-
Notifications
You must be signed in to change notification settings - Fork 125
Upgrade ego-tree to 0.11.0 and html5ever to 0.37.1 #299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cfvescovo
commented
Jan 23, 2026
- Bump ego-tree from 0.10.0 to 0.11.0
- Bump html5ever from 0.36.0 to 0.37.1
- Bump tendril from 0.4.3 to 0.5.0 (required by html5ever 0.37.1)
- Add clone_subtree() method to TreeSink implementation
- Bump ego-tree from 0.10.0 to 0.11.0 - Bump html5ever from 0.36.0 to 0.37.1 - Bump tendril from 0.4.3 to 0.5.0 (required by html5ever 0.37.1) - Add clone_subtree() method to TreeSink implementation
540eeea to
f13a4e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR upgrades key HTML parsing dependencies to their latest versions. The main changes include updating ego-tree from 0.10.0 to 0.11.0, html5ever from 0.36.0 to 0.37.1, and tendril from 0.4.3 to 0.5.0. As a requirement of the new html5ever version, a new clone_subtree() method has been implemented for the TreeSink trait.
Changes:
- Upgraded ego-tree, html5ever, and tendril dependencies to latest versions
- Implemented clone_subtree() method in TreeSink implementation
- Updated transitive dependencies (futf and mac removed, serde removed from string_cache)
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| scraper/Cargo.toml | Updated dependency versions for ego-tree (0.11.0), html5ever (0.37.1), and tendril (0.5.0) |
| Cargo.lock | Updated lock file with new dependency versions and transitive dependency changes |
| scraper/src/html/tree_sink.rs | Added Tree import and implemented clone_subtree() method for TreeSink trait |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
scraper/src/html/tree_sink.rs
Outdated
| let this = self.0.borrow(); | ||
|
|
||
| fn clone_node_recursive( | ||
| tree: &Tree<Node>, | ||
| node_id: NodeId, | ||
| new_tree: &mut Tree<Node>, | ||
| parent_id: Option<NodeId>, | ||
| ) -> NodeId { | ||
| let node = tree.get(node_id).unwrap(); | ||
| let cloned_value = node.value().clone(); | ||
|
|
||
| let new_node_id = if let Some(parent) = parent_id { | ||
| let mut parent_node = new_tree.get_mut(parent).unwrap(); | ||
| parent_node.append(cloned_value); | ||
| parent_node.last_child().unwrap().id() | ||
| } else { | ||
| new_tree.orphan(cloned_value).id() | ||
| }; | ||
|
|
||
| for child in node.children() { | ||
| clone_node_recursive(tree, child.id(), new_tree, Some(new_node_id)); | ||
| } | ||
|
|
||
| new_node_id | ||
| } | ||
|
|
||
| drop(this); | ||
| let mut this = self.0.borrow_mut(); | ||
| clone_node_recursive(&this.tree.clone(), *target, &mut this.tree, None) |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation clones the entire tree just to read from it, which is extremely inefficient for large trees. A better approach would be to use a two-pass algorithm: first collect all the node values to clone in a vector, then insert them into the tree. This avoids the expensive full tree clone.
Example approach:
// First pass: collect nodes to clone
let nodes_to_clone = {
let this = self.0.borrow();
collect_nodes_recursive(&this.tree, *target)
};
// Second pass: insert cloned nodes
let mut this = self.0.borrow_mut();
insert_nodes_recursive(&mut this.tree, nodes_to_clone)
This would improve performance significantly for large subtrees.
| let this = self.0.borrow(); | |
| fn clone_node_recursive( | |
| tree: &Tree<Node>, | |
| node_id: NodeId, | |
| new_tree: &mut Tree<Node>, | |
| parent_id: Option<NodeId>, | |
| ) -> NodeId { | |
| let node = tree.get(node_id).unwrap(); | |
| let cloned_value = node.value().clone(); | |
| let new_node_id = if let Some(parent) = parent_id { | |
| let mut parent_node = new_tree.get_mut(parent).unwrap(); | |
| parent_node.append(cloned_value); | |
| parent_node.last_child().unwrap().id() | |
| } else { | |
| new_tree.orphan(cloned_value).id() | |
| }; | |
| for child in node.children() { | |
| clone_node_recursive(tree, child.id(), new_tree, Some(new_node_id)); | |
| } | |
| new_node_id | |
| } | |
| drop(this); | |
| let mut this = self.0.borrow_mut(); | |
| clone_node_recursive(&this.tree.clone(), *target, &mut this.tree, None) | |
| // First pass: collect node values and their parent relationships without | |
| // cloning the entire tree. | |
| fn collect_nodes_recursive( | |
| tree: &Tree<Node>, | |
| node_id: NodeId, | |
| parent_index: Option<usize>, | |
| out: &mut Vec<(Option<usize>, Node)>, | |
| ) { | |
| let node = tree.get(node_id).unwrap(); | |
| let cloned_value = node.value().clone(); | |
| let current_index = out.len(); | |
| out.push((parent_index, cloned_value)); | |
| for child in node.children() { | |
| collect_nodes_recursive(tree, child.id(), Some(current_index), out); | |
| } | |
| } | |
| // First pass under an immutable borrow: collect the subtree structure and values. | |
| let nodes_to_clone = { | |
| let this = self.0.borrow(); | |
| let mut nodes = Vec::new(); | |
| collect_nodes_recursive(&this.tree, *target, None, &mut nodes); | |
| nodes | |
| }; | |
| // Second pass under a mutable borrow: insert cloned nodes into the tree. | |
| let mut this = self.0.borrow_mut(); | |
| let mut new_ids: Vec<NodeId> = Vec::with_capacity(nodes_to_clone.len()); | |
| for (parent_index, value) in nodes_to_clone { | |
| let new_node_id = if let Some(parent_idx) = parent_index { | |
| let parent_id = new_ids[parent_idx]; | |
| let mut parent_node = this.tree.get_mut(parent_id).unwrap(); | |
| parent_node.append(value); | |
| parent_node.last_child().unwrap().id() | |
| } else { | |
| this.tree.orphan(value).id() | |
| }; | |
| new_ids.push(new_node_id); | |
| } | |
| // The first inserted node is the root of the cloned subtree. | |
| new_ids[0] |
| fn clone_subtree(&self, target: &Self::Handle) -> Self::Handle { | ||
| let this = self.0.borrow(); | ||
|
|
||
| fn clone_node_recursive( | ||
| tree: &Tree<Node>, | ||
| node_id: NodeId, | ||
| new_tree: &mut Tree<Node>, | ||
| parent_id: Option<NodeId>, | ||
| ) -> NodeId { | ||
| let node = tree.get(node_id).unwrap(); | ||
| let cloned_value = node.value().clone(); | ||
|
|
||
| let new_node_id = if let Some(parent) = parent_id { | ||
| let mut parent_node = new_tree.get_mut(parent).unwrap(); | ||
| parent_node.append(cloned_value); | ||
| parent_node.last_child().unwrap().id() | ||
| } else { | ||
| new_tree.orphan(cloned_value).id() | ||
| }; | ||
|
|
||
| for child in node.children() { | ||
| clone_node_recursive(tree, child.id(), new_tree, Some(new_node_id)); | ||
| } | ||
|
|
||
| new_node_id | ||
| } | ||
|
|
||
| drop(this); | ||
| let mut this = self.0.borrow_mut(); | ||
| clone_node_recursive(&this.tree.clone(), *target, &mut this.tree, None) | ||
| } |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new clone_subtree method lacks test coverage. Since the codebase has comprehensive test coverage for other TreeSink methods and HTML parsing functionality, this method should also be tested to ensure it correctly clones subtrees with various node types (elements, text, comments) and nested structures.