Skip to content

Conversation

@cfvescovo
Copy link
Member

  • Bump ego-tree from 0.10.0 to 0.11.0
  • Bump html5ever from 0.36.0 to 0.37.1
  • Bump tendril from 0.4.3 to 0.5.0 (required by html5ever 0.37.1)
  • Add clone_subtree() method to TreeSink implementation

Copilot AI review requested due to automatic review settings January 23, 2026 20:36
- Bump ego-tree from 0.10.0 to 0.11.0
- Bump html5ever from 0.36.0 to 0.37.1
- Bump tendril from 0.4.3 to 0.5.0 (required by html5ever 0.37.1)
- Add clone_subtree() method to TreeSink implementation
@cfvescovo cfvescovo force-pushed the upgrade-ego-tree-html5ever branch from 540eeea to f13a4e6 Compare January 23, 2026 20:37
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades key HTML parsing dependencies to their latest versions. The main changes include updating ego-tree from 0.10.0 to 0.11.0, html5ever from 0.36.0 to 0.37.1, and tendril from 0.4.3 to 0.5.0. As a requirement of the new html5ever version, a new clone_subtree() method has been implemented for the TreeSink trait.

Changes:

  • Upgraded ego-tree, html5ever, and tendril dependencies to latest versions
  • Implemented clone_subtree() method in TreeSink implementation
  • Updated transitive dependencies (futf and mac removed, serde removed from string_cache)

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.

File Description
scraper/Cargo.toml Updated dependency versions for ego-tree (0.11.0), html5ever (0.37.1), and tendril (0.5.0)
Cargo.lock Updated lock file with new dependency versions and transitive dependency changes
scraper/src/html/tree_sink.rs Added Tree import and implemented clone_subtree() method for TreeSink trait

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 300 to 328
let this = self.0.borrow();

fn clone_node_recursive(
tree: &Tree<Node>,
node_id: NodeId,
new_tree: &mut Tree<Node>,
parent_id: Option<NodeId>,
) -> NodeId {
let node = tree.get(node_id).unwrap();
let cloned_value = node.value().clone();

let new_node_id = if let Some(parent) = parent_id {
let mut parent_node = new_tree.get_mut(parent).unwrap();
parent_node.append(cloned_value);
parent_node.last_child().unwrap().id()
} else {
new_tree.orphan(cloned_value).id()
};

for child in node.children() {
clone_node_recursive(tree, child.id(), new_tree, Some(new_node_id));
}

new_node_id
}

drop(this);
let mut this = self.0.borrow_mut();
clone_node_recursive(&this.tree.clone(), *target, &mut this.tree, None)
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation clones the entire tree just to read from it, which is extremely inefficient for large trees. A better approach would be to use a two-pass algorithm: first collect all the node values to clone in a vector, then insert them into the tree. This avoids the expensive full tree clone.

Example approach:

// First pass: collect nodes to clone
let nodes_to_clone = {
    let this = self.0.borrow();
    collect_nodes_recursive(&this.tree, *target)
};

// Second pass: insert cloned nodes
let mut this = self.0.borrow_mut();
insert_nodes_recursive(&mut this.tree, nodes_to_clone)

This would improve performance significantly for large subtrees.

Suggested change
let this = self.0.borrow();
fn clone_node_recursive(
tree: &Tree<Node>,
node_id: NodeId,
new_tree: &mut Tree<Node>,
parent_id: Option<NodeId>,
) -> NodeId {
let node = tree.get(node_id).unwrap();
let cloned_value = node.value().clone();
let new_node_id = if let Some(parent) = parent_id {
let mut parent_node = new_tree.get_mut(parent).unwrap();
parent_node.append(cloned_value);
parent_node.last_child().unwrap().id()
} else {
new_tree.orphan(cloned_value).id()
};
for child in node.children() {
clone_node_recursive(tree, child.id(), new_tree, Some(new_node_id));
}
new_node_id
}
drop(this);
let mut this = self.0.borrow_mut();
clone_node_recursive(&this.tree.clone(), *target, &mut this.tree, None)
// First pass: collect node values and their parent relationships without
// cloning the entire tree.
fn collect_nodes_recursive(
tree: &Tree<Node>,
node_id: NodeId,
parent_index: Option<usize>,
out: &mut Vec<(Option<usize>, Node)>,
) {
let node = tree.get(node_id).unwrap();
let cloned_value = node.value().clone();
let current_index = out.len();
out.push((parent_index, cloned_value));
for child in node.children() {
collect_nodes_recursive(tree, child.id(), Some(current_index), out);
}
}
// First pass under an immutable borrow: collect the subtree structure and values.
let nodes_to_clone = {
let this = self.0.borrow();
let mut nodes = Vec::new();
collect_nodes_recursive(&this.tree, *target, None, &mut nodes);
nodes
};
// Second pass under a mutable borrow: insert cloned nodes into the tree.
let mut this = self.0.borrow_mut();
let mut new_ids: Vec<NodeId> = Vec::with_capacity(nodes_to_clone.len());
for (parent_index, value) in nodes_to_clone {
let new_node_id = if let Some(parent_idx) = parent_index {
let parent_id = new_ids[parent_idx];
let mut parent_node = this.tree.get_mut(parent_id).unwrap();
parent_node.append(value);
parent_node.last_child().unwrap().id()
} else {
this.tree.orphan(value).id()
};
new_ids.push(new_node_id);
}
// The first inserted node is the root of the cloned subtree.
new_ids[0]

Copilot uses AI. Check for mistakes.
Comment on lines 299 to 329
fn clone_subtree(&self, target: &Self::Handle) -> Self::Handle {
let this = self.0.borrow();

fn clone_node_recursive(
tree: &Tree<Node>,
node_id: NodeId,
new_tree: &mut Tree<Node>,
parent_id: Option<NodeId>,
) -> NodeId {
let node = tree.get(node_id).unwrap();
let cloned_value = node.value().clone();

let new_node_id = if let Some(parent) = parent_id {
let mut parent_node = new_tree.get_mut(parent).unwrap();
parent_node.append(cloned_value);
parent_node.last_child().unwrap().id()
} else {
new_tree.orphan(cloned_value).id()
};

for child in node.children() {
clone_node_recursive(tree, child.id(), new_tree, Some(new_node_id));
}

new_node_id
}

drop(this);
let mut this = self.0.borrow_mut();
clone_node_recursive(&this.tree.clone(), *target, &mut this.tree, None)
}
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new clone_subtree method lacks test coverage. Since the codebase has comprehensive test coverage for other TreeSink methods and HTML parsing functionality, this method should also be tested to ensure it correctly clones subtrees with various node types (elements, text, comments) and nested structures.

Copilot uses AI. Check for mistakes.
@cfvescovo cfvescovo closed this Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants