Skip to content

Conversation

@godnight10061
Copy link
Contributor

@godnight10061 godnight10061 commented Jan 17, 2026

Summary

  • Add a component-shredded vortex.list layout for List/FixedSizeList columns
  • Enable nested projection/pushdown for list<struct> (e.g. projecting items.a)
  • Extend vortex.get_item and the list reader to support list<struct>

Why

  • Today list columns are written as opaque blobs, which prevents nested projections from being pushed down and breaks list-of-struct field selection.

Tests

  • cargo +nightly fmt --all --check
  • cargo clippy --locked --workspace --all-features --all-targets --exclude vortex-bench --exclude vortex-python --exclude vortex-duckdb --exclude vortex-fuzz --exclude duckdb-bench --exclude lance-bench --exclude datafusion-bench --exclude random-access-bench --exclude compress-bench -- -D warnings
  • cargo nextest run --locked --workspace --all-features --no-fail-fast --exclude vortex-bench --exclude vortex-python --exclude vortex-duckdb --exclude vortex-fuzz --exclude duckdb-bench --exclude lance-bench --exclude datafusion-bench --exclude random-access-bench --exclude compress-bench (fails locally: vortex-cuda for_::tests::test_cuda_for_decompression requires a CUDA driver)
  • cargo nextest run --locked --workspace --all-features --no-fail-fast --exclude vortex-cuda --exclude vortex-bench --exclude vortex-python --exclude vortex-duckdb --exclude vortex-fuzz --exclude duckdb-bench --exclude lance-bench --exclude datafusion-bench --exclude random-access-bench --exclude compress-bench

Compatibility

  • New files may be written with a new list layout encoding id (vortex.list). Older readers that don't know this encoding won't be able to read those files; new readers continue to read existing files.

References

@godnight10061
Copy link
Contributor Author

I wasn't able to apply labels from a fork (permission denied). Could a maintainer please add the ix label for the changelog/CI gating?

@robert3005 robert3005 added the changelog/fix A bug fix label Jan 17, 2026
Comment on lines 193 to 194
// List-of-struct field access: `$.items.a` where `items: list<struct{a,b}>`.
match input.dtype() {
Copy link
Contributor

@joseph-isaacs joseph-isaacs Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really want to tie the get_item to have this for_each element behaviour.
I am not sure we want get_item to pierce thought lists, we would want a this to be map_list(get_item("a", _), get_item("items", root()) I am not sure we have designed the map_list or the _ expression yet?

@joseph-isaacs
Copy link
Contributor

This is a great contribution, however we need to design the list expression more carefully

@codecov
Copy link

codecov bot commented Jan 19, 2026

Codecov Report

❌ Patch coverage is 68.05556% with 276 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.83%. Comparing base (a30298b) to head (8a8803c).
⚠️ Report is 5 commits behind head on develop.

Files with missing lines Patch % Lines
vortex-array/src/expr/exprs/get_item.rs 43.69% 134 Missing ⚠️
vortex-layout/src/layouts/list/writer.rs 65.57% 63 Missing ⚠️
vortex-layout/src/layouts/list/reader.rs 80.86% 44 Missing ⚠️
vortex-layout/src/layouts/list/mod.rs 70.83% 35 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@joseph-isaacs
Copy link
Contributor

joseph-isaacs commented Jan 19, 2026

Started a discussion: #6030, we can reopen once we find a good answer there.

Maybe you can expand on your use case there.

@joseph-isaacs joseph-isaacs marked this pull request as draft January 19, 2026 14:59
@godnight10061 godnight10061 force-pushed the fix-list-of-struct-projection branch from 8a8803c to 11f06b6 Compare January 20, 2026 02:50
row_range: &Range<u64>,
splits: &mut BTreeSet<u64>,
) -> VortexResult<()> {
splits.insert(row_range.end);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one thing I struggled with the last time I tried to do ListLayout is that it's hard to split naturally.

Ideally you'd have a way to split based on the elements, then turn that back into row ranges...

@godnight10061 godnight10061 force-pushed the fix-list-of-struct-projection branch from d5a8576 to 87d9553 Compare January 21, 2026 02:35
@joseph-isaacs joseph-isaacs self-assigned this Jan 22, 2026
@joseph-isaacs joseph-isaacs marked this pull request as ready for review January 22, 2026 21:17
Comment on lines +47 to +57
fn propagate_nullability(parent_nullability: Nullability, field_dtype: DType) -> DType {
if matches!(
(parent_nullability, field_dtype.nullability()),
(Nullability::Nullable, Nullability::NonNullable)
) {
return field_dtype.with_nullability(Nullability::Nullable);
}

field_dtype
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

union_nullability

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need these changes?

Comment on lines +31 to +40
fn propagate_nullability(parent_nullability: Nullability, field_dtype: DType) -> DType {
if matches!(
(parent_nullability, field_dtype.nullability()),
(Nullability::Nullable, Nullability::NonNullable)
) {
return field_dtype.with_nullability(Nullability::Nullable);
}

field_dtype
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

field_dtype
}

impl VTable for GetItemList {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc str please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure its clear this should never be seralized and relied on.

(element_dtype.as_ref(), *list_nullability, Some(*list_size))
}
_ => {
return Err(vortex_err!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vortex_bail


match input.dtype() {
DType::List(..) => {
let list = input.to_listview();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use execute, I think needs to be updated too

Comment on lines +161 to +168
let elements = list.elements();
let elements = elements.to_struct();

let field = elements.field_by_name(field_name).cloned()?;
let field = match elements.dtype().nullability() {
Nullability::NonNullable => field,
Nullability::Nullable => mask(&field, &elements.validity_mask().not())?,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we share this between list and fsl

Comment on lines +51 to +54
pb::GetItemOpts {
path: field_name.to_string(),
}
.encode_to_vec(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should panic with an error message

Comment on lines +199 to +207
let scan_projection = scan_projection
.optimize_recursive(vxf.dtype())
.map_err(|e| {
exec_datafusion_err!(
"Couldn't simplify the underlying Vortex scan projection: {e}"
)
})?;
let scan_dtype = scan_projection.return_dtype(vxf.dtype()).map_err(|e| {
exec_datafusion_err!("Couldn't get the dtype for the underlying Vortex scan: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do this

col: &Expression,
scope_dtype: &DType,
) -> VortexResult<Option<Expression>> {
let col = col.optimize_recursive(scope_dtype)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again try?

_ => None,
})
.expect("items layout");
assert_eq!(items_layout.encoding_id().as_ref(), "vortex.list");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do an assert_array_eq! that the result in what is expected?

//
// NOTE: `vortex.get_item_list` is a temporary list-of-struct projection expression;
// when pushing down we construct the element projection and pass it into the elements reader.
let (is_pushdown, element_expr) = if let Some(field_name) = expr.as_opt::<GetItemList>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you also need to teach the struct layout reader partitioner about this expression

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotate_scope_access and (/Users/joeisaacs/git/spiraldb/vortex-4/vortex-array/src/expr/transform/partition.rs) otherwise the struct partitioning at the top level will be over cautious

Comment on lines +375 to +376
} else if expr.vtable().id().as_ref() == "vortex.select" {
(true, expr.clone())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should never be here, also use the expr id please

self.lazy_children.get(idx)
}

fn list_slice_future(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc str please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/fix A bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants