-
Notifications
You must be signed in to change notification settings - Fork 396
feat(datafusion): Add LIKE predicate pushdown for StartsWith patterns #2014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(datafusion): Add LIKE predicate pushdown for StartsWith patterns #2014
Conversation
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @viirya for this pr!
| assert_eq!(predicate, None); | ||
| } | ||
|
|
||
| #[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We recently added support for sqllogictests, see https://github.com/liurenjie1024/iceberg-rust/blob/666a9fe1aaf1692583d6f44e4f7a1d52a688b217/crates/sqllogictest/testdata/schedules/df_test.toml#L19
It would be better if we also include such sql logic tests in spite of ut.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Added test in sqllogictests.
| }) => { | ||
| // Only support simple prefix patterns (e.g., 'prefix%') | ||
| // Escape characters and case-insensitive matching are not supported yet | ||
| if escape_char.is_some() || *case_insensitive { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, iceberg's starts with is case sensitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the document here.
Implement pushdown for LIKE predicates with simple prefix patterns (e.g., 'prefix%') to Iceberg's StartsWith operator. This optimization allows filtering to be performed at the storage layer, significantly improving query performance for prefix searches. Changes: - Add support for Expr::Like in predicate conversion - Convert LIKE 'prefix%' patterns to StartsWith operator - Convert NOT LIKE 'prefix%' patterns to NotStartsWith operator - Handle edge cases: empty prefix, unicode, special characters - Reject complex patterns that cannot be pushed down (wildcards in middle, underscore, ILIKE) - Add 8 comprehensive unit tests covering various scenarios Implementation details: - Only simple prefix patterns ending with % are converted - Patterns with % or _ wildcards in the prefix are not pushed down - Case-insensitive LIKE (ILIKE) is not supported for pushdown - Escape characters are not supported for pushdown - Works seamlessly with other predicates in AND/OR expressions Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
… clarify case sensitivity Address review comments on PR apache#2014: 1. Add comprehensive sqllogictest for LIKE predicate pushdown 2. Clarify that Iceberg's StartsWith operator is case-sensitive Changes: - Add like_predicate_pushdown.slt test file with comprehensive test coverage: * Verify LIKE 'prefix%' and NOT LIKE 'prefix%' predicates push down to IcebergTableScan * Test case-sensitive behavior (e.g., 'Al%' matches 'Alice' but not 'alice') * Test empty prefix, single character prefix * Test LIKE combined with other predicates (AND/OR expressions) - Add test to df_test.toml schedule - Update code comments to explicitly note that Iceberg's StartsWith is case-sensitive, explaining why ILIKE cannot be pushed down The sqllogictests verify that: - LIKE predicates are correctly converted to StartsWith/NotStartsWith - Case sensitivity is preserved (Iceberg StartsWith is case-sensitive) - Predicates are properly pushed down to the storage layer - Complex patterns that cannot be pushed down remain in FilterExec Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
99e1b6a to
09d52d6
Compare
Implement pushdown for LIKE predicates with simple prefix patterns (e.g., 'prefix%') to Iceberg's StartsWith operator. This optimization allows filtering to be performed at the storage layer, significantly improving query performance for prefix searches.
Changes:
Implementation details:
Which issue does this PR close?
What changes are included in this PR?
Are these changes tested?