forked from ClickHouse/ClickHouse
-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Labels
Description
Here's the current Project Antalya roadmap for 2025. This year the principal focus is adapting ClickHouse to use Iceberg as shared object storage and adding separation of storage and compute. All features are open source--there are no hold-backs.
Please suggest additional features and ideas in the comments to this issue. We also welcome contributions.
Performance:
- Parquet metadata cache
- Parquet File Metadata caching implementation #586
- use parquet metadata cache for parquetmetadata format as well #636
- Parquet native reader, v1
- Add parquet bloom filters support ClickHouse/ClickHouse#62966
- Support for Parquet page V2 on native reader ClickHouse/ClickHouse#70807
- Boolean support for parquet native reader ClickHouse/ClickHouse#71055
- Merge parquet bloom filter and min/max evaluation ClickHouse/ClickHouse#71383
- Support parquet integer logical types on native reader ClickHouse/ClickHouse#72105
- Parquet native reader, v3 (upstream) Parquet reader v3 ClickHouse/ClickHouse#82789
- ListObjectsV2 cache Antalya: Cache the list objects operation on object storage using a TTL + prefix matching cache implementation #743
- Iceberg table pruning in cluster requests Prune table in icebergCluster functions #770
- Iceberg files metadata cache (upstream) Support Iceberg Metadata Files Cache ClickHouse/ClickHouse#77156
- Iceberg partition pruning (upstream)
- Iceberg Partition Pruning for time-related partition transforms ClickHouse/ClickHouse#72044
- Minmax iceberg ClickHouse/ClickHouse#78242
- Support for Iceberg partition pruning bucket transform ClickHouse/ClickHouse#79262
- RowGroup adaptive size
Swarms:
- Auto-discovery of swarm cluster nodes Clusters autodiscovery #629
- Cache locality improvements
- Rendezvous hashing filesystem cache #709
- Improve cache locality #867
- Distributed object storage table engines Distributed request to tables with Object Storage Engines #615
- Swarm query syntax Convert functions with object_storage_cluster setting to cluster functions #712
- Swarm reliability/re-tries
- Restart cluster tasks on connection lost #780
- SYSTEM STOP SWARM MODE command for graceful shutdown swarm node merge attempt v2 #1014
- JOIN with *Cluster table functions
- s3cluster joins, part 1 #972
- Improve observability Profile events for task distribution in ObjectStorageCluster requests #1172
- Swarm for writes
- Swarm for merges/optimize
Catalogs:
- Open source catalog for Kubernetes https://github.com/Altinity/ice
- AWS S3 Table support
- Support S3 tables as a warehouse Antalya 25.3: Support different warehouses behind Iceberg REST catalog #860
- Iceberg reads improvement
- Read optimization using Iceberg metadata #1019
- Allow to read Iceberg data from any location #1092
- TimestampTZ support
- Iceberg Timezone for iceberg timestamptz #1103
- Glue (upstream) Support
TimestampTZin Glue catalog ClickHouse/ClickHouse#83132 - Unity catalog support (upstream) Unity catalog integration ClickHouse/ClickHouse#76988
- Glue catalog support (upstream) Add glue catalog integration ClickHouse/ClickHouse#77257
- AWS S3 authentication (upstream) Implement AWS S3 authentication with an explicitly provided IAM role; implement OAuth for GCS. ClickHouse/ClickHouse#84011
- Improve observability
- More profile metrics for Iceberg, S3 and Azure #1123
- Antalya 25.6.5: Expose IcebergS3 partition_key and sorting_key in system.tables #959
- Cloudflare R2 Data Catalogs support
- Public datasets in Iceberg
Iceberg Writes:
- Toolkit for loading files into Iceberg https://github.com/Altinity/ice
- Support partitioning
- Support ordering (see https://www.tabular.io/apache-iceberg-cookbook/data-engineering-table-write-order/)
- CREATE TABLE for Iceberg/DataLakeCatalog database engine
- INSERT INTO Iceberg table
- Use MergeTree buffer for frequent inserts into Iceberg (like async inserts but with much bigger buffer on disk)
Tiered Storage:
- Add support for hive partition style reads and writes Add support for hive partition style reads and writes ClickHouse/ClickHouse#76802
- Write MergeTree parts to Parquet
- simple export part #1009
- Allow any partition strategy to accept part export #1083
- split large parquet files on part export, preserve entire settings object in part export #1229
- Improve observability improve observability a bit, simplify sink #1017
- Write MergeTree partitions to Parquet
- Yet another export replicated partition pr #1124
- Hybrid table engine
- engine=Hybrid #1071
- engine=Hybrid improvements #1156
- TTL to other table
- Merge tables with watermark
- Backup/restore for tiered tables (extension to Altinity Backup for ClickHouse aka clickhouse-backup)
Chrisbattarbee, joshleecreates, godwhoa, sumerc, shiv4289 and 4 more