Project Antalya Roadmap 2025 - Real-Time Data Lakes

Here's the current Project Antalya roadmap for 2025. This year the principal focus is adapting ClickHouse to use Iceberg as shared object storage and adding separation of storage and compute.  All features are open source--there are no hold-backs. 

Please suggest additional features and ideas in the comments to this issue. We also welcome contributions. 

## Performance:
- [x] Parquet metadata cache 
- #586 
- #636 
- [x] Parquet native reader, v1  
- https://github.com/ClickHouse/ClickHouse/pull/62966
- https://github.com/ClickHouse/ClickHouse/pull/70807
- https://github.com/ClickHouse/ClickHouse/pull/71055
- https://github.com/ClickHouse/ClickHouse/pull/71383
- https://github.com/ClickHouse/ClickHouse/pull/72105
- [x] Parquet native reader, v3 (upstream) https://github.com/ClickHouse/ClickHouse/pull/82789
- [x] ListObjectsV2 cache #743 
- [x] Iceberg table pruning in cluster requests #770 
- [x] Iceberg files metadata cache (upstream) https://github.com/ClickHouse/ClickHouse/pull/77156
- [x] Iceberg partition pruning (upstream)
- https://github.com/ClickHouse/ClickHouse/pull/72044
- https://github.com/ClickHouse/ClickHouse/pull/78242
- https://github.com/ClickHouse/ClickHouse/pull/79262
- [ ] RowGroup adaptive size

## Swarms:
- [x] Auto-discovery of swarm cluster nodes #629 
- [x] Cache locality improvements
- #709  
- #867
- [x] Distributed object storage table engines #615 
- [x] Swarm query syntax #712 
- [x] Swarm reliability/re-tries
- #780
- #1014
- [ ] JOIN with *Cluster table functions
- #972
- [x] Improve observability #1172
- [ ] Swarm for writes
- [ ] Swarm for merges/optimize

## Catalogs:
- [x] Open source catalog for Kubernetes https://github.com/Altinity/ice
- [x] AWS S3 Table support
- [x] Support S3 tables as a warehouse #860 
- [x] Iceberg reads improvement 
- #1019
- #1092
- [ ] TimestampTZ support
- Iceberg #1103
- Glue (upstream) https://github.com/ClickHouse/ClickHouse/pull/83132
- [x] Unity catalog support (upstream) https://github.com/ClickHouse/ClickHouse/pull/76988
- [x] Glue catalog support (upstream) https://github.com/ClickHouse/ClickHouse/pull/77257
- [x] AWS S3 authentication (upstream) https://github.com/ClickHouse/ClickHouse/pull/84011 
- [x] Improve observability
- #1123 
- #959
- [ ] Cloudflare R2 Data Catalogs support
- [ ] Public datasets in Iceberg

## Iceberg Writes:
- [x] Toolkit for loading files into Iceberg https://github.com/Altinity/ice
- [x] Support partitioning
- [x] Support ordering (see https://www.tabular.io/apache-iceberg-cookbook/data-engineering-table-write-order/)
- [ ] CREATE TABLE for Iceberg/DataLakeCatalog database engine
- [ ] INSERT INTO Iceberg table
- [ ] Use MergeTree buffer for frequent inserts into Iceberg (like async inserts but with much bigger buffer on disk)

## Tiered Storage:
- [x] Add support for hive partition style reads and writes https://github.com/ClickHouse/ClickHouse/pull/76802
- [x] Write MergeTree parts to Parquet 
- #1009
- #1083 
- #1229
- Improve observability #1017
- [x] Write MergeTree partitions to Parquet
- #1124
- [x] Hybrid table engine 
- #1071
- #1156 
- [ ] TTL to other table
- [ ] Merge tables with watermark
- [ ] Backup/restore for tiered tables (extension to Altinity Backup for ClickHouse aka clickhouse-backup)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Antalya Roadmap 2025 - Real-Time Data Lakes #804

Performance:

Swarms:

Catalogs:

Iceberg Writes:

Tiered Storage:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Project Antalya Roadmap 2025 - Real-Time Data Lakes #804

Description

Performance:

Swarms:

Catalogs:

Iceberg Writes:

Tiered Storage:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions