Skip to content

Conversation

@HuaHuaY
Copy link
Contributor

@HuaHuaY HuaHuaY commented Jan 15, 2026

Rationale for this change

Support customizing more Zstd parameters to allow users to optimize the performance of parquet page compression/decompression.

What changes are included in this PR?

  1. Add a new class ZstdCodecOptions inherited from CodecOptions.
  2. Create ZSTD_CCtx/ZSTD_DCtx explicitly in ZSTDCodec.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes. The new class ZstdCodecOptions is added.

@HuaHuaY HuaHuaY requested a review from wgtmac as a code owner January 15, 2026 12:52
@github-actions
Copy link

⚠️ GitHub issue #48864 has been automatically assigned in GitHub to PR creator.

@HuaHuaY
Copy link
Contributor Author

HuaHuaY commented Jan 15, 2026

@pitrou @wgtmac @mapleFU Please take a look. This PR only uses context and does not reuse it. I create a new context every time I call Compress and Decompress methods. I guess this is the same behavior as before.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @HuaHuaY . This can be a useful improvement, I've made a bunch of comments below.

@github-actions github-actions bot added awaiting committer review Awaiting committer review awaiting review Awaiting review and removed awaiting review Awaiting review labels Jan 15, 2026
@github-actions github-actions bot added awaiting review Awaiting review and removed awaiting review Awaiting review awaiting committer review Awaiting committer review labels Jan 15, 2026
@HuaHuaY
Copy link
Contributor Author

HuaHuaY commented Jan 15, 2026

@pitrou Thank you for your detailed review! I've fixed these comments. Please take a look again.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly good! Just a few nits.

{5, GZipFormat::DEFLATE, -12, false},
{-992, GZipFormat::GZIP, 15, false}};

template <std::derived_from<arrow::util::CodecOptions> T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, a concept 😀

ARROW_ASSIGN_OR_RAISE(auto cctx, CreateCCtx());
size_t ret = ZSTD_compressCCtx(cctx.get(), output_buffer,
static_cast<size_t>(output_buffer_len), input,
static_cast<size_t>(input_len), compression_level_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny that the compression level is passed a second time here, after we already gave it to ZSTD_CCtx_setParameter. I suppose it doesn't matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context will also be passed to ZSTDCompressor and it needs compression_level_. We can move auto ret = ZSTD_CCtx_setParameter(cctx.get(), ZSTD_c_compressionLevel, compression_level_) to MakeCompressor though I think the readability will decrease a bit.

Copy link
Contributor Author

@HuaHuaY HuaHuaY Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After I check the Zstd's document again, I think there is a mistake here. We need to use ZSTD_compress2 instead of ZSTD_compressCCtx because ZSTD_compressCCtx will reset other parameters. Thank you for your good attention!

/*! ZSTD_compressCCtx() :
 *  Same as ZSTD_compress(), using an explicit ZSTD_CCtx.
 *  Important : in order to mirror `ZSTD_compress()` behavior,
 *  this function compresses at the requested compression level,
 *  __ignoring any other advanced parameter__ .
 *  If any advanced parameter was set using the advanced API,
 *  they will all be reset. Only @compressionLevel remains.
 */
ZSTDLIB_API size_t ZSTD_compressCCtx(ZSTD_CCtx* cctx,
                                     void* dst, size_t dstCapacity,
                               const void* src, size_t srcSize,
                                     int compressionLevel);

/*! ZSTD_compress2() :
 *  Behave the same as ZSTD_compressCCtx(), but compression parameters are set using the advanced API.
 *  (note that this entry point doesn't even expose a compression level parameter).
 *  ZSTD_compress2() always starts a new frame.
 *  Should cctx hold data from a previously unfinished frame, everything about it is forgotten.
 *  - Compression parameters are pushed into CCtx before starting compression, using ZSTD_CCtx_set*()
 *  - The function is always blocking, returns when compression is completed.
 *  NOTE: Providing `dstCapacity >= ZSTD_compressBound(srcSize)` guarantees that zstd will have
 *        enough space to successfully compress the data, though it is possible it fails for other reasons.
 * @return : compressed size written into `dst` (<= `dstCapacity),
 *           or an error code if it fails (which can be tested using ZSTD_isError()).
 */
ZSTDLIB_API size_t ZSTD_compress2( ZSTD_CCtx* cctx,
                                   void* dst, size_t dstCapacity,
                             const void* src, size_t srcSize);

Copy link
Contributor Author

@HuaHuaY HuaHuaY Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a new test TEST(TestCodecMisc, ZstdLargerWindowLog) to ensure the other parameters work effectively.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 15, 2026
address review

use ZSTD_compress2 instead of ZSTD_compressCCtx

add a test to ensure zstd context parameters work effectively
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants