-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-48864: [C++] Support customizing more Zstd parameters #48865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
e60e703 to
730eb94
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @HuaHuaY . This can be a useful improvement, I've made a bunch of comments below.
730eb94 to
4ab207e
Compare
|
@pitrou Thank you for your detailed review! I've fixed these comments. Please take a look again. |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks mostly good! Just a few nits.
| {5, GZipFormat::DEFLATE, -12, false}, | ||
| {-992, GZipFormat::GZIP, 15, false}}; | ||
|
|
||
| template <std::derived_from<arrow::util::CodecOptions> T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat, a concept 😀
| ARROW_ASSIGN_OR_RAISE(auto cctx, CreateCCtx()); | ||
| size_t ret = ZSTD_compressCCtx(cctx.get(), output_buffer, | ||
| static_cast<size_t>(output_buffer_len), input, | ||
| static_cast<size_t>(input_len), compression_level_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny that the compression level is passed a second time here, after we already gave it to ZSTD_CCtx_setParameter. I suppose it doesn't matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context will also be passed to ZSTDCompressor and it needs compression_level_. We can move auto ret = ZSTD_CCtx_setParameter(cctx.get(), ZSTD_c_compressionLevel, compression_level_) to MakeCompressor though I think the readability will decrease a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After I check the Zstd's document again, I think there is a mistake here. We need to use ZSTD_compress2 instead of ZSTD_compressCCtx because ZSTD_compressCCtx will reset other parameters. Thank you for your good attention!
/*! ZSTD_compressCCtx() :
* Same as ZSTD_compress(), using an explicit ZSTD_CCtx.
* Important : in order to mirror `ZSTD_compress()` behavior,
* this function compresses at the requested compression level,
* __ignoring any other advanced parameter__ .
* If any advanced parameter was set using the advanced API,
* they will all be reset. Only @compressionLevel remains.
*/
ZSTDLIB_API size_t ZSTD_compressCCtx(ZSTD_CCtx* cctx,
void* dst, size_t dstCapacity,
const void* src, size_t srcSize,
int compressionLevel);
/*! ZSTD_compress2() :
* Behave the same as ZSTD_compressCCtx(), but compression parameters are set using the advanced API.
* (note that this entry point doesn't even expose a compression level parameter).
* ZSTD_compress2() always starts a new frame.
* Should cctx hold data from a previously unfinished frame, everything about it is forgotten.
* - Compression parameters are pushed into CCtx before starting compression, using ZSTD_CCtx_set*()
* - The function is always blocking, returns when compression is completed.
* NOTE: Providing `dstCapacity >= ZSTD_compressBound(srcSize)` guarantees that zstd will have
* enough space to successfully compress the data, though it is possible it fails for other reasons.
* @return : compressed size written into `dst` (<= `dstCapacity),
* or an error code if it fails (which can be tested using ZSTD_isError()).
*/
ZSTDLIB_API size_t ZSTD_compress2( ZSTD_CCtx* cctx,
void* dst, size_t dstCapacity,
const void* src, size_t srcSize);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a new test TEST(TestCodecMisc, ZstdLargerWindowLog) to ensure the other parameters work effectively.
685b5e1 to
dfbca3b
Compare
address review use ZSTD_compress2 instead of ZSTD_compressCCtx add a test to ensure zstd context parameters work effectively
dfbca3b to
84dd650
Compare
Rationale for this change
Support customizing more Zstd parameters to allow users to optimize the performance of parquet page compression/decompression.
What changes are included in this PR?
ZstdCodecOptionsinherited fromCodecOptions.ZSTD_CCtx/ZSTD_DCtxexplicitly inZSTDCodec.Are these changes tested?
Yes.
Are there any user-facing changes?
Yes. The new class
ZstdCodecOptionsis added.