-
Notifications
You must be signed in to change notification settings - Fork 487
[filesystem] fix the size of fluss-fs-s3 is too big #2419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…op version upgrades. Signed-off-by: Pei Yu <125331682@qq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses the issue of the fluss-fs-s3 jar becoming too large after Hadoop version upgrades by removing dependency on the AWS SDK v2 bundle and replacing Hadoop's NoAwsCredentialsException with a custom implementation.
Changes:
- Excluded
software.amazon.awssdk:bundlefromhadoop-awsdependency to prevent transitive inclusion of the large AWS SDK v2 bundle - Introduced
InvalidCredentialsExceptionas a replacement forNoAwsCredentialsExceptionfrom hadoop-aws - Updated credential provider classes to use the new custom exception
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| fluss-filesystems/fluss-fs-s3/pom.xml | Added exclusion for AWS SDK v2 bundle from hadoop-aws dependency |
| fluss-filesystems/fluss-fs-s3/src/main/java/org/apache/fluss/fs/s3/exception/InvalidCredentialsException.java | New custom exception class to replace NoAwsCredentialsException |
| fluss-filesystems/fluss-fs-s3/src/main/java/org/apache/fluss/fs/s3/token/DynamicTemporaryAWSCredentialsProvider.java | Updated to throw InvalidCredentialsException instead of NoAwsCredentialsException |
| fluss-filesystems/fluss-fs-s3/src/main/java/org/apache/fluss/fs/s3/token/S3DelegationTokenReceiver.java | Updated to throw InvalidCredentialsException instead of NoAwsCredentialsException |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
luoyuxia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sd4324530 Thanks for the pr. Left minor comments
| <groupId>org.slf4j</groupId> | ||
| <artifactId>slf4j-reload4j</artifactId> | ||
| </exclusion> | ||
| <exclusion> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://hadoop.apache.org/release/3.4.1.html
"We have also introduced a lean tar which is a small tar file that does not contain the AWS SDK because the size of AWS SDK is itself 500 MB. This can ease usage for non AWS users. Even AWS users can add this jar explicitly if desired."
If we use 3.4.1, the fat jar won't be included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this should be a better solution. I'll test it and give my conclusion later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luoyuxia Hadoop 3.4.1 provides a "tar" file instead of the "jar" file we usually use. Simply adjusting the version number in the pom.xml file will not reduce the final package size.
I think we still need to manually exclude the fat JAR.
|
|
||
| public static final String E_NO_AWS_CREDENTIALS = "No AWS Credentials"; | ||
|
|
||
| public InvalidCredentialsException(String credentialProvider) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the class not found issue. I also check the v0.8 jar for s3
com/amazonaws/SdkClientException.class
The same with the jar build with this pr
com/amazonaws/SdkClientException.class
So, it looks to me that the same problem will also happen in v0.8, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for reminding me. I checked it again carefully.
Fluss v0.8 depends on hadoop version 3.3.4.
I found that the extends chain of NoAwsCredentialsException has changed when upgrading from 3.3.4 to 3.4.0:
hadoop 3.3.4:
classDiagram
direction RL
class AmazonClientException
class CredentialInitializationException
class NoAuthWithAWSException
class NoAwsCredentialsException
AmazonClientException --|> CredentialInitializationException
CredentialInitializationException --|> NoAuthWithAWSException
NoAuthWithAWSException --|> NoAwsCredentialsException
hadoop 3.4.0:
classDiagram
direction RL
class SdkClientException
class CredentialInitializationException
class NoAuthWithAWSException
class NoAwsCredentialsException
SdkClientException--|> CredentialInitializationException
CredentialInitializationException --|> NoAuthWithAWSException
NoAuthWithAWSException --|> NoAwsCredentialsException
In Hadoop 3.3.4, AmazonClientException originates from aws-java-sdk-core-1.12.319.jar, which is only 1MB in size.
see: https://github.com/apache/fluss/blob/release-0.8/fluss-filesystems/fluss-fs-s3/pom.xml#L188-L192
In Hadoop 3.4.0, SdkClientException originates from bundle-2.23.19.jar, which is 500+MB in size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explantion.
Purpose
Linked issue: close #2412
Brief change log
Fixed the issue of the
fluss-fs-s3jar becoming too big due to Hadoop version upgrades.hadoop-awsversion 3.4.0:InvalidCredentialsExceptionto replaceNoAwsCredentialsException.