Skip to content

Conversation

@sd4324530
Copy link
Contributor

@sd4324530 sd4324530 commented Jan 20, 2026

Purpose

Linked issue: close #2412

Brief change log

Fixed the issue of the fluss-fs-s3 jar becoming too big due to Hadoop version upgrades.

  1. Exclude dependency from hadoop-aws version 3.4.0:
<exclusion>
   <groupId>software.amazon.awssdk</groupId>
   <artifactId>bundle</artifactId>
</exclusion>
  1. Added InvalidCredentialsException to replace NoAwsCredentialsException.

…op version upgrades.

Signed-off-by: Pei Yu <125331682@qq.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses the issue of the fluss-fs-s3 jar becoming too large after Hadoop version upgrades by removing dependency on the AWS SDK v2 bundle and replacing Hadoop's NoAwsCredentialsException with a custom implementation.

Changes:

  • Excluded software.amazon.awssdk:bundle from hadoop-aws dependency to prevent transitive inclusion of the large AWS SDK v2 bundle
  • Introduced InvalidCredentialsException as a replacement for NoAwsCredentialsException from hadoop-aws
  • Updated credential provider classes to use the new custom exception

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
fluss-filesystems/fluss-fs-s3/pom.xml Added exclusion for AWS SDK v2 bundle from hadoop-aws dependency
fluss-filesystems/fluss-fs-s3/src/main/java/org/apache/fluss/fs/s3/exception/InvalidCredentialsException.java New custom exception class to replace NoAwsCredentialsException
fluss-filesystems/fluss-fs-s3/src/main/java/org/apache/fluss/fs/s3/token/DynamicTemporaryAWSCredentialsProvider.java Updated to throw InvalidCredentialsException instead of NoAwsCredentialsException
fluss-filesystems/fluss-fs-s3/src/main/java/org/apache/fluss/fs/s3/token/S3DelegationTokenReceiver.java Updated to throw InvalidCredentialsException instead of NoAwsCredentialsException

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sd4324530 Thanks for the pr. Left minor comments

<groupId>org.slf4j</groupId>
<artifactId>slf4j-reload4j</artifactId>
</exclusion>
<exclusion>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://hadoop.apache.org/release/3.4.1.html
"We have also introduced a lean tar which is a small tar file that does not contain the AWS SDK because the size of AWS SDK is itself 500 MB. This can ease usage for non AWS users. Even AWS users can add this jar explicitly if desired."

If we use 3.4.1, the fat jar won't be included.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this should be a better solution. I'll test it and give my conclusion later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luoyuxia Hadoop 3.4.1 provides a "tar" file instead of the "jar" file we usually use. Simply adjusting the version number in the pom.xml file will not reduce the final package size.
I think we still need to manually exclude the fat JAR.


public static final String E_NO_AWS_CREDENTIALS = "No AWS Credentials";

public InvalidCredentialsException(String credentialProvider) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the class not found issue. I also check the v0.8 jar for s3

com/amazonaws/SdkClientException.class

The same with the jar build with this pr

com/amazonaws/SdkClientException.class

So, it looks to me that the same problem will also happen in v0.8, right?

Copy link
Contributor Author

@sd4324530 sd4324530 Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for reminding me. I checked it again carefully.
Fluss v0.8 depends on hadoop version 3.3.4.
I found that the extends chain of NoAwsCredentialsException has changed when upgrading from 3.3.4 to 3.4.0:

hadoop 3.3.4:

classDiagram
direction RL
    class AmazonClientException
    class CredentialInitializationException
    class NoAuthWithAWSException
    class NoAwsCredentialsException
    
    AmazonClientException --|> CredentialInitializationException
    CredentialInitializationException --|> NoAuthWithAWSException
    NoAuthWithAWSException --|> NoAwsCredentialsException
Loading

hadoop 3.4.0:

classDiagram
direction RL
    class SdkClientException
    class CredentialInitializationException
    class NoAuthWithAWSException
    class NoAwsCredentialsException
    
    SdkClientException--|> CredentialInitializationException
    CredentialInitializationException --|> NoAuthWithAWSException
    NoAuthWithAWSException --|> NoAwsCredentialsException
Loading

In Hadoop 3.3.4, AmazonClientException originates from aws-java-sdk-core-1.12.319.jar, which is only 1MB in size.
see: https://github.com/apache/fluss/blob/release-0.8/fluss-filesystems/fluss-fs-s3/pom.xml#L188-L192
In Hadoop 3.4.0, SdkClientException originates from bundle-2.23.19.jar, which is 500+MB in size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explantion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The size of fluss-fs-s3 is too big

2 participants