HDDS-14426. OMResponse.leaderOMNodeId should not use RaftClientMessage.serverId #9682
+70
−116
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Currently in OMRatisHelper#getOMResponseFromRaftClientReply, the OMResponse.leaderOMNodeId is set to the replierId (i.e. RaftClientReply.serverId) which is set to the Raft peer that is serving the request.
Previously, it was fine since all Ratis request all write requests that need to be served by the leader. However, now that Ozone support OM follower read, OM cannot use replierId as the OMResponse.leaderOMNodeId.
Hadoop27OmTransport and Hadoop3OmTransport submitRequest will set the next OM node ID to this OM node ID. If the request is incorrectly set to a non-leader OM, this might cause the subsequent failover to wrongly goes to a follower, which might add to the write latency.
Therefore, I decided to remove the failover logic from the Hadoop transport because it causes the client to failover twice
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14426
How was this patch tested?
IT (Clean CI: https://github.com/ivandika3/ozone/actions/runs/21431935770)
Note:
testOMProxyProviderFailoverToCurrentLeaderis failing because of the removal ofsetNextOmProxylogic in Hadoop transport. I removed it because I believe the test is not correct.The test assumes that if we trigger
OmFailoverProxyProvider#performFailover, the next OM request will be sent to the new current OM node ID sinceFailoverProxyProvider#getProxyis already updated to the new OM node ID. However, this is incorrect since OM client wraps the RPC proxy withRetryProxy#createwhich usesRetryInvocationHandler.RetryInvocationHandlercaches the current proxy inProxyDescriptor.proxyInfoand only callFailoverProxyProvider#getProxywhen there is a failover event which triggersProxyDescriptor#failover. So even if the test changes the OM node ID to non-leader, theRetryInvocationHandlerwill keep sending to the leader until the leader steps down / down / disconnected.We can technically make the test pass by removing the cached
ProxyDescriptor.proxyInfoand makeRetryInvocationHandler.ProxyDescriptor#getProxycallsFailoverProxyProvider#getProxydirectly. However, this might affect other use Hadoop client use cases whereFailoverProxyProvider#getProxyrecreates the proxy every time.