-
Notifications
You must be signed in to change notification settings - Fork 109
impl(pubsub): retry acks on connection closed #4475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
impl(pubsub): retry acks on connection closed #4475
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4475 +/- ##
==========================================
- Coverage 95.04% 95.03% -0.01%
==========================================
Files 195 195
Lines 7427 7433 +6
==========================================
+ Hits 7059 7064 +5
- Misses 368 369 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
||
| impl RetryPolicy for AtLeastOnceRetryPolicy { | ||
| fn on_error(&self, state: &RetryState, error: Error) -> RetryResult { | ||
| if state.attempt_count == 1 && error.is_transport() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine, but a OnlyTransportErrors.with_attempt_limit(1) would give you more reusable components.
Do you want to retry auth errors too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because auth errors would warrant some kind of backoff.
The problem I am trying to solve is losing Acks when a channel closes, which reliably happens every hour.
From testing, this change reduces the lost acks, but not entirely. I missed that the same Ack requests could get retried on a different channel that is also closed, and receive another transport error.
So maybe I should retry these immediately up to N times where N is the number of channels.
Or maybe I should manually reset the channels every 55 minutes, or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because auth errors would warrant some kind of backoff.
The problem I am trying to solve is losing Acks when a channel closes, which reliably happens every hour.
Right. I am arguing that maybe you should should retry any request that never hit the network. 🤷
Or maybe I should manually reset the channels every 55 minutes, or something.
Maybe there is a way to keep the channels active. I think gRPC for C++ had some configs to send keep alive messages every X seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I didn't look there. I'll see if it helps. Thanks.
|
|
||
| impl BackoffPolicy for AtLeastOnceBackoffPolicy { | ||
| fn on_failure(&self, _state: &RetryState) -> Duration { | ||
| Duration::ZERO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why, but this gives me the heebie-jeebies. Works, but only because of that attempt limit.
Fixes #4471
I have been running a benchmark for >1 hour with these changes, and the client reports no failed acks thus far.