impl(pubsub): retry acks on connection closed #4475

dbolduc · 2026-01-31T00:42:08Z

I have been running a benchmark for >1 hour with these changes, and the client reports no failed acks thus far.

codecov · 2026-01-31T00:54:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.03%. Comparing base (eb74294) to head (4bcaadf).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4475      +/-   ##
==========================================
- Coverage   95.04%   95.03%   -0.01%     
==========================================
  Files         195      195              
  Lines        7427     7433       +6     
==========================================
+ Hits         7059     7064       +5     
- Misses        368      369       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coryan · 2026-01-31T15:48:58Z

src/pubsub/src/subscriber/retry_policy.rs

+
+impl RetryPolicy for AtLeastOnceRetryPolicy {
+    fn on_error(&self, state: &RetryState, error: Error) -> RetryResult {
+        if state.attempt_count == 1 && error.is_transport() {


Fine, but a OnlyTransportErrors.with_attempt_limit(1) would give you more reusable components.

Do you want to retry auth errors too?

No, because auth errors would warrant some kind of backoff.

The problem I am trying to solve is losing Acks when a channel closes, which reliably happens every hour.

From testing, this change reduces the lost acks, but not entirely. I missed that the same Ack requests could get retried on a different channel that is also closed, and receive another transport error.

So maybe I should retry these immediately up to N times where N is the number of channels.

Or maybe I should manually reset the channels every 55 minutes, or something.

No, because auth errors would warrant some kind of backoff.

The problem I am trying to solve is losing Acks when a channel closes, which reliably happens every hour.

Right. I am arguing that maybe you should should retry any request that never hit the network. 🤷

Or maybe I should manually reset the channels every 55 minutes, or something.

Maybe there is a way to keep the channels active. I think gRPC for C++ had some configs to send keep alive messages every X seconds.

Maybe this:

https://docs.rs/tonic/0.14.2/tonic/transport/channel/struct.Endpoint.html#method.http2_keep_alive_interval

Hm, I didn't look there. I'll see if it helps. Thanks.

coryan · 2026-01-31T15:51:54Z

src/pubsub/src/subscriber/retry_policy.rs

+
+impl BackoffPolicy for AtLeastOnceBackoffPolicy {
+    fn on_failure(&self, _state: &RetryState) -> Duration {
+        Duration::ZERO


I am not sure why, but this gives me the heebie-jeebies. Works, but only because of that attempt limit.

impl(pubsub): retry acks on connection closed

43eee02

product-auto-label bot added the api: pubsub Issues related to the Pub/Sub API. label Jan 31, 2026

clippy

4bcaadf

dbolduc marked this pull request as ready for review January 31, 2026 04:22

dbolduc requested a review from a team as a code owner January 31, 2026 04:22

coryan approved these changes Jan 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

impl(pubsub): retry acks on connection closed #4475

impl(pubsub): retry acks on connection closed #4475

dbolduc commented Jan 31, 2026

Uh oh!

codecov bot commented Jan 31, 2026 •

edited

Loading

Uh oh!

coryan Jan 31, 2026

Uh oh!

dbolduc Jan 31, 2026

Uh oh!

coryan Jan 31, 2026

Uh oh!

coryan Jan 31, 2026

Uh oh!

dbolduc Jan 31, 2026

Uh oh!

coryan Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

impl(pubsub): retry acks on connection closed #4475

Are you sure you want to change the base?

impl(pubsub): retry acks on connection closed #4475

Conversation

dbolduc commented Jan 31, 2026

Uh oh!

codecov bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coryan Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

dbolduc Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coryan Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coryan Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

dbolduc Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

coryan Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 31, 2026 •

edited

Loading