Skip to content

Conversation

@alin-at-dfinity
Copy link

Regardless of max_polling_time, it makes no sense to keep polling for an Unknown status after more than 5 minutes. At that point the ingress message will have expired and will never be inducted.

Without this, and with ic-system-test-driver agents recently defaulting to a max_polling_time of 3600 seconds, tests with a timeout of 3600 seconds will fail if an ingress message is lost (which tends to be triggered by intentional replica restarts).

Checklist:

  • The title of this PR complies with Conventional Commits.
  • I have edited the CHANGELOG accordingly.
  • I have made corresponding changes to the documentation.

Regardless of `max_polling_time`, it makes no sense to keep polling for an `Unknown` status after more than 5 minutes. At that point the ingress message will have expired and will never be inducted.

Without this, and with `ic-system-test-driver` agents recently defaulting to a `max_polling_time` of 3600 seconds, tests with a timeout of 3600 seconds will fail if an ingress message is lost (which tends to be triggered by intentional replica restarts).
@alin-at-dfinity alin-at-dfinity requested a review from a team as a code owner January 27, 2026 18:44
github-merge-queue bot pushed a commit to dfinity/ic that referenced this pull request Jan 28, 2026
IMHO, if a test has to wait for one hour for a response to be produced,
then it's not a very useful test. And regardless of that, one hour
should not be the default for every single system test.

This is intended as a fix for the significant flakiness experienced
recently by
`//rs/tests/message_routing:state_sync_malicious_chunk_test`. All the
failures are timeouts. And the last relevant message before the test
times out is `Checking for subnet progress...`. After which, the test
proceeds to make a canister call that never completes and is never
retried (because the test expires before the call, after 3600 seconds).

The other half of the fix (dfinity/agent-rs#693,
which will have to wait for an `ic-agent` release) is to bail out if the
status is still `Unknown` after 5 minutes, regardless of the configured
`max_polling_time`.
github-merge-queue bot pushed a commit to dfinity/ic that referenced this pull request Jan 28, 2026
IMHO, if a test has to wait for one hour for a response to be produced,
then it's not a very useful test. And regardless of that, one hour
should not be the default for every single system test.

This is intended as a fix for the significant flakiness experienced
recently by
`//rs/tests/message_routing:state_sync_malicious_chunk_test`. All the
failures are timeouts. And the last relevant message before the test
times out is `Checking for subnet progress...`. After which, the test
proceeds to make a canister call that never completes and is never
retried (because the test expires before the call, after 3600 seconds).

The other half of the fix (dfinity/agent-rs#693,
which will have to wait for an `ic-agent` release) is to bail out if the
status is still `Unknown` after 5 minutes, regardless of the configured
`max_polling_time`.
github-merge-queue bot pushed a commit to dfinity/ic that referenced this pull request Jan 28, 2026
IMHO, if a test has to wait for one hour for a response to be produced,
then it's not a very useful test. And regardless of that, one hour
should not be the default for every single system test.

This is intended as a fix for the significant flakiness experienced
recently by
`//rs/tests/message_routing:state_sync_malicious_chunk_test`. All the
failures are timeouts. And the last relevant message before the test
times out is `Checking for subnet progress...`. After which, the test
proceeds to make a canister call that never completes and is never
retried (because the test expires before the call, after 3600 seconds).

The other half of the fix (dfinity/agent-rs#693,
which will have to wait for an `ic-agent` release) is to bail out if the
status is still `Unknown` after 5 minutes, regardless of the configured
`max_polling_time`.
github-merge-queue bot pushed a commit to dfinity/ic that referenced this pull request Jan 28, 2026
IMHO, if a test has to wait for one hour for a response to be produced,
then it's not a very useful test. And regardless of that, one hour
should not be the default for every single system test.

This is intended as a fix for the significant flakiness experienced
recently by
`//rs/tests/message_routing:state_sync_malicious_chunk_test`. All the
failures are timeouts. And the last relevant message before the test
times out is `Checking for subnet progress...`. After which, the test
proceeds to make a canister call that never completes and is never
retried (because the test expires before the call, after 3600 seconds).

The other half of the fix (dfinity/agent-rs#693,
which will have to wait for an `ic-agent` release) is to bail out if the
status is still `Unknown` after 5 minutes, regardless of the configured
`max_polling_time`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant