Record rollout start time and show rollout latency in UI #398

benjibc · 2026-01-07T05:28:02Z

Motivation

Capture an explicit start timestamp for each rollout so per-rollout latency and trace alignment can be computed and rendered.
The frontend needs an overall latency column for each rollout to help visualize performance (OTEL-style waterfall was requested).
Existing created_at timestamps refer to the invocation and cannot be used to compute per-rollout start times.
Make rollout timing available in the TS schema so the UI can sort/filter by latency.

Description

Added rollout_start_time to ExecutionMetadata in eval_protocol/models.py and extended the TypeScript schema in vite-app/src/types/eval-protocol.ts with rollout_start_time, rollout_duration_seconds, and eval_duration_seconds.
Stamp rollout_start_time at rollout start in the main rollout entry points by setting it before the processing timer in processors such as default_single_turn_rollout_process.py, default_pydantic_ai_rollout_processor.py, remote_rollout_processor.py, github_action_rollout_processor.py, openenv_rollout_processor.py, default_klavis_sandbox_rollout_processor.py, default_agent_rollout_processor.py, tinker_rollout_processor.py, priority_scheduler.py, and mcp/execution/manager.py.
Surface rollout latency in the frontend by adding a sortable Rollout Latency column in vite-app/src/components/EvaluationTable.tsx, a RowRolloutDuration renderer, and wiring the cell in vite-app/src/components/EvaluationRow.tsx to display execution_metadata.rollout_duration_seconds formatted as seconds.
Minor plumbing to ensure rollout durations are still computed where previously used (rollout_duration_seconds assignments remain unchanged) while providing the start timestamp for future trace alignment.

Testing

Attempted to run pre-commit via make pre-commit to run linters/type checks but it failed because the pre-commit tool is not installed in the environment.
Attempted npm install in vite-app to validate frontend dependencies but it failed with Cannot read properties of null (reading 'matches') from npm in this environment.
No unit test suite was executed successfully in this environment as automated checks above did not complete.
All code changes were added and committed locally (git commit) after the edits completed successfully.

Codex Task

Note

Adds explicit rollout timing to enable accurate per-rollout latency and sorting.

Protocol: Add rollout_start_time to ExecutionMetadata plus rollout_duration_seconds and eval_duration_seconds fields; keep existing duration plumbing; serialize in models.py and TS ExecutionMetadataSchema.
Processors/manager: Set execution_metadata.rollout_start_time at rollout start and compute rollout_duration_seconds in tinker_rollout_processor.py, mcp/execution/manager.py, default_single_turn_rollout_process.py, default_pydantic_ai_rollout_processor.py, default_agent_rollout_processor.py, default_klavis_sandbox_rollout_processor.py, openenv_rollout_processor.py, remote_rollout_processor.py, github_action_rollout_processor.py, and priority_scheduler.py.
Frontend: Add sortable "Rollout Latency" column in EvaluationTable.tsx and render with RowRolloutDuration in EvaluationRow.tsx using execution_metadata.rollout_duration_seconds.

^{Written by Cursor Bugbot for commit 9171bc5. This will update automatically on new commits. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9171bc586c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-07T05:33:20Z

eval_protocol/pytest/default_single_turn_rollout_process.py

+            if row.execution_metadata.rollout_start_time is None:
+                row.execution_metadata.rollout_start_time = datetime.now(timezone.utc)


Reset rollout_start_time on retries

In retry flows (rollout_processor_with_retry reuses the same EvaluationRow), this if ... is None guard means the timestamp is only set on the first attempt. If the first attempt fails and the row is retried, the successful attempt keeps the earlier rollout_start_time while rollout_duration_seconds reflects only the last attempt, so any latency calculation or trace alignment based on rollout_start_time will be too early by the time spent in prior retries. Consider resetting rollout_start_time at the start of each attempt (or in the retry wrapper) to keep these timings consistent.

Useful? React with 👍 / 👎.

Add rollout start time and latency display

9171bc5

benjibc added the codex label Jan 7, 2026 — with ChatGPT Codex Connector

chatgpt-codex-connector bot reviewed Jan 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record rollout start time and show rollout latency in UI #398

Record rollout start time and show rollout latency in UI #398

Uh oh!

benjibc commented Jan 7, 2026 •

edited by cursor bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if row.execution_metadata.rollout_start_time is None:
		row.execution_metadata.rollout_start_time = datetime.now(timezone.utc)

Record rollout start time and show rollout latency in UI #398

Are you sure you want to change the base?

Record rollout start time and show rollout latency in UI #398

Uh oh!

Conversation

benjibc commented Jan 7, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

benjibc commented Jan 7, 2026 •

edited by cursor bot

Loading