Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

Copy link
Contributor Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Jan 27, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4053

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@4053

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4053

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4053

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4053

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4053

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4053

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4053

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4053

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4053

commit: e4b540a

@claude
Copy link

claude bot commented Jan 27, 2026

Pull Request Review: Fix active request calculation and add HWS metrics

Overview

This PR fixes a critical bug in active request tracking and adds new metrics for hibernating WebSocket connections.


🐛 Critical Bug Fix

Line 67 in engine/packages/pegboard/src/workflows/actor/metrics.rs:

-let mut destroy = true;
+let mut destroy = false;

Impact: The previous code incorrectly initialized destroy = true, causing the metrics workflow to terminate prematurely. The fix ensures the workflow continues until explicitly destroyed.

Good catch! This bug would have prevented proper metrics collection for long-running actors.


📊 New Feature: Hibernating WebSocket Metrics

Added tracking for hibernating WebSocket (HWS) connections:

  • Metric::WebsocketHibernate - Increments active HWS request count
  • Metric::WebsocketStopHibernate - Decrements active HWS request count
  • Metrics recorded on hibernation entry (line 738) and exit (line 770)

Well-structured - Follows the same pattern as HTTP and regular WebSocket metrics.


🔧 Metrics Recording Improvements

Parallel Execution: Changed metrics recording from sequential to parallel using tokio::join!

Benefits:

  • Reduces latency by recording metrics concurrently with request handling
  • Better error isolation

⚠️ Concern: Error messages state "likely corrupt now" when egress metrics fail after ingress succeeds. This could lead to permanently corrupted active request counts.

Suggestion: Consider implementing:

  • A reconciliation mechanism for inconsistent states
  • Sufficient logging context (request_id, actor_id, runner_id) for manual cleanup
  • A recovery workflow to detect and fix inconsistent states

🔄 Gasoline Event System Changes

VersionCheckEvent Enhancement: Changed from empty marker to data-carrying event with inner_version field.

Good: Provides better debugging info and follows the pattern of other events.

Backward Compatibility: Falls back to event version for old events that lack inner_version.

Excellent: Maintains compatibility with existing data.


🗑️ Configuration Cleanup

Removed from engine/artifacts/config-schema.json:

  • provision_users field
  • ClickHouseUser and ClickHouseUserRole definitions

Question: Confirm these are unused and not referenced elsewhere in the codebase.


📝 Minor Changes

✅ Unused parameter prefixed with underscore (engine/packages/gasoline/src/db/kv/mod.rs:624)
✅ Explicit type annotation added for clarity (engine/packages/namespace/src/workflows/metrics_exporter.rs:152)
✅ Extracted get_runner_protocol_version function (engine/packages/pegboard-gateway/src/lib.rs:887-902)


🧪 Testing Considerations

⚠️ Missing test coverage:

  1. The critical destroy = false fix needs a test
  2. HWS metrics tracking should be tested for consistency
  3. Error handling paths in metrics recording

Recommended tests:

  • Test metrics workflow lifecycle with Pause/Resume/Destroy signals
  • Test HWS metrics increment/decrement correctly
  • Test partial metrics failure scenarios

🔒 Security Considerations

✅ No security concerns. Changes are internal metrics tracking only.


⚡ Performance Considerations

Improved: Parallel metrics recording reduces request latency

⚠️ Watch: If metrics failures become common, error logging could be noisy. Consider rate-limiting.


📋 Summary

Strengths:

  • Critical bug fix for metrics collection
  • Well-structured HWS metrics following existing patterns
  • Good backward compatibility handling
  • Improved performance with parallel execution

Areas for Improvement:

  1. Add test coverage for the destroy flag fix
  2. Implement metrics consistency reconciliation
  3. Update PR description with context

Recommendations:

  1. Add test coverage for the critical bug fix
  2. Implement metrics reconciliation or add sufficient logging for manual recovery
  3. Update PR description with context and testing details
  4. Verify removed ClickHouse config fields are not referenced elsewhere

✅ Approval Status

Approve with suggestions - The critical bug fix is correct and new features are well-implemented. Adding test coverage (especially for the destroy flag fix) would significantly improve confidence.

@MasterPtato MasterPtato force-pushed the 01-27-fix_fix_active_req_caculation_add_hws_metrics branch from 0a34c06 to e4b540a Compare January 28, 2026 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants