Add Ralph Wiggum Loop and beads infrastructure #87

javdl · 2026-01-30T10:32:51Z

Summary

Adds loop.sh and PROMPT_*.md files for Ralph Wiggum autonomous loop
Adds beads issue tracking sections to AGENTS.md (Issue Tracking, Landing the Plane, Multi-Agent Orchestration, Related Tools)
Patches flake.nix with go package and beads auto-install shellHook
Adds .gitignore entries for Ralph/beads runtime state
Adds Ralph Wiggum Loop documentation to README.md

Context

Bootstrapped from the about repo via scripts/bootstrap-ralph.sh as part of the Ralph Wiggum Loop distribution epic.

Test plan

nix develop enters shell without errors
bd --version works after shell entry
./loop.sh --help shows usage
No untracked runtime files appear in git status

Adds loop.sh, PROMPT files, AGENTS.md with beads workflow, beads shellHook in flake.nix, and .gitignore entries for runtime state.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e37f64bcba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-30T10:35:45Z

loop.sh

+            plan|build)
+                CONFIG[mode]="$1"
+                [[ "$1" == "plan" ]] && [[ ${CONFIG[max_iterations]} -eq 0 ]] && CONFIG[max_iterations]=3
+                shift ;;


Make plan mode default to 3 iterations

When users run ./loop.sh plan without -n, the script still keeps the global default of 500 iterations because the plan-specific override only triggers if max_iterations is already 0. This contradicts both the help text and README examples that say plan mode defaults to 3 iterations, and will make plan runs much longer than intended. This will occur any time plan mode is used without explicitly setting -n or a config override.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-01-30T10:35:45Z

flake.nix

+          shellHook = ''
+            if ! command -v bd &> /dev/null; then
+              echo "Installing beads (bd) for AI agent task tracking..."
+              go install github.com/steveyegge/beads/cmd/bd@latest
+            fi


Ensure bd is discoverable after go install

The shellHook installs bd with go install but never adds $(go env GOPATH)/bin (or GOBIN) to PATH. On systems where GOPATH/bin is not already on PATH (common in a fresh nix develop), command -v bd will keep failing, causing the install to repeat every shell entry and bd --version to keep failing. This makes the “auto-install” ineffective unless users manually fix their PATH.

Useful? React with 👍 / 👎.

claude · 2026-01-30T10:35:46Z

Code Review: PR #87 - Add Ralph Wiggum Loop and beads infrastructure

Summary

This PR adds the Ralph Wiggum autonomous loop system and beads infrastructure to the project. It bootstraps automation tooling for AI-driven development workflows.

✅ Strengths

Well-documented: Excellent README documentation with clear examples and configuration options
Comprehensive implementation: The loop.sh script is feature-rich with TUI, metrics, session management, and error handling
Good integration: Proper .gitignore entries to avoid tracking runtime state
Clear separation: Runtime state (.ralph/) is separate from issue tracking (.beads/)

🔴 Critical Issues

1. Security: Dangerously Skip Permissions Flag (HIGH PRIORITY)

Location: loop.sh:839, README.md:50

This flag bypasses all permission checks and is extremely risky in an autonomous loop that runs indefinitely. The script could delete important files, make unauthorized API calls, push destructive changes, or execute arbitrary commands without oversight.

Recommendation: Remove --dangerously-skip-permissions or add extensive documentation about security implications. Consider implementing a whitelist of safe operations and add a prominent warning in the README.

2. Hardcoded Jujutsu (jj) Dependency

Locations: Throughout loop.sh (lines 703-783), PROMPT_build.md:25, PROMPT_review.md:1,6

Issue: The script assumes Jujutsu VCS is available but this project uses Git. No jj installation instructions in README or flake.nix. Functions like vcs_branch(), vcs_commit_short(), vcs_push() will fail.

Recommendation: Add Git fallback logic or detect which VCS is in use. Update flake.nix to include jj package if required. Document the requirement prominently or abstract VCS operations to support both Git and Jujutsu.

3. Missing Dependency: codex CLI

Location: loop.sh:979, README.md:78-84

Issue: Review functionality requires codex CLI tool but it's not mentioned in dependencies or flake.nix. Will cause fatal errors when review is enabled (default: true).

Recommendation: Add codex installation instructions to README. Consider adding to flake.nix if available. Make review feature gracefully degrade if unavailable.

⚠️ Major Issues

4. PROMPT Files Reference Wrong Project Structure

Location: PROMPT_build.md:5, PROMPT_plan.md:4

These prompts reference src/lib/* but should be customized for this Astro/Starlight project with src/content/docs/, src/components/, src/pages/.

5. Excessive Subagent Suggestion

Location: PROMPT_build.md:13

500 parallel subagents is excessive and will hit rate limits immediately, overwhelm the system, and cause unnecessary API costs. Reduce to 10-20.

6. Missing Error Handling in ShellHook

Location: flake.nix:36-42

go install requires GOPATH/bin in PATH. No error handling if installation fails. Add PATH export and error handling.

🟡 Minor Issues

Bash Script Complexity: 1516 lines - consider breaking into modules and adding shellcheck validation
Cost Calculation Hardcoded: Pricing hardcoded for Opus but script supports multiple models
Incomplete Test Plan: All checkboxes unchecked in PR description
Documentation Inconsistencies: AGENTS.md:34 says sync with jj but should clarify it syncs beads database

🎯 Required Before Merge

Fix Jujutsu/Git incompatibility - Critical blocker
Document codex dependency or make review optional
Remove or document --dangerously-skip-permissions - Security concern
Complete test plan checkboxes
Update PROMPT files to match this project's structure

📝 Verdict

This is an ambitious and well-crafted implementation of autonomous AI development tooling. However, there are critical issues around VCS compatibility, missing dependencies, and security that must be addressed before merging.

Recommendation: Request changes. The core implementation is solid but needs adaptation to this specific project's environment and security posture.

claude · 2026-01-30T10:42:33Z

Code Review: Ralph Wiggum Loop Infrastructure (PR #87)

Summary

This PR adds the Ralph Wiggum autonomous loop infrastructure - a sophisticated bash-based automation system that orchestrates Claude Code iterations with beads issue tracking and optional AI-powered code review. This is a significant addition (1,703 additions, 0 deletions).

Overall Assessment

✅ APPROVE with recommendations

The code is well-structured and production-grade. However, there are several areas where improvements would enhance security, maintainability, and reliability.

Critical Issues (Must Fix Before Merge)

1. Command Injection Risk in loop.sh

Severity: HIGH | Location: loop.sh:825-830, loop.sh:990-995

The script pipes content to claude and codex commands without validation. Model name from config isn't validated, uses --dangerously-skip-permissions flag, and prompt file path not validated.

Recommendation: Add validation for model names (opus|sonnet|haiku) and verify prompt files exist before use.

2. Webhook URL Validation Missing

Severity: MEDIUM | Location: loop.sh:662-683

Webhook URL from config used directly in curl without validation. Could access internal services.

Recommendation: Validate URL format and restrict to http/https schemes only.

3. Config Validation Incomplete

Severity: MEDIUM | Location: loop.sh:1506-1516

Config values loaded without type/range validation. No checks for numeric values (max_iterations, delay), valid model names, or boolean normalization.

Recommendation: Add validate_config() function to check all config values.

4. PIPESTATUS Race Condition

Severity: MEDIUM | Location: loop.sh:856-858

Exit code capture from PIPESTATUS may be unreliable due to intermediate commands.

Recommendation: Capture PIPESTATUS immediately into an array variable to prevent race conditions.

Medium Priority Issues

5. Temp File Cleanup (LOW)

Some temp files not properly cleaned up (loop.sh:1203-1212, 973). Use trap-based cleanup with TEMP_FILES array.

6. JSON Parsing Performance (LOW)

Multiple jq processes spawned in session_list (40 total for 10 sessions). Parse all fields in single jq call using @TSV format.

7. Magic Numbers (LOW)

Hardcoded values throughout. Extract to readonly constants with descriptive names.

Documentation Issues

8. Missing Prerequisites (LOW)

Ralph Wiggum section doesn't list required tools: claude CLI + auth, codex CLI (for review), jj (Jujutsu), bd (beads), bc utility.

9. Generic Placeholder Text (LOW)

PROMPT_build.md:4 and PROMPT_plan.md:26-27 contain generic placeholders that should be customized for this Astro documentation project.

Testing Recommendations

Please verify:

Test with --no-review flag
Test with --max 1 (single iteration)
Test interrupt handling (Ctrl+C)
Test with invalid config values
Test with missing dependencies
Run shellcheck -x loop.sh and address warnings

Positive Aspects ✅

Comprehensive error handling with retry logic and rate limiting
Rich TUI with colors, progress bars, and status displays
Session persistence with resume functionality
Detailed logging and metrics for observability
Graceful shutdown with proper signal handling
Modular design with clear function separation
Good use of readonly variables for constants
UTF-8 fallbacks for cross-platform compatibility
Webhook integration for external monitoring
Innovative review-revision loop implementation

Conclusion

This is a well-crafted, production-grade bash script that brings sophisticated autonomous development capabilities to the project. The code demonstrates advanced bash techniques, proper error handling, and thoughtful UX design.

The security concerns are manageable and can be addressed with validation layers. With the recommended fixes, this will be a valuable addition to the development workflow.

Recommendation: APPROVE with required changes for security validation ✅

Great work on this implementation! 🚀

Reviewed with Claude Code following project conventions from CLAUDE.md

claude · 2026-01-30T10:49:15Z

Code Review - PR #87: Add Ralph Wiggum Loop and beads infrastructure

Overview

This PR adds the Ralph Wiggum autonomous loop infrastructure to enable AI agents to work continuously on tasks tracked by the beads issue tracker. The changes include a comprehensive bash script, prompt templates, documentation, and Nix environment setup.

✅ Strengths

Comprehensive Implementation: The loop.sh script (1,516 lines) is production-grade with extensive features including session management, metrics tracking, webhooks, and a TUI dashboard.
Well-Documented: The README addition provides clear instructions on how to use the loop system with multiple examples.
Proper Gitignore: Ralph runtime state files are appropriately gitignored to prevent repository pollution.
Nix Integration: The flake.nix changes enable automatic installation of beads, improving developer experience.
Modular Design: Separate prompt files for build, plan, and review phases promote separation of concerns.

🔴 Critical Issues

1. VCS Commands Assume Jujutsu (not Git)

Severity: High

The loop.sh script heavily uses jj (Jujutsu) commands throughout:

Lines 690-770: Functions like vcs_branch(), vcs_commit_short(), vcs_changes_summary()
Line 763: jj git push
Lines 976-977, 1151-1256: Multiple jj log and jj diff calls
PROMPT_build.md line 26: jj describe -m and jj git push
PROMPT_review.md lines 7, 21: Expects jj commands

Problem: This repository uses Git, not Jujutsu. All these commands will fail.

Evidence from PR description: "Sync with jj (run at session end)" but the repo uses standard git.

Recommendation: Either:

Add jj as a requirement to the Nix flake and document it
Or convert all jj commands to git equivalents
Or check for jj availability and fallback to git

2. Hardcoded References to External Dependencies

Severity: Medium

loop.sh line 856: References PIPESTATUS which is bash-specific but may not work reliably in all environments.

PROMPT_build.md line 12: "Use up to 500 parallel subagents" - this seems unrealistic and could cause resource exhaustion.

3. Review Phase Requires `codex` Command

Severity: Medium

Lines 966-969, 990-995 in loop.sh require the codex command which:

Is not listed as a dependency
Is not installed by the Nix flake
Will cause fatal errors if review is enabled (default: true)

Recommendation: Either add codex to dependencies or disable review by default with clear documentation.

4. AGENTS.md Has Duplicate Content

Severity: Low

Lines 63-85 duplicate the "Issue Tracking" and "Related Tools" sections that already appear at lines 5-46. This creates maintenance burden and potential inconsistencies.

5. Missing Error Handling in shellHook

Severity: Low

flake.nix lines 37-42: The shellHook attempts to install beads but:

Doesn't check if go install succeeds
Doesn't verify the installation path is in PATH
Could silently fail leaving users without bd

⚠️ Security Concerns

1. Command Injection Risk

Severity: Medium

loop.sh line 825-830: The streaming function pipes user input directly to claude without sanitization. While the script uses set -euo pipefail, there's potential for injection if prompt files contain malicious content.

2. Webhook Without Authentication

Severity: Low

Lines 662-683: Webhooks are sent without any authentication mechanism. If webhook_url is misconfigured, sensitive session data could be leaked.

Recommendation: Add support for webhook secrets/authentication headers.

🔍 Code Quality Issues

1. Inconsistent Error Handling

Some functions return error codes, others use log ERROR without returning
Exit codes are inconsistent (0, 1, 2, 3 used in run_review but not documented)

2. Magic Numbers

Line 412: 10485760 (10MB) should be a constant
Line 890: Token costs hardcoded (5/M, 5/M) - will become outdated
Line 299: Default max_iterations=500 seems arbitrary

3. Testing Gap

No tests are provided for the 1,516-line bash script. The loop.sh script has complex logic including:

State management
Token parsing
Cost calculation
Review workflows

Recommendation: At minimum, add integration tests or document manual testing procedures.

4. PROMPT Files Reference Wrong Commands

PROMPT_build.md line 26 uses jj describe which doesn't match project's git workflow
PROMPT_plan.md line 26 references "[project-specific goal]" placeholder that should be filled in

📝 Documentation Issues

1. README Example Mismatch

README.md line 166: Shows the Ralph Wiggum loop as:

while :; do cat prompt.md | claude --dangerously-skip-permissions;done

But the actual loop.sh is far more complex. This oversimplification might confuse users.

2. Missing Prerequisites

Neither README nor documentation mentions that users need:

bd (beads) installed
jj (Jujutsu) installed (if that's truly required)
codex installed (for review phase)
claude CLI installed

3. Config File Format Unclear

README shows TOML config but loop.sh's load_config() (lines 1506-1516) uses a naive parser that may not handle all TOML features correctly.

🎯 Performance Considerations

Token Cost Calculation: Lines 890-894 calculate costs but use bc which may not be available on all systems. Consider using awk or bash arithmetic.
Log Rotation: Line 412 checks log size but rotation could be more efficient with logrotate or similar.
Metrics File Growth: metrics.jsonl will grow unbounded - consider rotation or archival strategy.

✨ Suggestions for Improvement

Add a --check flag to verify all dependencies (bd, jj/git, codex, claude) before starting the loop.
Make review phase opt-in rather than opt-out until codex dependency is documented/installed.
Add set -x debug mode flag for troubleshooting.
Clean up AGENTS.md to remove duplicate sections.
Add examples of actual .ralph/config.toml files for common scenarios.
Document the review workflow more clearly - the SHIP/REVISE pattern is clever but needs more explanation.

🧪 Test Coverage

Missing:

No unit tests for bash functions
No integration tests for the loop workflow
No tests for prompt templates
No validation that jj commands work in this repo

Recommendation: At minimum, add:

# Test that required commands exist
./loop.sh --check-deps
# Test config loading
# Test session management

🔒 Security Assessment

Overall Risk: Low-Medium

The code itself is not malicious, but:

Runs with --dangerously-skip-permissions flag (by design for autonomous operation)
Could potentially execute arbitrary commands from prompts
Webhook functionality could leak data if misconfigured
No input validation on config values

Recommendation: Add validation for:

Webhook URLs (must be HTTPS)
Model names (whitelist)
Iteration limits (sanity checks)
Epic IDs (format validation)

✅ Recommendations Summary

Before Merging:

✋ BLOCKER: Fix Jujutsu vs Git incompatibility
✋ BLOCKER: Remove duplicate content in AGENTS.md
🔧 Document codex requirement or disable review by default
📝 Clarify prerequisites in README

After Merging:
5. 🧪 Add basic integration tests
6. 🔒 Add webhook authentication support
7. 📊 Improve config parsing (use proper TOML library or document limitations)
8. ♻️ Add log rotation strategy

📊 Metrics

Files Changed: 8
Lines Added: 1,703
Lines Deleted: 0
Complexity: High (1,500+ line bash script)
Test Coverage: 0%

Conclusion

This is an ambitious and well-thought-out addition that brings autonomous AI agent capabilities to the project. The implementation quality is generally high, with good separation of concerns and extensive features. However, the critical Jujutsu/Git incompatibility must be resolved before merging, and the documentation needs to clearly state all prerequisites.

Recommendation: Request Changes - Address the Jujutsu/Git issue and documentation gaps before merging.

feat: add Ralph Wiggum Loop and beads infrastructure

e37f64b

Adds loop.sh, PROMPT files, AGENTS.md with beads workflow, beads shellHook in flake.nix, and .gitignore entries for runtime state.

chatgpt-codex-connector bot reviewed Jan 30, 2026

View reviewed changes

fix: remove gastown references and legacy .beads/loop/ gitignore entry

1ca89a8

fix: correct default model in README (opus, not sonnet)

27e0bbf

javdl merged commit 0300fc4 into main Jan 30, 2026
4 checks passed

javdl deleted the add-ralph-wiggum-loop branch January 30, 2026 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Ralph Wiggum Loop and beads infrastructure #87

Add Ralph Wiggum Loop and beads infrastructure #87

Uh oh!

javdl commented Jan 30, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Jan 30, 2026

Uh oh!

chatgpt-codex-connector bot Jan 30, 2026

Uh oh!

claude bot commented Jan 30, 2026

Uh oh!

claude bot commented Jan 30, 2026

Uh oh!

claude bot commented Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Ralph Wiggum Loop and beads infrastructure #87

Add Ralph Wiggum Loop and beads infrastructure #87

Uh oh!

Conversation

javdl commented Jan 30, 2026

Summary

Context

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Jan 30, 2026

Code Review: PR #87 - Add Ralph Wiggum Loop and beads infrastructure

Summary

✅ Strengths

🔴 Critical Issues

1. Security: Dangerously Skip Permissions Flag (HIGH PRIORITY)

2. Hardcoded Jujutsu (jj) Dependency

3. Missing Dependency: codex CLI

⚠️ Major Issues

4. PROMPT Files Reference Wrong Project Structure

5. Excessive Subagent Suggestion

6. Missing Error Handling in ShellHook

🟡 Minor Issues

🎯 Required Before Merge

📝 Verdict

Uh oh!

claude bot commented Jan 30, 2026

Code Review: Ralph Wiggum Loop Infrastructure (PR #87)

Summary

Overall Assessment

Critical Issues (Must Fix Before Merge)

1. Command Injection Risk in loop.sh

2. Webhook URL Validation Missing

3. Config Validation Incomplete

4. PIPESTATUS Race Condition

Medium Priority Issues

5. Temp File Cleanup (LOW)

6. JSON Parsing Performance (LOW)

7. Magic Numbers (LOW)

Documentation Issues

8. Missing Prerequisites (LOW)

9. Generic Placeholder Text (LOW)

Testing Recommendations

Positive Aspects ✅

Conclusion

Uh oh!

claude bot commented Jan 30, 2026

Code Review - PR #87: Add Ralph Wiggum Loop and beads infrastructure

Overview

✅ Strengths

🔴 Critical Issues

1. VCS Commands Assume Jujutsu (not Git)

2. Hardcoded References to External Dependencies

3. Review Phase Requires codex Command

4. AGENTS.md Has Duplicate Content

5. Missing Error Handling in shellHook

⚠️ Security Concerns

1. Command Injection Risk

2. Webhook Without Authentication

🔍 Code Quality Issues

1. Inconsistent Error Handling

2. Magic Numbers

3. Testing Gap

4. PROMPT Files Reference Wrong Commands

📝 Documentation Issues

1. README Example Mismatch

2. Missing Prerequisites

3. Config File Format Unclear

🎯 Performance Considerations

✨ Suggestions for Improvement

🧪 Test Coverage

3. Review Phase Requires `codex` Command