Implement rules system v2 with markdown format #56

nhorton · 2026-01-16T17:56:56Z

Summary

Complete implementation of the rules system (renamed from "policy") with v2 markdown format:

Renamed policy → rules throughout codebase for clarity
V2 format: Individual .deepwork/rules/*.md files with YAML frontmatter instead of single .deepwork.rules.yml
Hook fixes: Exit code 0 for JSON format hooks (blocking via {"decision": "block"} not exit code)
Test coverage: Comprehensive tests with critical contract warnings
Test consolidation: Merged hook test files into single test_hooks.py

Rules v2 Format

---
name: Rule Name
trigger: "**/*.py"
compare_to: base  # or: default_tip, prompt
---
Instructions for the agent when this rule triggers.

Key Changes

Before	After
`.deepwork.rules.yml` (single file)	`.deepwork/rules/*.md` (individual files)
`policy` terminology	`rules` terminology
Exit code 2 for blocking	Exit code 0 + JSON `{"decision": "block"}`

Test plan

All 470 tests pass
Rules trigger correctly with compare_to: prompt mode
Hook JSON format follows Claude Code contract
Exit codes verified against documentation

🤖 Generated with Claude Code

doc/policy_syntax.md

doc/policy_system_design.md

Design docs for next-generation policy system with: - File correspondence matching (sets and pairs) - Idempotent command execution - Queue-based state tracking with detector/evaluator pattern - Folder-based policy storage using frontmatter markdown files Key changes from current system: - Policies move from single .deepwork.policy.yml to .deepwork/policies/*.md - YAML frontmatter for config, markdown body for instructions - New 'set' syntax for bidirectional file relationships - New 'pair' syntax for directional file relationships - New 'action' field for running commands instead of prompts - Queue system prevents duplicate policy triggers across sessions

Key changes: - Restructure taxonomy: detection modes (trigger/safety, set, pair) + action types (prompt, command) - Add required `name` field for human-friendly promise tag display (e.g., "✓ Source/Test Pairing") - Remove priority and defer features (not needed yet) - Clarify .deepwork/tmp is gitignored, so cleanup is not critical - Shorten output format - group by policy name, use simple arrow notation for correspondence - Update all examples to include name field

- Don't enforce idempotency, just document it as expected behavior - Give lint formatters (black, ruff, prettier) as good examples - Remove output_mode from config (not referenced elsewhere) - Remove idempotency verification test scenarios

This implements the redesigned policy system with: - Detection modes: trigger/safety (default), set (bidirectional), pair (directional) - Action types: prompt (show instructions), command (run idempotent command) - Variable pattern matching: {path} for multi-segment, {name} for single-segment - Queue system in .deepwork/tmp/policy/queue/ for state tracking - Frontmatter markdown format for policy files in .deepwork/policies/ New core modules: - pattern_matcher.py: Variable pattern matching with regex - policy_queue.py: Queue system for policy state persistence - command_executor.py: Command action execution with substitution Updates to existing modules: - policy_parser.py: v2 Policy class with detection modes and action types - policy_check.py: Uses new v2 system with queue deduplication - evaluate_policies.py: Updated for v1 backward compatibility - policy_schema.py: New frontmatter schema for v2 format Tests updated to work with both v1 and v2 APIs.

- Update README.md with v2 policy examples and directory structure - Update doc/architecture.md with v2 detection modes, action types, and queue system - Bump version to 0.4.0 in pyproject.toml - Add changelog entry for v2 policy system features

The hook now: - Checks for v2 policies in .deepwork/policies/ first - Falls back to v1 policies in .deepwork.policy.yml if no v2 found - Passes JSON input directly to policy_check.py for v2 (via wrapper) - Maintains existing behavior for v1 evaluate_policies.py

Remove all legacy v1 policy format (.deepwork.policy.yml) support: - Remove evaluate_policies.py hook module - Remove PolicyV1 class and parse_policy_file from policy_parser.py - Remove v1 schema (POLICY_SCHEMA_V1) from policy_schema.py - Remove v1 test fixtures and test_evaluate_policies.py - Update test fixtures to use v2 frontmatter markdown format - Update documentation to remove v1 references - Fix policy_stop_hook.sh to handle exit code 2 (block) correctly Only v2 frontmatter markdown format (.deepwork/policies/*.md) is now supported.

Rename all policy-related terminology to rules throughout the codebase: - Rename deepwork_policy job to deepwork_rules - Rename .deepwork.policy.yml to .deepwork.rules.yml - Rename policy_parser.py, policy_queue.py, policy_check.py to rules_* - Rename policy_schema.py to rules_schema.py - Rename policy_stop_hook.sh to rules_stop_hook.sh - Update all documentation, tests, and references Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The previous commit renamed deepwork_policy to deepwork_rules but left duplicate hook entries in settings.json pointing to the old paths. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add 134 new tests covering test plan scenarios: - test_pattern_matcher.py: glob patterns, variable extraction, resolution - test_command_executor.py: variable substitution, command execution - test_rules_queue.py: queue entry lifecycle, hash calculation - test_schema_validation.py: required fields, mutual exclusivity - Extended test_rules_parser.py with correspondence sets/pairs tests - Security: Add shlex.quote() to command_executor.py to prevent command injection via malicious file paths - Fix ruff linting issues in pattern_matcher.py, rules_queue.py, and rules_check.py (f-strings, datetime.UTC, open mode) - Update .gitignore comment from "policy" to "rules" - Remove doc/test_scenarios.md (all scenarios now covered by tests) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Replace single .deepwork.rules.yml (v1) with individual .md files in .deepwork/rules/ directory (v2 frontmatter markdown format) - Update install.py to create rules directory structure with: - README explaining v2 format - Example templates (.md.example files) - Add v2 example templates in standard_jobs/deepwork_rules/rules/: - readme-documentation.md.example (trigger/safety mode) - api-documentation-sync.md.example (trigger/safety mode) - security-review.md.example (trigger-only mode) - source-test-pairing.md.example (set/bidirectional mode) - Completely rewrite deepwork_rules.define step for v2 format: - Detection mode selection (trigger/safety, set, pair) - Variable pattern syntax ({path}, {name}) - Updated examples and file location guidance - Migrate this repo's bespoke rules to v2: - readme-accuracy.md - architecture-documentation-accuracy.md - standard-jobs-source-of-truth.md - version-and-changelog-update.md - Remove deprecated src/deepwork/templates/default_rules.yml - Update integration tests for v2 directory structure Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Hooks using JSON output format should always exit with code 0. The blocking behavior is controlled by the "decision" field in the JSON output, not the exit code. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add prominent warning comments to test files that verify Claude Code hook JSON format and exit code contracts. These comments reference the official documentation and clearly mark tests that should not be modified without consulting the hook specification. Files updated: - tests/shell_script_tests/test_hooks_json_format.py - tests/shell_script_tests/test_hook_wrappers.py - tests/unit/test_hook_wrapper.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Consolidate test_hooks_json_format.py and test_hook_wrappers.py into a single test_hooks.py file with logical organization: - TestClaudeHookWrapper / TestGeminiHookWrapper: Platform wrapper scripts - TestRulesStopHook / TestUserPromptSubmitHook: Rules-specific hooks - TestHooksWithTranscript: Transcript input handling - TestHookExitCodes: Exit code contract tests (DO NOT EDIT) - TestHookWrapperIntegration: Integration tests with Python hooks - TestRulesCheckModule: Python module tests Also moved hooks_dir and src_dir fixtures to conftest.py for sharing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Add manual test files for testing hook/rule functionality Creates manual_tests/claude/ directory with test files that exercise different rule styles: - Trigger/Safety mode (basic conditional) - Set mode (bidirectional correspondence) - Pair mode (directional correspondence) - Command action (automatic command execution) - Multi-safety (multiple safety patterns) Each test file includes documentation explaining what it tests, how to trigger it, and expected behavior. Corresponding rule definitions added to .deepwork/rules/. * Move manual test files from manual_tests/claude/ to manual_tests/ Flatten directory structure as requested. Updated all rule definitions to reference the new paths. * Reorganize manual tests into subfolders per test type Group related files together: - test_trigger_safety_mode/ - test_set_mode/ - test_pair_mode/ - test_command_action/ - test_multi_safety/ Updated rule definitions and README to match new structure. * Add compare_to: prompt to manual test rules This ensures rules evaluate against changes since the last prompt rather than against the merge-base, allowing them to fire during the current conversation when files are edited. * Add sub-agent testing instructions to manual tests README Explains that the best way to run these tests is as sub-agents using a fast model (haiku), with example prompts and verification commands. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update manual test files with both-case test instructions - Updated README with test matrix showing expected results - Added TEST CASE sections to each test file documenting both "should fire" and "should NOT fire" scenarios - Added test results tracking table to README Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>

nhorton commented Jan 16, 2026

View reviewed changes

doc/policy_syntax.md Show resolved Hide resolved

nhorton commented Jan 16, 2026

View reviewed changes

doc/policy_system_design.md Outdated Show resolved Hide resolved

nhorton changed the title ~~Plan and document policy system changes~~ Implement rules system v2 with markdown format Jan 17, 2026

nhorton marked this pull request as ready for review January 17, 2026 22:18

nhorton added this pull request to the merge queue Jan 17, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 17, 2026

claude and others added 21 commits January 17, 2026 16:57

Feedback from review

cf8b7e2

Format policy_parser.py with ruff

84eb741

Update uv.lock

76f138c

Remove stale deepwork_policy hook entries from settings.json

aa77a79

The previous commit renamed deepwork_policy to deepwork_rules but left duplicate hook entries in settings.json pointing to the old paths. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix hook exit code to always return 0 with JSON format

34484a3

Hooks using JSON output format should always exit with code 0. The blocking behavior is controlled by the "decision" field in the JSON output, not the exit code. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Format code with ruff

ead2c2b

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix ruff linting errors (unused imports, import sorting)

78ed5d9

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Cleanup hooks and wrappers

cf756dd

nhorton force-pushed the claude/policy-system-planning-T8939 branch from ca40769 to 66f2032 Compare January 17, 2026 23:58

nhorton merged commit b7f8cdb into main Jan 17, 2026
4 checks passed

nhorton deleted the claude/policy-system-planning-T8939 branch January 17, 2026 23:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement rules system v2 with markdown format #56

Implement rules system v2 with markdown format #56

Uh oh!

nhorton commented Jan 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implement rules system v2 with markdown format #56

Implement rules system v2 with markdown format #56

Uh oh!

Conversation

nhorton commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Rules v2 Format

Key Changes

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nhorton commented Jan 16, 2026 •

edited

Loading