-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Summary
Replace the current CD-level stacking approach with a simpler architecture:
Current approach:
- Clone ECPS records per congressional district
- Assign block within each CD
- Calibrate per-CD weights
Proposed approach:
- Take full national ECPS (drop geographic identifiers)
- Clone it N times to get the same total records as current approach
- Assign each record to a population-weighted random census block nationally
- Derive CD (and all other geography) from the block
- Use calibration to hit CD-level targets post-facto
Rationale
- Simpler architecture - no CD-specific cloning logic needed
- Block-first design - CD becomes just another derived geography
- Natural population weighting - CDs have roughly equal population (~760k each), so pop-weighted national block assignment naturally gives equal representation per CD
- Already have the infrastructure - block_crosswalk.csv.gz + block_cd_distributions.csv.gz provide all needed data
Implementation notes
- May need to create a national P(block) distribution file (simple: just use 2020 Census block populations)
- Calibration already handles fine-tuning to CD targets
- Could simplify data pipeline significantly
Related
- PR Add census block-level geographic assignment with comprehensive lookups #484: Add census block-level geographic assignment
🤖 Generated with Claude Code
Metadata
Metadata
Assignees
Labels
No labels