Skip to content

Replace CD stacking with cloning + national block assignment #486

@MaxGhenis

Description

@MaxGhenis

Summary

Replace the current CD-level stacking approach with a simpler architecture:

Current approach:

  1. Clone ECPS records per congressional district
  2. Assign block within each CD
  3. Calibrate per-CD weights

Proposed approach:

  1. Take full national ECPS (drop geographic identifiers)
  2. Clone it N times to get the same total records as current approach
  3. Assign each record to a population-weighted random census block nationally
  4. Derive CD (and all other geography) from the block
  5. Use calibration to hit CD-level targets post-facto

Rationale

  1. Simpler architecture - no CD-specific cloning logic needed
  2. Block-first design - CD becomes just another derived geography
  3. Natural population weighting - CDs have roughly equal population (~760k each), so pop-weighted national block assignment naturally gives equal representation per CD
  4. Already have the infrastructure - block_crosswalk.csv.gz + block_cd_distributions.csv.gz provide all needed data

Implementation notes

  • May need to create a national P(block) distribution file (simple: just use 2020 Census block populations)
  • Calibration already handles fine-tuning to CD targets
  • Could simplify data pipeline significantly

Related

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions