Skip to content

Conversation

@MaxGhenis
Copy link
Contributor

Summary

Optimizes all np.select() statements in the codebase by removing conditions that return the same value as the default. This is the numpy-efficient pattern where the default handles the most common case(s).

Key insight: When using np.select(), each condition requires an array comparison. By setting default= to the most common value and removing explicit conditions for that value, we reduce evaluations from N to N-k.

Example optimization (taxsim_mstat.py)

Before:

return select(
    [
        filing_status == fstatus.SINGLE,
        filing_status == fstatus.HEAD_OF_HOUSEHOLD,
        filing_status == fstatus.JOINT,
        filing_status == fstatus.SEPARATE,
        filing_status == fstatus.SURVIVING_SPOUSE,
    ],
    [1, 1, 2, 6, 8],
)

After:

return select(
    [
        filing_status == fstatus.JOINT,
        filing_status == fstatus.SEPARATE,
        filing_status == fstatus.SURVIVING_SPOUSE,
    ],
    [2, 6, 8],
    default=1,  # SINGLE, HEAD_OF_HOUSEHOLD
)

Changes

  • 74 files modified across federal and state tax/benefit calculations
  • Net -10 lines of code (fewer conditions = cleaner code)
  • Added clarifying comments documenting what cases the default covers
  • Common pattern: SINGLE filing status as default for state income tax calculations

Performance benefit

Each removed condition eliminates one boolean array comparison per element. For microsimulations with millions of tax units, this reduces memory allocations and CPU cycles.

Test plan

  • All 74 modified files pass Python syntax validation
  • Package imports successfully
  • CI tests pass

Supersedes #7242 (which only added default= without removing redundant conditions)

🤖 Generated with Claude Code

When using np.select(), the most efficient pattern is to have the default
handle the most common case(s), eliminating explicit condition checks.

This commit:
- Removes explicit conditions that return the same value as the default
- Adds clarifying comments documenting what cases the default covers
- Reduces condition evaluations from N to N-k where k conditions matched default

Key optimizations:
- taxsim_mstat: SINGLE and HOH both return 1, now handled by default
- age_group: WORKING_AGE (most common) now handled by default
- 70+ state tax files: SINGLE filing status now handled by default

Performance benefit: Each removed condition eliminates one array comparison
per element during vectorized calculations.

74 files changed, net -10 lines of code.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@MaxGhenis MaxGhenis closed this Jan 26, 2026
@MaxGhenis MaxGhenis deleted the optimize-select-defaults branch January 26, 2026 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants