Optimize select() statements by removing redundant conditions #7252

MaxGhenis · 2026-01-26T14:32:35Z

Summary

Optimizes all np.select() statements in the codebase by removing conditions that return the same value as the default. This is the numpy-efficient pattern where the default handles the most common case(s).

Key insight: When using np.select(), each condition requires an array comparison. By setting default= to the most common value and removing explicit conditions for that value, we reduce evaluations from N to N-k.

Example optimization (taxsim_mstat.py)

Before:

return select(
    [
        filing_status == fstatus.SINGLE,
        filing_status == fstatus.HEAD_OF_HOUSEHOLD,
        filing_status == fstatus.JOINT,
        filing_status == fstatus.SEPARATE,
        filing_status == fstatus.SURVIVING_SPOUSE,
    ],
    [1, 1, 2, 6, 8],
)

After:

return select(
    [
        filing_status == fstatus.JOINT,
        filing_status == fstatus.SEPARATE,
        filing_status == fstatus.SURVIVING_SPOUSE,
    ],
    [2, 6, 8],
    default=1,  # SINGLE, HEAD_OF_HOUSEHOLD
)

Changes

74 files modified across federal and state tax/benefit calculations
Net -10 lines of code (fewer conditions = cleaner code)
Added clarifying comments documenting what cases the default covers
Common pattern: SINGLE filing status as default for state income tax calculations

Performance benefit

Each removed condition eliminates one boolean array comparison per element. For microsimulations with millions of tax units, this reduces memory allocations and CPU cycles.

Test plan

All 74 modified files pass Python syntax validation
Package imports successfully
CI tests pass

Supersedes #7242 (which only added default= without removing redundant conditions)

🤖 Generated with Claude Code

When using np.select(), the most efficient pattern is to have the default handle the most common case(s), eliminating explicit condition checks. This commit: - Removes explicit conditions that return the same value as the default - Adds clarifying comments documenting what cases the default covers - Reduces condition evaluations from N to N-k where k conditions matched default Key optimizations: - taxsim_mstat: SINGLE and HOH both return 1, now handled by default - age_group: WORKING_AGE (most common) now handled by default - 70+ state tax files: SINGLE filing status now handled by default Performance benefit: Each removed condition eliminates one array comparison per element during vectorized calculations. 74 files changed, net -10 lines of code. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

MaxGhenis closed this Jan 26, 2026

MaxGhenis deleted the optimize-select-defaults branch January 26, 2026 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize select() statements by removing redundant conditions #7252

Optimize select() statements by removing redundant conditions #7252

Uh oh!

MaxGhenis commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Optimize select() statements by removing redundant conditions #7252

Optimize select() statements by removing redundant conditions #7252

Uh oh!

Conversation

MaxGhenis commented Jan 26, 2026

Summary

Example optimization (taxsim_mstat.py)

Changes

Performance benefit

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants