Skip to content

Conversation

@MaxGhenis
Copy link
Contributor

@MaxGhenis MaxGhenis commented Jan 25, 2026

Summary

Optimizes all np.select() statements in the codebase by removing conditions that return the same value as the default. This is the numpy-efficient pattern where the default handles the most common case(s).

Key insight: When using np.select(), each condition requires an array comparison. By setting default= to the most common value and removing explicit conditions for that value, we reduce evaluations from N to N-k.

Example optimization (taxsim_mstat.py)

Before:

return select(
    [
        filing_status == fstatus.SINGLE,
        filing_status == fstatus.HEAD_OF_HOUSEHOLD,
        filing_status == fstatus.JOINT,
        filing_status == fstatus.SEPARATE,
        filing_status == fstatus.SURVIVING_SPOUSE,
    ],
    [1, 1, 2, 6, 8],
)

After:

return select(
    [
        filing_status == fstatus.JOINT,
        filing_status == fstatus.SEPARATE,
        filing_status == fstatus.SURVIVING_SPOUSE,
    ],
    [2, 6, 8],
    default=1,  # SINGLE, HEAD_OF_HOUSEHOLD
)

Changes

  • 56 files modified across federal and state tax/benefit calculations
  • Net +27 lines (added clarifying comments documenting what defaults cover)
  • Common pattern: SINGLE filing status as default for state income tax calculations
  • Special cases: taxsim_mstat (SINGLE + HOH → 1), age_group (WORKING_AGE as default)

Performance benefit

Each removed condition eliminates one boolean array comparison per element. For microsimulations with millions of tax units, this reduces memory allocations and CPU cycles.

Test plan

  • All 56 modified files pass Python syntax validation
  • Package imports successfully
  • CI tests pass

🤖 Generated with Claude Code

@codecov
Copy link

codecov bot commented Jan 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.97%. Comparing base (7e781d3) to head (399c93a).
⚠️ Report is 25 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##              main    #7242      +/-   ##
===========================================
- Coverage   100.00%   98.97%   -1.03%     
===========================================
  Files           12       16       +4     
  Lines          205      294      +89     
  Branches         0        3       +3     
===========================================
+ Hits           205      291      +86     
- Misses           0        3       +3     
Flag Coverage Δ
unittests 98.97% <ø> (-1.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MaxGhenis
Copy link
Contributor Author

Optimization suggestion: Remove redundant conditions

When using np.select, the most efficient pattern is to have the default handle the most common case, allowing us to remove the explicit conditions for values that match the default. This reduces the number of condition evaluations.

taxsim_mstat.py

SINGLE and HEAD_OF_HOUSEHOLD both return 1. Since these are likely the most common, use 1 as default and remove their conditions:

return select(
    [
        filing_status == fstatus.JOINT,
        filing_status == fstatus.SEPARATE,
        filing_status == fstatus.SURVIVING_SPOUSE,
    ],
    [
        2,
        6,
        8,
    ],
    default=1,  # SINGLE, HEAD_OF_HOUSEHOLD
)

age_group.py

WORKING_AGE is the default and most common. Remove that condition:

return select(
    [
        person("is_child", period),
        person("is_senior", period),
    ],
    [AgeGroup.CHILD, AgeGroup.SENIOR],
    default=AgeGroup.WORKING_AGE,
)

mt_capital_gains_tax_indiv.py

SINGLE rates are the default. Remove those conditions from both selects:

lower_rate = select(
    [
        filing_status == status.SEPARATE,
        filing_status == status.SURVIVING_SPOUSE,
        filing_status == status.HEAD_OF_HOUSEHOLD,
    ],
    [
        p.rates.separate.amounts[0],
        p.rates.surviving_spouse.amounts[0],
        p.rates.head_of_household.amounts[0],
    ],
    default=p.rates.single.amounts[0],
)

(Same pattern for higher_rate)

mt_capital_gains_tax_applicable_threshold_indiv.py

Same optimization - remove the SINGLE condition since it's the default.

ia_standard_deduction_indiv.py and ia_amt_indiv.py

Remove the SINGLE condition since default=fsvals.SINGLE.

This pattern reduces condition evaluations from N to N-k where k is the number of conditions that match the default value.

When using np.select(), the most efficient pattern is to have the default
handle the most common case(s), eliminating explicit condition checks.

This commit:
- Removes explicit conditions that return the same value as the default
- Adds clarifying comments documenting what cases the default covers
- Reduces condition evaluations from N to N-k where k conditions matched default

Key optimizations:
- taxsim_mstat: SINGLE and HOH both return 1, now handled by default
- age_group: WORKING_AGE (most common) now handled by default
- 50+ state tax files: SINGLE filing status now handled by default

Performance benefit: Each removed condition eliminates one array comparison
per element during vectorized calculations.

56 files changed with optimizations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@MaxGhenis MaxGhenis force-pushed the select-defaults-additional branch from 399c93a to ad9d8cb Compare January 26, 2026 14:37
@MaxGhenis MaxGhenis changed the title Add default parameters to additional select statements Optimize select() statements by removing redundant conditions Jan 26, 2026
MaxGhenis and others added 4 commits January 26, 2026 11:49
- Format ny_supplemental_tax.py with black
- Add changelog_entry.yaml

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@MaxGhenis
Copy link
Contributor Author

/rebase

@MaxGhenis MaxGhenis marked this pull request as draft January 28, 2026 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants