-
Notifications
You must be signed in to change notification settings - Fork 201
Optimize select() statements by removing redundant conditions #7242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7242 +/- ##
===========================================
- Coverage 100.00% 98.97% -1.03%
===========================================
Files 12 16 +4
Lines 205 294 +89
Branches 0 3 +3
===========================================
+ Hits 205 291 +86
- Misses 0 3 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Optimization suggestion: Remove redundant conditionsWhen using
|
When using np.select(), the most efficient pattern is to have the default handle the most common case(s), eliminating explicit condition checks. This commit: - Removes explicit conditions that return the same value as the default - Adds clarifying comments documenting what cases the default covers - Reduces condition evaluations from N to N-k where k conditions matched default Key optimizations: - taxsim_mstat: SINGLE and HOH both return 1, now handled by default - age_group: WORKING_AGE (most common) now handled by default - 50+ state tax files: SINGLE filing status now handled by default Performance benefit: Each removed condition eliminates one array comparison per element during vectorized calculations. 56 files changed with optimizations. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
399c93a to
ad9d8cb
Compare
- Format ny_supplemental_tax.py with black - Add changelog_entry.yaml Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/rebase |
Summary
Optimizes all
np.select()statements in the codebase by removing conditions that return the same value as the default. This is the numpy-efficient pattern where the default handles the most common case(s).Key insight: When using
np.select(), each condition requires an array comparison. By settingdefault=to the most common value and removing explicit conditions for that value, we reduce evaluations from N to N-k.Example optimization (taxsim_mstat.py)
Before:
After:
Changes
taxsim_mstat(SINGLE + HOH → 1),age_group(WORKING_AGE as default)Performance benefit
Each removed condition eliminates one boolean array comparison per element. For microsimulations with millions of tax units, this reduces memory allocations and CPU cycles.
Test plan
🤖 Generated with Claude Code