Skip to content

Conversation

@eswarchandravidyasagar
Copy link
Collaborator

  • Added PHIX validation module to validate school/daycare names against the official PHIX reference list.
  • Integrated validation into the preprocessing step in orchestrator.py.
  • Configurable options added to parameters.yaml for enabling validation and handling unmatched facilities.
  • Created unit tests for the validation module covering various scenarios.
  • Added documentation for the validation plan and updated the plans directory.

- Added PHIX validation module to validate school/daycare names against the official PHIX reference list.
- Integrated validation into the preprocessing step in orchestrator.py.
- Configurable options added to parameters.yaml for enabling validation and handling unmatched facilities.
- Created unit tests for the validation module covering various scenarios.
- Added documentation for the validation plan and updated the plans directory.
@jangevaare
Copy link
Member

We don't have redistribution permission on the phix reference list file, so that will need to be removed and commits squashed. It'll also blow up the size of this repository and its history.

Users will have to BYO phix reference list

# Path to PHIX reference Excel file (relative to project root)
reference_file: PHIX Reference Lists v5.2 - 2025Jun30.xlsx
# Minimum fuzzy match score (0-100) to consider a match
match_threshold: 85
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required. It should be exact? This could enable bypass of the exact issues we'd like to protect against like similarly named schools being accidentally selected when a panorama user creates a forecast query

@jangevaare
Copy link
Member

We likely need a mapping file that converts the PHU name from phix reference document, to standardized PHU acronyms (which should be enforced for template folders, etc)

We also may need to allow functionality for this map to be many-to-one, in the case of PHUs which have merged since this was last updated.

@jangevaare
Copy link
Member

I know in this case that this is important to run early in pipeline before other processing, but I wonder also if we can emit something in the per-pdf validation log regarding valid facility being used for the target PHU?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants