Skip to content

Conversation

@RyanDoesMath
Copy link
Member

Add a module that extracts the drug dosage numbers. Does not yet associate a unit to the number, which will come after the drug-to-number dictionary gets created.

The main way the extraction of these drug dosages works currently is by the following process.

  1. Filter the digit detections for those which are within the IV drug and IV fluid sections (using the landmark detections).
  2. Convert the detection's boxes from pixel coordinates (0-image width/height) to relative coordinates (0-1).
  3. Pass all the boxes to a linkage proposal function which proposes pairs of boxes which are within a given euclidean distance. This function is attempting to find which boxes form a single multi-digit number.
  4. Pass all the proposals to a sklearn random forest classification (RFC) model which is trained on several engineered features to determine if the two boxes which form a proposal are actually linked as a single multi-digit number.
  5. Construct a graph using each digit detection as a node, and each proposal which the RFC model has predicted true as edges, then take all the connected components, and return them as Cluster objects (groups of bounding boxes).
  6. Find the timestamp and row for each cluster, and return a dictionary mapping each row to another dictionary that maps timestamps to the number formed by concatenating each cluster's bounding box categories from left to right.

Currently, the random forest classification model is loaded from a pickle. This is surely not the best way to do things, so at some point a new way of loading and running the model should be implemented.

Closes #73

@RyanDoesMath RyanDoesMath added this to the v0.3.0 milestone Dec 11, 2025
@RyanDoesMath RyanDoesMath self-assigned this Dec 11, 2025
@RyanDoesMath RyanDoesMath added the enhancement New feature or request label Dec 11, 2025
@RyanDoesMath RyanDoesMath merged commit 0e9fe7c into main Dec 11, 2025
@RyanDoesMath RyanDoesMath deleted the feature-drug-dose-extraction branch December 11, 2025 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Multi-Digit Drug and Fluid Reading via Clustering

2 participants