Clustering completed. #7

charbelmarche33 · 2024-10-29T03:07:37Z

Second PR from issue 29 of extractor repo.

This is PR n.o. 2 from issue #29.

Methods Evaluated and Performance

Without any error in boxes all methods: kmeans, agglomerative, and dbscan obtained 100% accuracy.

With accounting for 5% error adding and removing boxes

Improvements:

Using density max to isolate relevant bounding boxes.
Clustering via multiple methods
Testing using mAP as well as incrementing until 0.95.

Work moving forward for next semester/in-between other :

Smart identification of clusters to be more flexible to detection identification errors (use spacing for this)
...

…his file will be ignored by git (all files within this directory will be ignored by git with exception of ".gitkeep"). Additionally adding details to README.md to ease set up.

…er. May want to take some of these functions and turn them into a package to use in various microservices?

… not quite accurate.

…o convert to YOLO.

… to be 100%.

…d version of Ryan's selected method and datapoints are split by a diagonal line. Converted to using BoundingBox class instead of handling coordinates directly. Added labels to output.

review-notebook-app · 2024-10-29T03:07:42Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…ance of center of found cluster to expected cluster center.

…ber expected clusters

…t-Extraction-Project/ChartExtractorSupplements into experiment_clustering

… current methods.

…clustering_mAP Added mAP as accuracy metric. Need spacing/post-processing to improve…

…in to outside ROI

charbelmarche33 · 2025-04-07T02:05:09Z

@RyanDoesMath Will talk about this in the next meeting, but the results are actually much better than we were getting. We were having an issue with some of the ground-truth data not being accurate to the registered images we were using. I found you had added new data in the Google Drive under cluster_bp_and_hr_yolo.zip so we used that instead, and these are the numbers we are getting:

Still room for improvement with a more robust method of cluster naming, but all-in-all quite an improvement.

charbelmarche33 and others added 25 commits October 10, 2024 01:35

Added a clustering branch

c332a86

Added a template notebook file

f64a356

Adding poetry and an empty "Data" file by default. The data with in t…

570007e

…his file will be ignored by git (all files within this directory will be ignored by git with exception of ".gitkeep"). Additionally adding details to README.md to ease set up.

Working on homography and registering. Added utils to conversion fold…

23e2b59

…er. May want to take some of these functions and turn them into a package to use in various microservices?

Homography working, need to tweak bounding boxes remapping as this is…

46cce3f

… not quite accurate.

Fixed a typo that caused for improper bounding box remapping. Ready t…

5e30789

…o convert to YOLO.

Saving YOLO bounding boxes to a json file for each sheet.

08a12fd

Removed print statements

af7ab84

Save registered image to "data/registered_images" directory

322d827

Removed loop break

2c82cf6

Saving images without bounding boxes. Working on clustering problem.

e2992c1

Selecting region of interest and relevant bounding boxes.

8dfb22f

K-Means clustering completed. Need to test and tune model.

563b20b

Clustering time and mmHg/bpm seperately

92d7545

Accuracy among time stamps in 99.75%. Among number labels it is found…

9b5584f

… to be 100%.

Density based clustering experiement

b8319d6

Added all two methods to the same notebook. ROI selected via an edite…

280b9ea

…d version of Ryan's selected method and datapoints are split by a diagonal line. Converted to using BoundingBox class instead of handling coordinates directly. Added labels to output.

Add agglomerative clustering method

f359fb0

Updated first cell of notebook

2358727

Updated dbscan parameters

2022f85

5% erroneous bounding boxes for cluster testing

08d089c

Add density approach to select relevant bounding boxes

0589cdc

Commented out old code

8b67e2d

Added 5% erroneous test with ROI bounds

ae22aa5

Formatted files, ready for PR

eedd586

charbelmarche33 requested a review from RyanDoesMath October 29, 2024 03:07

mattbeck1 and others added 3 commits October 31, 2024 15:09

Stricter selection of bounding boxes (preprocessing)

a5f2dee

Constrained erroneous bounding boxes to time and number axes

f167bdd

Imputing meaning via expected locations. Will need to get MSE of dist…

9299d67

…ance of center of found cluster to expected cluster center.

charbelmarche33 and others added 14 commits November 5, 2024 11:35

Adjusted accuracy calculation.

b3d9fab

Clustering accuracy is now (correct - (incorrect + undetected)) / num…

09f0091

…ber expected clusters

Re-ran whole notebook

3e77659

Re-ran notebook

d064e24

edit preprocessing

112245f

Add remove outliers to preprocessing

7be99a8

Generalized the process of selecting nearest expected cluster

697a60c

Pushing results

e1df718

Preprocessing testing method

e88f2f6

Merge branch 'experiment_clustering' of https://github.com/Paper-Char…

27cb7ab

…t-Extraction-Project/ChartExtractorSupplements into experiment_clustering

Test for preprocessing effectiveness

a362f41

Updated clustering test for varying percent of erroneous bounding boxes

ac94d59

Add function to find cluster bounding box

0fdeee3

Ready for spacing function and accuracy metrics

ef07e5d

charbelmarche33 changed the title ~~Clustering completed: DBScan provides most accurate results when accounting for erroneous labels.~~ Clustering in progress... Nov 11, 2024

mattbeck1 and others added 8 commits November 11, 2024 21:18

Add categories to clusters

e6e4e9c

Added mAP as accuracy metric. Need spacing/post-processing to improve…

9718985

… current methods.

Merge branch 'experiment_clustering' into experiment_clustering_mAP

a54ba96

Merge pull request #8 from Paper-Chart-Extraction-Project/experiment_…

5b80d3b

…clustering_mAP Added mAP as accuracy metric. Need spacing/post-processing to improve…

Added results

40d357e

Added results

c3093b9

Fixed filtering typo

e7c9478

Updated preprocessing test to only add new bounding boxes and constra…

c5101d7

…in to outside ROI

charbelmarche33 marked this pull request as draft November 19, 2024 01:47

charbelmarche33 changed the title ~~Clustering in progress...~~ Clustering completed. Nov 19, 2024

Added mAP challeges

de7c23c

charbelmarche33 marked this pull request as ready for review November 19, 2024 02:54

charbelmarche33 added 2 commits April 6, 2025 21:42

Used new data for ground-truth label. Significant improvement in mAP

04703ee

Significant mAP improvement with appropriate ground-truth labels.

302d774

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clustering completed. #7

Clustering completed. #7

Uh oh!

charbelmarche33 commented Oct 29, 2024 •

edited

Loading

Uh oh!

review-notebook-app bot commented Oct 29, 2024

Uh oh!

charbelmarche33 commented Apr 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Clustering completed. #7

Are you sure you want to change the base?

Clustering completed. #7

Uh oh!

Conversation

charbelmarche33 commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Second PR from issue 29 of extractor repo.

Methods Evaluated and Performance

Improvements:

Uh oh!

review-notebook-app bot commented Oct 29, 2024

Uh oh!

charbelmarche33 commented Apr 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

charbelmarche33 commented Oct 29, 2024 •

edited

Loading