Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
c332a86
Added a clustering branch
charbelmarche33 Oct 10, 2024
f64a356
Added a template notebook file
charbelmarche33 Oct 10, 2024
570007e
Adding poetry and an empty "Data" file by default. The data with in t…
charbelmarche33 Oct 17, 2024
23e2b59
Working on homography and registering. Added utils to conversion fold…
charbelmarche33 Oct 17, 2024
46cce3f
Homography working, need to tweak bounding boxes remapping as this is…
charbelmarche33 Oct 17, 2024
5e30789
Fixed a typo that caused for improper bounding box remapping. Ready t…
charbelmarche33 Oct 17, 2024
08a12fd
Saving YOLO bounding boxes to a json file for each sheet.
charbelmarche33 Oct 17, 2024
af7ab84
Removed print statements
charbelmarche33 Oct 17, 2024
322d827
Save registered image to "data/registered_images" directory
charbelmarche33 Oct 17, 2024
2c82cf6
Removed loop break
charbelmarche33 Oct 17, 2024
e2992c1
Saving images without bounding boxes. Working on clustering problem.
charbelmarche33 Oct 17, 2024
8dfb22f
Selecting region of interest and relevant bounding boxes.
charbelmarche33 Oct 18, 2024
563b20b
K-Means clustering completed. Need to test and tune model.
charbelmarche33 Oct 18, 2024
92d7545
Clustering time and mmHg/bpm seperately
charbelmarche33 Oct 19, 2024
9b5584f
Accuracy among time stamps in 99.75%. Among number labels it is found…
charbelmarche33 Oct 19, 2024
b8319d6
Density based clustering experiement
hvalenty Oct 23, 2024
280b9ea
Added all two methods to the same notebook. ROI selected via an edite…
charbelmarche33 Oct 24, 2024
f359fb0
Add agglomerative clustering method
mattbeck1 Oct 25, 2024
2358727
Updated first cell of notebook
charbelmarche33 Oct 25, 2024
2022f85
Updated dbscan parameters
hvalenty Oct 28, 2024
08d089c
5% erroneous bounding boxes for cluster testing
hvalenty Oct 28, 2024
0589cdc
Add density approach to select relevant bounding boxes
mattbeck1 Oct 29, 2024
8b67e2d
Commented out old code
charbelmarche33 Oct 29, 2024
ae22aa5
Added 5% erroneous test with ROI bounds
hvalenty Oct 29, 2024
eedd586
Formatted files, ready for PR
charbelmarche33 Oct 29, 2024
a5f2dee
Stricter selection of bounding boxes (preprocessing)
mattbeck1 Oct 31, 2024
f167bdd
Constrained erroneous bounding boxes to time and number axes
hvalenty Nov 2, 2024
9299d67
Imputing meaning via expected locations. Will need to get MSE of dist…
charbelmarche33 Nov 4, 2024
6743185
Added average distance of clusters from the proposed cluster label.
charbelmarche33 Nov 5, 2024
bf1597a
Improved method of calculating accuracy to include undetected cluster…
charbelmarche33 Nov 5, 2024
a80a821
Increased threshold for kmeans
charbelmarche33 Nov 5, 2024
550286c
Normalized erroneous bounding box method.
hvalenty Nov 5, 2024
07c05ec
Improved kmeans results
charbelmarche33 Nov 5, 2024
556b14a
Kmeans has best performance on number labels while DBscan has best pe…
charbelmarche33 Nov 5, 2024
b3d9fab
Adjusted accuracy calculation.
charbelmarche33 Nov 5, 2024
09f0091
Clustering accuracy is now (correct - (incorrect + undetected)) / num…
charbelmarche33 Nov 5, 2024
3e77659
Re-ran whole notebook
charbelmarche33 Nov 5, 2024
d064e24
Re-ran notebook
charbelmarche33 Nov 5, 2024
112245f
edit preprocessing
mattbeck1 Nov 5, 2024
7be99a8
Add remove outliers to preprocessing
mattbeck1 Nov 5, 2024
697a60c
Generalized the process of selecting nearest expected cluster
charbelmarche33 Nov 7, 2024
e1df718
Pushing results
charbelmarche33 Nov 7, 2024
e88f2f6
Preprocessing testing method
hvalenty Nov 7, 2024
27cb7ab
Merge branch 'experiment_clustering' of https://github.com/Paper-Char…
hvalenty Nov 7, 2024
a362f41
Test for preprocessing effectiveness
hvalenty Nov 7, 2024
ac94d59
Updated clustering test for varying percent of erroneous bounding boxes
hvalenty Nov 10, 2024
0fdeee3
Add function to find cluster bounding box
mattbeck1 Nov 11, 2024
ef07e5d
Ready for spacing function and accuracy metrics
charbelmarche33 Nov 11, 2024
e6e4e9c
Add categories to clusters
mattbeck1 Nov 12, 2024
9718985
Added mAP as accuracy metric. Need spacing/post-processing to improve…
charbelmarche33 Nov 12, 2024
a54ba96
Merge branch 'experiment_clustering' into experiment_clustering_mAP
charbelmarche33 Nov 12, 2024
5b80d3b
Merge pull request #8 from Paper-Chart-Extraction-Project/experiment_…
charbelmarche33 Nov 12, 2024
40d357e
Added results
charbelmarche33 Nov 12, 2024
c3093b9
Added results
charbelmarche33 Nov 12, 2024
e7c9478
Fixed filtering typo
charbelmarche33 Nov 18, 2024
c5101d7
Updated preprocessing test to only add new bounding boxes and constra…
hvalenty Nov 18, 2024
de7c23c
Added mAP challeges
charbelmarche33 Nov 19, 2024
04703ee
Used new data for ground-truth label. Significant improvement in mAP
charbelmarche33 Apr 7, 2025
302d774
Significant mAP improvement with appropriate ground-truth labels.
charbelmarche33 Apr 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -160,4 +160,6 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
/data

/data/*
!/data/.gitkeep
32 changes: 32 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,34 @@
# ChartExtractorSupplements

This repository houses two types of content: (1) jupyter notebooks that run experiments to improve ChartExtractor and (2) useful scripts for working with ChartExtractor.

### Getting Set Up

#### Where To Place Data

- When you pull this repository, there will be an empty directory called `Data` that contains a `.gitkeep` file. This file should remain in this directory, do not delete it.
- Add your data files to this directory. These files should and will be ignored by git.

#### Downloading Necessary Packages

- Install poetry using pip to start
```bash
pip install poetry
```
- I have created the pyproject.toml files so you don't have to worry about any of that. Just do the below.
- Add configuration to have venv in project directory
```bash
poetry config virtualenvs.in-project true
```
- Set up venv using poetry
```bash
poetry install
```
- Now you should have a created venv that you can switch into with the following command and run the python scripts
```bash
poetry shell
```
- As you develop you can add packages with the following command
```bash
poetry add <package-name>
```
Empty file added data/.gitkeep
Empty file.
Empty file added experiments/clustering/.gitkeep
Empty file.
1,495 changes: 1,495 additions & 0 deletions experiments/clustering/clustering.ipynb

Large diffs are not rendered by default.

Loading