From 09fd4f53bb0b3c1fef8e974ab52acc579d225716 Mon Sep 17 00:00:00 2001 From: fryerd1 Date: Tue, 6 Jan 2026 16:37:27 +0000 Subject: [PATCH 01/14] Remove bonus unit test exercises and change report_dir --- config/user_config.yaml | 2 +- exercises/01_introduction.ipynb | 6 +++--- exercises/02_modules.ipynb | 4 ++-- exercises/03_config_files.ipynb | 5 +---- exercises/outputs/.gitkeep | 0 exercises/outputs/02_modules/.gitkeep | 0 {data/outputs/reports => reports}/.gitkeep | 0 7 files changed, 7 insertions(+), 10 deletions(-) delete mode 100644 exercises/outputs/.gitkeep delete mode 100644 exercises/outputs/02_modules/.gitkeep rename {data/outputs/reports => reports}/.gitkeep (100%) diff --git a/config/user_config.yaml b/config/user_config.yaml index 2e1ddb7..6a47056 100644 --- a/config/user_config.yaml +++ b/config/user_config.yaml @@ -9,4 +9,4 @@ # - Any other settings you want to change without editing code input_path: data/input/health_data.csv cleaned_path: data/outputs/cleaned/health_data_cleaned.csv -report_dir: data/outputs/reports/ +report_dir: reports/ diff --git a/exercises/01_introduction.ipynb b/exercises/01_introduction.ipynb index ce0aa75..2312665 100644 --- a/exercises/01_introduction.ipynb +++ b/exercises/01_introduction.ipynb @@ -25,13 +25,13 @@ "## Where to find exercises, solutions, and outputs\n", "- **Exercises**: Located in the `exercises/` folder (e.g., `exercises/02_modules.ipynb`)\n", "- **Solutions**: Located in `exercises/solutions/` (e.g., `exercises/solutions/02_modules_solutions.ipynb`)\n", - "- **Outputs**: Saved in `exercises/outputs/` (e.g., cleaned data, reports, charts)\n", + "- **Outputs**: Saved in `data/outputs/` or `reports/` (e.g., cleaned data, reports, charts)\n", "\n", "## How to use the exercises\n", "1. **Read each exercise notebook** and follow the instructions step by step.\n", "2. **Write your code** in the notebook cells or in the `src/python_rap_demo/` modules as instructed.\n", "4. **Check your solutions** by comparing your results with the solutions notebooks in `exercises/solutions/`.\n", - "5. **View outputs** in the `data/outputs/` folder (e.g., cleaned data, markdown reports, charts).\n", + "5. **View outputs** in the `data/outputs/` and `reports/` folders (e.g., cleaned data, markdown reports, charts).\n", "\n", "## Adding code to the src folder\n", "- Place reusable functions and classes in the appropriate module in `src/python_rap_demo/` (e.g., `cleaning.py`, `io.py`, `report.py`).\n", @@ -39,7 +39,7 @@ "\n", "## How to check if your solutions have worked\n", "- **Compare outputs**: Check your results against the solutions notebooks and output files.\n", - "- **Check outputs**: Review generated files in `data/outputs/` for expected results.\n", + "- **Check outputs**: Review generated files in `data/outputs/` or `reports/` for expected results.\n", "- **Run tests**: For unit test exercises you can run all tests with:\n", " ```cmd\n", " pytest tests\n", diff --git a/exercises/02_modules.ipynb b/exercises/02_modules.ipynb index 4106f22..11f9b4a 100644 --- a/exercises/02_modules.ipynb +++ b/exercises/02_modules.ipynb @@ -147,7 +147,7 @@ "- Add the visualisations to the output report\n", "\n", "**Bonus:**\n", - "- Save the charts to the outputs folder\n", + "- Save the charts to the reports folder\n", "- Customise chart formatting and colours\n" ] } @@ -163,7 +163,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.5" + "version": "3.12.3" } }, "nbformat": 4, diff --git a/exercises/03_config_files.ipynb b/exercises/03_config_files.ipynb index b5af1a1..413e1fc 100644 --- a/exercises/03_config_files.ipynb +++ b/exercises/03_config_files.ipynb @@ -67,10 +67,7 @@ "\n", "**Reflect:**\n", "- How does using config files improve reproducibility and flexibility?\n", - "- What other parameters could you add to make your pipeline more configurable?\n", - "\n", - "**Bonus:**\n", - "- Add unit tests to check that your code correctly applies the config parameter" + "- What other parameters could you add to make your pipeline more configurable?\n" ] }, { diff --git a/exercises/outputs/.gitkeep b/exercises/outputs/.gitkeep deleted file mode 100644 index e69de29..0000000 diff --git a/exercises/outputs/02_modules/.gitkeep b/exercises/outputs/02_modules/.gitkeep deleted file mode 100644 index e69de29..0000000 diff --git a/data/outputs/reports/.gitkeep b/reports/.gitkeep similarity index 100% rename from data/outputs/reports/.gitkeep rename to reports/.gitkeep From d8937d79c407178def313cfb9cbb063ab6dfd3b4 Mon Sep 17 00:00:00 2001 From: fryerd1 Date: Wed, 7 Jan 2026 11:16:38 +0000 Subject: [PATCH 02/14] Add existing unit test as example and alter exercise and solution numbering --- exercises/04_unit_tests.ipynb | 50 ++++++++++++------- .../solutions/04_unit_tests_solutions.ipynb | 8 +-- 2 files changed, 37 insertions(+), 21 deletions(-) diff --git a/exercises/04_unit_tests.ipynb b/exercises/04_unit_tests.ipynb index a0a5bb8..43e9b92 100644 --- a/exercises/04_unit_tests.ipynb +++ b/exercises/04_unit_tests.ipynb @@ -31,7 +31,22 @@ "id": "2", "metadata": {}, "source": [ - "## Exercise 1: Write a simple unit test for a new function\n", + "## Exercise 1: Review an existing unit test\n", + "\n", + "The function `clean_health_data` in `src/python_rap_demo/cleaning.py` has an existing unit test in\n", + "`tests/test_cleaning.py`.\n", + "\n", + "**Task:** \n", + "1. Run the existing unit test for `clean_health_data` to understand how it works. \n", + "2. Modify the sample DataFrame in the test to understand what causes the tests to pass or fail (Note: if all rows are dropped all tests will fail)." + ] + }, + { + "cell_type": "markdown", + "id": "3", + "metadata": {}, + "source": [ + "## Exercise 2: Write a simple unit test for a new function\n", "\n", "Suppose you have created a function called `impute_by_group` in `src/python_rap_demo/cleaning.py`.\n", "\n", @@ -46,10 +61,10 @@ }, { "cell_type": "markdown", - "id": "3", + "id": "4", "metadata": {}, "source": [ - "## Exercise 1a: Walkthrough - Write a unit test for flag_missing\n", + "## Exercise 2a: Walkthrough - Write a unit test for flag_missing\n", "\n", "Let's start with a simple function called `flag_missing`. This function adds a new column to your DataFrame to flag missing values in specified columns.\n", "\n", @@ -59,7 +74,7 @@ { "cell_type": "code", "execution_count": null, - "id": "4", + "id": "5", "metadata": {}, "outputs": [], "source": [ @@ -93,7 +108,7 @@ }, { "cell_type": "markdown", - "id": "5", + "id": "6", "metadata": {}, "source": [ "### How to write a unit test for `flag_missing`\n", @@ -129,7 +144,7 @@ }, { "cell_type": "markdown", - "id": "6", + "id": "7", "metadata": {}, "source": [ "#### Understanding the test_flag_missing function\n", @@ -147,10 +162,10 @@ }, { "cell_type": "markdown", - "id": "7", + "id": "8", "metadata": {}, "source": [ - "## Exercise 1b: Write a unit test for impute_by_group\n", + "## Exercise 2b: Write a unit test for impute_by_group\n", "\n", "Now try writing your own tests for the following function. Use the walkthrough above as a guide.\n", "\n", @@ -165,7 +180,7 @@ { "cell_type": "code", "execution_count": null, - "id": "8", + "id": "9", "metadata": {}, "outputs": [], "source": [ @@ -190,7 +205,7 @@ }, { "cell_type": "markdown", - "id": "9", + "id": "10", "metadata": {}, "source": [ "**Task:**\n", @@ -205,10 +220,10 @@ }, { "cell_type": "markdown", - "id": "10", + "id": "11", "metadata": {}, "source": [ - "## Exercise 2: Run your unit tests\n", + "## Exercise 3: Run your unit tests\n", "\n", "**Task:**\n", "- Run all tests in the `tests/` folder using the command below:\n", @@ -223,10 +238,10 @@ }, { "cell_type": "markdown", - "id": "11", + "id": "12", "metadata": {}, "source": [ - "## Exercise 3: Stretch - Check test coverage\n", + "## Exercise 4: Stretch - Check test coverage\n", "\n", "Test coverage shows how much of your code is tested by unit tests.\n", "\n", @@ -245,10 +260,10 @@ }, { "cell_type": "markdown", - "id": "12", + "id": "13", "metadata": {}, "source": [ - "## Exercise 4: Stretch - Try parameterisation in pytest\n", + "## Exercise 5: Stretch - Try parameterisation in pytest\n", "\n", "Parameterisation lets you run the same test with different inputs.\n", "\n", @@ -260,7 +275,8 @@ ], "metadata": { "language_info": { - "name": "python" + "name": "python", + "version": "3.12.3" } }, "nbformat": 4, diff --git a/exercises/solutions/04_unit_tests_solutions.ipynb b/exercises/solutions/04_unit_tests_solutions.ipynb index dd96603..67c0a4e 100644 --- a/exercises/solutions/04_unit_tests_solutions.ipynb +++ b/exercises/solutions/04_unit_tests_solutions.ipynb @@ -15,7 +15,7 @@ "id": "1", "metadata": {}, "source": [ - "## Solution 1: Write a simple unit test for a new function\n", + "## Exercise 2 Solution: Write a simple unit test for a new function\n", "\n", "Here are example unit tests for the `flag_missing` and `impute_by_group` functions in `src/python_rap_demo/cleaning.py`:" ] @@ -73,7 +73,7 @@ "id": "4", "metadata": {}, "source": [ - "## Solution 2: Run your unit tests\n", + "## Exercise 3 Solution: Run your unit tests\n", "\n", "Run the following command in your terminal:\n", "```cmd\n", @@ -88,7 +88,7 @@ "id": "5", "metadata": {}, "source": [ - "## Solution 3: Stretch - Check test coverage\n", + "## Exercise 4 Solution: Stretch - Check test coverage\n", "\n", "Run the following commands:\n", "```cmd\n", @@ -105,7 +105,7 @@ "id": "6", "metadata": {}, "source": [ - "## Solution 4: Stretch - Try parameterisation in pytest\n", + "## Exercise 5 Solution: Stretch - Try parameterisation in pytest\n", "\n", "Here are examples using `@pytest.mark.parametrize` for `flag_missing` and `impute_by_group`. Parameterisation lets you run the same test with different inputs, making your tests more robust and easier to maintain." ] From aa6086731039807b755f1ee7ab2b456ce448e839 Mon Sep 17 00:00:00 2001 From: fryerd1 Date: Wed, 7 Jan 2026 13:39:19 +0000 Subject: [PATCH 03/14] Add INSTRUCTIONS.md and change file paths in 02_modules_solution --- INSTRUCTIONS.md | 8 ++++++++ exercises/solutions/02_modules_solutions.ipynb | 6 +++--- 2 files changed, 11 insertions(+), 3 deletions(-) create mode 100644 INSTRUCTIONS.md diff --git a/INSTRUCTIONS.md b/INSTRUCTIONS.md new file mode 100644 index 0000000..df0008b --- /dev/null +++ b/INSTRUCTIONS.md @@ -0,0 +1,8 @@ +Offline Python RAP demo instructions + +Getting started + +1. Download and unzip the folder +2. Open the folder in your chosen IDE + +Once these steps are complete refer to README.md step 2 and onwards diff --git a/exercises/solutions/02_modules_solutions.ipynb b/exercises/solutions/02_modules_solutions.ipynb index 3f9f4bc..2c9469b 100644 --- a/exercises/solutions/02_modules_solutions.ipynb +++ b/exercises/solutions/02_modules_solutions.ipynb @@ -30,8 +30,8 @@ "from python_rap_demo.utils import add_bmi_column\n", "\n", "input_path = \"../../data/input/health_data.csv\"\n", - "cleaned_path = \"../outputs/02_modules/cleaned_data.csv\"\n", - "report_path = \"../outputs/02_modules/\"\n", + "cleaned_path = \"../data/outputs/cleaned/health_data_cleaned.csv\"\n", + "report_path = \"../reports/disease_prevalence_report.md\"\n", "\n", "# I/O: Read data\n", "df = read_health_data(input_path)\n", @@ -736,7 +736,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.5" + "version": "3.12.3" } }, "nbformat": 4, From eea3d4eb0b60642621bd46d679fa3d3ea61917bc Mon Sep 17 00:00:00 2001 From: fryerd1 Date: Thu, 8 Jan 2026 09:14:38 +0000 Subject: [PATCH 04/14] Add reports/ to repository structure in README --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 7541393..37f97da 100644 --- a/README.md +++ b/README.md @@ -43,6 +43,7 @@ Welcome to the RAP (Reproducible Analytical Pipeline) demonstration repository! - `src/` — Main pipeline code and modules - `data/` — Example health data for analysis - `config/` — Configuration files (YAML) +- `reports/` — Graphs and reports - `tests/` — Unit tests for pipeline modules - `exercises/` — **Practice exercises** (see below) - `docs/` — Documentation From ec7ebc322550972d783e61e6bd23d4842419c918 Mon Sep 17 00:00:00 2001 From: fryerd1 Date: Thu, 8 Jan 2026 12:24:57 +0000 Subject: [PATCH 05/14] Fix INSTRUCTIONS.md --- INSTRUCTIONS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/INSTRUCTIONS.md b/INSTRUCTIONS.md index df0008b..6fee3cf 100644 --- a/INSTRUCTIONS.md +++ b/INSTRUCTIONS.md @@ -2,7 +2,7 @@ Offline Python RAP demo instructions Getting started -1. Download and unzip the folder +1. Download and unzip the file 2. Open the folder in your chosen IDE Once these steps are complete refer to README.md step 2 and onwards From af871e0ba1549bdffacfaad0b4dbfc51386fa010 Mon Sep 17 00:00:00 2001 From: fryerd1 Date: Fri, 16 Jan 2026 10:04:15 +0000 Subject: [PATCH 06/14] Respond to comments --- INSTRUCTIONS.md | 2 +- .../solutions/02_modules_solutions.ipynb | 4 +- .../solutions/04_unit_tests_solutions.ipynb | 60 ++++++++++++++++--- 3 files changed, 56 insertions(+), 10 deletions(-) diff --git a/INSTRUCTIONS.md b/INSTRUCTIONS.md index 6fee3cf..dd87503 100644 --- a/INSTRUCTIONS.md +++ b/INSTRUCTIONS.md @@ -5,4 +5,4 @@ Getting started 1. Download and unzip the file 2. Open the folder in your chosen IDE -Once these steps are complete refer to README.md step 2 and onwards +Once these steps are complete refer to the README.md file in the unzipped folder and follow the instrcutions from step 2 onwards diff --git a/exercises/solutions/02_modules_solutions.ipynb b/exercises/solutions/02_modules_solutions.ipynb index 2c9469b..2c6647f 100644 --- a/exercises/solutions/02_modules_solutions.ipynb +++ b/exercises/solutions/02_modules_solutions.ipynb @@ -30,8 +30,8 @@ "from python_rap_demo.utils import add_bmi_column\n", "\n", "input_path = \"../../data/input/health_data.csv\"\n", - "cleaned_path = \"../data/outputs/cleaned/health_data_cleaned.csv\"\n", - "report_path = \"../reports/disease_prevalence_report.md\"\n", + "cleaned_path = \"../../data/outputs/cleaned/health_data_cleaned.csv\"\n", + "report_path = \"../../reports/disease_prevalence_report.md\"\n", "\n", "# I/O: Read data\n", "df = read_health_data(input_path)\n", diff --git a/exercises/solutions/04_unit_tests_solutions.ipynb b/exercises/solutions/04_unit_tests_solutions.ipynb index 67c0a4e..0835ea2 100644 --- a/exercises/solutions/04_unit_tests_solutions.ipynb +++ b/exercises/solutions/04_unit_tests_solutions.ipynb @@ -14,6 +14,21 @@ "cell_type": "markdown", "id": "1", "metadata": {}, + "source": [ + "## Exercise 1 Solution: Review and Adapt an existing unit\n", + "\n", + "Here are some examples of common issues that may be causing the unit test to fail after modification:\n", + "\n", + "The check that missing 'smoker' values default to the value \"No\" may fail if the DataFrame is modified because the test checks row 0 for the value \"No\" as the input DataFrame has a blank 'smoker' value in row 0. \n", + "(Note: The location of the missing 'smoker' value will change as rows are imputed if they have missing 'diagnosis'. For example if there is a missing 'smoker' value in row 2 of the DataFrame and a missing 'diagnosis' in row 1 of the DataFrame, row 1 will be imputed because of the missing 'diagnosis' resulting in row 2 becoming row 1 and therefore the unit test must check row 1 for a value of \"No\" otherwise the test will fail.)\n", + "\n", + "If all rows are imputed, in the case that each row has a missing 'diagnosis' value, the unit tests will run on an empty DataFrame and will therefore fail." + ] + }, + { + "cell_type": "markdown", + "id": "2", + "metadata": {}, "source": [ "## Exercise 2 Solution: Write a simple unit test for a new function\n", "\n", @@ -23,13 +38,18 @@ { "cell_type": "code", "execution_count": null, - "id": "2", + "id": "3", "metadata": {}, "outputs": [], "source": [ "# Walkthrough: Unit test for flag_missing\n", + "import os\n", + "import sys\n", + "\n", "import pandas as pd\n", "\n", + "sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), \"..\", \"..\", \"src\")))\n", + "\n", "from python_rap_demo.cleaning import flag_missing\n", "\n", "\n", @@ -47,13 +67,22 @@ { "cell_type": "code", "execution_count": null, - "id": "3", + "id": "4", "metadata": {}, "outputs": [], "source": [ "# Example: Unit test for impute_by_group\n", + "\n", + "# Note: This test will not run unless impute_by_group has been entered into cleaning.py\n", + "\n", + "\n", + "import os\n", + "import sys\n", + "\n", "import pandas as pd\n", "\n", + "sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), \"..\", \"..\", \"src\")))\n", + "\n", "from python_rap_demo.cleaning import impute_by_group\n", "\n", "\n", @@ -70,7 +99,7 @@ }, { "cell_type": "markdown", - "id": "4", + "id": "5", "metadata": {}, "source": [ "## Exercise 3 Solution: Run your unit tests\n", @@ -85,7 +114,7 @@ }, { "cell_type": "markdown", - "id": "5", + "id": "6", "metadata": {}, "source": [ "## Exercise 4 Solution: Stretch - Check test coverage\n", @@ -102,7 +131,7 @@ }, { "cell_type": "markdown", - "id": "6", + "id": "7", "metadata": {}, "source": [ "## Exercise 5 Solution: Stretch - Try parameterisation in pytest\n", @@ -113,13 +142,21 @@ { "cell_type": "code", "execution_count": null, - "id": "7", + "id": "8", "metadata": {}, "outputs": [], "source": [ + "# Note: These tests will not run unless impute_by_group and flag_missing have been entered into cleaning.py\n", + "\n", + "\n", + "import os\n", + "import sys\n", + "\n", "import pandas as pd\n", "import pytest\n", "\n", + "sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), \"..\", \"..\", \"src\")))\n", + "\n", "from python_rap_demo.cleaning import flag_missing, impute_by_group\n", "\n", "# Parameterised test for flag_missing\n", @@ -188,7 +225,16 @@ ], "metadata": { "language_info": { - "name": "python" + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.3" } }, "nbformat": 4, From ed45fff4d016fea31b523cd08f7dd2e90e12ee43 Mon Sep 17 00:00:00 2001 From: fryerd1 Date: Fri, 16 Jan 2026 12:49:08 +0000 Subject: [PATCH 07/14] change unit test exercise 1 title --- exercises/04_unit_tests.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/exercises/04_unit_tests.ipynb b/exercises/04_unit_tests.ipynb index 43e9b92..8ae5b48 100644 --- a/exercises/04_unit_tests.ipynb +++ b/exercises/04_unit_tests.ipynb @@ -31,7 +31,7 @@ "id": "2", "metadata": {}, "source": [ - "## Exercise 1: Review an existing unit test\n", + "## Exercise 1: Review and Adapt an existing unit test\n", "\n", "The function `clean_health_data` in `src/python_rap_demo/cleaning.py` has an existing unit test in\n", "`tests/test_cleaning.py`.\n", From ea7e72019ec4b35b07c839886eccc20e9dbc3b9e Mon Sep 17 00:00:00 2001 From: fryerd1 Date: Mon, 19 Jan 2026 10:47:57 +0000 Subject: [PATCH 08/14] Add __init.py__ , edit unit_tests_solutions, add comma to INSTRUCTIONS --- INSTRUCTIONS.md | 2 +- exercises/solutions/04_unit_tests_solutions.ipynb | 12 +++++++++--- src/__init__.py | 0 src/python_rap_demo/__init__.py | 0 4 files changed, 10 insertions(+), 4 deletions(-) create mode 100644 src/__init__.py create mode 100644 src/python_rap_demo/__init__.py diff --git a/INSTRUCTIONS.md b/INSTRUCTIONS.md index dd87503..a6dfc36 100644 --- a/INSTRUCTIONS.md +++ b/INSTRUCTIONS.md @@ -5,4 +5,4 @@ Getting started 1. Download and unzip the file 2. Open the folder in your chosen IDE -Once these steps are complete refer to the README.md file in the unzipped folder and follow the instrcutions from step 2 onwards +Once these steps are complete, refer to the README.md file in the unzipped folder and follow the instrcutions from step 2 onwards diff --git a/exercises/solutions/04_unit_tests_solutions.ipynb b/exercises/solutions/04_unit_tests_solutions.ipynb index 0835ea2..e03354e 100644 --- a/exercises/solutions/04_unit_tests_solutions.ipynb +++ b/exercises/solutions/04_unit_tests_solutions.ipynb @@ -61,7 +61,11 @@ " flagged = flag_missing(df, [\"height_cm\", \"weight_kg\"])\n", " # Check that the _imputed columns are correct\n", " assert flagged[\"height_cm_imputed\"].tolist() == [False, True]\n", - " assert flagged[\"weight_kg_imputed\"].tolist() == [False, True]" + " print(\"height_cm_imputed test passed.\")\n", + " assert flagged[\"weight_kg_imputed\"].tolist() == [False, True]\n", + " print(\"weight_kg_imputed test passed.\")\n", + "\n", + "test_flag_missing()" ] }, { @@ -94,7 +98,10 @@ " imputed = impute_by_group(df, \"height_cm\", \"gender\")\n", " # Check that missing value is imputed with group mean\n", " expected = [170, 160, 160]\n", - " assert imputed.tolist() == expected" + " assert imputed.tolist() == expected\n", + " print(\"impute_by_group test passed.\")\n", + "\n", + "test_impute_by_group()" ] }, { @@ -148,7 +155,6 @@ "source": [ "# Note: These tests will not run unless impute_by_group and flag_missing have been entered into cleaning.py\n", "\n", - "\n", "import os\n", "import sys\n", "\n", diff --git a/src/__init__.py b/src/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/src/python_rap_demo/__init__.py b/src/python_rap_demo/__init__.py new file mode 100644 index 0000000..e69de29 From e2e157205ac53110d4f7c4fac087c536cd4d84b3 Mon Sep 17 00:00:00 2001 From: fryerd1 Date: Mon, 19 Jan 2026 11:03:45 +0000 Subject: [PATCH 09/14] Change setup block in modules_solutions --- exercises/solutions/02_modules_solutions.ipynb | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/exercises/solutions/02_modules_solutions.ipynb b/exercises/solutions/02_modules_solutions.ipynb index 2c6647f..e107003 100644 --- a/exercises/solutions/02_modules_solutions.ipynb +++ b/exercises/solutions/02_modules_solutions.ipynb @@ -22,6 +22,8 @@ "import os\n", "import sys\n", "\n", + "import pandas as pd\n", + "\n", "sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), \"..\", \"..\", \"src\")))\n", "\n", "from python_rap_demo.cleaning import clean_health_data\n", @@ -62,7 +64,6 @@ "outputs": [], "source": [ "# Modular solution using subfunctions\n", - "import pandas as pd\n", "\n", "\n", "def flag_missing(df: pd.DataFrame, columns: list[str]) -> pd.DataFrame:\n", From 53a0445666667a5321c74bd12747bd1cc86717de Mon Sep 17 00:00:00 2001 From: fryerd1 Date: Tue, 20 Jan 2026 11:54:04 +0000 Subject: [PATCH 10/14] Update INSTRUCTIONS.md Co-authored-by: alex-westwood <156091267+alex-westwood@users.noreply.github.com> --- INSTRUCTIONS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/INSTRUCTIONS.md b/INSTRUCTIONS.md index a6dfc36..3c24f53 100644 --- a/INSTRUCTIONS.md +++ b/INSTRUCTIONS.md @@ -5,4 +5,4 @@ Getting started 1. Download and unzip the file 2. Open the folder in your chosen IDE -Once these steps are complete, refer to the README.md file in the unzipped folder and follow the instrcutions from step 2 onwards +Once these steps are complete, refer to the README.md file in the unzipped folder and follow the instructions from step 2 onwards From c467497ebbb82684c41c4ba32abea642139a64b1 Mon Sep 17 00:00:00 2001 From: fryerd1 Date: Tue, 20 Jan 2026 13:58:34 +0000 Subject: [PATCH 11/14] Change file path in 02_module_solutions --- exercises/solutions/02_modules_solutions.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/exercises/solutions/02_modules_solutions.ipynb b/exercises/solutions/02_modules_solutions.ipynb index e107003..f29065b 100644 --- a/exercises/solutions/02_modules_solutions.ipynb +++ b/exercises/solutions/02_modules_solutions.ipynb @@ -33,7 +33,7 @@ "\n", "input_path = \"../../data/input/health_data.csv\"\n", "cleaned_path = \"../../data/outputs/cleaned/health_data_cleaned.csv\"\n", - "report_path = \"../../reports/disease_prevalence_report.md\"\n", + "report_path = \"../../reports/\"\n", "\n", "# I/O: Read data\n", "df = read_health_data(input_path)\n", From 71953e6940313c9fa0c68af0012967397e81412d Mon Sep 17 00:00:00 2001 From: fryerd1 Date: Tue, 20 Jan 2026 15:07:39 +0000 Subject: [PATCH 12/14] Update unit tests exercises and solutions --- exercises/04_unit_tests.ipynb | 44 ++++-- .../solutions/04_unit_tests_solutions.ipynb | 127 ++++++++++++++++-- 2 files changed, 146 insertions(+), 25 deletions(-) diff --git a/exercises/04_unit_tests.ipynb b/exercises/04_unit_tests.ipynb index 8ae5b48..0e371a8 100644 --- a/exercises/04_unit_tests.ipynb +++ b/exercises/04_unit_tests.ipynb @@ -38,13 +38,33 @@ "\n", "**Task:** \n", "1. Run the existing unit test for `clean_health_data` to understand how it works. \n", - "2. Modify the sample DataFrame in the test to understand what causes the tests to pass or fail (Note: if all rows are dropped all tests will fail)." + "2. Modify the sample DataFrame in the test to understand what causes the tests to pass or fail (Note: if all rows are dropped all tests will fail).\n", + "\n", + "There is an example of a modified DataFrame below however it does not have to be used, feel free to modify the DataFrame and experiment with the tests without following the set exercise." ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "id": "3", "metadata": {}, + "outputs": [], + "source": [ + "df = pd.DataFrame(\n", + " {\n", + " \"diagnosis\": [None, None, \"A\", \"B\"],\n", + " \"smoker\": [\"No\", \"Yes\", \"Yes\", \"No\"],\n", + " \"gender\": [\"m\", \"f\", \"M\", \"F\"],\n", + " \"height_cm\": [170, None, 160, 165],\n", + " \"weight_kg\": [70, 80, 75, 68],\n", + " }\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "4", + "metadata": {}, "source": [ "## Exercise 2: Write a simple unit test for a new function\n", "\n", @@ -61,7 +81,7 @@ }, { "cell_type": "markdown", - "id": "4", + "id": "5", "metadata": {}, "source": [ "## Exercise 2a: Walkthrough - Write a unit test for flag_missing\n", @@ -74,7 +94,7 @@ { "cell_type": "code", "execution_count": null, - "id": "5", + "id": "6", "metadata": {}, "outputs": [], "source": [ @@ -108,7 +128,7 @@ }, { "cell_type": "markdown", - "id": "6", + "id": "7", "metadata": {}, "source": [ "### How to write a unit test for `flag_missing`\n", @@ -144,7 +164,7 @@ }, { "cell_type": "markdown", - "id": "7", + "id": "8", "metadata": {}, "source": [ "#### Understanding the test_flag_missing function\n", @@ -162,7 +182,7 @@ }, { "cell_type": "markdown", - "id": "8", + "id": "9", "metadata": {}, "source": [ "## Exercise 2b: Write a unit test for impute_by_group\n", @@ -180,7 +200,7 @@ { "cell_type": "code", "execution_count": null, - "id": "9", + "id": "10", "metadata": {}, "outputs": [], "source": [ @@ -205,7 +225,7 @@ }, { "cell_type": "markdown", - "id": "10", + "id": "11", "metadata": {}, "source": [ "**Task:**\n", @@ -220,7 +240,7 @@ }, { "cell_type": "markdown", - "id": "11", + "id": "12", "metadata": {}, "source": [ "## Exercise 3: Run your unit tests\n", @@ -238,7 +258,7 @@ }, { "cell_type": "markdown", - "id": "12", + "id": "13", "metadata": {}, "source": [ "## Exercise 4: Stretch - Check test coverage\n", @@ -260,7 +280,7 @@ }, { "cell_type": "markdown", - "id": "13", + "id": "14", "metadata": {}, "source": [ "## Exercise 5: Stretch - Try parameterisation in pytest\n", diff --git a/exercises/solutions/04_unit_tests_solutions.ipynb b/exercises/solutions/04_unit_tests_solutions.ipynb index e03354e..9ecbd88 100644 --- a/exercises/solutions/04_unit_tests_solutions.ipynb +++ b/exercises/solutions/04_unit_tests_solutions.ipynb @@ -17,18 +17,117 @@ "source": [ "## Exercise 1 Solution: Review and Adapt an existing unit\n", "\n", - "Here are some examples of common issues that may be causing the unit test to fail after modification:\n", - "\n", - "The check that missing 'smoker' values default to the value \"No\" may fail if the DataFrame is modified because the test checks row 0 for the value \"No\" as the input DataFrame has a blank 'smoker' value in row 0. \n", - "(Note: The location of the missing 'smoker' value will change as rows are imputed if they have missing 'diagnosis'. For example if there is a missing 'smoker' value in row 2 of the DataFrame and a missing 'diagnosis' in row 1 of the DataFrame, row 1 will be imputed because of the missing 'diagnosis' resulting in row 2 becoming row 1 and therefore the unit test must check row 1 for a value of \"No\" otherwise the test will fail.)\n", - "\n", - "If all rows are imputed, in the case that each row has a missing 'diagnosis' value, the unit tests will run on an empty DataFrame and will therefore fail." + "Open `tests/test_cleaning.py` and run the test using the following command in the terminal:\n", + "```cmd\n", + "pytest tests/test_cleaning.py\n", + "```\n", + "After the test passes, change the DataFrame to the following DataFrame and run the test again:" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "id": "2", "metadata": {}, + "outputs": [], + "source": [ + "df = pd.DataFrame(\n", + " {\n", + " \"diagnosis\": [None, None, \"A\", \"B\"],\n", + " \"smoker\": [\"No\", \"Yes\", \"Yes\", \"No\"],\n", + " \"gender\": [\"m\", \"f\", \"M\", \"F\"],\n", + " \"height_cm\": [170, None, 160, 165],\n", + " \"weight_kg\": [70, 80, 75, 68],\n", + " }\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "3", + "metadata": {}, + "source": [ + "Notice how the check for the missing smoker value fails. The test checks column 0 for a \"smoker\" value of \"No\", with the new DataFrame the first 2 columns are imputed due to missing diagnosis leaving the DataFrame looking like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4", + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.DataFrame(\n", + " {\n", + " \"diagnosis\": [\"A\", \"B\"],\n", + " \"smoker\": [\"Yes\", \"No\"],\n", + " \"gender\": [\"M\", \"F\"],\n", + " \"height_cm\": [160, 165],\n", + " \"weight_kg\": [75, 68],\n", + " }\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "5", + "metadata": {}, + "source": [ + "The test checks column 0 and expects the value \"No\" in the \"smoker\" column, instead it receives the value \"Yes\" and therefore the test fails.\n", + "In order for the test to pass either change the column that the test checks or change the expected \"smoker\" value to \"Yes\" instead of \"No\".\n", + "The original assert statement looks like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6", + "metadata": {}, + "outputs": [], + "source": [ + "assert cleaned[\"smoker\"].iloc[0] == \"No\"" + ] + }, + { + "cell_type": "markdown", + "id": "7", + "metadata": {}, + "source": [ + "The changed assert statement should look like either:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8", + "metadata": {}, + "outputs": [], + "source": [ + "assert cleaned[\"smoker\"].iloc[0] == \"Yes\"" + ] + }, + { + "cell_type": "markdown", + "id": "9", + "metadata": {}, + "source": [ + "or" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "10", + "metadata": {}, + "outputs": [], + "source": [ + "assert cleaned[\"smoker\"].iloc[1] == \"No\"" + ] + }, + { + "cell_type": "markdown", + "id": "11", + "metadata": {}, "source": [ "## Exercise 2 Solution: Write a simple unit test for a new function\n", "\n", @@ -38,7 +137,7 @@ { "cell_type": "code", "execution_count": null, - "id": "3", + "id": "12", "metadata": {}, "outputs": [], "source": [ @@ -65,13 +164,14 @@ " assert flagged[\"weight_kg_imputed\"].tolist() == [False, True]\n", " print(\"weight_kg_imputed test passed.\")\n", "\n", + "\n", "test_flag_missing()" ] }, { "cell_type": "code", "execution_count": null, - "id": "4", + "id": "13", "metadata": {}, "outputs": [], "source": [ @@ -101,12 +201,13 @@ " assert imputed.tolist() == expected\n", " print(\"impute_by_group test passed.\")\n", "\n", + "\n", "test_impute_by_group()" ] }, { "cell_type": "markdown", - "id": "5", + "id": "14", "metadata": {}, "source": [ "## Exercise 3 Solution: Run your unit tests\n", @@ -121,7 +222,7 @@ }, { "cell_type": "markdown", - "id": "6", + "id": "15", "metadata": {}, "source": [ "## Exercise 4 Solution: Stretch - Check test coverage\n", @@ -138,7 +239,7 @@ }, { "cell_type": "markdown", - "id": "7", + "id": "16", "metadata": {}, "source": [ "## Exercise 5 Solution: Stretch - Try parameterisation in pytest\n", @@ -149,7 +250,7 @@ { "cell_type": "code", "execution_count": null, - "id": "8", + "id": "17", "metadata": {}, "outputs": [], "source": [ From 23aadb24662ca8154c57c0ab86c0280838b09810 Mon Sep 17 00:00:00 2001 From: fryerd1 Date: Thu, 22 Jan 2026 10:55:32 +0000 Subject: [PATCH 13/14] Update unit tests exercise and solution --- exercises/04_unit_tests.ipynb | 35 ++++-- .../solutions/04_unit_tests_solutions.ipynb | 102 +++++++----------- 2 files changed, 60 insertions(+), 77 deletions(-) diff --git a/exercises/04_unit_tests.ipynb b/exercises/04_unit_tests.ipynb index 0e371a8..417d58c 100644 --- a/exercises/04_unit_tests.ipynb +++ b/exercises/04_unit_tests.ipynb @@ -38,9 +38,9 @@ "\n", "**Task:** \n", "1. Run the existing unit test for `clean_health_data` to understand how it works. \n", - "2. Modify the sample DataFrame in the test to understand what causes the tests to pass or fail (Note: if all rows are dropped all tests will fail).\n", + "2. Modify the function `clean_health_data` and re-run the tests to understand what causes the tests to pass or fail, then modify the unit test so that the tests passes again.\n", "\n", - "There is an example of a modified DataFrame below however it does not have to be used, feel free to modify the DataFrame and experiment with the tests without following the set exercise." + "There is an example of a modified function below however it does not have to be used, feel free to modify the function and experiment with the tests without following the set exercise." ] }, { @@ -50,15 +50,28 @@ "metadata": {}, "outputs": [], "source": [ - "df = pd.DataFrame(\n", - " {\n", - " \"diagnosis\": [None, None, \"A\", \"B\"],\n", - " \"smoker\": [\"No\", \"Yes\", \"Yes\", \"No\"],\n", - " \"gender\": [\"m\", \"f\", \"M\", \"F\"],\n", - " \"height_cm\": [170, None, 160, 165],\n", - " \"weight_kg\": [70, 80, 75, 68],\n", - " }\n", - ")" + "def clean_health_data(df: pd.DataFrame) -> pd.DataFrame:\n", + " \"\"\"\n", + " Clean health data by dropping rows with missing values in key columns.\n", + "\n", + " Args:\n", + " df (pd.DataFrame): Raw health data.\n", + "\n", + " Returns:\n", + " pd.DataFrame: Cleaned health data with no missing values in critical columns.\n", + " \"\"\"\n", + " df = df.copy()\n", + "\n", + " # Drop rows with missing values in height_cm, weight_kg, or diagnosis columns\n", + " df = df.dropna(subset=[\"height_cm\", \"weight_kg\", \"diagnosis\"])\n", + "\n", + " # Fill missing smoker values with 'Yes'\n", + " df[\"smoker\"] = df[\"smoker\"].fillna(\"Yes\")\n", + "\n", + " # Ensure gender is uppercase\n", + " df[\"gender\"] = df[\"gender\"].str.upper()\n", + "\n", + " return df\n" ] }, { diff --git a/exercises/solutions/04_unit_tests_solutions.ipynb b/exercises/solutions/04_unit_tests_solutions.ipynb index 9ecbd88..4e00fff 100644 --- a/exercises/solutions/04_unit_tests_solutions.ipynb +++ b/exercises/solutions/04_unit_tests_solutions.ipynb @@ -21,7 +21,7 @@ "```cmd\n", "pytest tests/test_cleaning.py\n", "```\n", - "After the test passes, change the DataFrame to the following DataFrame and run the test again:" + "After the test passes, change the function `clean_health_data` in `src/python_rap_demo/cleaning.py ` to the following function, save the file and run the test again:" ] }, { @@ -31,15 +31,28 @@ "metadata": {}, "outputs": [], "source": [ - "df = pd.DataFrame(\n", - " {\n", - " \"diagnosis\": [None, None, \"A\", \"B\"],\n", - " \"smoker\": [\"No\", \"Yes\", \"Yes\", \"No\"],\n", - " \"gender\": [\"m\", \"f\", \"M\", \"F\"],\n", - " \"height_cm\": [170, None, 160, 165],\n", - " \"weight_kg\": [70, 80, 75, 68],\n", - " }\n", - ")" + "def clean_health_data(df: pd.DataFrame) -> pd.DataFrame:\n", + " \"\"\"\n", + " Clean health data by dropping rows with missing values in key columns.\n", + "\n", + " Args:\n", + " df (pd.DataFrame): Raw health data.\n", + "\n", + " Returns:\n", + " pd.DataFrame: Cleaned health data with no missing values in critical columns.\n", + " \"\"\"\n", + " df = df.copy()\n", + "\n", + " # Drop rows with missing values in height_cm, weight_kg, or diagnosis columns\n", + " df = df.dropna(subset=[\"height_cm\", \"weight_kg\", \"diagnosis\"])\n", + "\n", + " # Fill missing smoker values with 'Yes'\n", + " df[\"smoker\"] = df[\"smoker\"].fillna(\"Yes\")\n", + "\n", + " # Ensure gender is uppercase\n", + " df[\"gender\"] = df[\"gender\"].str.upper()\n", + "\n", + " return df" ] }, { @@ -47,7 +60,10 @@ "id": "3", "metadata": {}, "source": [ - "Notice how the check for the missing smoker value fails. The test checks column 0 for a \"smoker\" value of \"No\", with the new DataFrame the first 2 columns are imputed due to missing diagnosis leaving the DataFrame looking like this:" + "Notice how the check for the missing smoker value fails. The test checks the first column for a \"smoker\" value of \"No\", however the modified `clean_health_data` function fills missing smoker values with 'Yes', changing the smoker value in the first column to 'Yes', which causes the test to fail.\n", + "\n", + "In order for the test to pass change the expected \"smoker\" value to \"Yes\" instead of \"No\".\n", + "The original assert statement looks like this:" ] }, { @@ -57,15 +73,7 @@ "metadata": {}, "outputs": [], "source": [ - "df = pd.DataFrame(\n", - " {\n", - " \"diagnosis\": [\"A\", \"B\"],\n", - " \"smoker\": [\"Yes\", \"No\"],\n", - " \"gender\": [\"M\", \"F\"],\n", - " \"height_cm\": [160, 165],\n", - " \"weight_kg\": [75, 68],\n", - " }\n", - ")" + "assert cleaned[\"smoker\"].iloc[0] == \"No\"" ] }, { @@ -73,9 +81,7 @@ "id": "5", "metadata": {}, "source": [ - "The test checks column 0 and expects the value \"No\" in the \"smoker\" column, instead it receives the value \"Yes\" and therefore the test fails.\n", - "In order for the test to pass either change the column that the test checks or change the expected \"smoker\" value to \"Yes\" instead of \"No\".\n", - "The original assert statement looks like this:" + "The changed assert statement should look like this:" ] }, { @@ -84,49 +90,13 @@ "id": "6", "metadata": {}, "outputs": [], - "source": [ - "assert cleaned[\"smoker\"].iloc[0] == \"No\"" - ] - }, - { - "cell_type": "markdown", - "id": "7", - "metadata": {}, - "source": [ - "The changed assert statement should look like either:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8", - "metadata": {}, - "outputs": [], "source": [ "assert cleaned[\"smoker\"].iloc[0] == \"Yes\"" ] }, { "cell_type": "markdown", - "id": "9", - "metadata": {}, - "source": [ - "or" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "10", - "metadata": {}, - "outputs": [], - "source": [ - "assert cleaned[\"smoker\"].iloc[1] == \"No\"" - ] - }, - { - "cell_type": "markdown", - "id": "11", + "id": "7", "metadata": {}, "source": [ "## Exercise 2 Solution: Write a simple unit test for a new function\n", @@ -137,7 +107,7 @@ { "cell_type": "code", "execution_count": null, - "id": "12", + "id": "8", "metadata": {}, "outputs": [], "source": [ @@ -171,7 +141,7 @@ { "cell_type": "code", "execution_count": null, - "id": "13", + "id": "9", "metadata": {}, "outputs": [], "source": [ @@ -207,7 +177,7 @@ }, { "cell_type": "markdown", - "id": "14", + "id": "10", "metadata": {}, "source": [ "## Exercise 3 Solution: Run your unit tests\n", @@ -222,7 +192,7 @@ }, { "cell_type": "markdown", - "id": "15", + "id": "11", "metadata": {}, "source": [ "## Exercise 4 Solution: Stretch - Check test coverage\n", @@ -239,7 +209,7 @@ }, { "cell_type": "markdown", - "id": "16", + "id": "12", "metadata": {}, "source": [ "## Exercise 5 Solution: Stretch - Try parameterisation in pytest\n", @@ -250,7 +220,7 @@ { "cell_type": "code", "execution_count": null, - "id": "17", + "id": "13", "metadata": {}, "outputs": [], "source": [ From 3321ef81721d11dbaea907d2b841213402041151 Mon Sep 17 00:00:00 2001 From: alex-westwood <156091267+alex-westwood@users.noreply.github.com> Date: Fri, 23 Jan 2026 15:41:17 +0000 Subject: [PATCH 14/14] Apply suggestions from code review --- exercises/solutions/04_unit_tests_solutions.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/exercises/solutions/04_unit_tests_solutions.ipynb b/exercises/solutions/04_unit_tests_solutions.ipynb index 4e00fff..eefe213 100644 --- a/exercises/solutions/04_unit_tests_solutions.ipynb +++ b/exercises/solutions/04_unit_tests_solutions.ipynb @@ -15,7 +15,7 @@ "id": "1", "metadata": {}, "source": [ - "## Exercise 1 Solution: Review and Adapt an existing unit\n", + "## Exercise 1 Solution: Review and adapt an existing unit test\n", "\n", "Open `tests/test_cleaning.py` and run the test using the following command in the terminal:\n", "```cmd\n", @@ -62,7 +62,7 @@ "source": [ "Notice how the check for the missing smoker value fails. The test checks the first column for a \"smoker\" value of \"No\", however the modified `clean_health_data` function fills missing smoker values with 'Yes', changing the smoker value in the first column to 'Yes', which causes the test to fail.\n", "\n", - "In order for the test to pass change the expected \"smoker\" value to \"Yes\" instead of \"No\".\n", + "The test failing highlights the function to developers who can then check if the change was correct or not. If it was not, the developer can fix the error in the function. If it was, the unit test can be adapted to incorporate the change. In this case assume the change was correct. In order for the test to pass change the expected \"smoker\" value to \"Yes\" instead of \"No\".\n", "The original assert statement looks like this:" ] },