Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions INSTRUCTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Offline Python RAP demo instructions

Getting started

1. Download and unzip the file
2. Open the folder in your chosen IDE

Once these steps are complete, refer to the README.md file in the unzipped folder and follow the instructions from step 2 onwards
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ Welcome to the RAP (Reproducible Analytical Pipeline) demonstration repository!
- `src/` — Main pipeline code and modules
- `data/` — Example health data for analysis
- `config/` — Configuration files (YAML)
- `reports/` — Graphs and reports
- `tests/` — Unit tests for pipeline modules
- `exercises/` — **Practice exercises** (see below)
- `docs/` — Documentation
Expand Down
2 changes: 1 addition & 1 deletion config/user_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@
# - Any other settings you want to change without editing code
input_path: data/input/health_data.csv
cleaned_path: data/outputs/cleaned/health_data_cleaned.csv
report_dir: data/outputs/reports/
report_dir: reports/
6 changes: 3 additions & 3 deletions exercises/01_introduction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,21 @@
"## Where to find exercises, solutions, and outputs\n",
"- **Exercises**: Located in the `exercises/` folder (e.g., `exercises/02_modules.ipynb`)\n",
"- **Solutions**: Located in `exercises/solutions/` (e.g., `exercises/solutions/02_modules_solutions.ipynb`)\n",
"- **Outputs**: Saved in `exercises/outputs/` (e.g., cleaned data, reports, charts)\n",
"- **Outputs**: Saved in `data/outputs/` or `reports/` (e.g., cleaned data, reports, charts)\n",
"\n",
"## How to use the exercises\n",
"1. **Read each exercise notebook** and follow the instructions step by step.\n",
"2. **Write your code** in the notebook cells or in the `src/python_rap_demo/` modules as instructed.\n",
"4. **Check your solutions** by comparing your results with the solutions notebooks in `exercises/solutions/`.\n",
"5. **View outputs** in the `data/outputs/` folder (e.g., cleaned data, markdown reports, charts).\n",
"5. **View outputs** in the `data/outputs/` and `reports/` folders (e.g., cleaned data, markdown reports, charts).\n",
"\n",
"## Adding code to the src folder\n",
"- Place reusable functions and classes in the appropriate module in `src/python_rap_demo/` (e.g., `cleaning.py`, `io.py`, `report.py`).\n",
"- Use clear function names, type hints, and docstrings following PEP8 standards.\n",
"\n",
"## How to check if your solutions have worked\n",
"- **Compare outputs**: Check your results against the solutions notebooks and output files.\n",
"- **Check outputs**: Review generated files in `data/outputs/` for expected results.\n",
"- **Check outputs**: Review generated files in `data/outputs/` or `reports/` for expected results.\n",
"- **Run tests**: For unit test exercises you can run all tests with:\n",
" ```cmd\n",
" pytest tests\n",
Expand Down
7 changes: 5 additions & 2 deletions exercises/02_modules.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,10 @@
"- Write a function to plot missing values per column before the data is cleaned\n",
"- Write a function to plot disease prevalence for each disease category over time after the data is cleaned\n",
"- Add the visualisations to the output report\n",
"- Save the charts to the outputs folder\n"
"\n",
"**Bonus:**\n",
"- Save the charts to the reports folder\n",
"- Customise chart formatting and colours\n"
]
}
],
Expand All @@ -161,7 +164,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
"version": "3.12.3"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion exercises/03_config_files.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@
"\n",
"**Reflect:**\n",
"- How does using config files improve reproducibility and flexibility?\n",
"- What other parameters could you add to make your pipeline more configurable?"
"- What other parameters could you add to make your pipeline more configurable?\n"
]
},
{
Expand Down
83 changes: 66 additions & 17 deletions exercises/04_unit_tests.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,55 @@
"id": "2",
"metadata": {},
"source": [
"## Exercise 1: Write a simple unit test for a new function\n",
"## Exercise 1: Review and Adapt an existing unit test\n",
"\n",
"The function `clean_health_data` in `src/python_rap_demo/cleaning.py` has an existing unit test in\n",
"`tests/test_cleaning.py`.\n",
"\n",
"**Task:** \n",
"1. Run the existing unit test for `clean_health_data` to understand how it works. \n",
"2. Modify the function `clean_health_data` and re-run the tests to understand what causes the tests to pass or fail, then modify the unit test so that the tests passes again.\n",
"\n",
"There is an example of a modified function below however it does not have to be used, feel free to modify the function and experiment with the tests without following the set exercise."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3",
"metadata": {},
"outputs": [],
"source": [
"def clean_health_data(df: pd.DataFrame) -> pd.DataFrame:\n",
" \"\"\"\n",
" Clean health data by dropping rows with missing values in key columns.\n",
"\n",
" Args:\n",
" df (pd.DataFrame): Raw health data.\n",
"\n",
" Returns:\n",
" pd.DataFrame: Cleaned health data with no missing values in critical columns.\n",
" \"\"\"\n",
" df = df.copy()\n",
"\n",
" # Drop rows with missing values in height_cm, weight_kg, or diagnosis columns\n",
" df = df.dropna(subset=[\"height_cm\", \"weight_kg\", \"diagnosis\"])\n",
"\n",
" # Fill missing smoker values with 'Yes'\n",
" df[\"smoker\"] = df[\"smoker\"].fillna(\"Yes\")\n",
"\n",
" # Ensure gender is uppercase\n",
" df[\"gender\"] = df[\"gender\"].str.upper()\n",
"\n",
" return df\n"
]
},
{
"cell_type": "markdown",
"id": "4",
"metadata": {},
"source": [
"## Exercise 2: Write a simple unit test for a new function\n",
"\n",
"Suppose you have created a function called `impute_by_group` in `src/python_rap_demo/cleaning.py`.\n",
"\n",
Expand All @@ -46,10 +94,10 @@
},
{
"cell_type": "markdown",
"id": "3",
"id": "5",
"metadata": {},
"source": [
"## Exercise 1a: Walkthrough - Write a unit test for flag_missing\n",
"## Exercise 2a: Walkthrough - Write a unit test for flag_missing\n",
"\n",
"Let's start with a simple function called `flag_missing`. This function adds a new column to your DataFrame to flag missing values in specified columns.\n",
"\n",
Expand All @@ -59,7 +107,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "4",
"id": "6",
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -93,7 +141,7 @@
},
{
"cell_type": "markdown",
"id": "5",
"id": "7",
"metadata": {},
"source": [
"### How to write a unit test for `flag_missing`\n",
Expand Down Expand Up @@ -129,7 +177,7 @@
},
{
"cell_type": "markdown",
"id": "6",
"id": "8",
"metadata": {},
"source": [
"#### Understanding the test_flag_missing function\n",
Expand All @@ -147,10 +195,10 @@
},
{
"cell_type": "markdown",
"id": "7",
"id": "9",
"metadata": {},
"source": [
"## Exercise 1b: Write a unit test for impute_by_group\n",
"## Exercise 2b: Write a unit test for impute_by_group\n",
"\n",
"Now try writing your own tests for the following function. Use the walkthrough above as a guide.\n",
"\n",
Expand All @@ -165,7 +213,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "8",
"id": "10",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -190,7 +238,7 @@
},
{
"cell_type": "markdown",
"id": "9",
"id": "11",
"metadata": {},
"source": [
"**Task:**\n",
Expand All @@ -205,10 +253,10 @@
},
{
"cell_type": "markdown",
"id": "10",
"id": "12",
"metadata": {},
"source": [
"## Exercise 2: Run your unit tests\n",
"## Exercise 3: Run your unit tests\n",
"\n",
"**Task:**\n",
"- Run all tests in the `tests/` folder using the command below:\n",
Expand All @@ -223,10 +271,10 @@
},
{
"cell_type": "markdown",
"id": "11",
"id": "13",
"metadata": {},
"source": [
"## Exercise 3: Stretch - Check test coverage\n",
"## Exercise 4: Stretch - Check test coverage\n",
"\n",
"Test coverage shows how much of your code is tested by unit tests.\n",
"\n",
Expand All @@ -245,10 +293,10 @@
},
{
"cell_type": "markdown",
"id": "12",
"id": "14",
"metadata": {},
"source": [
"## Exercise 4: Stretch - Try parameterisation in pytest\n",
"## Exercise 5: Stretch - Try parameterisation in pytest\n",
"\n",
"Parameterisation lets you run the same test with different inputs.\n",
"\n",
Expand All @@ -260,7 +308,8 @@
],
"metadata": {
"language_info": {
"name": "python"
"name": "python",
"version": "3.12.3"
}
},
"nbformat": 4,
Expand Down
9 changes: 5 additions & 4 deletions exercises/solutions/02_modules_solutions.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
"import os\n",
"import sys\n",
"\n",
"import pandas as pd\n",
"\n",
"sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), \"..\", \"..\", \"src\")))\n",
"\n",
"from python_rap_demo.cleaning import clean_health_data\n",
Expand All @@ -30,8 +32,8 @@
"from python_rap_demo.utils import add_bmi_column\n",
"\n",
"input_path = \"../../data/input/health_data.csv\"\n",
"cleaned_path = \"../outputs/02_modules/cleaned_data.csv\"\n",
"report_path = \"../outputs/02_modules/\"\n",
"cleaned_path = \"../../data/outputs/cleaned/health_data_cleaned.csv\"\n",
"report_path = \"../../reports/\"\n",
"\n",
"# I/O: Read data\n",
"df = read_health_data(input_path)\n",
Expand Down Expand Up @@ -62,7 +64,6 @@
"outputs": [],
"source": [
"# Modular solution using subfunctions\n",
"import pandas as pd\n",
"\n",
"\n",
"def flag_missing(df: pd.DataFrame, columns: list[str]) -> pd.DataFrame:\n",
Expand Down Expand Up @@ -736,7 +737,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
"version": "3.12.3"
}
},
"nbformat": 4,
Expand Down
Loading