ONSdigital · alex-westwood · Jan 23, 2026 · Jan 6, 2026 · Jan 7, 2026 · Jan 7, 2026
diff --git a/INSTRUCTIONS.md b/INSTRUCTIONS.md
@@ -0,0 +1,8 @@
+Offline Python RAP demo instructions
+
+Getting started
+
+1. Download and unzip the file
+2. Open the folder in your chosen IDE
+
+Once these steps are complete, refer to the README.md file in the unzipped folder and follow the instructions from step 2 onwards
diff --git a/README.md b/README.md
@@ -47,6 +47,7 @@ Welcome to the RAP (Reproducible Analytical Pipeline) demonstration repository!
 - `src/` — Main pipeline code and modules
 - `data/` — Example health data for analysis
 - `config/` — Configuration files (YAML)
+- `reports/` — Graphs and reports
 - `tests/` — Unit tests for pipeline modules
 - `exercises/` — **Practice exercises** (see below)
 - `docs/` — Documentation

diff --git a/config/user_config.yaml b/config/user_config.yaml
@@ -9,4 +9,4 @@
 # - Any other settings you want to change without editing code
 input_path: data/input/health_data.csv
 cleaned_path: data/outputs/cleaned/health_data_cleaned.csv
-report_dir: data/outputs/reports/
+report_dir: reports/
diff --git a/exercises/01_introduction.ipynb b/exercises/01_introduction.ipynb
@@ -24,21 +24,21 @@
     "## Where to find exercises, solutions, and outputs\n",
     "- **Exercises**: Located in the `exercises/` folder (e.g., `exercises/02_modules.ipynb`)\n",
     "- **Solutions**: Located in `exercises/solutions/` (e.g., `exercises/solutions/02_modules_solutions.ipynb`)\n",
-    "- **Outputs**: Saved in `exercises/outputs/` (e.g., cleaned data, reports, charts)\n",
+    "- **Outputs**: Saved in `data/outputs/` or `reports/` (e.g., cleaned data, reports, charts)\n",
     "\n",
     "## How to use the exercises\n",
     "1. **Read each exercise notebook** and follow the instructions step by step.\n",
     "2. **Write your code** in the notebook cells or in the `src/python_rap_demo/` modules as instructed.\n",
     "4. **Check your solutions** by comparing your results with the solutions notebooks in `exercises/solutions/`.\n",
-    "5. **View outputs** in the `data/outputs/` folder (e.g., cleaned data, markdown reports, charts).\n",
+    "5. **View outputs** in the `data/outputs/` and `reports/` folders (e.g., cleaned data, markdown reports, charts).\n",
     "\n",
     "## Adding code to the src folder\n",
     "- Place reusable functions and classes in the appropriate module in `src/python_rap_demo/` (e.g., `cleaning.py`, `io.py`, `report.py`).\n",
     "- Use clear function names, type hints, and docstrings following PEP8 standards.\n",
     "\n",
     "## How to check if your solutions have worked\n",
     "- **Compare outputs**: Check your results against the solutions notebooks and output files.\n",
-    "- **Check outputs**: Review generated files in `data/outputs/` for expected results.\n",
+    "- **Check outputs**: Review generated files in `data/outputs/` or `reports/` for expected results.\n",
     "- **Run tests**: For unit test exercises you can run all tests with:\n",
     "  ```cmd\n",
     "  pytest tests\n",

diff --git a/exercises/02_modules.ipynb b/exercises/02_modules.ipynb
@@ -146,7 +146,10 @@
     "- Write a function to plot missing values per column before the data is cleaned\n",
     "- Write a function to plot disease prevalence for each disease category over time after the data is cleaned\n",
     "- Add the visualisations to the output report\n",
-    "- Save the charts to the outputs folder\n"
+    "\n",
+    "**Bonus:**\n",
+    "- Save the charts to the reports folder\n",
+    "- Customise chart formatting and colours\n"
    ]
   }
  ],
@@ -161,7 +164,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.5"
+   "version": "3.12.3"
   }
  },
  "nbformat": 4,

diff --git a/exercises/03_config_files.ipynb b/exercises/03_config_files.ipynb
@@ -67,7 +67,7 @@
     "\n",
     "**Reflect:**\n",
     "- How does using config files improve reproducibility and flexibility?\n",
-    "- What other parameters could you add to make your pipeline more configurable?"
+    "- What other parameters could you add to make your pipeline more configurable?\n"
    ]
   },
   {

diff --git a/exercises/04_unit_tests.ipynb b/exercises/04_unit_tests.ipynb
@@ -31,7 +31,55 @@
    "id": "2",
    "metadata": {},
    "source": [
-    "## Exercise 1: Write a simple unit test for a new function\n",
+    "## Exercise 1: Review and Adapt an existing unit test\n",
+    "\n",
+    "The function `clean_health_data` in `src/python_rap_demo/cleaning.py` has an existing unit test in\n",
+    "`tests/test_cleaning.py`.\n",
+    "\n",
+    "**Task:** \n",
+    "1. Run the existing unit test for `clean_health_data` to understand how it works. \n",
+    "2. Modify the function `clean_health_data` and re-run the tests to understand what causes the tests to pass or fail, then modify the unit test so that the tests passes again.\n",
+    "\n",
+    "There is an example of a modified function below however it does not have to be used, feel free to modify the function and experiment with the tests without following the set exercise."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def clean_health_data(df: pd.DataFrame) -> pd.DataFrame:\n",
+    "    \"\"\"\n",
+    "    Clean health data by dropping rows with missing values in key columns.\n",
+    "\n",
+    "    Args:\n",
+    "        df (pd.DataFrame): Raw health data.\n",
+    "\n",
+    "    Returns:\n",
+    "        pd.DataFrame: Cleaned health data with no missing values in critical columns.\n",
+    "    \"\"\"\n",
+    "    df = df.copy()\n",
+    "\n",
+    "    # Drop rows with missing values in height_cm, weight_kg, or diagnosis columns\n",
+    "    df = df.dropna(subset=[\"height_cm\", \"weight_kg\", \"diagnosis\"])\n",
+    "\n",
+    "    # Fill missing smoker values with 'Yes'\n",
+    "    df[\"smoker\"] = df[\"smoker\"].fillna(\"Yes\")\n",
+    "\n",
+    "    # Ensure gender is uppercase\n",
+    "    df[\"gender\"] = df[\"gender\"].str.upper()\n",
+    "\n",
+    "    return df\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4",
+   "metadata": {},
+   "source": [
+    "## Exercise 2: Write a simple unit test for a new function\n",
     "\n",
     "Suppose you have created a function called `impute_by_group` in `src/python_rap_demo/cleaning.py`.\n",
     "\n",
@@ -46,10 +94,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3",
+   "id": "5",
    "metadata": {},
    "source": [
-    "## Exercise 1a: Walkthrough - Write a unit test for flag_missing\n",
+    "## Exercise 2a: Walkthrough - Write a unit test for flag_missing\n",
     "\n",
     "Let's start with a simple function called `flag_missing`. This function adds a new column to your DataFrame to flag missing values in specified columns.\n",
     "\n",
@@ -59,7 +107,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "4",
+   "id": "6",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -93,7 +141,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5",
+   "id": "7",
    "metadata": {},
    "source": [
     "### How to write a unit test for `flag_missing`\n",
@@ -129,7 +177,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6",
+   "id": "8",
    "metadata": {},
    "source": [
     "#### Understanding the test_flag_missing function\n",
@@ -147,10 +195,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7",
+   "id": "9",
    "metadata": {},
    "source": [
-    "## Exercise 1b: Write a unit test for impute_by_group\n",
+    "## Exercise 2b: Write a unit test for impute_by_group\n",
     "\n",
     "Now try writing your own tests for the following function. Use the walkthrough above as a guide.\n",
     "\n",
@@ -165,7 +213,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "8",
+   "id": "10",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -190,7 +238,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9",
+   "id": "11",
    "metadata": {},
    "source": [
     "**Task:**\n",
@@ -205,10 +253,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "10",
+   "id": "12",
    "metadata": {},
    "source": [
-    "## Exercise 2: Run your unit tests\n",
+    "## Exercise 3: Run your unit tests\n",
     "\n",
     "**Task:**\n",
     "- Run all tests in the `tests/` folder using the command below:\n",
@@ -223,10 +271,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "11",
+   "id": "13",
    "metadata": {},
    "source": [
-    "## Exercise 3: Stretch - Check test coverage\n",
+    "## Exercise 4: Stretch - Check test coverage\n",
     "\n",
     "Test coverage shows how much of your code is tested by unit tests.\n",
     "\n",
@@ -245,10 +293,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "12",
+   "id": "14",
    "metadata": {},
    "source": [
-    "## Exercise 4: Stretch - Try parameterisation in pytest\n",
+    "## Exercise 5: Stretch - Try parameterisation in pytest\n",
     "\n",
     "Parameterisation lets you run the same test with different inputs.\n",
     "\n",
@@ -260,7 +308,8 @@
  ],
  "metadata": {
   "language_info": {
-   "name": "python"
+   "name": "python",
+   "version": "3.12.3"
   }
  },
  "nbformat": 4,

diff --git a/exercises/solutions/02_modules_solutions.ipynb b/exercises/solutions/02_modules_solutions.ipynb
@@ -22,6 +22,8 @@
     "import os\n",
     "import sys\n",
     "\n",
+    "import pandas as pd\n",
+    "\n",
     "sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), \"..\", \"..\", \"src\")))\n",
     "\n",
     "from python_rap_demo.cleaning import clean_health_data\n",
@@ -30,8 +32,8 @@
     "from python_rap_demo.utils import add_bmi_column\n",
     "\n",
     "input_path = \"../../data/input/health_data.csv\"\n",
-    "cleaned_path = \"../outputs/02_modules/cleaned_data.csv\"\n",
-    "report_path = \"../outputs/02_modules/\"\n",
+    "cleaned_path = \"../../data/outputs/cleaned/health_data_cleaned.csv\"\n",
+    "report_path = \"../../reports/\"\n",
     "\n",
     "# I/O: Read data\n",
     "df = read_health_data(input_path)\n",
@@ -62,7 +64,6 @@
    "outputs": [],
    "source": [
     "# Modular solution using subfunctions\n",
-    "import pandas as pd\n",
     "\n",
     "\n",
     "def flag_missing(df: pd.DataFrame, columns: list[str]) -> pd.DataFrame:\n",
@@ -736,7 +737,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.5"
+   "version": "3.12.3"
   }
  },
  "nbformat": 4,