ONSdigital · alex-westwood · Jan 23, 2026 · Jan 20, 2026 · Jan 20, 2026 · Jan 20, 2026
diff --git a/README.md b/README.md
@@ -8,14 +8,18 @@ In a RAP project, the README is essential for:
 - Documenting setup steps and usage instructions
 - Outlining folder structure and key files
 - Explaining how to run the pipeline, tests, and automation tools
-- Sharing best practices for reproducibility, automation, and transparency
+- Any other information to help users and contributors understand and work with the project
 
-A well-written README makes your RAP project accessible and easy for others to use, review, or contribute to. Update it as your project evolves.
+The README file is the first file users and contributors will interact with in a RAP.
+A well-written README makes the RAP project accessible and easy for others to use, review, or contribute to.
+Update it as your project evolves.
 -->
 # Work in Progress - RAP demonstration repository for Python
 
 Welcome to the RAP (Reproducible Analytical Pipeline) demonstration repository! This repository is designed for beginner to intermediate coders to practice RAP principles, experiment with code, and learn best practices for Reproducible Analytical Pipelines in Python.
 
+See the [Reproducible Analytical Pipelines]([PROVISIONAL_LINK]) materials on the Analysis for Action platform for more information about RAPs and their importance.
+
 **This repository is still in development**
 
 ## Getting Started
@@ -99,6 +103,13 @@ Run tests with:
   pytest tests
   ```
 
+## Troubleshooting
+If you encounter issues:
+- Ensure your virtual environment is activated. The terminal prompt should show `(.venv)` at the start.
+- Check that all dependencies are installed by running `pip install -r requirements.txt`.
+- Verify you are in the project root directory when running commands. The terminal should show the path ending with `python_rap_demo`.
+- For exercise notebooks, clean outputs and restart the kernel if you face issues.
+
 ## AI declaration
 
 AI has been used in the production of this content.

diff --git a/exercises/01_introduction.ipynb b/exercises/01_introduction.ipynb
@@ -6,8 +6,7 @@
    "metadata": {},
    "source": [
     "# Introduction to RAP pipeline exercises\n",
-    "\n",
-    "Welcome to the RAP (Reproducible Analytical Pipeline) exercises for this project. These exercises are designed to help you learn and apply best practices for reproducible data analysis in Python.\n",
+    "Welcome to the RAP (Reproducible Analytical Pipeline) exercises. These exercises are designed to help you develop understanding of best practices for a RAP and walk through how to apply those practices in Python.\n",
     "\n",
     "## Contents of the exercises\n",
     "- **01_introduction.ipynb**: Overview and guidance for the RAP exercises\n",
@@ -17,7 +16,7 @@
     "- **05_continuous_integration.ipynb**: Implement and test continuous integration\n",
     "\n",
     "## Aim of the exercises\n",
-    "The aim is to guide to build on your understanding of reproducible analytical pipelines (RAP) with practical experience in a demonstration repository.  \n",
+    "The aim is to guide to build on your understanding of RAP with practical experience in a demonstration repository.  \n",
     "  \n",
     "These exercises are intended as a starting point that you can build upon. Please use this repository to practice elements of RAP you would like to improve that may not be covered by these exercises.\n",
     "\n",
@@ -34,18 +33,25 @@
     "\n",
     "## Adding code to the src folder\n",
     "- Place reusable functions and classes in the appropriate module in `src/python_rap_demo/` (e.g., `cleaning.py`, `io.py`, `report.py`).\n",
-    "- Use clear function names, type hints, and docstrings following PEP8 standards.\n",
+    "- Use clear function names, type hints, and docstrings following [PEP8](https://peps.python.org/pep-0008/) standards.\n",
     "\n",
     "## How to check if your solutions have worked\n",
-    "- **Compare outputs**: Check your results against the solutions notebooks and output files.\n",
-    "- **Check outputs**: Review generated files in `data/outputs/` or `reports/` for expected results.\n",
-    "- **Run tests**: For unit test exercises you can run all tests with:\n",
+    "View the solutions for each exercise in the `exercises/solutions/` folder. The solutions will walk through the expected outputs and how to run them. This includes:\n",
+    "- **Comparing outputs**: Check your results against the solutions notebooks and output files.\n",
+    "- **Checking outputs**: Review generated files in `data/outputs/` or `reports/` for expected results.\n",
+    "- **Running tests**: For unit test exercises you can run all tests with:\n",
     "  ```cmd\n",
     "  pytest tests\n",
     "  ```\n",
-    "- **Use pre-commit hooks and CI**: Ensure your code passes formatting, linting, and CI checks.\n",
+    "- **Using pre-commit hooks and CI**: Ensure your code passes formatting, linting, and CI checks.\n",
     "\n",
-    "By following these exercises, you will build a reproducible, automated, and transparent analytical pipeline suitable for real-world data analysis."
+    "## Troubleshooting\n",
+    "If you encounter issues:\n",
+    "- Ensure your virtual environment is activated. The terminal prompt should show `(.venv)` at the start.\n",
+    "- Check your notebook is using the correct Python kernel associated with the virtual environment. You can change the kernel in the Jupyter notebook interface.\n",
+    "- Clean outputs and restart the kernel if you face issues.\n",
+    "- Check that all dependencies are installed by running `pip install -r requirements.txt`.\n",
+    "- Verify you are in the project root directory when running commands. The terminal should show the path ending with `python_rap_demo`.\n"
    ]
   }
  ],

diff --git a/exercises/02_modules.ipynb b/exercises/02_modules.ipynb
@@ -43,7 +43,7 @@
    "id": "2",
    "metadata": {},
    "source": [
-    "## Step 1: Review the Monolithic Script\n",
+    "## Exercise 1: Review the Monolithic Script\n",
     "\n",
     "Below is a single script that identifies missing values and performs imputation. Your task is to refactor this into modular code.\n",
     "\n",
@@ -106,7 +106,7 @@
    "id": "6",
    "metadata": {},
    "source": [
-    "## Step 2: Add your functions to the pipeline\n",
+    "## Exercise 2: Add your functions to the pipeline\n",
     "\n",
     "**Tasks:**\n",
     "- Add your functions to an appropriate module in `src/python_rap_demo`\n",
@@ -122,7 +122,7 @@
    "id": "7",
    "metadata": {},
    "source": [
-    "## Step 3: Challenge & Reflection\n",
+    "## Exercise 3: Challenge & Reflection\n",
     "\n",
     "**Tasks:**\n",
     "- Compare the cleaned_data output with and without your changes\n",
@@ -138,7 +138,7 @@
    "id": "8",
    "metadata": {},
    "source": [
-    "## Step 4: Refactor Visualisation Code\n",
+    "## Exercise 4: Refactor Visualisation Code\n",
     "\n",
     "As an additional challenge, Add code to create visualisations. Place all visualisation functions in a separate module (e.g., `report.py`).\n",
     "\n",

diff --git a/exercises/03_config_files.ipynb b/exercises/03_config_files.ipynb
@@ -21,7 +21,7 @@
    "id": "1",
    "metadata": {},
    "source": [
-    "## Step 1: Add a new parameter to the config file\n",
+    "## Exercise 1: Add a new parameter to the config file\n",
     "\n",
     "Open `config/user_config.yaml` and add a new parameter. For example, add a parameter to control the minimum height allowed in your analysis:\n",
     "\n",
@@ -42,7 +42,7 @@
    "id": "2",
    "metadata": {},
    "source": [
-    "## Step 2: Update your script to use the new parameter\n",
+    "## Exercise 2: Update your script to use the new parameter\n",
     "\n",
     "Update your pipeline code (e.g., in `main.py` or `cleaning.py`) to read the new parameter from the config file and use it to filter the data.\n",
     "\n",
@@ -59,7 +59,7 @@
    "id": "3",
    "metadata": {},
    "source": [
-    "## Step 3: Test and reflect\n",
+    "## Exercise 3: Test and reflect\n",
     "\n",
     "**Tasks:**\n",
     "- Run your pipeline and check that the new parameter is applied\n",
@@ -75,7 +75,7 @@
    "id": "4",
    "metadata": {},
    "source": [
-    "## Step 4: Challenge\n",
+    "## Exercise 4: Challenge\n",
     "\n",
     "Add another parameter to your config file, such as `output_format` (e.g., `csv` or `xlsx`). Update your pipeline to use this parameter when saving output files.\n",
     "\n",

diff --git a/exercises/04_unit_tests.ipynb b/exercises/04_unit_tests.ipynb
@@ -18,10 +18,10 @@
     "## Why are unit tests important?\n",
     "\n",
     "Unit tests check that individual functions behave as expected. They:\n",
-    "- Help catch bugs early\n",
-    "- Make code easier to maintain\n",
-    "- Support reproducibility and automation\n",
-    "- Give confidence when refactoring or adding new features\n",
+    "- Help catch bugs, by testing small pieces of code in isolation\n",
+    "- Make code easier to maintain by documenting expected behavior\n",
+    "- Ensure code changes don't break existing functionality by running tests after modifications\n",
+    "- Give confidence when refactoring or adding new features by verifying existing tests still pass\n",
     "\n",
     "Read more in the [pytest documentation](https://docs.pytest.org/en/stable/)."
    ]

diff --git a/exercises/solutions/02_modules_solutions.ipynb b/exercises/solutions/02_modules_solutions.ipynb
@@ -23,6 +23,7 @@
     "import sys\n",
     "\n",
     "import pandas as pd\n",
+    "import plotly.express as px\n",
     "\n",
     "sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), \"..\", \"..\", \"src\")))\n",
     "\n",
@@ -51,9 +52,9 @@
    "id": "2",
    "metadata": {},
    "source": [
-    "## Step 1 solution: Imputation function\n",
+    "## Exercise 1 solution: Imputation function\n",
     "\n",
-    "Below is a function that performs imputation for missing height and weight values using group means (e.g., by sex or diagnosis), and flags which values were imputed. This approach is more robust and transparent than using overall means."
+    "Below is a function that performs imputation for missing height and weight values using group means (e.g., by sex or diagnosis), and flags which values were imputed."
    ]
   },
   {
@@ -130,7 +131,7 @@
    "id": "4",
    "metadata": {},
    "source": [
-    "## Step 2 solution: Add functions to pipeline\n",
+    "## Exercise 2 solution: Add functions to pipeline\n",
     "Where you place these functions depends on the context and what makes most sense for your pipeline. However, for this exercise one appropriate solution is outlined below:\n",
     "\n",
     "1. Place the `flag_missing`, `impute_by_group` and `impute_height_weight` function into `src/python_rap_demo/cleaning.py`. You could also have added `flag_missing` to utils as it could be used across other modules."
@@ -147,8 +148,6 @@
     "cleaning.py: Data cleaning functions\n",
     "\"\"\"\n",
     "\n",
-    "import pandas as pd\n",
-    "\n",
     "\n",
     "def clean_health_data(df: pd.DataFrame) -> pd.DataFrame:\n",
     "    \"\"\"\n",
@@ -298,11 +297,11 @@
    "id": "9",
    "metadata": {},
    "source": [
-    "## Step 3 Solution: Reflection\n",
+    "## Exercise 3 solution: Reflection\n",
     "\n",
-    "- Group-based imputation preserves important differences in the data and improves reproducibility.\n",
-    "- Flagging imputed values helps with transparency and downstream analysis.\n",
-    "- Handling edge cases (e.g., all values missing in a group) ensures robustness.\n",
+    "- Modular code improves reproducibility and maintainability by placing small functions that can be used across the pipeline into clearly named scripts.\n",
+    "- The advantage of separating code into modules is that it allows users to find and reuse functions easily.\n",
+    "- This pipeline could be extended further by adding more functions to existing modules such as extra cleaning or analysis functions. New modules could also be created for specific tasks such as creating consistent spreadsheet outputs.\n",
     "\n",
     "**Extension:**\n",
     "The above could be taken further to automatically detect categorical and numerical columns, applying imputation to each.  \n",
@@ -374,17 +373,7 @@
     "# for cleaning.py\n",
     "\n",
     "\n",
-    "from typing import List\n",
-    "\n",
-    "# import utils functions above using\n",
-    "# from python_rap_demo.utils import (\n",
-    "#     get_column_types,\n",
-    "#     is_categorical_column,\n",
-    "#     is_numeric_column,\n",
-    "# )\n",
-    "\n",
-    "\n",
-    "def flag_missing(df: pd.DataFrame, columns: List[str]) -> pd.DataFrame:\n",
+    "def flag_missing(df: pd.DataFrame, columns: list[str]) -> pd.DataFrame:\n",
     "    \"\"\"\n",
     "    Add boolean columns to flag missing values for specified columns.\n",
     "\n",
@@ -485,7 +474,7 @@
    "id": "13",
    "metadata": {},
    "source": [
-    "## Step 4 Solution: Refactor Visualisation Code\n",
+    "## Exercise 4 solution: Refactor Visualisation Code\n",
     "\n",
     "Below are example functions for visualising missing values and disease prevalence using Plotly. These would go in `src/python_rap_demo/report.py`."
    ]
@@ -505,7 +494,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import plotly.express as px\n",
+    "\"\"\"\n",
+    "report.py: Markdown report generation for RAP pipeline\n",
+    "\"\"\"\n",
     "\n",
     "\n",
     "def plot_missing_values(df: pd.DataFrame, output_path: str) -> None:\n",
@@ -569,11 +560,6 @@
     "report.py: Markdown report generation for RAP pipeline\n",
     "\"\"\"\n",
     "\n",
-    "import pandas as pd\n",
-    "\n",
-    "# Import plotly\n",
-    "import plotly.express as px\n",
-    "\n",
     "\n",
     "def format_month_section(month: str, month_df: pd.DataFrame) -> str:\n",
     "    \"\"\"\n",
@@ -677,7 +663,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# report.py\n",
+    "\"\"\"\n",
+    "report.py: Markdown report generation for RAP pipeline\n",
+    "\"\"\"\n",
     "\n",
     "\n",
     "def generate_markdown_report(\n",