From 8696dac584fc34b0aa8f576f7dfd4ccea546fcc1 Mon Sep 17 00:00:00 2001 From: Alex Westwood Date: Tue, 16 Dec 2025 16:57:48 +0000 Subject: [PATCH 1/3] update wording --- .github/PULL_REQUEST_TEMPLATE.md | 80 +++++++++++++++++++--------- .github/What is the github folder.md | 17 +++--- .github/copilot-instructions.md | 2 +- .pre-commit-config.yaml | 2 +- CHANGELOG | 2 +- README.md | 10 ++-- config/What is the config folder.md | 7 +-- exercises/01_introduction.ipynb | 5 +- exercises/02_modules.ipynb | 20 ++++--- exercises/03_config_files.ipynb | 5 +- requirements.txt | 2 + src/What is the src folder.md | 7 +-- tests/RAP Unit testing guide.md | 6 ++- 13 files changed, 92 insertions(+), 73 deletions(-) diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 261c6da..4821332 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -5,31 +5,59 @@ It helps you provide all necessary information and ensures standard checks are f You can find this template in the .github folder of the repository, and it will be visible when you create a PR via the GitHub web interface. --> -# Pull Request Checklist - -Please review and check off each item before submitting your PR: - -- [ ] I have followed RAP best practices (reproducibility, automation, transparency). -- [ ] My code follows PEP8 standards and includes comments/docstrings. -- [ ] I have added or updated unit tests for new/changed code. -- [ ] All tests pass locally (`pytest tests`). -- [ ] I have updated documentation as needed. -- [ ] I have described the changes clearly below. - -# Description of Changes - +## Description +
Please include a summary of the changes. -# Additional Notes (Optional) - - - -# Reviewer Guidance - -- Please check that all standard checks pass. -- Confirm that RAP principles are maintained. -- Ask for clarification if anything is unclear. - ---- - -*This template helps keep PRs consistent, clear, and easy to review. For more information, see the GitHub documentation on [pull request templates](https://docs.github.com/en/github/building-a-strong-community/creating-a-pull-request-template-for-your-repository).* + - What is this change? + - Is this a bug fix or a feature and does it break any existing functionality? + - How has it been tested? +
+ +## Type of change + +*You can delete options that are not relevant.* + +- [ ] Bug fix - *non-breaking change* +- [ ] New feature - *non-breaking change* +- [ ] Breaking change - *backwards incompatible change, changes expected behaviour* +- [ ] Non-user facing change, structural change, dev functionality, docs ... + +## Checklist: + +- [ ] I have performed a self-review of my own code. +- [ ] I have commented my code appropriately, focusing on explaining my design decisions (explain why, not how). +- [ ] I have made corresponding changes to the documentation (comments, docstring, etc.. ) +- [ ] I have added tests that prove my fix is effective or that my feature works (see here for more information). +- [ ] New and existing unit tests pass locally with my changes. +- [ ] I have updated the changelog. +- [ ] I have checked the pipeline runs with test data. + +
+ +# Peer review +Any new code includes all the following: + +- **Documentation**: docstrings, comments have been added/ updated. +- **Style guidelines**: New code conforms to the project's contribution guidelines. +- **Functionality**: The code works fully implements the requirements and works as expected, handles expected edge cases, exceptions are handled appropriately. +- **Complexity**: The code is not overly complex, logic has been split into appropriately sized functions, etc.. +- **Test coverage**: Unit tests cover essential functions for a reasonable range of inputs and conditions. Added and existing tests pass on my machine. + +### Review comments +Suggestions should be tailored to the code that you are reviewing. Provide context. +Be critical and clear, but not mean. Ask questions and set actions. +
These might include: + +- bugs that need fixing (does it work as expected? and does it work with other code + that it is likely to interact with?) +- alternative methods (could it be written more efficiently or with more clarity?) +- documentation improvements (does the documentation reflect how the code actually works?) +- additional tests that should be implemented + - Do the tests effectively assure that it + works correctly? Are there additional edge cases/ negative tests to be considered? +- code style improvements (could the code be written more clearly?) +
+
+ +*Further reading: [code review best practices](https://best-practice-and-impact.github.io/qa-of-code-guidance/peer_review.html)* diff --git a/.github/What is the github folder.md b/.github/What is the github folder.md index 760f16b..db4d45c 100644 --- a/.github/What is the github folder.md +++ b/.github/What is the github folder.md @@ -1,28 +1,23 @@ # What is the `.github` folder? -The `.github` folder is a special directory in your repository used to store files that help automate, organise, and improve collaboration on your project on GitHub. +The `.github` folder is a directory in your repository used to store files that interact with GitHub. This includes, templates for pull requests, instructions for GitHub products like Copilot and GitHub action workflows. ## Why is it important? -- It helps you set up workflows, templates, and community standards for your project. -- It makes your project easier to contribute to and maintain, especially for teams or open-source projects. -- In RAP (Reproducible Analytical Pipeline) projects, it supports automation, reproducibility, and transparency. +- For RAP projects, ensures your work is reproducible and transparent by documenting and automating key steps in workflows. +- Automates tasks like testing and code checks, so code is consistent and quality assured at every push or pull request. +- Templates and guidelines help document your process, so others can repeat your work and understand each step, making it easier to contribute. + ## Common files in `.github` - **Workflows** (`workflows/`): - - Contains GitHub Actions workflow files (e.g., `ci.yml`) that automate tasks like running tests, checking code quality, or deploying your project. + - Contains GitHub Actions workflow files (e.g., `ci.yml`) that automate tasks like running tests, checking code quality, or deploying your project. These workflows are run on a GitHub virtual machine after pushing code, or creating or amending a pull request. The outputs can be viewed in GitHub actions and will notify the developer and block merges if any errors are raised. - **Pull request templates** (`PULL_REQUEST_TEMPLATE.md`): - Provides a checklist and guidance for contributors when they open a pull request, helping ensure best practices are followed. - **Issue templates** (`ISSUE_TEMPLATE/`): - Helps users report bugs or request features in a consistent way. -- **Contributing guidelines** (`CONTRIBUTING.md`): - - Explains how to contribute to the project, including coding standards and review processes. -- **Code of conduct** (`CODE_OF_CONDUCT.md`): - - Sets expectations for behavior in your project's community. - **Copilot instructions** (`copilot-instructions.md`): - Allows users to set baseline instructions for GitHub copilot to follow. Instructions are then followed for all prompts. For example, you may add "All functions must include docstrings with defined Args and Returns" -- **Other community files**: - - You can add files like `FUNDING.yml` (for sponsorship), `SECURITY.md` (for reporting vulnerabilities), and more. ## Practical example - In this RAP repository, `.github/workflows/ci.yml` runs automated tests every time you push code or open a pull request. diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 6188e56..7cd0e60 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -4,7 +4,7 @@ custom instructions to Copilot. For more details, visit https://code.visualstudio.com/docs/copilot/copilot-customization#_use-a-githubcopilotinstructionsmd-file --> -This repository follows RAP (Reproducible Analytical Pipeline) best practices. Please prioritize reproducibility, automation, and transparency in all code and documentation. +This repository follows RAP (Reproducible Analytical Pipeline) best practices. Please prioritise reproducibility, automation, and transparency in all code and documentation. This repository is a learning resource for people who are new to RAP with beginner to intermediate coding experience. Please ensure documentation is clearly worded and simple to understand. Ensure new code follows PEP8 standards and includes appropriate comments and docstrings detailing args and returns where necessary. Use type hints for all function signatures. diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index dff0f82..b64cc99 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -1,7 +1,7 @@ # .pre-commit-config.yaml for python_rap_demo RAP project # # This file configures pre-commit hooks, which automatically check and fix code before you commit changes to Git. -# Pre-commit hooks help maintain code quality, consistency, and security by running checks +# Pre-commit hooks help maintain code quality, and security by running checks # (like formatting, linting, and secret detection) every time you make a commit. # # To use this file: diff --git a/CHANGELOG b/CHANGELOG index 87b4137..4e2323f 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,7 +1,7 @@ # Work in Progress - RAP demonstration repository for Python -Welcome to the RAP (Reproducible Analytical Pipeline) demonstration repository! This repository is designed for beginners to practice RAP principles, experiment with code, and learn best practices for reproducible, automated, and transparent analytical pipelines in Python. +Welcome to the RAP (Reproducible Analytical Pipeline) demonstration repository! This repository is designed for beginner to intermediate coders to practice RAP principles, experiment with code, and learn best practices for Reproducible Analytical Pipelines in Python. **This repository is still in development** @@ -76,7 +76,7 @@ All exercises for RAP learning are in the `exercises/` folder. These are not par - Add new modules - Use config files - Write unit tests - - Set up and customize pre-commit hooks + - Set up and customise pre-commit hooks - Apply RAP principles in real code **Do not edit files in `src/` unless instructed by an exercise.** @@ -85,7 +85,7 @@ All exercises for RAP learning are in the `exercises/` folder. These are not par Information about different files and folders can be found throughout the pipeline: - Files: Contain information on what they are and what they are used for in a RAP in the file itself, except .secrets.baseline. .secrets.baseline information can be found in the `docs` folder - - Folders: Contain a README to explain what the folder is for and typical files it contains + - Folders: Contain a markdown (.md) file to explain what the folder is for and typical files it contains. - Scripts: Fully documented with docstrings and comments. ### Create and run tests @@ -97,10 +97,6 @@ Run tests with: pytest tests ``` -## Contributing - -This repo is for learning and experimentation. If you want to contribute improvements, please read `CONTRIBUTING.md`. - ## AI declaration AI has been used in the production of this content. diff --git a/config/What is the config folder.md b/config/What is the config folder.md index e9052b2..32e10c0 100644 --- a/config/What is the config folder.md +++ b/config/What is the config folder.md @@ -1,11 +1,11 @@ # What is the `config` folder? -The `config` folder is used to store configuration (config) files that control how your RAP (Reproducible Analytical Pipeline) project runs. These files help separate settings from code, making your analysis easier to update, share, and reproduce. +The `config` folder is used to store configuration (config) files that control how your RAP (Reproducible Analytical Pipeline) project runs. These files hold common parameters like file names, dates or other settings used across your code in one place. This helps to separate settings from code, making your analysis easier to update, share, and reproduce. ## Why is it important? - Keeps all important settings in one place - Makes it easy to change file paths, parameters, or options without editing code -- Improves reproducibility and transparency +- Improves reproducibility and transparency by documenting settings that are used for each run - Helps users and developers understand and customise the pipeline ## Common types of config files @@ -21,6 +21,7 @@ The `config` folder is used to store configuration (config) files that control h - Advanced pipeline options - Debugging or logging settings - Experimental features + - Parameters for different types of tests - **Other config files**: - You can add config files for specific tools (e.g., `pre-commit`, `pytest`, `bandit`), or for different environments (e.g., production vs. development). @@ -29,4 +30,4 @@ The `config` folder is used to store configuration (config) files that control h - Users of the pipeline can change where files are read from or written to without updating the code. ## Summary -The `config` folder is a key part of making your RAP project flexible, reproducible, and easy to use. As your project grows, you can add more config files to organise and control different parts of your analysis. +The `config` folder is a key part of making your RAP project flexible, reproducible, and easy to use. As your project grows, you can add more parameters to the config file or other config files to organise and control different parts of your analysis. diff --git a/exercises/01_introduction.ipynb b/exercises/01_introduction.ipynb index ce0aa75..78c7e7b 100644 --- a/exercises/01_introduction.ipynb +++ b/exercises/01_introduction.ipynb @@ -7,15 +7,14 @@ "source": [ "# Introduction to RAP pipeline exercises\n", "\n", - "Welcome to the RAP (Reproducible Analytical Pipeline) exercises for this project. These exercises are designed to help you learn and apply best practices for reproducible, modular, and transparent data analysis in Python.\n", + "Welcome to the RAP (Reproducible Analytical Pipeline) exercises for this project. These exercises are designed to help you learn and apply best practices for reproducible data analysis in Python.\n", "\n", "## Contents of the exercises\n", "- **01_introduction.ipynb**: Overview and guidance for the RAP exercises\n", "- **02_modules.ipynb**: Refactor monolithic code into functions and modular scripts\n", "- **03_config_files.ipynb**: Use configuration files to control pipeline behaviour\n", "- **04_unit_tests.ipynb**: Write and run unit tests for your code\n", - "- **05_logging.ipynb**: Add logging for transparency, debugging and reproducibility\n", - "- **06_continuous_integration.ipynb**: Implement and test continuous integration\n", + "- **05_continuous_integration.ipynb**: Implement and test continuous integration\n", "\n", "## Aim of the exercises\n", "The aim is to guide to build on your understanding of reproducible analytical pipelines (RAP) with practical experience in a demonstration repository. \n", diff --git a/exercises/02_modules.ipynb b/exercises/02_modules.ipynb index c6c8deb..9f3aa7d 100644 --- a/exercises/02_modules.ipynb +++ b/exercises/02_modules.ipynb @@ -7,7 +7,11 @@ "source": [ "# Exercise: Using modules and functions\n", "\n", - "In this exercise, you'll practice refactoring a single, monolithic script into a modular pipeline using functions and separate files. This is a key RAP best practice for reproducibility and maintainability.\n", + "In this exercise, you'll practice refactoring a single, monolithic script into a modular pipeline using functions and separate files. This is a key RAP best practice. Modules and functions ensure:\n", + "- Code is not duplicated unnecessarily\n", + "- The pipeline can be scaled up easily by using existing functions or adding new modules when needed\n", + "- Code is easy to navigate through the main.py file and appropriately named modules\n", + "- Code is easy to understand with comprehensive documentation for each function\n", "\n", "**Learning objectives:**\n", "- Understand the difference between monolithic and modular code\n", @@ -47,10 +51,11 @@ "- Convert the below into functions\n", "- Extend the function to impute missing height and weight using group means (e.g., by sex or diagnosis)\n", "- Add a new column to flag which values were imputed\n", + "- Document your modules with clear comments and docstrings\n", "\n", "**Hints:**\n", + "- Your function should be able to handle any column in `df`\n", "- Use clear function names and docstrings\n", - "- Use `groupby` and `transform` to calculate group means\n", "- Use boolean indexing to flag imputed values" ] }, @@ -125,11 +130,7 @@ "**Reflect:**\n", "- How does modular code improve reproducibility and maintainability?\n", "- What are the benefits of separating code into modules?\n", - "- How would you extend this pipeline to add new features or analyses?\n", - "\n", - "**Bonus:**\n", - "- Add unit tests for your new functions\n", - "- Document your modules with clear comments and docstrings\n" + "- How would you extend this pipeline to add new features or analyses?" ] }, { @@ -145,10 +146,7 @@ "- Write a function to plot missing values per column before the data is cleaned\n", "- Write a function to plot disease prevalence for each disease category over time after the data is cleaned\n", "- Add the visualisations to the output report\n", - "\n", - "**Bonus:**\n", - "- Save the charts to the outputs folder\n", - "- Customise chart formatting and colours\n" + "- Save the charts to the outputs folder\n" ] } ], diff --git a/exercises/03_config_files.ipynb b/exercises/03_config_files.ipynb index b5af1a1..5b71948 100644 --- a/exercises/03_config_files.ipynb +++ b/exercises/03_config_files.ipynb @@ -67,10 +67,7 @@ "\n", "**Reflect:**\n", "- How does using config files improve reproducibility and flexibility?\n", - "- What other parameters could you add to make your pipeline more configurable?\n", - "\n", - "**Bonus:**\n", - "- Add unit tests to check that your code correctly applies the config parameter" + "- What other parameters could you add to make your pipeline more configurable?" ] }, { diff --git a/requirements.txt b/requirements.txt index 292437e..64352b7 100644 --- a/requirements.txt +++ b/requirements.txt @@ -6,6 +6,8 @@ # This will install all required dependencies for your pipeline, unit tests, and pre-commit hooks. # Update this file whenever you add new packages to your project. # +# This is a simple form of dependency management. +# # Example requirements for RAP-compliant Python project kaleido==1.1.0 nbformat diff --git a/src/What is the src folder.md b/src/What is the src folder.md index b1fd0ee..9cb23ea 100644 --- a/src/What is the src folder.md +++ b/src/What is the src folder.md @@ -3,9 +3,10 @@ The `src` folder is used to store the main source code for your RAP (Reproducible Analytical Pipeline) or Python package project. It helps keep your code organised, separate from data, tests, and configuration files. ## Why is it important? -- Keeps code organised and easy to find -- Makes it easier to maintain, test, and extend your project -- Helps others understand your code structure +- Keeps all source code in one place, making it easier to find and manage as your project grows. +- Prevents code from getting mixed up with data, configuration, or test files, which improves reproducibility and reduces mistakes. +- Makes testing and automation simpler, as tools can target the src folder directly. +- Follows Python and RAP best practice, so your project structure is familiar and easier for others to understand and contribute to. ## Common types of files in `src` - **Main pipeline scripts** (e.g., `main.py`): Entry points for running your analysis diff --git a/tests/RAP Unit testing guide.md b/tests/RAP Unit testing guide.md index a5aa884..f0e4542 100644 --- a/tests/RAP Unit testing guide.md +++ b/tests/RAP Unit testing guide.md @@ -1,13 +1,13 @@ # RAP Unit testing guide ## What are unit tests? -Unit tests are small, automated tests that check individual pieces of code (functions, classes, modules) to ensure they work as expected. +Unit tests are small, automated tests that check individual pieces of code (functions and classes) to ensure they work as expected. For examples of written tests, look through the scripts in the `tests` folder of the pipeline. More information on unit tests can be found in the QA for RAP learning resource [add link]. ## Why unit tests matter in RAP - **Reproducibility:** Tests ensure code produces the same results every time. -- **Automation:** Tests run automatically, saving time and reducing manual checking. +- **Automation:** Tests can be run automatically using GitHub workflows (see `.github` folder for more information), saving time and reducing manual checking. - **Transparency:** Tests document what your code is supposed to do, making it easier to review and maintain. - **Quality:** Tests catch bugs and edge cases before code is used in production or shared with others. @@ -50,6 +50,8 @@ in the QA for RAP learning resource [add link]. ``` - After adding tests, run them to check everything works as expected. +For exercises on creating unit tests, go to the `04_unit_tests.ipynb` notebook in `exercises/` + ## Practical tips - Add tests for every new function or module you create. - Use comments and docstrings to explain what each test does. From d62996f709f7c6c905e2d02f5a5864b1754a53ab Mon Sep 17 00:00:00 2001 From: alex-westwood <156091267+alex-westwood@users.noreply.github.com> Date: Wed, 17 Dec 2025 09:13:37 +0000 Subject: [PATCH 2/3] remove line --- README.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/README.md b/README.md index c79fddf..5538066 100644 --- a/README.md +++ b/README.md @@ -79,8 +79,6 @@ All exercises for RAP learning are in the `exercises/` folder. These are not par - Set up and customize pre-commit hooks - Apply RAP principles in real code -**Do not edit files in `src/` unless instructed by an exercise.** - ### Understanding the purpose of each file and folder Information about different files and folders can be found throughout the pipeline: From 0ae157808c0e0d9155ce95525e664645b1de4fe3 Mon Sep 17 00:00:00 2001 From: Alex Westwood Date: Wed, 17 Dec 2025 10:32:37 +0000 Subject: [PATCH 3/3] clarify forking and output file locations --- README.md | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index 5538066..39b086d 100644 --- a/README.md +++ b/README.md @@ -14,20 +14,22 @@ A well-written README makes your RAP project accessible and easy for others to u --> # Work in Progress - RAP demonstration repository for Python -Welcome to the RAP (Reproducible Analytical Pipeline) demonstration repository! This repository is designed for beginners to practice RAP principles, experiment with code, and learn best practices for reproducible, automated, and transparent analytical pipelines in Python. +Welcome to the RAP (Reproducible Analytical Pipeline) demonstration repository! This repository is designed for beginner to intermediate coders to practice RAP principles, experiment with code, and learn best practices for Reproducible Analytical Pipelines in Python. **This repository is still in development** ## Getting Started 1. **Fork the repository:** - - Go to the GitHub page for this repository. - - Click the "Fork" button in the top right to create your own copy. - - Clone your forked repository: + - Forking means creating your own copy of this project on GitHub. Go to the [GitHub page](https://github.com/ONSdigital/python_rap_demo) for this repository (if you are not there already) and click the "Fork" button in the top right. + - After forking, go to your new repository (it will be at `https://github.com//python_rap_demo`). + - Click the green "Code" button and copy the URL shown under "Clone". + - Open a terminal (Command Prompt) and run: ```cmd git clone https://github.com//python_rap_demo.git cd python_rap_demo ``` + - **Tip:** To check you are in the project root, run `dir` and make sure you see files like `README.md` and folders like `src` and `data`. 2. **Set up your environment:** - Create and activate a virtual environment: @@ -62,7 +64,8 @@ This will: - Load configuration from user_config.yaml - Read input data from health_data.csv - Clean and process the data -- Write outputs and generate a markdown report in outputs +- Write the cleaned data to `data/outputs/cleaned/health_data_cleaned.csv` +- Write outputs and generate a markdown report in `data/outputs/reports/` - You should see a message confirming the report was generated. Explore the existing code and add your own to the `src/` folder. @@ -76,14 +79,14 @@ All exercises for RAP learning are in the `exercises/` folder. These are not par - Add new modules - Use config files - Write unit tests - - Set up and customize pre-commit hooks + - Set up and customise pre-commit hooks - Apply RAP principles in real code ### Understanding the purpose of each file and folder Information about different files and folders can be found throughout the pipeline: - Files: Contain information on what they are and what they are used for in a RAP in the file itself, except .secrets.baseline. .secrets.baseline information can be found in the `docs` folder - - Folders: Contain a README to explain what the folder is for and typical files it contains + - Folders: Contain a markdown (.md) file to explain what the folder is for and typical files it contains. - Scripts: Fully documented with docstrings and comments. ### Create and run tests @@ -95,10 +98,6 @@ Run tests with: pytest tests ``` -## Contributing - -This repo is for learning and experimentation. If you want to contribute improvements, please read `CONTRIBUTING.md`. - ## AI declaration AI has been used in the production of this content.