Welcome to the RAP (Reproducible Analytical Pipeline) demonstration repository! This repository is designed for beginner to intermediate coders to practice RAP principles, experiment with code, and learn best practices for Reproducible Analytical Pipelines in Python.
See the Reproducible Analytical Pipelines materials on the Analysis for Action platform for more information about RAPs and their importance.
This repository is still in development
-
Fork the repository:
- Forking means creating your own copy of this project on GitHub. Go to the GitHub page for this repository (if you are not there already) and click the "Fork" button in the top right.
- After forking, go to your new repository (it will be at
https://github.com/<your-username>/python_rap_demo). - Click the green "Code" button and copy the URL shown under "Clone".
- Open a terminal (Command Prompt) and run:
git clone https://github.com/<your-username>/python_rap_demo.git cd python_rap_demo
- Tip: To check you are in the project root, run
dirand make sure you see files likeREADME.mdand folders likesrcanddata.
-
Set up your environment:
- Create and activate a virtual environment:
python -m venv .venv .venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Create and activate a virtual environment:
src/— Main pipeline code and modulesdata/— Example health data for analysisconfig/— Configuration files (YAML)reports/— Graphs and reportstests/— Unit tests for pipeline modulesexercises/— Practice exercises (see below)docs/— Documentation
To run the main RAP pipeline, open a terminal in your project root and enter:
src/main.pyThis will:
- Load configuration from user_config.yaml
- Read input data from health_data.csv
- Clean and process the data
- Write the cleaned data to
data/outputs/cleaned/health_data_cleaned.csv - Write outputs and generate a markdown report in
data/outputs/reports/ - You should see a message confirming the report was generated.
Explore the existing code and add your own to the src/ folder.
All exercises for RAP learning are in the exercises/ folder. These are not part of the main pipeline, but are for practice and experimentation.
- Each exercise has its own subfolder and README with instructions.
- Work through exercises to learn how to:
- Add new modules
- Use config files
- Write unit tests
- Set up and customise pre-commit hooks
- Apply RAP principles in real code
Information about different files and folders can be found throughout the pipeline:
- Files: Contain information on what they are and what they are used for in a RAP in the file itself, except .secrets.baseline. .secrets.baseline information can be found in the
docsfolder - Folders: Contain a markdown (.md) file to explain what the folder is for and typical files it contains.
- Scripts: Fully documented with docstrings and comments.
Test your functions by adding tests to the tests/ folder.
Run tests with:
pytest testsIf you encounter issues:
- Ensure your virtual environment is activated. The terminal prompt should show
(.venv)at the start. - Check that all dependencies are installed by running
pip install -r requirements.txt. - Verify you are in the project root directory when running commands. The terminal should show the path ending with
python_rap_demo. - For exercise notebooks, clean outputs and restart the kernel if you face issues.
AI has been used in the production of this content.
Happy RAP coding!