Skip to content

Learn RAP principles through practical Python exercises. This repo showcases best practices for Reproducible Analytical Pipelines, with descriptions of components and exercises to help users build confidence in applying RAP techniques.

License

Notifications You must be signed in to change notification settings

ONSdigital/python_rap_demo

Repository files navigation

Work in Progress - RAP demonstration repository for Python

Welcome to the RAP (Reproducible Analytical Pipeline) demonstration repository! This repository is designed for beginner to intermediate coders to practice RAP principles, experiment with code, and learn best practices for Reproducible Analytical Pipelines in Python.

See the Reproducible Analytical Pipelines materials on the Analysis for Action platform for more information about RAPs and their importance.

This repository is still in development

Getting Started

  1. Fork the repository:

    • Forking means creating your own copy of this project on GitHub. Go to the GitHub page for this repository (if you are not there already) and click the "Fork" button in the top right.
    • After forking, go to your new repository (it will be at https://github.com/<your-username>/python_rap_demo).
    • Click the green "Code" button and copy the URL shown under "Clone".
    • Open a terminal (Command Prompt) and run:
      git clone https://github.com/<your-username>/python_rap_demo.git
      cd python_rap_demo
    • Tip: To check you are in the project root, run dir and make sure you see files like README.md and folders like src and data.
  2. Set up your environment:

    • Create and activate a virtual environment:
      python -m venv .venv
      .venv\Scripts\activate
    • Install dependencies:
      pip install -r requirements.txt

Repository Structure

  • src/ — Main pipeline code and modules
  • data/ — Example health data for analysis
  • config/ — Configuration files (YAML)
  • reports/ — Graphs and reports
  • tests/ — Unit tests for pipeline modules
  • exercises/Practice exercises (see below)
  • docs/ — Documentation

Using the repository

Run the pipeline

To run the main RAP pipeline, open a terminal in your project root and enter:

src/main.py

This will:

  • Load configuration from user_config.yaml
  • Read input data from health_data.csv
  • Clean and process the data
  • Write the cleaned data to data/outputs/cleaned/health_data_cleaned.csv
  • Write outputs and generate a markdown report in data/outputs/reports/
  • You should see a message confirming the report was generated.

Explore the existing code and add your own to the src/ folder.

Practice with exercises

All exercises for RAP learning are in the exercises/ folder. These are not part of the main pipeline, but are for practice and experimentation.

  • Each exercise has its own subfolder and README with instructions.
  • Work through exercises to learn how to:
    • Add new modules
    • Use config files
    • Write unit tests
    • Set up and customise pre-commit hooks
    • Apply RAP principles in real code

Understanding the purpose of each file and folder

Information about different files and folders can be found throughout the pipeline:

  • Files: Contain information on what they are and what they are used for in a RAP in the file itself, except .secrets.baseline. .secrets.baseline information can be found in the docs folder
  • Folders: Contain a markdown (.md) file to explain what the folder is for and typical files it contains.
  • Scripts: Fully documented with docstrings and comments.

Create and run tests

Test your functions by adding tests to the tests/ folder.

Run tests with:

pytest tests

Troubleshooting

If you encounter issues:

  • Ensure your virtual environment is activated. The terminal prompt should show (.venv) at the start.
  • Check that all dependencies are installed by running pip install -r requirements.txt.
  • Verify you are in the project root directory when running commands. The terminal should show the path ending with python_rap_demo.
  • For exercise notebooks, clean outputs and restart the kernel if you face issues.

AI declaration

AI has been used in the production of this content.


Happy RAP coding!

About

Learn RAP principles through practical Python exercises. This repo showcases best practices for Reproducible Analytical Pipelines, with descriptions of components and exercises to help users build confidence in applying RAP techniques.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •