igerber · igerber · Jan 17, 2026 · Jan 17, 2026 · Jan 17, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -148,6 +148,16 @@ pytest tests/test_rust_backend.py -v
   - `run_all_placebo_tests()` - Comprehensive suite of diagnostics
   - `PlaceboTestResults` - Dataclass for test results
 
+- **`diff_diff/datasets.py`** - Real-world datasets for teaching and examples:
+  - `load_card_krueger()` - Card & Krueger (1994) minimum wage dataset (classic 2x2 DiD)
+  - `load_castle_doctrine()` - Castle Doctrine / Stand Your Ground laws (staggered adoption)
+  - `load_divorce_laws()` - Unilateral divorce laws (staggered adoption, Stevenson-Wolfers)
+  - `load_mpdta()` - Minimum wage panel data from R `did` package (Callaway-Sant'Anna example)
+  - `list_datasets()` - List available datasets with descriptions
+  - `load_dataset(name)` - Load dataset by name
+  - `clear_cache()` - Clear locally cached datasets
+  - Datasets are downloaded from public sources and cached locally
+
 - **`diff_diff/honest_did.py`** - Honest DiD sensitivity analysis (Rambachan & Roth 2023):
   - `HonestDiD` - Main class for computing bounds under parallel trends violations
   - `DeltaSD`, `DeltaRM`, `DeltaSDRM` - Restriction classes for smoothness and relative magnitudes
@@ -239,6 +249,7 @@ See `docs/performance-plan.md` for full optimization details and `docs/benchmark
   - `06_power_analysis.ipynb` - Power analysis for study design, MDE, simulation-based power
   - `07_pretrends_power.ipynb` - Pre-trends power analysis (Roth 2022), MDV, power curves
   - `08_triple_diff.ipynb` - Triple Difference (DDD) estimation with proper covariate handling
+  - `09_real_world_examples.ipynb` - Real-world data examples (Card-Krueger, Castle Doctrine, Divorce Laws)
 
 ### Benchmarks
 
@@ -281,6 +292,7 @@ Tests mirror the source modules:
 - `tests/test_honest_did.py` - Tests for Honest DiD sensitivity analysis
 - `tests/test_power.py` - Tests for power analysis
 - `tests/test_pretrends.py` - Tests for pre-trends power analysis
+- `tests/test_datasets.py` - Tests for dataset loading functions
 
 ### Dependencies
 

diff --git a/TODO.md b/TODO.md
@@ -66,7 +66,7 @@ Different estimators compute SEs differently. Consider unified interface.
 ## Documentation Improvements
 
 - [x] ~~Comparison of estimator outputs on same data~~ ✅ Done in `02_staggered_did.ipynb` (Section 13: Comparing CS and SA)
-- [ ] Real-world data examples (currently synthetic only)
+- [x] ~~Real-world data examples (currently synthetic only)~~ ✅ Added `datasets.py` module and `09_real_world_examples.ipynb` with Card-Krueger, Castle Doctrine, and Divorce Laws datasets
 
 ---
 

diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py
@@ -116,6 +116,15 @@
     plot_pretrends_power,
     plot_sensitivity,
 )
+from diff_diff.datasets import (
+    clear_cache,
+    list_datasets,
+    load_card_krueger,
+    load_castle_doctrine,
+    load_dataset,
+    load_divorce_laws,
+    load_mpdta,
+)
 
 __version__ = "2.0.3"
 __all__ = [
@@ -206,4 +215,12 @@
     # Linear algebra helpers
     "LinearRegression",
     "InferenceResult",
+    # Datasets
+    "load_card_krueger",
+    "load_castle_doctrine",
+    "load_divorce_laws",
+    "load_mpdta",
+    "load_dataset",
+    "list_datasets",
+    "clear_cache",
 ]