From daf7eac58e34983a5244477d17e7307945115fc3 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Fri, 16 Jan 2026 13:34:21 -0600 Subject: [PATCH] docs: fix broken .ipynb links in tutorial notebooks Change all .ipynb links to directory links (/) for mkdocs-jupyter compatibility. Mkdocs-jupyter converts notebooks to HTML and serves them as directories, so links must point to the directory path without the .ipynb extension. Fixed links in: - basics/01-first-pipeline.ipynb - basics/02-schema-design.ipynb - basics/03-data-entry.ipynb - basics/04-queries.ipynb - basics/06-object-storage.ipynb - examples/blob-detection.ipynb - examples/hotel-reservations.ipynb - examples/languages.ipynb --- src/tutorials/basics/01-first-pipeline.ipynb | 46 +------- src/tutorials/basics/02-schema-design.ipynb | 111 +----------------- src/tutorials/basics/03-data-entry.ipynb | 22 +--- src/tutorials/basics/04-queries.ipynb | 59 +--------- src/tutorials/basics/06-object-storage.ipynb | 57 +-------- src/tutorials/examples/blob-detection.ipynb | 24 +--- .../examples/hotel-reservations.ipynb | 21 +--- src/tutorials/examples/languages.ipynb | 26 +--- 8 files changed, 20 insertions(+), 346 deletions(-) diff --git a/src/tutorials/basics/01-first-pipeline.ipynb b/src/tutorials/basics/01-first-pipeline.ipynb index 22b6e5d5..2116c148 100644 --- a/src/tutorials/basics/01-first-pipeline.ipynb +++ b/src/tutorials/basics/01-first-pipeline.ipynb @@ -3,22 +3,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "# A Simple Pipeline\n", - "\n", - "This tutorial introduces DataJoint by building a simple research lab database. You'll learn to:\n", - "\n", - "- Define tables with primary keys and dependencies\n", - "- Insert and query data\n", - "- Use the four core operations: restriction, projection, join, aggregation\n", - "- Understand the schema diagram\n", - "\n", - "We'll work with **Manual tables** only—tables where you enter data directly. Later tutorials introduce automated computation.\n", - "\n", - "For complete working examples, see:\n", - "- [University Database](../examples/university.ipynb) — Academic records with complex queries\n", - "- [Blob Detection](../examples/blob-detection.ipynb) — Image processing with computation" - ] + "source": "# A Simple Pipeline\n\nThis tutorial introduces DataJoint by building a simple research lab database. You'll learn to:\n\n- Define tables with primary keys and dependencies\n- Insert and query data\n- Use the four core operations: restriction, projection, join, aggregation\n- Understand the schema diagram\n\nWe'll work with **Manual tables** only—tables where you enter data directly. Later tutorials introduce automated computation.\n\nFor complete working examples, see:\n- [University Database](../examples/university/) — Academic records with complex queries\n- [Blob Detection](../examples/blob-detection/) — Image processing with computation" }, { "cell_type": "markdown", @@ -2698,32 +2683,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "## Summary\n", - "\n", - "You've learned the fundamentals of DataJoint:\n", - "\n", - "| Concept | Description |\n", - "|---------|-------------|\n", - "| **Tables** | Python classes with a `definition` string |\n", - "| **Primary key** | Above `---`, uniquely identifies rows |\n", - "| **Dependencies** | `->` creates foreign keys |\n", - "| **Restriction** | `&` filters rows |\n", - "| **Projection** | `.proj()` selects/computes columns |\n", - "| **Join** | `*` combines tables |\n", - "| **Aggregation** | `.aggr()` summarizes groups |\n", - "\n", - "### Next Steps\n", - "\n", - "- [Schema Design](02-schema-design.ipynb) — Primary keys, relationships, table tiers\n", - "- [Queries](04-queries.ipynb) — Advanced query patterns\n", - "- [Computation](05-computation.ipynb) — Automated processing with Imported/Computed tables\n", - "\n", - "### Complete Examples\n", - "\n", - "- [University Database](../examples/university.ipynb) — Complex queries on academic records\n", - "- [Blob Detection](../examples/blob-detection.ipynb) — Image processing pipeline with computation" - ] + "source": "## Summary\n\nYou've learned the fundamentals of DataJoint:\n\n| Concept | Description |\n|---------|-------------|\n| **Tables** | Python classes with a `definition` string |\n| **Primary key** | Above `---`, uniquely identifies rows |\n| **Dependencies** | `->` creates foreign keys |\n| **Restriction** | `&` filters rows |\n| **Projection** | `.proj()` selects/computes columns |\n| **Join** | `*` combines tables |\n| **Aggregation** | `.aggr()` summarizes groups |\n\n### Next Steps\n\n- [Schema Design](02-schema-design/) — Primary keys, relationships, table tiers\n- [Queries](04-queries/) — Advanced query patterns\n- [Computation](05-computation/) — Automated processing with Imported/Computed tables\n\n### Complete Examples\n\n- [University Database](../examples/university/) — Complex queries on academic records\n- [Blob Detection](../examples/blob-detection/) — Image processing pipeline with computation" }, { "cell_type": "code", @@ -2764,4 +2724,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file diff --git a/src/tutorials/basics/02-schema-design.ipynb b/src/tutorials/basics/02-schema-design.ipynb index 831ca9e4..a2b7904b 100644 --- a/src/tutorials/basics/02-schema-design.ipynb +++ b/src/tutorials/basics/02-schema-design.ipynb @@ -1299,39 +1299,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "### Reading the Diagram\n", - "\n", - "DataJoint diagrams show tables as nodes and foreign keys as edges. The notation conveys relationship semantics at a glance.\n", - "\n", - "**Line Styles:**\n", - "\n", - "| Line | Style | Relationship | Meaning |\n", - "|------|-------|--------------|---------|\n", - "| ━━━ | Thick solid | Extension | FK **is** entire PK (one-to-one) |\n", - "| ─── | Thin solid | Containment | FK **in** PK with other fields (one-to-many) |\n", - "| ┄┄┄ | Dashed | Reference | FK in secondary attributes (one-to-many) |\n", - "\n", - "**Visual Indicators:**\n", - "\n", - "| Indicator | Meaning |\n", - "|-----------|---------|\n", - "| **Underlined name** | Introduces new dimension (new PK attributes) |\n", - "| Non-underlined name | Inherits all dimensions (PK entirely from FKs) |\n", - "| **Green** | Manual table |\n", - "| **Gray** | Lookup table |\n", - "| **Red** | Computed table |\n", - "| **Blue** | Imported table |\n", - "| **Orange dots** | Renamed foreign keys (via `.proj()`) |\n", - "\n", - "**Key principle:** Solid lines mean the parent's identity becomes part of the child's identity. Dashed lines mean the child maintains independent identity.\n", - "\n", - "**Note:** Diagrams do NOT show `[nullable]` or `[unique]` modifiers—check table definitions for these constraints.\n", - "\n", - "See [How to Read Diagrams](../../how-to/read-diagrams.ipynb) for diagram operations and comparison to ER notation.\n", - "\n", - "## Insert Test Data and Populate" - ] + "source": "### Reading the Diagram\n\nDataJoint diagrams show tables as nodes and foreign keys as edges. The notation conveys relationship semantics at a glance.\n\n**Line Styles:**\n\n| Line | Style | Relationship | Meaning |\n|------|-------|--------------|---------|\n| ━━━ | Thick solid | Extension | FK **is** entire PK (one-to-one) |\n| ─── | Thin solid | Containment | FK **in** PK with other fields (one-to-many) |\n| ┄┄┄ | Dashed | Reference | FK in secondary attributes (one-to-many) |\n\n**Visual Indicators:**\n\n| Indicator | Meaning |\n|-----------|---------|\n| **Underlined name** | Introduces new dimension (new PK attributes) |\n| Non-underlined name | Inherits all dimensions (PK entirely from FKs) |\n| **Green** | Manual table |\n| **Gray** | Lookup table |\n| **Red** | Computed table |\n| **Blue** | Imported table |\n| **Orange dots** | Renamed foreign keys (via `.proj()`) |\n\n**Key principle:** Solid lines mean the parent's identity becomes part of the child's identity. Dashed lines mean the child maintains independent identity.\n\n**Note:** Diagrams do NOT show `[nullable]` or `[unique]` modifiers—check table definitions for these constraints.\n\nSee [How to Read Diagrams](../../how-to/read-diagrams/) for diagram operations and comparison to ER notation.\n\n## Insert Test Data and Populate" }, { "cell_type": "code", @@ -1562,80 +1530,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "## Best Practices\n", - "\n", - "### 1. Choose Meaningful Primary Keys\n", - "- Use natural identifiers when possible (`subject_id = 'M001'`)\n", - "- Keep keys minimal but sufficient for uniqueness\n", - "\n", - "### 2. Use Appropriate Table Tiers\n", - "- **Manual**: Data entered by operators or instruments\n", - "- **Lookup**: Configuration, parameters, reference data\n", - "- **Imported**: Data read from files (recordings, images)\n", - "- **Computed**: Derived analyses and summaries\n", - "\n", - "### 3. Normalize Your Data\n", - "- Don't repeat information across rows\n", - "- Create separate tables for distinct entities\n", - "- Use foreign keys to link related data\n", - "\n", - "### 4. Use Core DataJoint Types\n", - "\n", - "DataJoint has a three-layer type architecture (see [Type System Specification](../reference/specs/type-system.md)):\n", - "\n", - "1. **Native database types** (Layer 1): Backend-specific types like `INT`, `FLOAT`, `TINYINT UNSIGNED`. These are **discouraged** but allowed for backward compatibility.\n", - "\n", - "2. **Core DataJoint types** (Layer 2): Standardized, scientist-friendly types that work identically across MySQL and PostgreSQL. **Always prefer these.**\n", - "\n", - "3. **Codec types** (Layer 3): Types with `encode()`/`decode()` semantics like ``, ``, ``.\n", - "\n", - "**Core types used in this tutorial:**\n", - "\n", - "| Type | Description | Example |\n", - "|------|-------------|---------|\n", - "| `uint8`, `uint16`, `int32` | Sized integers | `session_idx : uint16` |\n", - "| `float32`, `float64` | Sized floats | `reaction_time : float32` |\n", - "| `varchar(n)` | Variable-length string | `name : varchar(100)` |\n", - "| `bool` | Boolean | `correct : bool` |\n", - "| `date` | Date only | `date_of_birth : date` |\n", - "| `datetime` | Date and time (UTC) | `created_at : datetime` |\n", - "| `enum(...)` | Enumeration | `sex : enum('M', 'F', 'U')` |\n", - "| `json` | JSON document | `task_params : json` |\n", - "| `uuid` | Universally unique ID | `experimenter_id : uuid` |\n", - "\n", - "**Why native types are allowed but discouraged:**\n", - "\n", - "Native types (like `int`, `float`, `tinyint`) are passed through to the database but generate a **warning at declaration time**. They are discouraged because:\n", - "- They lack explicit size information\n", - "- They are not portable across database backends\n", - "- They are not recorded in field metadata for reconstruction\n", - "\n", - "If you see a warning like `\"Native type 'int' used; consider 'int32' instead\"`, update your definition to use the corresponding core type.\n", - "\n", - "### 5. Document Your Tables\n", - "- Add comments after `#` in definitions\n", - "- Document units in attribute comments\n", - "\n", - "## Key Concepts Recap\n", - "\n", - "| Concept | Description |\n", - "|---------|-------------|\n", - "| **Primary Key** | Attributes above `---` that uniquely identify rows |\n", - "| **Secondary Attributes** | Attributes below `---` that store additional data |\n", - "| **Foreign Key** (`->`) | Reference to another table, imports its primary key |\n", - "| **One-to-Many** | FK in primary key: parent has many children |\n", - "| **One-to-One** | FK is entire primary key: exactly one child per parent |\n", - "| **Master-Part** | Compositional integrity: master and parts inserted/deleted atomically |\n", - "| **Nullable FK** | `[nullable]` makes the reference optional |\n", - "| **Lookup Table** | Pre-populated reference data |\n", - "\n", - "## Next Steps\n", - "\n", - "- [Data Entry](03-data-entry.ipynb) — Inserting, updating, and deleting data\n", - "- [Queries](04-queries.ipynb) — Filtering, joining, and projecting\n", - "- [Computation](05-computation.ipynb) — Building computational pipelines" - ] + "source": "## Best Practices\n\n### 1. Choose Meaningful Primary Keys\n- Use natural identifiers when possible (`subject_id = 'M001'`)\n- Keep keys minimal but sufficient for uniqueness\n\n### 2. Use Appropriate Table Tiers\n- **Manual**: Data entered by operators or instruments\n- **Lookup**: Configuration, parameters, reference data\n- **Imported**: Data read from files (recordings, images)\n- **Computed**: Derived analyses and summaries\n\n### 3. Normalize Your Data\n- Don't repeat information across rows\n- Create separate tables for distinct entities\n- Use foreign keys to link related data\n\n### 4. Use Core DataJoint Types\n\nDataJoint has a three-layer type architecture (see [Type System Specification](../../reference/specs/type-system/)):\n\n1. **Native database types** (Layer 1): Backend-specific types like `INT`, `FLOAT`, `TINYINT UNSIGNED`. These are **discouraged** but allowed for backward compatibility.\n\n2. **Core DataJoint types** (Layer 2): Standardized, scientist-friendly types that work identically across MySQL and PostgreSQL. **Always prefer these.**\n\n3. **Codec types** (Layer 3): Types with `encode()`/`decode()` semantics like ``, ``, ``.\n\n**Core types used in this tutorial:**\n\n| Type | Description | Example |\n|------|-------------|---------|\n| `uint8`, `uint16`, `int32` | Sized integers | `session_idx : uint16` |\n| `float32`, `float64` | Sized floats | `reaction_time : float32` |\n| `varchar(n)` | Variable-length string | `name : varchar(100)` |\n| `bool` | Boolean | `correct : bool` |\n| `date` | Date only | `date_of_birth : date` |\n| `datetime` | Date and time (UTC) | `created_at : datetime` |\n| `enum(...)` | Enumeration | `sex : enum('M', 'F', 'U')` |\n| `json` | JSON document | `task_params : json` |\n| `uuid` | Universally unique ID | `experimenter_id : uuid` |\n\n**Why native types are allowed but discouraged:**\n\nNative types (like `int`, `float`, `tinyint`) are passed through to the database but generate a **warning at declaration time**. They are discouraged because:\n- They lack explicit size information\n- They are not portable across database backends\n- They are not recorded in field metadata for reconstruction\n\nIf you see a warning like `\"Native type 'int' used; consider 'int32' instead\"`, update your definition to use the corresponding core type.\n\n### 5. Document Your Tables\n- Add comments after `#` in definitions\n- Document units in attribute comments\n\n## Key Concepts Recap\n\n| Concept | Description |\n|---------|-------------|\n| **Primary Key** | Attributes above `---` that uniquely identify rows |\n| **Secondary Attributes** | Attributes below `---` that store additional data |\n| **Foreign Key** (`->`) | Reference to another table, imports its primary key |\n| **One-to-Many** | FK in primary key: parent has many children |\n| **One-to-One** | FK is entire primary key: exactly one child per parent |\n| **Master-Part** | Compositional integrity: master and parts inserted/deleted atomically |\n| **Nullable FK** | `[nullable]` makes the reference optional |\n| **Lookup Table** | Pre-populated reference data |\n\n## Next Steps\n\n- [Data Entry](03-data-entry/) — Inserting, updating, and deleting data\n- [Queries](04-queries/) — Filtering, joining, and projecting\n- [Computation](05-computation/) — Building computational pipelines" }, { "cell_type": "code", @@ -1676,4 +1571,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file diff --git a/src/tutorials/basics/03-data-entry.ipynb b/src/tutorials/basics/03-data-entry.ipynb index 6838804e..4b76ae72 100644 --- a/src/tutorials/basics/03-data-entry.ipynb +++ b/src/tutorials/basics/03-data-entry.ipynb @@ -1588,25 +1588,7 @@ "cell_type": "markdown", "id": "cell-42", "metadata": {}, - "source": [ - "## Quick Reference\n", - "\n", - "| Operation | Method | Use Case |\n", - "|-----------|--------|----------|\n", - "| Insert one | `insert1(row)` | Adding single entity |\n", - "| Insert many | `insert(rows)` | Bulk data loading |\n", - "| Update one | `update1(row)` | Surgical corrections only |\n", - "| Delete | `delete()` | Removing entities (cascades) |\n", - "| Delete quick | `delete_quick()` | Internal cleanup (no cascade) |\n", - "| Validate | `validate(rows)` | Pre-insert check |\n", - "\n", - "See the [Data Manipulation Specification](../reference/specs/data-manipulation.md) for complete details.\n", - "\n", - "## Next Steps\n", - "\n", - "- [Queries](04-queries.ipynb) — Filtering, joining, and projecting data\n", - "- [Computation](05-computation.ipynb) — Building computational pipelines" - ] + "source": "## Quick Reference\n\n| Operation | Method | Use Case |\n|-----------|--------|----------|\n| Insert one | `insert1(row)` | Adding single entity |\n| Insert many | `insert(rows)` | Bulk data loading |\n| Update one | `update1(row)` | Surgical corrections only |\n| Delete | `delete()` | Removing entities (cascades) |\n| Delete quick | `delete_quick()` | Internal cleanup (no cascade) |\n| Validate | `validate(rows)` | Pre-insert check |\n\nSee the [Data Manipulation Specification](../../reference/specs/data-manipulation/) for complete details.\n\n## Next Steps\n\n- [Queries](04-queries/) — Filtering, joining, and projecting data\n- [Computation](05-computation/) — Building computational pipelines" }, { "cell_type": "code", @@ -1648,4 +1630,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file diff --git a/src/tutorials/basics/04-queries.ipynb b/src/tutorials/basics/04-queries.ipynb index e5fad587..12d4fcdf 100644 --- a/src/tutorials/basics/04-queries.ipynb +++ b/src/tutorials/basics/04-queries.ipynb @@ -3991,28 +3991,7 @@ "cell_type": "markdown", "id": "tt5h1lmim2", "metadata": {}, - "source": [ - "### Primary Keys in Join Results\n", - "\n", - "Every query result has a valid primary key. For joins, the result's primary key depends on **functional dependencies** between the operands:\n", - "\n", - "| Condition | Result Primary Key |\n", - "|-----------|-------------------|\n", - "| `A → B` (A determines B) | PK(A) |\n", - "| `B → A` (B determines A) | PK(B) |\n", - "| Both | PK(A) |\n", - "| Neither | PK(A) ∪ PK(B) |\n", - "\n", - "**\"A determines B\"** means all of B's primary key attributes exist in A (as primary or secondary attributes).\n", - "\n", - "In our example:\n", - "- `Session` has PK: `(subject_id, session_idx)`\n", - "- `Trial` has PK: `(subject_id, session_idx, trial_idx)`\n", - "\n", - "Since Session's PK is a subset of Trial's PK, `Session → Trial`. The join `Session * Trial` has the same primary key as Session.\n", - "\n", - "See the [Query Algebra Specification](../reference/specs/query-algebra.md) for the complete functional dependency rules." - ] + "source": "### Primary Keys in Join Results\n\nEvery query result has a valid primary key. For joins, the result's primary key depends on **functional dependencies** between the operands:\n\n| Condition | Result Primary Key |\n|-----------|-------------------|\n| `A → B` (A determines B) | PK(A) |\n| `B → A` (B determines A) | PK(B) |\n| Both | PK(A) |\n| Neither | PK(A) ∪ PK(B) |\n\n**\"A determines B\"** means all of B's primary key attributes exist in A (as primary or secondary attributes).\n\nIn our example:\n- `Session` has PK: `(subject_id, session_idx)`\n- `Trial` has PK: `(subject_id, session_idx, trial_idx)`\n\nSince Session's PK is a subset of Trial's PK, `Session → Trial`. The join `Session * Trial` has the same primary key as Session.\n\nSee the [Query Algebra Specification](../../reference/specs/query-algebra/) for the complete functional dependency rules." }, { "cell_type": "markdown", @@ -6400,39 +6379,7 @@ "cell_type": "markdown", "id": "cell-63", "metadata": {}, - "source": [ - "## Quick Reference\n", - "\n", - "### Operators\n", - "\n", - "| Operation | Syntax | Description |\n", - "|-----------|--------|-------------|\n", - "| Restrict | `A & cond` | Select matching rows |\n", - "| Anti-restrict | `A - cond` | Select non-matching rows |\n", - "| Top | `A & dj.Top(limit, order_by)` | Limit/order results |\n", - "| Project | `A.proj(...)` | Select/compute columns |\n", - "| Join | `A * B` | Combine tables |\n", - "| Extend | `A.extend(B)` | Add B's attributes, keep all A rows |\n", - "| Aggregate | `A.aggr(B, ...)` | Group and summarize |\n", - "| Union | `A + B` | Combine entity sets |\n", - "\n", - "### Fetch Methods\n", - "\n", - "| Method | Returns | Use Case |\n", - "|--------|---------|----------|\n", - "| `to_dicts()` | `list[dict]` | JSON, iteration |\n", - "| `to_pandas()` | `DataFrame` | Data analysis |\n", - "| `to_arrays()` | `np.ndarray` | Numeric computation |\n", - "| `to_arrays('a', 'b')` | `tuple[array, ...]` | Specific columns |\n", - "| `keys()` | `list[dict]` | Primary keys |\n", - "| `fetch1()` | `dict` | Single row |\n", - "\n", - "See the [Query Algebra Specification](../reference/specs/query-algebra.md) and [Fetch API](../reference/specs/fetch-api.md) for complete details.\n", - "\n", - "## Next Steps\n", - "\n", - "- [Computation](05-computation.ipynb) — Building computational pipelines" - ] + "source": "## Quick Reference\n\n### Operators\n\n| Operation | Syntax | Description |\n|-----------|--------|-------------|\n| Restrict | `A & cond` | Select matching rows |\n| Anti-restrict | `A - cond` | Select non-matching rows |\n| Top | `A & dj.Top(limit, order_by)` | Limit/order results |\n| Project | `A.proj(...)` | Select/compute columns |\n| Join | `A * B` | Combine tables |\n| Extend | `A.extend(B)` | Add B's attributes, keep all A rows |\n| Aggregate | `A.aggr(B, ...)` | Group and summarize |\n| Union | `A + B` | Combine entity sets |\n\n### Fetch Methods\n\n| Method | Returns | Use Case |\n|--------|---------|----------|\n| `to_dicts()` | `list[dict]` | JSON, iteration |\n| `to_pandas()` | `DataFrame` | Data analysis |\n| `to_arrays()` | `np.ndarray` | Numeric computation |\n| `to_arrays('a', 'b')` | `tuple[array, ...]` | Specific columns |\n| `keys()` | `list[dict]` | Primary keys |\n| `fetch1()` | `dict` | Single row |\n\nSee the [Query Algebra Specification](../../reference/specs/query-algebra/) and [Fetch API](../../reference/specs/fetch-api/) for complete details.\n\n## Next Steps\n\n- [Computation](05-computation/) — Building computational pipelines" }, { "cell_type": "code", @@ -6474,4 +6421,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file diff --git a/src/tutorials/basics/06-object-storage.ipynb b/src/tutorials/basics/06-object-storage.ipynb index 59cdde0a..af05d260 100644 --- a/src/tutorials/basics/06-object-storage.ipynb +++ b/src/tutorials/basics/06-object-storage.ipynb @@ -1706,64 +1706,13 @@ "cell_type": "markdown", "id": "wou1v0xdbyj", "metadata": {}, - "source": [ - "## Garbage Collection\n", - "\n", - "Hash-addressed storage (``, ``, ``) uses deduplication—identical content is stored once. This means deleting a row doesn't automatically delete the stored content, since other rows might reference it.\n", - "\n", - "Use garbage collection to clean up orphaned content:\n", - "\n", - "```python\n", - "import datajoint as dj\n", - "\n", - "# Preview what would be deleted (dry run)\n", - "stats = dj.gc.collect(dry_run=True)\n", - "print(f\"Orphaned items: {stats['orphaned']}\")\n", - "print(f\"Space to reclaim: {stats['orphaned_bytes'] / 1e6:.1f} MB\")\n", - "\n", - "# Actually delete orphaned content\n", - "stats = dj.gc.collect()\n", - "print(f\"Deleted: {stats['deleted']} items\")\n", - "```\n", - "\n", - "### When to Run Garbage Collection\n", - "\n", - "- **After bulk deletions** — Clean up storage after removing many rows\n", - "- **Periodically** — Schedule weekly/monthly cleanup jobs\n", - "- **Before archiving** — Reclaim space before backups\n", - "\n", - "### Key Points\n", - "\n", - "- GC only affects hash-addressed types (``, ``, ``)\n", - "- Schema-addressed types (``, ``) are deleted with their rows\n", - "- Always use `dry_run=True` first to preview changes\n", - "- GC is safe—it only deletes content with zero references\n", - "\n", - "See [Clean Up Storage](../how-to/garbage-collection.md) for detailed usage." - ] + "source": "## Garbage Collection\n\nHash-addressed storage (``, ``, ``) uses deduplication—identical content is stored once. This means deleting a row doesn't automatically delete the stored content, since other rows might reference it.\n\nUse garbage collection to clean up orphaned content:\n\n```python\nimport datajoint as dj\n\n# Preview what would be deleted (dry run)\nstats = dj.gc.collect(dry_run=True)\nprint(f\"Orphaned items: {stats['orphaned']}\")\nprint(f\"Space to reclaim: {stats['orphaned_bytes'] / 1e6:.1f} MB\")\n\n# Actually delete orphaned content\nstats = dj.gc.collect()\nprint(f\"Deleted: {stats['deleted']} items\")\n```\n\n### When to Run Garbage Collection\n\n- **After bulk deletions** — Clean up storage after removing many rows\n- **Periodically** — Schedule weekly/monthly cleanup jobs\n- **Before archiving** — Reclaim space before backups\n\n### Key Points\n\n- GC only affects hash-addressed types (``, ``, ``)\n- Schema-addressed types (``, ``) are deleted with their rows\n- Always use `dry_run=True` first to preview changes\n- GC is safe—it only deletes content with zero references\n\nSee [Clean Up Storage](../../how-to/garbage-collection/) for detailed usage." }, { "cell_type": "markdown", "id": "cell-32", "metadata": {}, - "source": [ - "## Quick Reference\n", - "\n", - "| Pattern | Use Case |\n", - "|---------|----------|\n", - "| `` | Small Python objects |\n", - "| `` | Large arrays with deduplication |\n", - "| `` | Large arrays in specific store |\n", - "| `` | Files preserving names |\n", - "| `` | Schema-addressed data (Zarr, HDF5) |\n", - "\n", - "## Next Steps\n", - "\n", - "- [Configure Object Storage](../how-to/configure-storage.md) — Set up S3, MinIO, or filesystem stores\n", - "- [Clean Up Storage](../how-to/garbage-collection.md) — Garbage collection for hash-addressed storage\n", - "- [Custom Codecs](advanced/custom-codecs.ipynb) — Define domain-specific types\n", - "- [Manage Large Data](../how-to/manage-large-data.md) — Performance optimization" - ] + "source": "## Quick Reference\n\n| Pattern | Use Case |\n|---------|----------|\n| `` | Small Python objects |\n| `` | Large arrays with deduplication |\n| `` | Large arrays in specific store |\n| `` | Files preserving names |\n| `` | Schema-addressed data (Zarr, HDF5) |\n\n## Next Steps\n\n- [Configure Object Storage](../../how-to/configure-storage/) — Set up S3, MinIO, or filesystem stores\n- [Clean Up Storage](../../how-to/garbage-collection/) — Garbage collection for hash-addressed storage\n- [Custom Codecs](../../advanced/custom-codecs/) — Define domain-specific types\n- [Manage Large Data](../../how-to/manage-large-data/) — Performance optimization" }, { "cell_type": "code", @@ -1807,4 +1756,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file diff --git a/src/tutorials/examples/blob-detection.ipynb b/src/tutorials/examples/blob-detection.ipynb index 5c9dd268..9f5b06d1 100644 --- a/src/tutorials/examples/blob-detection.ipynb +++ b/src/tutorials/examples/blob-detection.ipynb @@ -1589,27 +1589,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "## Key Concepts Recap\n", - "\n", - "| Concept | What It Does | Example |\n", - "|---------|--------------|--------|\n", - "| **Schema** | Groups related tables | `schema = dj.Schema('tutorial_blobs')` |\n", - "| **Manual Table** | Stores user-entered data | `Image`, `SelectedDetection` |\n", - "| **Lookup Table** | Stores reference/config data | `DetectionParams` |\n", - "| **Computed Table** | Derives data automatically | `Detection` |\n", - "| **Part Table** | Stores detailed results with master | `Detection.Blob` |\n", - "| **Foreign Key** (`->`) | Creates dependency | `-> Image` |\n", - "| **`populate()`** | Runs pending computations | `Detection.populate()` |\n", - "| **Restriction** (`&`) | Filters rows | `Detection & 'num_blobs < 300'` |\n", - "| **Join** (`*`) | Combines tables | `Image * Detection` |\n", - "\n", - "## Next Steps\n", - "\n", - "- [Schema Design](02-schema-design.ipynb) — Learn table types and relationships in depth\n", - "- [Queries](04-queries.ipynb) — Master DataJoint's query operators\n", - "- [Computation](05-computation.ipynb) — Build complex computational workflows" - ] + "source": "## Key Concepts Recap\n\n| Concept | What It Does | Example |\n|---------|--------------|--------|\n| **Schema** | Groups related tables | `schema = dj.Schema('tutorial_blobs')` |\n| **Manual Table** | Stores user-entered data | `Image`, `SelectedDetection` |\n| **Lookup Table** | Stores reference/config data | `DetectionParams` |\n| **Computed Table** | Derives data automatically | `Detection` |\n| **Part Table** | Stores detailed results with master | `Detection.Blob` |\n| **Foreign Key** (`->`) | Creates dependency | `-> Image` |\n| **`populate()`** | Runs pending computations | `Detection.populate()` |\n| **Restriction** (`&`) | Filters rows | `Detection & 'num_blobs < 300'` |\n| **Join** (`*`) | Combines tables | `Image * Detection` |\n\n## Next Steps\n\n- [Schema Design](../basics/02-schema-design/) — Learn table types and relationships in depth\n- [Queries](../basics/04-queries/) — Master DataJoint's query operators\n- [Computation](../basics/05-computation/) — Build complex computational workflows" }, { "cell_type": "code", @@ -1650,4 +1630,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file diff --git a/src/tutorials/examples/hotel-reservations.ipynb b/src/tutorials/examples/hotel-reservations.ipynb index d789731d..2f054590 100644 --- a/src/tutorials/examples/hotel-reservations.ipynb +++ b/src/tutorials/examples/hotel-reservations.ipynb @@ -1913,24 +1913,7 @@ "cell_type": "markdown", "id": "cell-summary-md", "metadata": {}, - "source": [ - "## Key Concepts\n", - "\n", - "| Concept | How It's Used |\n", - "|---------|---------------|\n", - "| **Workflow Dependencies** | `CheckOut -> CheckIn -> Reservation -> RoomAvailable` |\n", - "| **Unique Constraints** | One reservation per room/night (primary key) |\n", - "| **Referential Integrity** | Can't reserve unavailable room, can't check in without reservation |\n", - "| **Error Translation** | Database exceptions → domain-specific errors |\n", - "\n", - "The schema **is** the business logic. Application code just translates errors.\n", - "\n", - "## Next Steps\n", - "\n", - "- [University Database](university.ipynb) — Academic records with many-to-many relationships\n", - "- [Languages & Proficiency](languages.ipynb) — International standards and lookup tables\n", - "- [Data Entry](../basics/03-data-entry.ipynb) — Insert patterns and transactions" - ] + "source": "## Key Concepts\n\n| Concept | How It's Used |\n|---------|---------------|\n| **Workflow Dependencies** | `CheckOut -> CheckIn -> Reservation -> RoomAvailable` |\n| **Unique Constraints** | One reservation per room/night (primary key) |\n| **Referential Integrity** | Can't reserve unavailable room, can't check in without reservation |\n| **Error Translation** | Database exceptions → domain-specific errors |\n\nThe schema **is** the business logic. Application code just translates errors.\n\n## Next Steps\n\n- [University Database](university/) — Academic records with many-to-many relationships\n- [Languages & Proficiency](languages/) — International standards and lookup tables\n- [Data Entry](../basics/03-data-entry/) — Insert patterns and transactions" }, { "cell_type": "code", @@ -1972,4 +1955,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file diff --git a/src/tutorials/examples/languages.ipynb b/src/tutorials/examples/languages.ipynb index 6dfedfe7..0db7d019 100644 --- a/src/tutorials/examples/languages.ipynb +++ b/src/tutorials/examples/languages.ipynb @@ -2267,29 +2267,7 @@ "cell_type": "markdown", "id": "cell-summary-md", "metadata": {}, - "source": [ - "## Key Concepts\n", - "\n", - "| Pattern | Implementation |\n", - "|---------|----------------|\n", - "| **Many-to-many** | `Proficiency` links `Person` and `Language` |\n", - "| **Lookup tables** | `Language` and `CEFRLevel` with `contents` |\n", - "| **Association data** | `cefr_level` stored in the association table |\n", - "| **Standards** | ISO 639-1 codes, CEFR levels |\n", - "\n", - "### Benefits of Lookup Tables\n", - "\n", - "1. **Data consistency** — Only valid codes can be used\n", - "2. **Rich metadata** — Full names, descriptions stored once\n", - "3. **Easy updates** — Change \"Español\" to \"Spanish\" in one place\n", - "4. **Self-documenting** — `Language()` shows all valid options\n", - "\n", - "## Next Steps\n", - "\n", - "- [University Database](university.ipynb) — Academic records\n", - "- [Hotel Reservations](hotel-reservations.ipynb) — Workflow dependencies\n", - "- [Queries Tutorial](../basics/04-queries.ipynb) — Query operators in depth" - ] + "source": "## Key Concepts\n\n| Pattern | Implementation |\n|---------|----------------|\n| **Many-to-many** | `Proficiency` links `Person` and `Language` |\n| **Lookup tables** | `Language` and `CEFRLevel` with `contents` |\n| **Association data** | `cefr_level` stored in the association table |\n| **Standards** | ISO 639-1 codes, CEFR levels |\n\n### Benefits of Lookup Tables\n\n1. **Data consistency** — Only valid codes can be used\n2. **Rich metadata** — Full names, descriptions stored once\n3. **Easy updates** — Change \"Español\" to \"Spanish\" in one place\n4. **Self-documenting** — `Language()` shows all valid options\n\n## Next Steps\n\n- [University Database](university/) — Academic records\n- [Hotel Reservations](hotel-reservations/) — Workflow dependencies\n- [Queries Tutorial](../basics/04-queries/) — Query operators in depth" }, { "cell_type": "code", @@ -2331,4 +2309,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file