From 7eff7b656a8a2b55aa6c827760786b7545a0a381 Mon Sep 17 00:00:00 2001
From: mfw78 <mfw78@nullis.xyz>
Date: Mon, 3 Mar 2025 09:56:38 +0000
Subject: [PATCH 1/5] feat(swip-26): strictly typed chunk system

---
 SWIPs/swip-26.md | 173 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 173 insertions(+)
 create mode 100644 SWIPs/swip-26.md

diff --git a/SWIPs/swip-26.md b/SWIPs/swip-26.md
new file mode 100644
index 0000000..3e10dd8
--- /dev/null
+++ b/SWIPs/swip-26.md
@@ -0,0 +1,173 @@
+---
+swip: 26
+title: Standardised Chunk Type Framework
+status: Draft
+type: Standards Track
+category: Core
+author: mfw78 (@mfw78)
+created: 2025-03-03
+---
+
+## Simple Summary
+This SWIP introduces a standardised framework for defining chunk types in Swarm, improving security and interoperability through consistent type identification and validation.
+
+## Abstract
+This SWIP proposes a standardised framework for defining and processing chunk types in Swarm. By creating a formal type system for chunks, including content-addressed chunks (CAC) and single-owner chunks (SOC), we improve security, interoperability, and maintainability across the Swarm ecosystem. The proposal defines a structured approach to chunk identification, versioning, and validation without modifying the wire protocol. Key innovations include fixed-length type-specific headers, deterministic address calculation, and formalised validation rules.
+
+## Motivation
+Swarm's storage layer is built around chunks as the fundamental unit of data. Currently, the system supports multiple chunk types, but lacks a standardised type system. This creates several issues:
+
+1. **Ambiguous Processing**: Without explicit type information, chunk processing depends on implicit detection methods, leading to potential security vulnerabilities.
+
+2. **Limited Extensibility**: Adding new chunk types requires changes to core validation logic, making it difficult to evolve the system.
+
+3. **Inconsistent Validation**: Chunk validation logic is spread across multiple components, leading to potential inconsistencies.
+
+4. **Type-Safety Gaps**: Without formal type definitions, runtime type errors can occur when processing chunks.
+
+A standardised chunk type framework would address these issues by providing a consistent, extensible system for defining, identifying, and validating different chunk types.
+
+## Specification
+
+### Core Concepts
+
+#### 1. Chunk Structure
+
+A standardised chunk shall conceptually consist of:
+
+1. **Header**: Metadata describing the chunk and its contents
+   - Common Header: Information common to all chunk types (type, version)
+   - Type-Specific Header: Additional fields specific to the chunk type
+2. **Payload**: The actual chunk data
+
+The chunk's address is not part of the chunk itself but is deterministically derived from the chunk's contents based on its type.
+
+#### 2. Common Chunk Header
+
+The common chunk header shall contain:
+
+1. **Type**: The chunk type identifier (1 byte)
+2. **Version**: The chunk format version (1 byte)
+
+| Type ID | Name | Description |
+|---------|------|-------------|
+| 0x00    | CAC  | Content-addressed chunk |
+| 0x01    | SOC  | Single-owner chunk |
+| 0x02-0xFF | Reserved | Reserved for future chunk types |
+
+#### 3. Fixed-Length Type-Specific Headers
+
+All type-specific headers MUST be of fixed length for their respective chunk types. This ensures that at a wire-level, the maximum size of a chunk is always known and predictable, based on the first 2 bytes (type and version).
+
+Example header sizes:
+- CAC header: 10 bytes (2 bytes common header + 8 bytes span)
+- SOC header: 99 bytes (2 bytes common header + 32 bytes ID + 65 bytes signature)
+
+### Address Calculation
+
+The address of a chunk shall be deterministically calculated based on its type, version, and contents. We define the general address calculation function as:
+
+$$\text{Address} = f_{\text{type}}(\text{header}, \text{payload})$$
+
+Where $f_{\text{type}}$ is the type-specific address calculation function.
+
+#### Generic Address Derivation Function
+
+For any chunk type, the address derivation function can be formally defined as:
+
+$$f_{\text{type}}(\text{header}, \text{payload}) = \mathcal{H}(g_{\text{type}}(\text{header}, \text{payload}))$$
+
+Where:
+- $\mathcal{H}$ is a cryptographic hash function (i.e. `keccak256`)
+- $g_{\text{type}}$ is a type-specific data preparation function
+
+Different chunk types will implement specific derivation functions based on their requirements.
+
+### Chunk Type Specifications
+
+The Swarm Specifications shall define the standardised format for each chunk type. Adding a new chunk type to the specifications requires:
+
+1. Assignment of a unique type identifier
+2. Definition of fixed-length type-specific header structure
+3. Definition of payload structure
+4. Specification of address calculation function $f_{\text{type}}$
+5. Specification of validation requirements
+
+These specifications ensure that all implementations handle chunks consistently and securely across the Swarm ecosystem.
+
+### Type Processing
+
+The chunk processing logic shall:
+
+1. Receive the chunk type and version information from the wire protocol
+2. Use the type and version to determine the expected fixed-length type-specific header size as defined in the Swarm Specifications
+3. Verify that the received header matches the expected size for the given type
+4. Fail fast if the header is malformed or incomplete
+5. Extract the type-specific header fields
+6. Calculate the chunk address using the type-specific address calculation function
+7. Apply type-specific validation rules
+8. Process the payload according to type-specific structure
+
+This approach allows for early validation of chunk integrity based on protocol-level type information, reducing parsing errors and simplifying processing logic.
+
+## Rationale
+
+The proposed standardised chunk type framework addresses several key issues in the current implementation:
+
+1. **Type Ambiguity**: By explicitly encoding chunk types in the header, we eliminate ambiguity in chunk processing, enhancing security and reliability.
+
+2. **Extensibility**: The formal specifications allow for future chunk types to be added in a standardised way without modifying core validation logic.
+
+3. **Validation Consistency**: Centralising validation rules in the specifications ensures consistent enforcement across components and implementations.
+
+4. **Memory Efficiency**: Fixed-length headers enable predictable memory allocation and reduce fragmentation.
+
+5. **Parsing Efficiency**: Type-specific parsing paths reduce the need for speculative parsing, improving performance.
+
+The design choices prioritise:
+- Security through explicit typing and validation
+- Efficiency through predictable memory allocation and fail-fast validation
+- Extensibility through the standardised specification system
+- Backward compatibility with existing chunk types
+
+## Backwards Compatibility
+
+This proposal maintains backward compatibility by:
+
+1. Preserving existing chunk address calculation methods for current chunk types
+2. Supporting current chunk formats with version 0 of each type
+3. Allowing for gradual adoption of the type system
+4. Providing a conversion layer between legacy and new chunk formats
+
+## Test Cases
+
+Test cases should include:
+
+1. **Header Validation**: Tests that verify correct parsing of type-specific headers for different chunk types
+2. **Address Calculation**: Tests that confirm proper address derivation for each chunk type
+3. **Size Verification**: Tests that ensure fixed-length headers meet their size requirements
+4. **Malformed Input**: Tests that verify proper rejection of malformed chunks
+5. **Version Handling**: Tests for correct processing of different versions of the same chunk type
+
+## Implementation
+
+Implementation will proceed in phases:
+
+1. Formalise the chunk type specifications for CAC and SOC in the Swarm Specifications
+2. Implement type-aware chunk processing in the node software
+3. Add validation framework for existing chunk types based on the specifications
+4. Develop compatibility layer for processing legacy chunks
+
+## Security Considerations
+
+The standardised chunk type framework improves security through:
+
+1. **Explicit Type Checking**: Reduces the risk of type confusion attacks
+2. **Fixed-Length Headers**: Prevents buffer overflow attacks
+3. **Early Validation**: Enables fail-fast behaviour for malformed chunks
+4. **Deterministic Addressing**: Ensures consistent and secure chunk addressing
+5. **Versioned Security**: Allows security improvements via version updates
+
+## Copyright Waiver
+
+Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).

From d427fc67ef70e6939264321f508a452c8532109f Mon Sep 17 00:00:00 2001
From: mfw78 <mfw78@nullis.xyz>
Date: Mon, 3 Mar 2025 10:09:45 +0000
Subject: [PATCH 2/5] chore(swip-26): add flowchart

---
 SWIPs/swip-26.md | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/SWIPs/swip-26.md b/SWIPs/swip-26.md
index 3e10dd8..b4ca4d2 100644
--- a/SWIPs/swip-26.md
+++ b/SWIPs/swip-26.md
@@ -110,6 +110,24 @@ The chunk processing logic shall:
 
 This approach allows for early validation of chunk integrity based on protocol-level type information, reducing parsing errors and simplifying processing logic.
 
+#### Flowchart
+
+The flowchart below illustrates the processing steps for a chunk:
+
+```mermaid
+flowchart TD
+    Start[Receive chunk via wire protocol] --> A[Protobuf decodes chunk type, version, header, and payload]
+    A --> B{Header size matches expected size for type?}
+    B -->|No| C[Fail: Invalid header size]
+    B -->|Yes| D[Extract type-specific header fields]
+    D --> E[Calculate chunk address using type-specific function]
+    E --> F{Validate chunk content}
+    F -->|Invalid| G[Fail: Invalid chunk content]
+    F -->|Valid| H[Process payload according to type-specific structure]
+    H --> I[Pass processed chunk to appropriate protocol handler]
+    I --> End[Protocol-specific processing]
+```
+
 ## Rationale
 
 The proposed standardised chunk type framework addresses several key issues in the current implementation:

From 945322294449e1878de2168e6ec1901d198adb60 Mon Sep 17 00:00:00 2001
From: mfw78 <53399572+mfw78@users.noreply.github.com>
Date: Mon, 5 May 2025 07:16:31 +0000
Subject: [PATCH 3/5] Apply suggestions from code review

Co-authored-by: significance <daniel.nickless@gmail.com>
---
 SWIPs/swip-26.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/SWIPs/swip-26.md b/SWIPs/swip-26.md
index b4ca4d2..ffc3301 100644
--- a/SWIPs/swip-26.md
+++ b/SWIPs/swip-26.md
@@ -12,10 +12,10 @@ created: 2025-03-03
 This SWIP introduces a standardised framework for defining chunk types in Swarm, improving security and interoperability through consistent type identification and validation.
 
 ## Abstract
-This SWIP proposes a standardised framework for defining and processing chunk types in Swarm. By creating a formal type system for chunks, including content-addressed chunks (CAC) and single-owner chunks (SOC), we improve security, interoperability, and maintainability across the Swarm ecosystem. The proposal defines a structured approach to chunk identification, versioning, and validation without modifying the wire protocol. Key innovations include fixed-length type-specific headers, deterministic address calculation, and formalised validation rules.
+This SWIP proposes a standardised framework for defining and processing chunk types in Swarm. By creating a formal type system for chunks, including content-addressed chunks (CAC) and single-owner chunks (SOC), we improve security, interoperability, and maintainability across the Swarm ecosystem. The proposal defines a structured approach to chunk identification, versioning, and validation. The key innovation is the formal definition of fixed-length type-specific headers to be delivered alongside chunks and formally documenting address determination and payload validation rules.
 
 ## Motivation
-Swarm's storage layer is built around chunks as the fundamental unit of data. Currently, the system supports multiple chunk types, but lacks a standardised type system. This creates several issues:
+Swarm's storage layer is built around chunks as the fundamental unit of data. Currently, the system supports multiple chunk types, but lack standardised headers. This creates several issues:
 
 1. **Ambiguous Processing**: Without explicit type information, chunk processing depends on implicit detection methods, leading to potential security vulnerabilities.
 

From 234ffb9266963646b5c85bfe75228e670391e59e Mon Sep 17 00:00:00 2001
From: mfw78 <mfw78@nxm.rs>
Date: Tue, 20 Jan 2026 22:05:10 +0000
Subject: [PATCH 4/5] feat(swip-26): add wire protocol representation for typed
 chunks

- Define Chunk protobuf message with type, version, and payload fields
- Specify that all protocol messages referencing chunks MUST use the
  Chunk message type instead of raw bytes
- Add Delivery message example for pushsync/pullsync integration
- Include migration path for backward compatibility
- Fix minor grammar and style issues
---
 SWIPs/swip-26.md | 53 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 51 insertions(+), 2 deletions(-)

diff --git a/SWIPs/swip-26.md b/SWIPs/swip-26.md
index ffc3301..fc5b19a 100644
--- a/SWIPs/swip-26.md
+++ b/SWIPs/swip-26.md
@@ -15,7 +15,7 @@ This SWIP introduces a standardised framework for defining chunk types in Swarm,
 This SWIP proposes a standardised framework for defining and processing chunk types in Swarm. By creating a formal type system for chunks, including content-addressed chunks (CAC) and single-owner chunks (SOC), we improve security, interoperability, and maintainability across the Swarm ecosystem. The proposal defines a structured approach to chunk identification, versioning, and validation. The key innovation is the formal definition of fixed-length type-specific headers to be delivered alongside chunks and formally documenting address determination and payload validation rules.
 
 ## Motivation
-Swarm's storage layer is built around chunks as the fundamental unit of data. Currently, the system supports multiple chunk types, but lack standardised headers. This creates several issues:
+Swarm's storage layer is built around chunks as the fundamental unit of data. Currently, the system supports multiple chunk types, but lacks standardised headers. This creates several issues:
 
 1. **Ambiguous Processing**: Without explicit type information, chunk processing depends on implicit detection methods, leading to potential security vulnerabilities.
 
@@ -78,7 +78,7 @@ For any chunk type, the address derivation function can be formally defined as:
 $$f_{\text{type}}(\text{header}, \text{payload}) = \mathcal{H}(g_{\text{type}}(\text{header}, \text{payload}))$$
 
 Where:
-- $\mathcal{H}$ is a cryptographic hash function (i.e. `keccak256`)
+- $\mathcal{H}$ is a cryptographic hash function (e.g. `keccak256`)
 - $g_{\text{type}}$ is a type-specific data preparation function
 
 Different chunk types will implement specific derivation functions based on their requirements.
@@ -128,6 +128,55 @@ flowchart TD
     I --> End[Protocol-specific processing]
 ```
 
+### Wire Protocol Representation
+
+To enable typed chunks at the wire level, the following Protocol Buffer definitions shall be used:
+
+#### Chunk Message
+
+```protobuf
+message Chunk {
+  uint32 type = 1;    // Chunk type identifier (see type table)
+  uint32 version = 2; // Chunk format version
+  bytes payload = 3;  // Type-specific header + chunk data
+}
+```
+
+The `payload` field contains the concatenation of the type-specific header and the chunk data. Based on the `type` and `version` fields, the receiver can determine the fixed-length type-specific header size and extract it from the beginning of the payload.
+
+For example:
+- **CAC (type=0, version=0)**: `payload` = span (8 bytes) || BMT chunk data
+- **SOC (type=0x01, version=0)**: `payload` = ID (32 bytes) || signature (65 bytes) || wrapped chunk data
+
+#### Integration with Existing Protocols
+
+All protocol buffer definitions that reference chunk data MUST use the typed `Chunk` message instead of raw `bytes`. This ensures consistent type information is available at the wire level across all protocols.
+
+For example, the `Delivery` message used by pushsync and pullsync protocols shall be updated:
+
+```protobuf
+message Delivery {
+  bytes address = 1;
+  Chunk data = 2;
+  bytes stamp = 3;
+}
+```
+
+This pattern applies universally: any protocol message that transmits chunk content MUST embed the `Chunk` message type, ensuring:
+
+1. Chunk type and version are always available at the wire level
+2. Recipients can determine the expected type-specific header size
+3. Address calculation and validation can be performed using type-specific rules
+4. Consistent handling across all protocols that deal with chunks
+
+#### Migration Path
+
+During the transition period, implementations should:
+
+1. Accept both legacy `Delivery` messages (with raw bytes) and new typed `Delivery` messages
+2. When receiving legacy messages, attempt heuristic type detection for backward compatibility
+3. When sending, prefer the new typed format if the peer supports it
+
 ## Rationale
 
 The proposed standardised chunk type framework addresses several key issues in the current implementation:

From a71c816f1497f432eba29030495d0c930db14249 Mon Sep 17 00:00:00 2001
From: mfw78 <mfw78@nxm.rs>
Date: Tue, 20 Jan 2026 22:08:26 +0000
Subject: [PATCH 5/5] chore(swip-26): update migration path to breaking change

- Remove backward compatibility with legacy messages
- Specify lazy determination and population of type information
  for existing localstore data
---
 SWIPs/swip-26.md | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/SWIPs/swip-26.md b/SWIPs/swip-26.md
index fc5b19a..57c2e20 100644
--- a/SWIPs/swip-26.md
+++ b/SWIPs/swip-26.md
@@ -171,11 +171,13 @@ This pattern applies universally: any protocol message that transmits chunk cont
 
 #### Migration Path
 
-During the transition period, implementations should:
+This specification represents a breaking change; implementations MUST use the typed `Chunk` message for all wire protocol communications.
 
-1. Accept both legacy `Delivery` messages (with raw bytes) and new typed `Delivery` messages
-2. When receiving legacy messages, attempt heuristic type detection for backward compatibility
-3. When sending, prefer the new typed format if the peer supports it
+For existing data in the localstore that lacks type information, implementations should:
+
+1. Determine the chunk type heuristically upon access (e.g. by examining the chunk structure)
+2. Lazily populate the type information in the localstore when chunks are retrieved
+3. Avoid a large upfront migration by only updating type metadata as chunks are accessed
 
 ## Rationale