Skip to content

Conversation

@mfw78
Copy link

@mfw78 mfw78 commented Mar 3, 2025

This PR introduces a draft SWIP for a standardised framework for defining chunk types in Swarm, improving security and interoperability through consistent type identification and validation.

Copy link
Member

@nugaon nugaon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chunk address always available before chunk type assertion which kind of implies its type; if it cannot be a SOC it is a CAC based on its address.
Though the proposal has valid points and straightforward logic to solve chunk types, I don't see the urge to incorporate this new type indicator for chunks because there is no new chunk type that needs it.


1. Preserving existing chunk address calculation methods for current chunk types
2. Supporting current chunk formats with version 0 of each type
3. Allowing for gradual adoption of the type system
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me, first, it seems like it needs a breaking change in the base protocol to handle type headers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A localstore migration can be used in order to both assign type, and version numbers to chunks contained within the localstore, assigning version 0 to the respective chunk types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it is fine for a client to opine methodology around when to determine chunk types in the first instance

as far as the protocol is concerned, i believe protobuf fields are optional, but sending headers should become mandatory once the swarm has had time to adjust. in this way the change could be made with less disruption. let's pin down our approach. thank you for raising this @nugaon

@mfw78
Copy link
Author

mfw78 commented Mar 5, 2025

The chunk address always available before chunk type assertion which kind of implies its type; if it cannot be a SOC it is a CAC based on its address. Though the proposal has valid points and straightforward logic to solve chunk types, I don't see the urge to incorporate this new type indicator for chunks because there is no new chunk type that needs it.

While there is no new chunk type planned (yet), this is work that would have to be achieved prior to this. Examples that come to mind would be system chunks (whereby postage snapshots can be distributed with some other proof mechanism attached instead of stamps). In the meantime, the system benefits from clear typing and well-defined binary marshaling of chunks.

@zelig zelig changed the title feat(swip-26): strictly typed chunk system add swip-26: strictly typed chunk system Apr 23, 2025
@significance
Copy link
Member

suggested some changes, otherwise happy to merge. more SWIPs should follow which define the two current chunk types and use these as a stage to establish prescriptive examples of a formal chunktype type structure with which future chunktypes shall be defined

Co-authored-by: significance <daniel.nickless@gmail.com>
@significance
Copy link
Member

@mfw78 have read through and i think we are gtg in spirit here

however, it would be good to a little more specific in the protocol specific implementation. my memory is we agreed to send the chunk type data specified in the protobuf messages. would be very much grateful if you are up for making a suggestion on how you would like to do so. if you are able, i will canvas opinion from the Bee team and once these details are pinned down let's merge and proceed to implementation

- Define Chunk protobuf message with type, version, and payload fields
- Specify that all protocol messages referencing chunks MUST use the
  Chunk message type instead of raw bytes
- Add Delivery message example for pushsync/pullsync integration
- Include migration path for backward compatibility
- Fix minor grammar and style issues
@mfw78
Copy link
Author

mfw78 commented Jan 20, 2026

Added wire protocol representation section based on recent discussion.

Key additions:

  • Defined Chunk protobuf message with type, version, and payload fields
  • Specified that all protocol buffer definitions referencing chunk data MUST use the typed Chunk message instead of raw bytes
  • Included Delivery message example showing integration with pushsync/pullsync
  • Added migration path for backward compatibility with legacy messages

- Remove backward compatibility with legacy messages
- Specify lazy determination and population of type information
  for existing localstore data
@significance
Copy link
Member

@mfw78 🙇


```protobuf
message Chunk {
uint32 type = 1; // Chunk type identifier (see type table)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a note about this - not sure if having a uint32 is: a. too permissive, b. too big.
since i don't see the list of chunk types and their versions to be too much in flux, consider using protobuf enums here (one for type, one for version), and we will bear the brunt of maintaining that in long run. alternatively, having bytes here with len = 1 would narrow down what is needed here. honestly i think enums would work best here. also, consider having type and version to be represented in one enum - this would probably allow to simplify downstream business logic. compare:

if type == 1 {
 if version == 0 {...}
 if version == 1 {...}
}

with:

switch type {
 case swarm.CacV1:
  ...
 case swarm.CacV2:
  ...
}

Copy link
Author

@mfw78 mfw78 Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd certainly be a fan of using enums, and would prefer this, but left this method out as there seems to have been an intentional non-committal towards firm typing. I am a little concerned that the definition of the type would leak type implementation details into the other. Ideally as well, could also compress down to u8 instead of u32, but the idea here was to keep the contents of the frame somewhat within a cache line for a 64 bit machine (to be fair though, I don't have firm benchmarks to support necessitating this).

Can you propose a concrete alternative / suggestion to the protobuf(s)?

@mfw78 mfw78 force-pushed the typed-chunk-system branch from a71c816 to 9453222 Compare January 21, 2026 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants