From 3a6b1d5c3b461c6251bf03abd48d59656c5e7182 Mon Sep 17 00:00:00 2001 From: Zixuan Chen Date: Sat, 6 Sep 2025 14:30:57 +0800 Subject: [PATCH 1/2] docs: encoding format --- encoding.md | 345 ++++++++++++++++++++ serde_columnar/fuzz/Cargo.lock | 577 +++++++++++++++++++++++++++++++++ 2 files changed, 922 insertions(+) create mode 100644 encoding.md create mode 100644 serde_columnar/fuzz/Cargo.lock diff --git a/encoding.md b/encoding.md new file mode 100644 index 0000000..38f1c1d --- /dev/null +++ b/encoding.md @@ -0,0 +1,345 @@ +# Serde‑Columnar Binary Encoding + +This document describes the on‑wire format produced by the `serde_columnar` +crate. It aims to be easy to read and practical to implement against, inspired +by the style of the Automerge binary format spec. + +The format is designed for compactness and forward/backward compatibility, with +column‑oriented layout and specialized codecs for common patterns. It is built +on top of the [postcard] serializer for primitive and sequence encodings. + +[postcard]: https://docs.rs/postcard/ + + +## Scope and Stability + +- This spec describes the current behavior of the `0.3.x` series of + `serde_columnar` as implemented in this repository. +- The crate is in progress and not yet stable for production use; details may + change. The compatibility model (optional fields with stable integer indexes) + is intended to remain. + + +## High‑Level Model + +At a high level, a serialized value is a single [postcard] payload. Within that +payload: + +- A “table” (a struct annotated with `#[columnar(ser, de)]`) is encoded as a + postcard sequence containing each field in declaration order. Fields marked as + containers (`class="vec"` or `class="map"`) are themselves nested postcard + sequences in a columnar layout (described below). +- A “row” (a struct annotated with `#[columnar(vec)]` and/or `#[columnar(map)]`) + is never serialized directly; instead, rows are only encoded as part of a + container and become one column per field. +- No field names or strategy identifiers are stored on wire. The schema (Rust + type with its `#[columnar(...)]` attributes) determines how to interpret each + element. + +Postcard provides the basic representation for integers, bytes, sequences, and +tuples. Specifically: + +- Integers are varint‑encoded; signed integers use ZigZag. +- A “bytes” value is encoded as a length (varint) followed by that many raw + octets. +- A sequence is encoded as its length (varint) followed by each element in + order. + +Serde‑columnar uses postcard’s “bytes” to embed codec output, and postcard’s +sequences to arrange columns and optional field mappings. + + +## Containers and Layout + +Two container kinds are supported: list‑like and map‑like. + +### Vec‑like containers (`class = "vec"`) + +Given a struct `Row` annotated with `#[columnar(vec)]` and a field `data: +Vec` in a table struct annotated with `#[columnar(ser, de)]`, the value of +`data` is encoded as a sequence of columns, one per field of `Row`: + +``` +data := SEQ( + COL_0_bytes, + COL_1_bytes, + ..., + COL_{F-1}_bytes, + (opt_index, opt_COL_bytes)* // 0 or more optional fields +) +``` + +- `F` is the count of non‑optional, non‑skipped fields in `Row`. +- Each `COL_i_bytes` is a postcard “bytes” element produced by the selected + codec for that column (see Codecs). +- Optional row fields are not placed positionally. Instead, each present + optional field is appended as a pair `(index: usize, bytes: Vec)`, where + `index` is the stable integer specified in the schema + `#[columnar(optional, index = N)]` and `bytes` is the codec output for that + optional column. + +Decoding: + +1. Read the `F` non‑optional columns in order. +2. Read zero or more mapping entries `(index, bytes)` until the sequence ends. +3. For each optional field of `Row`: + - If present in the mapping: decode that column from its `bytes`. + - Otherwise: synthesize a column of length `L` (the max length among decoded + columns) filled with `Default::default()`. +4. Reconstruct rows by zipping the per‑field columns element‑wise. + + +### Map‑like containers (`class = "map"`) + +Given a struct `Row` annotated with `#[columnar(map)]` and a field +`data: Map` in a table struct, the value of `data` is encoded as: + +``` +data := SEQ( + KEYS: Vec, // postcard Vec + COL_0_bytes, COL_1_bytes, ..., COL_{F-1}_bytes, + (opt_index, opt_COL_bytes)* +) +``` + +The columns correspond to the value type `Row` in the same way as the vec case. +Keys are serialized once up front. During decoding, `len(KEYS)` determines the +expected number of elements per column. + +Reconstruction zips the keys with the reconstructed rows to produce the map. + + +### Tables (top‑level structs) + +A table struct annotated with `#[columnar(ser, de)]` is encoded as a postcard +sequence of its fields in declaration order. For each field: + +- If `class = "vec"` or `class = "map"`: the field is encoded using the + container layouts above as a nested sequence value. +- Otherwise: the field is encoded with postcard normally. + +Optional table fields are appended after all non‑optional fields as mapping +pairs `(index: usize, bytes: &[u8])`, mirroring the container behavior. Missing +optionals decode to `Default::default()`. + + +## Columns and Codecs + +Each field of a row becomes a column. The schema controls which codec is used by +annotating the field with `#[columnar(strategy = "...")]`. If no strategy is +specified, a generic codec is used. + +On wire, every column is carried as a postcard “bytes” element. No strategy tag +is stored; the schema determines how to interpret the bytes. + + +### Generic (no strategy) + +Encodes the raw vector of field values using postcard: `Vec` → `bytes`. This +does not compress and is the fallback for complex types and nested containers +(e.g., a `Vec` or `Map` nested inside a row). + + +### RLE (`strategy = "Rle"`) + +General run‑length encoding for any `T` that implements Serde, `Clone`, and +`PartialEq`. + +Column bytes are a concatenation of runs encoded as postcard values: + +- Repeated run: `(len: isize > 0)` followed by a single `value: T`. +- Literal run: `(len: isize < 0)` followed by `-len` values of type `T`. + +Decoding appends `len` copies of `value` for repeated runs and replays the +exact sequence for literal runs. A safety limit of `MAX_RLE_COUNT = 1_000_000_000` +is enforced to reject obviously invalid inputs. + + +### Delta‑RLE (`strategy = "DeltaRle"`) + +Optimized for monotonic or slowly changing integers. + +Encoding maintains an `absolute` accumulator (starting at `0`). For each value +`v`, compute `delta = v - absolute`, update `absolute = v`, and append `delta` +to an RLE stream of `i128` using the RLE format above. The column bytes are the +underlying RLE bytes. + +Decoding reconstructs `absolute += delta` for each delta read from the RLE +decoder and converts back to the target integer type, erroring on overflow. + + +### Bool‑RLE (`strategy = "BoolRle"`) + +Specialized for booleans. The column bytes are a postcard sequence of `usize` +counts. Decoding proceeds as follows: + +1. Initialize `value = true` and `count = 0`. +2. For each `n` read: + - Set `value = !value` (toggle). + - Emit `n` copies of `value`. + +The encoder chooses counts so that the first toggle yields the first run’s +boolean. For example, the sequence `true, true, false, false, false` encodes as +counts `[0, 2, 3]`. + +Like RLE, a safety limit guards against pathological inputs. + + +### Delta‑of‑Delta (`strategy = "DeltaOfDelta"`) + +Compact bit‑packed encoding for timestamp‑like series with approximately +constant step. The first element is stored verbatim; subsequent steps encode the +second difference (`Δ² = (v_i - v_{i-1}) - (v_{i-1} - v_{i-2})`). + +Column bytes have the following structure: + +``` +bytes := postcard(Option) // head value (Some(first), or None if empty) + octet // bits_used_in_last_byte (1..=8, 8 means a full byte) + bitstream // big‑endian packed Δ² codes +``` + +The bitstream encodes each Δ² using a prefix class and payload: + +- `0` → Δ² = 0 (1 bit total) +- `10` + 7 bits unsigned → Δ² ∈ [−63, 64] +- `110` + 9 bits unsigned → Δ² ∈ [−255, 256] +- `1110` + 12 bits unsigned → Δ² ∈ [−2047, 2048] +- `11110` + 21 bits unsigned → Δ² ∈ [−(2²⁰−1), 2²⁰] +- `11111` + 64 bits unsigned → Δ² as 64‑bit two’s‑complement + +In each non‑zero class, the stored payload is `(Δ² + bias)` where the bias is +63, 255, 2047, or `(2²⁰ − 1)` respectively. Bits are appended MSB‑first, +spanning octet boundaries as needed. The single‑octet `bits_used_in_last_byte` +acts as a tail marker: when the last code ends exactly on a byte boundary, it +is set to `8`. + +Decoding reconstructs values by: + +1. Reading the head `Option` to seed `prev` and `prev_delta`. +2. Repeating: read a code; if class `0`, set `prev += prev_delta`; else read the + payload, unbias to get Δ², update `prev_delta += Δ²`, then `prev += + prev_delta`. +3. Stop when there are no more bits in the bitstream. + + +## Optional Fields and Compatibility + +To evolve schemas without a separate version number, fields may be marked: + +``` +#[columnar(optional, index = N)] +``` + +Rules: + +- All non‑optional fields must precede all optional fields in the struct. +- `index` must be unique per struct and stable across versions. + +On wire, optional fields are omitted from the positional portion and instead +encoded as `(index, bytes)` pairs after all non‑optional elements. Decoders that +do not know a field’s `index` will ignore it. Decoders that expect a field that +was not sent will use `Default::default()` for that field. + +This makes adding, removing, or reordering optional fields forward‑ and +backward‑compatible, as long as each field’s binary representation remains +compatible (e.g., `u32` → `u64` is fine, but changing the meaning of a codec is +not). + + +## Borrowing and Zero‑Copy + +Fields annotated with `#[columnar(borrow)]` are deserialized by borrowing from +the input buffer where possible (e.g., `Cow<'de, str>` and `Cow<'de, [u8]>`). +This does not change the on‑wire format; it affects only how bytes are mapped to +Rust values during deserialization. + + +## Iterable Decoding + +If a row type is annotated with `#[columnar(iterable)]`, an iterator view is +generated. Calling `serde_columnar::iter_from_bytes::(&bytes)` returns a +table in which any `class = "vec"` fields annotated with `iter = "Row"` produce +row iterators rather than eagerly allocating `Vec`. + +Iterators consume the same column bytes described above, using streaming +decoders: + +- `AnyRleIter` for `Rle` +- `DeltaRleIter` for `DeltaRle` +- `BoolRleIter` for `BoolRle` +- `DeltaOfDeltaIter` for `DeltaOfDelta` +- `GenericIter` for unstrategized columns + +The on‑wire layout is identical to the non‑iterable case. + + +## Error Handling and Limits + +- Column decoders validate run counts (`MAX_RLE_COUNT = 1_000_000_000`) to avoid + malicious memory usage. +- Integer conversions error if a reconstructed value cannot be represented in + the target type. +- `DeltaOfDelta` rejects inputs with insufficient header/trailer bytes. + + +## Worked Example (Informal) + +Consider these types: + +```rust +#[columnar(vec, ser, de)] +struct Row { + #[columnar(strategy = "Rle")] name: String, + #[columnar(strategy = "DeltaRle")] id: u64, + #[columnar(optional, index = 0)] note: String, +} + +#[columnar(ser, de)] +struct Table { + #[columnar(class = "vec")] rows: Vec, + version: u32, +} +``` + +Encoding a `Table` value produces one postcard payload: + +1. The top level is a sequence of length 2: element 0 is `rows`, element 1 is + `version`. +2. `rows` is a nested sequence: + - `COL_name` bytes (RLE of `String`) + - `COL_id` bytes (Delta‑RLE of `u64`) + - If any `note` present: a pair `(0, COL_note_bytes)` appended +3. `version` is the postcard encoding of `u32`. + +A decoder for an older schema without `note` will ignore the `(0, ...)` pair. A +decoder for a newer schema that adds `#[columnar(optional, index = 1)] nick: +Option` will read or default it independently. + + +## Implementation Notes (Pointers) + +- Container wrappers: `columnar/src/wrap.rs` +- Codecs and iterators: `columnar/src/strategy/rle.rs`, `columnar/src/iterable.rs` +- Column types: `columnar/src/column/*.rs` +- Top‑level encode/decode helpers: `columnar/src/lib.rs`, + `columnar/src/columnar_internal.rs` +- Macro expansion for tables/rows: `columnar_derive/src/**/*` + + +## Non‑Goals + +- No schema negotiation or self‑describing messages. The reader must know the + Rust types to decode against. +- No cross‑column compression (each column is independent). + + +## Summary + +- Tables serialize as postcard sequences of fields; container fields embed a + columnar layout. +- Row fields become independent columns; codecs emit postcard “bytes”. +- Optional fields are appended as `(index, bytes)` pairs, enabling + backward‑/forward‑compatible evolution. +- Iteration is a decoding optimization; the wire format remains the same. + diff --git a/serde_columnar/fuzz/Cargo.lock b/serde_columnar/fuzz/Cargo.lock new file mode 100644 index 0000000..5ce2810 --- /dev/null +++ b/serde_columnar/fuzz/Cargo.lock @@ -0,0 +1,577 @@ +# This file is automatically @generated by Cargo. +# It is not intended for manual editing. +version = 3 + +[[package]] +name = "adler" +version = "1.0.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f26201604c87b1e01bd3d98f8d5d9a8fcbb815e8cedb41ffccbeb4bf593a35fe" + +[[package]] +name = "aho-corasick" +version = "0.7.19" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b4f55bd91a0978cbfd91c457a164bab8b4001c833b7f323132c0a4e1922dd44e" +dependencies = [ + "memchr", +] + +[[package]] +name = "arbitrary" +version = "1.2.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "29d47fbf90d5149a107494b15a7dc8d69b351be2db3bb9691740e88ec17fd880" +dependencies = [ + "derive_arbitrary", +] + +[[package]] +name = "atomic-polyfill" +version = "0.1.10" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9c041a8d9751a520ee19656232a18971f18946a7900f1520ee4400002244dd89" +dependencies = [ + "critical-section", +] + +[[package]] +name = "autocfg" +version = "1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d468802bab17cbc0cc575e9b053f41e72aa36bfa6b7f55e3529ffa43161b97fa" + +[[package]] +name = "bare-metal" +version = "0.2.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5deb64efa5bd81e31fcd1938615a6d98c82eafcbcd787162b6f63b91d6bac5b3" +dependencies = [ + "rustc_version 0.2.3", +] + +[[package]] +name = "bare-metal" +version = "1.0.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f8fe8f5a8a398345e52358e18ff07cc17a568fbca5c6f73873d3a62056309603" + +[[package]] +name = "bincode" +version = "1.3.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b1f45e9417d87227c7a56d22e471c6206462cba514c7590c09aff4cf6d1ddcad" +dependencies = [ + "serde", +] + +[[package]] +name = "bit_field" +version = "0.10.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "dcb6dd1c2376d2e096796e234a70e17e94cc2d5d54ff8ce42b28cef1d0d359a4" + +[[package]] +name = "bitfield" +version = "0.13.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "46afbd2983a5d5a7bd740ccb198caf5b82f45c40c09c0eed36052d91cb92e719" + +[[package]] +name = "byteorder" +version = "1.4.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "14c189c53d098945499cdfa7ecc63567cf3886b3332b312a5b4585d8d3a6a610" + +[[package]] +name = "cc" +version = "1.0.76" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "76a284da2e6fe2092f2353e51713435363112dfd60030e22add80be333fb928f" +dependencies = [ + "jobserver", +] + +[[package]] +name = "cfg-if" +version = "1.0.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd" + +[[package]] +name = "cobs" +version = "0.2.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "67ba02a97a2bd10f4b59b25c7973101c79642302776489e030cd13cdab09ed15" + +[[package]] +name = "columnar" +version = "0.1.0" +dependencies = [ + "bincode", + "columnar_derive", + "flate2", + "itertools", + "lazy_static", + "postcard", + "serde", + "thiserror", +] + +[[package]] +name = "columnar-fuzz" +version = "0.0.0" +dependencies = [ + "arbitrary", + "columnar", + "libfuzzer-sys", + "serde", +] + +[[package]] +name = "columnar_derive" +version = "0.1.0" +dependencies = [ + "darling", + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "cortex-m" +version = "0.7.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "70858629a458fdfd39f9675c4dc309411f2a3f83bede76988d81bf1a0ecee9e0" +dependencies = [ + "bare-metal 0.2.5", + "bitfield", + "embedded-hal", + "volatile-register", +] + +[[package]] +name = "crc32fast" +version = "1.3.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b540bd8bc810d3885c6ea91e2018302f68baba2129ab3e88f32389ee9370880d" +dependencies = [ + "cfg-if", +] + +[[package]] +name = "critical-section" +version = "0.2.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "95da181745b56d4bd339530ec393508910c909c784e8962d15d722bacf0bcbcd" +dependencies = [ + "bare-metal 1.0.0", + "cfg-if", + "cortex-m", + "riscv", +] + +[[package]] +name = "darling" +version = "0.14.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b0dd3cd20dc6b5a876612a6e5accfe7f3dd883db6d07acfbf14c128f61550dfa" +dependencies = [ + "darling_core", + "darling_macro", +] + +[[package]] +name = "darling_core" +version = "0.14.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a784d2ccaf7c98501746bf0be29b2022ba41fd62a2e622af997a03e9f972859f" +dependencies = [ + "fnv", + "ident_case", + "proc-macro2", + "quote", + "strsim", + "syn", +] + +[[package]] +name = "darling_macro" +version = "0.14.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7618812407e9402654622dd402b0a89dff9ba93badd6540781526117b92aab7e" +dependencies = [ + "darling_core", + "quote", + "syn", +] + +[[package]] +name = "derive_arbitrary" +version = "1.2.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4903dff04948f22033ca30232ab8eca2c3fc4c913a8b6a34ee5199699814817f" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "either" +version = "1.8.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "90e5c1c8368803113bf0c9584fc495a58b86dc8a29edbf8fe877d21d9507e797" + +[[package]] +name = "embedded-hal" +version = "0.2.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "35949884794ad573cf46071e41c9b60efb0cb311e3ca01f7af807af1debc66ff" +dependencies = [ + "nb 0.1.3", + "void", +] + +[[package]] +name = "flate2" +version = "1.0.24" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f82b0f4c27ad9f8bfd1f3208d882da2b09c301bc1c828fd3a00d0216d2fbbff6" +dependencies = [ + "crc32fast", + "miniz_oxide", +] + +[[package]] +name = "fnv" +version = "1.0.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3f9eec918d3f24069decb9af1554cad7c880e2da24a9afd88aca000531ab82c1" + +[[package]] +name = "hash32" +version = "0.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b0c35f58762feb77d74ebe43bdbc3210f09be9fe6742234d573bacc26ed92b67" +dependencies = [ + "byteorder", +] + +[[package]] +name = "heapless" +version = "0.7.16" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "db04bc24a18b9ea980628ecf00e6c0264f3c1426dac36c00cb49b6fbad8b0743" +dependencies = [ + "atomic-polyfill", + "hash32", + "rustc_version 0.4.0", + "serde", + "spin", + "stable_deref_trait", +] + +[[package]] +name = "ident_case" +version = "1.0.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b9e0384b61958566e926dc50660321d12159025e767c18e043daf26b70104c39" + +[[package]] +name = "itertools" +version = "0.10.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b0fd2260e829bddf4cb6ea802289de2f86d6a7a690192fbe91b3f46e0f2c8473" +dependencies = [ + "either", +] + +[[package]] +name = "jobserver" +version = "0.1.25" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "068b1ee6743e4d11fb9c6a1e6064b3693a1b600e7f5f5988047d98b3dc9fb90b" +dependencies = [ + "libc", +] + +[[package]] +name = "lazy_static" +version = "1.4.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e2abad23fbc42b3700f2f279844dc832adb2b2eb069b2df918f455c4e18cc646" + +[[package]] +name = "libc" +version = "0.2.137" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "fc7fcc620a3bff7cdd7a365be3376c97191aeaccc2a603e600951e452615bf89" + +[[package]] +name = "libfuzzer-sys" +version = "0.4.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c8fff891139ee62800da71b7fd5b508d570b9ad95e614a53c6f453ca08366038" +dependencies = [ + "arbitrary", + "cc", + "once_cell", +] + +[[package]] +name = "lock_api" +version = "0.4.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "435011366fe56583b16cf956f9df0095b405b82d76425bc8981c0e22e60ec4df" +dependencies = [ + "autocfg", + "scopeguard", +] + +[[package]] +name = "memchr" +version = "2.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2dffe52ecf27772e601905b7522cb4ef790d2cc203488bbd0e2fe85fcb74566d" + +[[package]] +name = "miniz_oxide" +version = "0.5.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "96590ba8f175222643a85693f33d26e9c8a015f599c216509b1a6894af675d34" +dependencies = [ + "adler", +] + +[[package]] +name = "nb" +version = "0.1.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "801d31da0513b6ec5214e9bf433a77966320625a37860f910be265be6e18d06f" +dependencies = [ + "nb 1.0.0", +] + +[[package]] +name = "nb" +version = "1.0.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "546c37ac5d9e56f55e73b677106873d9d9f5190605e41a856503623648488cae" + +[[package]] +name = "once_cell" +version = "1.16.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "86f0b0d4bf799edbc74508c1e8bf170ff5f41238e5f8225603ca7caaae2b7860" + +[[package]] +name = "postcard" +version = "1.0.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1c2b180dc0bade59f03fd005cb967d3f1e5f69b13922dad0cd6e047cb8af2363" +dependencies = [ + "cobs", + "heapless", + "serde", +] + +[[package]] +name = "proc-macro2" +version = "1.0.47" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5ea3d908b0e36316caf9e9e2c4625cdde190a7e6f440d794667ed17a1855e725" +dependencies = [ + "unicode-ident", +] + +[[package]] +name = "quote" +version = "1.0.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bbe448f377a7d6961e30f5955f9b8d106c3f5e449d493ee1b125c1d43c2b5179" +dependencies = [ + "proc-macro2", +] + +[[package]] +name = "regex" +version = "1.7.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e076559ef8e241f2ae3479e36f97bd5741c0330689e217ad51ce2c76808b868a" +dependencies = [ + "aho-corasick", + "memchr", + "regex-syntax", +] + +[[package]] +name = "regex-syntax" +version = "0.6.28" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "456c603be3e8d448b072f410900c09faf164fbce2d480456f50eea6e25f9c848" + +[[package]] +name = "riscv" +version = "0.7.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6907ccdd7a31012b70faf2af85cd9e5ba97657cc3987c4f13f8e4d2c2a088aba" +dependencies = [ + "bare-metal 1.0.0", + "bit_field", + "riscv-target", +] + +[[package]] +name = "riscv-target" +version = "0.1.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "88aa938cda42a0cf62a20cfe8d139ff1af20c2e681212b5b34adb5a58333f222" +dependencies = [ + "lazy_static", + "regex", +] + +[[package]] +name = "rustc_version" +version = "0.2.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "138e3e0acb6c9fb258b19b67cb8abd63c00679d2851805ea151465464fe9030a" +dependencies = [ + "semver 0.9.0", +] + +[[package]] +name = "rustc_version" +version = "0.4.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bfa0f585226d2e68097d4f95d113b15b83a82e819ab25717ec0590d9584ef366" +dependencies = [ + "semver 1.0.14", +] + +[[package]] +name = "scopeguard" +version = "1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d29ab0c6d3fc0ee92fe66e2d99f700eab17a8d57d1c1d3b748380fb20baa78cd" + +[[package]] +name = "semver" +version = "0.9.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1d7eb9ef2c18661902cc47e535f9bc51b78acd254da71d375c2f6720d9a40403" +dependencies = [ + "semver-parser", +] + +[[package]] +name = "semver" +version = "1.0.14" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e25dfac463d778e353db5be2449d1cce89bd6fd23c9f1ea21310ce6e5a1b29c4" + +[[package]] +name = "semver-parser" +version = "0.7.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "388a1df253eca08550bef6c72392cfe7c30914bf41df5269b68cbd6ff8f570a3" + +[[package]] +name = "serde" +version = "1.0.147" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d193d69bae983fc11a79df82342761dfbf28a99fc8d203dca4c3c1b590948965" +dependencies = [ + "serde_derive", +] + +[[package]] +name = "serde_derive" +version = "1.0.147" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4f1d362ca8fc9c3e3a7484440752472d68a6caa98f1ab81d99b5dfe517cec852" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "spin" +version = "0.9.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7f6002a767bff9e83f8eeecf883ecb8011875a21ae8da43bffb817a57e78cc09" +dependencies = [ + "lock_api", +] + +[[package]] +name = "stable_deref_trait" +version = "1.2.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a8f112729512f8e442d81f95a8a7ddf2b7c6b8a1a6f509a95864142b30cab2d3" + +[[package]] +name = "strsim" +version = "0.10.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "73473c0e59e6d5812c5dfe2a064a6444949f089e20eec9a2e5506596494e4623" + +[[package]] +name = "syn" +version = "1.0.103" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a864042229133ada95abf3b54fdc62ef5ccabe9515b64717bcb9a1919e59445d" +dependencies = [ + "proc-macro2", + "quote", + "unicode-ident", +] + +[[package]] +name = "thiserror" +version = "1.0.37" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "10deb33631e3c9018b9baf9dcbbc4f737320d2b576bac10f6aefa048fa407e3e" +dependencies = [ + "thiserror-impl", +] + +[[package]] +name = "thiserror-impl" +version = "1.0.37" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "982d17546b47146b28f7c22e3d08465f6b8903d0ea13c1660d9d84a6e7adcdbb" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "unicode-ident" +version = "1.0.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6ceab39d59e4c9499d4e5a8ee0e2735b891bb7308ac83dfb4e80cad195c9f6f3" + +[[package]] +name = "vcell" +version = "0.1.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "77439c1b53d2303b20d9459b1ade71a83c716e3f9c34f3228c00e6f185d6c002" + +[[package]] +name = "void" +version = "1.0.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6a02e4885ed3bc0f2de90ea6dd45ebcbb66dacffe03547fadbb0eeae2770887d" + +[[package]] +name = "volatile-register" +version = "0.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9ee8f19f9d74293faf70901bc20ad067dc1ad390d2cbf1e3f75f721ffee908b6" +dependencies = [ + "vcell", +] From f723068373068864960f17e8ebb47fc2dbf320e0 Mon Sep 17 00:00:00 2001 From: Zixuan Chen Date: Sat, 6 Sep 2025 15:36:40 +0800 Subject: [PATCH 2/2] docs: update encoding --- encoding.md | 286 +++++++++++++++++++++++++++------------------------- 1 file changed, 151 insertions(+), 135 deletions(-) diff --git a/encoding.md b/encoding.md index 38f1c1d..ffd8543 100644 --- a/encoding.md +++ b/encoding.md @@ -1,14 +1,12 @@ -# Serde‑Columnar Binary Encoding +# Serde‑Columnar Binary Encoding (Self‑Contained) -This document describes the on‑wire format produced by the `serde_columnar` -crate. It aims to be easy to read and practical to implement against, inspired -by the style of the Automerge binary format spec. +This document specifies the exact byte format produced by the `serde_columnar` +crate. It is self‑contained: you can implement a compatible encoder/decoder in +any language without depending on Rust or Postcard. -The format is designed for compactness and forward/backward compatibility, with -column‑oriented layout and specialized codecs for common patterns. It is built -on top of the [postcard] serializer for primitive and sequence encodings. - -[postcard]: https://docs.rs/postcard/ +The format is compact and schema‑directed. A top‑level value (a “table”) is a +sequence of fields. Certain fields may contain “containers” whose elements are +stored column‑wise using dedicated codecs. ## Scope and Stability @@ -19,34 +17,42 @@ on top of the [postcard] serializer for primitive and sequence encodings. change. The compatibility model (optional fields with stable integer indexes) is intended to remain. +## Terminology and Conventions -## High‑Level Model +- “Octet” means an 8‑bit byte. +- All multi‑byte integer encodings within this spec are little‑endian + base‑128 varints (LEB128) unless explicitly stated otherwise. +- “Sequence” means a varint length `L` followed by exactly `L` items, each + encoded per its type. +- “Byte string” means a varint length `N` followed by exactly `N` octets. +- “Pair (A, B)” means the bytes of `A` immediately followed by the bytes of + `B` (no extra tag). +- “Table” is the top‑level struct annotated with `#[columnar(ser, de)]`. +- “Row” is a struct annotated with `#[columnar(vec)]` and/or + `#[columnar(map)]`. Rows only appear inside containers. -At a high level, a serialized value is a single [postcard] payload. Within that -payload: -- A “table” (a struct annotated with `#[columnar(ser, de)]`) is encoded as a - postcard sequence containing each field in declaration order. Fields marked as - containers (`class="vec"` or `class="map"`) are themselves nested postcard - sequences in a columnar layout (described below). -- A “row” (a struct annotated with `#[columnar(vec)]` and/or `#[columnar(map)]`) - is never serialized directly; instead, rows are only encoded as part of a - container and become one column per field. -- No field names or strategy identifiers are stored on wire. The schema (Rust - type with its `#[columnar(...)]` attributes) determines how to interpret each - element. +## Primitive Encodings -Postcard provides the basic representation for integers, bytes, sequences, and -tuples. Specifically: +- Unsigned integer `uN`: LEB128 varint, 7 payload bits per octet. The MSB of + each octet is the continuation bit: 1 means more octets follow; 0 means last. + Examples: `0 → 00`, `1 → 01`, `127 → 7F`, `128 → 80 01`. +- Signed integer `iN`: ZigZag, then LEB128. ZigZag maps `…,-2,-1,0,1,2,…` to + `…,3,1,0,2,4,…` via `u = (x << 1) ^ (x >> (N-1))`. + Examples: `0 → 00`, `-1 → 01`, `1 → 02`, `-2 → 03`, `2 → 04` (all as 1‑octet varints). +- Boolean: single octet `00` (false) or `01` (true). +- Byte string (`bytes`/`Vec`): `len: varint` followed by `len` octets. +- UTF‑8 string: `len: varint` followed by `len` UTF‑8 octets. +- Sequence (`Vec`): `len: varint` then `len` elements, each encoded as `T`. +- Option: one varint tag, then optional payload. `0 = None`, `1 = Some`, + and when tag is `1`, the bytes of `T` follow immediately. -- Integers are varint‑encoded; signed integers use ZigZag. -- A “bytes” value is encoded as a length (varint) followed by that many raw - octets. -- A sequence is encoded as its length (varint) followed by each element in - order. +Notes: -Serde‑columnar uses postcard’s “bytes” to embed codec output, and postcard’s -sequences to arrange columns and optional field mappings. +- When an encoded integer does not fit the consumer’s target type, the decoder + must raise an error. +- This format does not embed type or schema tags; decoders must know the + expected types from the schema the data was produced with. ## Containers and Layout @@ -57,102 +63,95 @@ Two container kinds are supported: list‑like and map‑like. Given a struct `Row` annotated with `#[columnar(vec)]` and a field `data: Vec` in a table struct annotated with `#[columnar(ser, de)]`, the value of -`data` is encoded as a sequence of columns, one per field of `Row`: +`data` is a sequence organized as follows: -``` data := SEQ( - COL_0_bytes, - COL_1_bytes, - ..., - COL_{F-1}_bytes, - (opt_index, opt_COL_bytes)* // 0 or more optional fields + COL_0, COL_1, ..., COL_{F-1}, // F non‑optional fields of Row + (opt_index, opt_COL_bytes)* // 0+ optional fields by mapping ) -``` -- `F` is the count of non‑optional, non‑skipped fields in `Row`. -- Each `COL_i_bytes` is a postcard “bytes” element produced by the selected - codec for that column (see Codecs). -- Optional row fields are not placed positionally. Instead, each present - optional field is appended as a pair `(index: usize, bytes: Vec)`, where - `index` is the stable integer specified in the schema - `#[columnar(optional, index = N)]` and `bytes` is the codec output for that - optional column. +- `F` is the number of non‑optional, non‑skipped fields in `Row`. +- Each `COL_i` is a byte string: the codec output for that column, encoded as a + byte string (length varint + bytes). See “Columns and Codecs”. +- Optional fields of `Row` are not stored positionally. For each present + optional field, append a pair `(index, bytes)`: + - `index`: unsigned varint (the stable field index from + `#[columnar(optional, index = N)]`). + - `bytes`: byte string containing the codec output for that optional column. Decoding: -1. Read the `F` non‑optional columns in order. -2. Read zero or more mapping entries `(index, bytes)` until the sequence ends. -3. For each optional field of `Row`: - - If present in the mapping: decode that column from its `bytes`. - - Otherwise: synthesize a column of length `L` (the max length among decoded - columns) filled with `Default::default()`. -4. Reconstruct rows by zipping the per‑field columns element‑wise. +1. Read the `F` non‑optional column byte strings in order and decode each. +2. Read zero or more `(index, bytes)` pairs until the sequence ends, and decode + any optional columns present. +3. For every optional field absent from the mapping, synthesize a column of + length `L` (the maximum length among decoded columns) filled with the type’s + default value. +4. Reconstruct rows by zipping columns element‑wise. ### Map‑like containers (`class = "map"`) Given a struct `Row` annotated with `#[columnar(map)]` and a field -`data: Map` in a table struct, the value of `data` is encoded as: +`data: Map` in a table struct, the value of `data` is: -``` data := SEQ( - KEYS: Vec, // postcard Vec - COL_0_bytes, COL_1_bytes, ..., COL_{F-1}_bytes, - (opt_index, opt_COL_bytes)* + KEYS, // Vec as a sequence + COL_0, COL_1, ..., COL_{F-1}, // F non‑optional fields of Row + (opt_index, opt_COL_bytes)* // 0+ optional fields by mapping ) -``` -The columns correspond to the value type `Row` in the same way as the vec case. -Keys are serialized once up front. During decoding, `len(KEYS)` determines the -expected number of elements per column. +- `KEYS` is a `Vec` encoded as a sequence (length varint + each `K`). +- Column handling matches the vec case. `len(KEYS)` determines the expected + number of elements per column. -Reconstruction zips the keys with the reconstructed rows to produce the map. +Reconstruction zips `KEYS` with the reconstructed rows to build the map. ### Tables (top‑level structs) -A table struct annotated with `#[columnar(ser, de)]` is encoded as a postcard -sequence of its fields in declaration order. For each field: +A table struct annotated with `#[columnar(ser, de)]` is a sequence of its fields +in declaration order. For each field: -- If `class = "vec"` or `class = "map"`: the field is encoded using the - container layouts above as a nested sequence value. -- Otherwise: the field is encoded with postcard normally. +- If `class = "vec"` or `class = "map"`: encode using the container layouts + above as a nested sequence value. +- Otherwise: encode the value directly using the primitive rules in this spec. -Optional table fields are appended after all non‑optional fields as mapping -pairs `(index: usize, bytes: &[u8])`, mirroring the container behavior. Missing -optionals decode to `Default::default()`. +Optional table fields are not stored positionally. After all non‑optional +fields, append `(index, bytes)` pairs where `index` is the stable field index +and `bytes` is a byte string containing the field encoded per this spec. +Missing optionals decode to the type’s default value. ## Columns and Codecs -Each field of a row becomes a column. The schema controls which codec is used by -annotating the field with `#[columnar(strategy = "...")]`. If no strategy is -specified, a generic codec is used. - -On wire, every column is carried as a postcard “bytes” element. No strategy tag -is stored; the schema determines how to interpret the bytes. +Each field of a row becomes an independent column. The schema selects the codec +with `#[columnar(strategy = "...")]`. If unspecified, the “Generic” codec is +used. On wire, the result of a codec is carried as a byte string (length varint +then raw bytes). No strategy tag is stored; the schema determines how to decode. ### Generic (no strategy) -Encodes the raw vector of field values using postcard: `Vec` → `bytes`. This -does not compress and is the fallback for complex types and nested containers -(e.g., a `Vec` or `Map` nested inside a row). +Encodes the raw vector of field values as a sequence: `Vec` → `len: varint` +then `len` elements of `T`, each encoded using the primitive rules in this +spec. This is the fallback for complex types and nested containers. ### RLE (`strategy = "Rle"`) -General run‑length encoding for any `T` that implements Serde, `Clone`, and -`PartialEq`. +General run‑length encoding for any element type `T`. -Column bytes are a concatenation of runs encoded as postcard values: +Column bytes are a concatenation of runs — there is no outer length header. Each +run starts with a ZigZag+varint `count: isize`: -- Repeated run: `(len: isize > 0)` followed by a single `value: T`. -- Literal run: `(len: isize < 0)` followed by `-len` values of type `T`. +- Repeated run: `count > 0`. Next are the bytes of a single `value: T`. +- Literal run: `count < 0`. Next are the bytes of exactly `-count` values of + type `T`, back‑to‑back. -Decoding appends `len` copies of `value` for repeated runs and replays the -exact sequence for literal runs. A safety limit of `MAX_RLE_COUNT = 1_000_000_000` -is enforced to reject obviously invalid inputs. +`count = 0` is invalid. Decoding repeats the single value for repeated runs and +copies the literal sequence for literal runs. A safety limit of +`MAX_RLE_COUNT = 1_000_000_000` applies. ### Delta‑RLE (`strategy = "DeltaRle"`) @@ -170,57 +169,53 @@ decoder and converts back to the target integer type, erroring on overflow. ### Bool‑RLE (`strategy = "BoolRle"`) -Specialized for booleans. The column bytes are a postcard sequence of `usize` -counts. Decoding proceeds as follows: +Specialized for booleans. The column bytes are a concatenation of unsigned +varint counts — there is no outer length header. Decoding: -1. Initialize `value = true` and `count = 0`. -2. For each `n` read: - - Set `value = !value` (toggle). - - Emit `n` copies of `value`. +1. Initialize `last = true`, `count = 0`. +2. Repeatedly read a varint `n`: + - Toggle `last = !last`. + - Emit `n` copies of `last`. +3. Stop at end of input. A safety limit guards against pathological inputs. -The encoder chooses counts so that the first toggle yields the first run’s -boolean. For example, the sequence `true, true, false, false, false` encodes as -counts `[0, 2, 3]`. +The encoder emits a leading `0` count when the first run is `true`. Example: -Like RLE, a safety limit guards against pathological inputs. +- Values: `true, true, false, false, false` +- Counts: `[0, 2, 3]` +- Column bytes (hex): `00 02 03` ### Delta‑of‑Delta (`strategy = "DeltaOfDelta"`) -Compact bit‑packed encoding for timestamp‑like series with approximately -constant step. The first element is stored verbatim; subsequent steps encode the -second difference (`Δ² = (v_i - v_{i-1}) - (v_{i-1} - v_{i-2})`). +Compact bit‑packed encoding for timestamp‑like `i64` series with approximately +constant step. The first element is stored verbatim; subsequent elements encode +the second difference `Δ² = (v_i - v_{i-1}) - (v_{i-1} - v_{i-2})`. -Column bytes have the following structure: - -``` -bytes := postcard(Option) // head value (Some(first), or None if empty) - octet // bits_used_in_last_byte (1..=8, 8 means a full byte) - bitstream // big‑endian packed Δ² codes -``` +Column bytes: -The bitstream encodes each Δ² using a prefix class and payload: +- Head: `Option` (see “Primitive Encodings”). `Some(first)` when non‑empty, + `None` when empty. +- Trailer: one octet `U` giving the number of valid bits in the final data + octet, where `U ∈ {0,1,…,8}`. `U = 0` means the bitstream is empty; `U = 8` + means the last data octet is fully used. +- Bitstream: big‑endian bit‑packed Δ² codes, appended MSB‑first across octets. -- `0` → Δ² = 0 (1 bit total) -- `10` + 7 bits unsigned → Δ² ∈ [−63, 64] -- `110` + 9 bits unsigned → Δ² ∈ [−255, 256] -- `1110` + 12 bits unsigned → Δ² ∈ [−2047, 2048] -- `11110` + 21 bits unsigned → Δ² ∈ [−(2²⁰−1), 2²⁰] -- `11111` + 64 bits unsigned → Δ² as 64‑bit two’s‑complement +Δ² code classes (prefix then payload): -In each non‑zero class, the stored payload is `(Δ² + bias)` where the bias is -63, 255, 2047, or `(2²⁰ − 1)` respectively. Bits are appended MSB‑first, -spanning octet boundaries as needed. The single‑octet `bits_used_in_last_byte` -acts as a tail marker: when the last code ends exactly on a byte boundary, it -is set to `8`. +- `0` → Δ² = 0 (1 bit total) +- `10` + 7 bits unsigned → Δ² ∈ [−63, 64] (store `Δ² + 63`) +- `110` + 9 bits unsigned → Δ² ∈ [−255, 256] (store `Δ² + 255`) +- `1110` + 12 bits unsigned → Δ² ∈ [−2047, 2048] (store `Δ² + 2047`) +- `11110` + 21 bits unsigned → Δ² ∈ [−(2²⁰−1), 2²⁰] (store `Δ² + (2²⁰−1)`) +- `11111` + 64 bits two’s‑complement → Δ² as signed 64‑bit -Decoding reconstructs values by: +Decoding: -1. Reading the head `Option` to seed `prev` and `prev_delta`. -2. Repeating: read a code; if class `0`, set `prev += prev_delta`; else read the - payload, unbias to get Δ², update `prev_delta += Δ²`, then `prev += - prev_delta`. -3. Stop when there are no more bits in the bitstream. +1. Read head `Option`. If `None`, the column is empty. If `Some(x)`, set + `prev = x` and `prev_delta = 0`, and yield `x` as the first value. +2. While bits remain: read a class per the prefixes above. If class `0`, set + `prev += prev_delta`. Otherwise, read the payload, unbias to get `Δ²`, then + `prev_delta += Δ²` and `prev += prev_delta`. Yield `prev` each time. ## Optional Fields and Compatibility @@ -238,8 +233,8 @@ Rules: On wire, optional fields are omitted from the positional portion and instead encoded as `(index, bytes)` pairs after all non‑optional elements. Decoders that -do not know a field’s `index` will ignore it. Decoders that expect a field that -was not sent will use `Default::default()` for that field. +do not know a field’s `index` ignore it. Decoders that expect a field that was +not sent use `Default::default()` for that field. This makes adding, removing, or reordering optional fields forward‑ and backward‑compatible, as long as each field’s binary representation remains @@ -251,7 +246,7 @@ not). Fields annotated with `#[columnar(borrow)]` are deserialized by borrowing from the input buffer where possible (e.g., `Cow<'de, str>` and `Cow<'de, [u8]>`). -This does not change the on‑wire format; it affects only how bytes are mapped to +This does not change the on‑wire format; it only affects how bytes are mapped to Rust values during deserialization. @@ -283,6 +278,28 @@ The on‑wire layout is identical to the non‑iterable case. - `DeltaOfDelta` rejects inputs with insufficient header/trailer bytes. +## Worked Examples + +These examples use the primitive rules above, so they can be reproduced without +Rust or Postcard. + +- Bool‑RLE column for values `true, true, false, false, false`: + - Counts: `[0, 2, 3]` + - Column bytes: `00 02 03` + +- Encoding a simple vec‑like container with a single boolean column using + Bool‑RLE. Let `rows: Vec` where `Row { #[columnar(strategy = "BoolRle")] b: bool }` + and values are the five booleans above. The container is a sequence with one + element (one column). That element is a byte string carrying the Bool‑RLE + bytes. Therefore the container bytes are: + - Sequence length: `01` + - Column byte string length: `03` + - Column payload: `00 02 03` + - Full bytes (hex): `01 03 00 02 03` + +Implementations can use this as a cross‑check. + + ## Worked Example (Informal) Consider these types: @@ -336,10 +353,9 @@ Option` will read or default it independently. ## Summary -- Tables serialize as postcard sequences of fields; container fields embed a - columnar layout. -- Row fields become independent columns; codecs emit postcard “bytes”. -- Optional fields are appended as `(index, bytes)` pairs, enabling - backward‑/forward‑compatible evolution. +- Tables are sequences of fields; container fields embed a columnar layout. +- Row fields become independent columns; codecs emit byte strings. +- Optional fields are appended as `(index, bytes)` pairs for + backward/forward‑compatible evolution. - Iteration is a decoding optimization; the wire format remains the same.