Skip to content

Conversation

@canadaduane
Copy link
Contributor

This PR replaces #896 , a previous attempt at creating a self-trimming shallow checkout.

I've tried to make each commit in this PR a "logical unit" that can be independently reviewed. It is arranged as a series of these 7 commits:

  1. arena-generation-counter -- Add arena generation counter for handler invalidation
  • Adds an atomic arena_generation: AtomicU64 counter to LoroDocInner
  • Provides arena_generation() getter and bump_arena_generation() method
  • Enables detection of when document internals have been swapped
  1. handler-generation-caching -- Handler generation-based caching
  • Modifies BasicHandler to cache (ContainerIdx, generation) tuple
  • Handlers automatically detect stale ContainerIdx when generation changes
  • Re-resolves ContainerIdx from arena on cache miss (cold path)
  1. swap-internals-from -- Add swap_internals_from to swap internal doc data
  • Adds OpLog::swap_data_with() to swap dag, change_store, history_cache, pending_changes
  • Adds DocState::swap_data_with() to swap document state
  • Adds SharedArena::swap_contents_with() to swap arena contents
  • Adds LoroDocInner::swap_internals_from() to orchestrate the full swap and bump generation
  1. replace-with-shallow-2 -- replace_with_shallow Loro internal function
  • Implements LoroDoc::replace_with_shallow(frontiers) API
  • Exports shallow snapshot at given frontiers → creates temp doc → swaps internals
  • Preserves peer ID, configuration, and handlers across the operation
  • Adds Rust tests for basic functionality, peer ID preservation, and continued editing
  1. subscription-container-id-storage -- Subscription container_id storage
  • Adds extract_all() method to SubscriberSet for extracting active subscriptions
  • Adds container_id_map to Observer to track ContainerIdx → ContainerID mapping
  • Enables looking up stable ContainerID when extracting subscriptions keyed by ContainerIdx
  1. subscription-preservation -- Preserve subscriptions on replace_with_shallow
  • Adds extract_subscriptions() and restore_subscriptions() methods to Observer
  • Preserves recording_diff state in DocState::swap_data_with() so events fire
  • Integrates subscription migration into replace_with_shallow() flow
  • Adds Rust tests for root and container subscription preservation
  1. loro-wasm-replace-with-shallow -- Add replaceWithShallow to loro-wasm
  • Exposes replaceWithShallow(frontiers) method in WASM bindings
  • Adds method to decorateMethods allowlist for proper event flushing
  • Adds comprehensive TypeScript test suite (36 test cases)

Implemented replace_with_shallow method in crates/loro-internal/src/loro.rs that:

- Exports a shallow snapshot at the given frontiers
- Creates a temporary document from the snapshot via LoroDoc::from_snapshot
- Calls swap_internals_from to swap the temp doc's internals into self
Also exposed the method in the public API at crates/loro/src/lib.rs with documentation.

Key Design Decisions
- Builds on existing APIs: Uses export(ExportMode::shallow_snapshot) and from_snapshot rather than implementing custom shallow logic, following zxch3n's guidance for correctness.

- Shallow snapshot semantics: The method exports the current state with history trimmed to start from the given frontiers. It does NOT export the state at those frontiers - this is how Loro's shallow snapshot works.

- Peer ID preservation: The original document's peer ID is preserved after the swap (handled by swap_internals_from).

Tests Added
- test_replace_with_shallow_basic - Verifies document becomes shallow with correct state
- test_replace_with_shallow_preserves_peer_id - Peer ID preservation
- test_replace_with_shallow_can_continue_editing - Can edit and sync after replace
- test_replace_with_shallow_at_intermediate_version - Intermediate version handling
- test_replace_with_shallow_cloned_doc_independence - Cloned docs share the swapped state
**`crates/loro-internal/src/utils/subscription.rs`**:
- Added `extract_all()` method to `SubscriberSet` - extracts all active subscriptions with their callbacks, enabling migration

**`crates/loro-internal/src/subscription.rs`**:
- Added `container_id_map: Mutex<FxHashMap<ContainerIdx, ContainerID>>` to `ObserverInner`
- Modified `subscribe()` to track ContainerID for each ContainerIdx
- Added `get_subscription_container_id(idx)` method for retrieval

1. **Mapping at Observer level**: Keeps SubscriberSet generic while adding Loro-specific tracking
2. **ContainerIdx → ContainerID direction**: Enables looking up the stable ID when extracting subscriptions keyed by ContainerIdx
3. **Thread-safe**: Uses Mutex since subscriptions can be created from multiple threads
Fixed subscription preservation in `replace_with_shallow` by:
1. Adding `extract_subscriptions()` and `restore_subscriptions()` methods to `Observer`
2. Modifying `DocState::swap_data_with()` to preserve the `recording_diff` state
3. Integrating subscription migration into `LoroDocInner::replace_with_shallow()`

1. **ContainerID-based migration**: Subscriptions are extracted with their `ContainerID` (stable identifier), then restored by re-resolving to new `ContainerIdx` values after the arena swap
2. **Detached subscriptions**: Restored subscriptions are detached (no unsubscribe handle returned) - the original user handles become orphaned but harmless
3. **Recording state preservation**: The `event_recorder.recording_diff` flag must be preserved across the swap, otherwise events won't be recorded and subscriptions won't fire

- `test_replace_with_shallow_preserves_subscriptions` - verifies root subscriptions fire after replace
- `test_replace_with_shallow_preserves_container_subscriptions` - verifies container-specific subscriptions fire after replace

- [`crates/loro-internal/src/subscription.rs`](crates/loro-internal/src/subscription.rs) - added `extract_subscriptions()`, `restore_subscriptions()`, `container_id_map` tracking
- [`crates/loro-internal/src/state.rs`](crates/loro-internal/src/state.rs) - fixed `swap_data_with()` to preserve recording state
- [`crates/loro-internal/src/loro.rs`](crates/loro-internal/src/loro.rs) - integrated subscription migration in `replace_with_shallow()`
- [`crates/loro-internal/src/utils/subscription.rs`](crates/loro-internal/src/utils/subscription.rs) - added `extract_all()` method to `SubscriberSet`
? "List"
: T extends LoroCounter
? "Counter"
: "Json" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an unintentional formatting change. Can fix.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0683aeeabd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

let mut other_root_c_idx = other.inner.root_c_idx.lock().unwrap();
std::mem::swap(&mut *self_root_c_idx, &mut *other_root_c_idx);

// Note: We intentionally do NOT swap the str arena or parent_resolver.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Swap string arena when swapping arena contents

unicode_start is recomputed during decode via alloc_str_with_slice (see encoding/outdated_encode_reordered.rs:228-233), but diff calculation later resolves those ranges through oplog.arena.slice_by_unicode (diff_calc.rs:1096-1097). When history before the shallow frontier contains text inserts, keeping the old string arena here means the swapped-in ranges point into the wrong prefix, so checkouts/diffs can return unrelated text after replace_with_shallow. The string arena should be swapped (or rebuilt) along with the rest of the arena contents.

Useful? React with 👍 / 👎.


// Swap the data fields
std::mem::swap(&mut self.frontiers, &mut other.frontiers);
std::mem::swap(&mut self.store, &mut other.store);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore ContainerStore config/peer after swapping stores

Swapping the entire ContainerStore also swaps its doc-level conf/peer fields to the defaults from other, but those settings are supposed to stay with the current doc. After replace_with_shallow, changes to self.config (e.g., clear recording into deleted_root_containers at loro.rs:2008-2012) no longer reach the store’s config used by InnerStore::flush (state/container_store/inner_store.rs:136-138), so cleared root containers can reappear in exported snapshots. The swapped store needs to be rebound to self.config/self.peer, not just self.arena.

Useful? React with 👍 / 👎.

self_state.swap_data_with(&mut other_state);

// Swap Arena contents
self.arena.swap_contents_with(&other.arena);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Migrate subscriptions inside swap_internals_from

Container subscriptions are keyed by ContainerIdx (subscription.rs:33), and those indices depend on registration order (arena.rs:182-185). The shallow-snapshot path registers alive containers by iterating an FxHashSet (state.rs:742-747), so the swapped-in arena can assign different indices for the same ContainerID. Because swap_internals_from doesn’t call extract_subscriptions/restore_subscriptions, using this public API directly with active container subscriptions can leave them pointing at stale indices and stop firing correctly.

Useful? React with 👍 / 👎.

container_id_map.insert(idx, id);
let (sub, enable) = self.inner.subscriber_set.insert(Some(idx), callback);
enable();
sub.detach(); // Detach so it stays active

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve unsubscribe handles when restoring subscriptions

Detaching the newly created subscription discards the only unsubscribe hook tied to the active callback. After extract_all drops the original inner subscriptions, user-held Subscription handles will fail the unsubscribe.upgrade() check in Drop (utils/subscription.rs:531-535), so dropping or unsubscribing them after replace_with_shallow no longer cancels the callback and the subscription leaks. The migration needs to keep a live unsubscribe handle rather than detaching it immediately.

Useful? React with 👍 / 👎.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a valid comment @canadaduane

@canadaduane canadaduane force-pushed the loro-wasm-replace-with-shallow branch from 0683aee to 50aef14 Compare January 24, 2026 02:09

/// Cold path: re-resolve the container index from the arena and update the cache.
#[cold]
fn resolve_and_cache(&self) -> ContainerIdx {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid the risk condition here. Read the generation BEFORE calling register_container, or use a mechanism to atomically ensure the generation hasn't changed during resolution. For example:

fn resolve_and_cache(&self) -> ContainerIdx {
    let mut guard = self.cached_idx.lock().unwrap();
    let gen = self.doc.arena_generation();
    let idx = self.doc.arena.register_container(&self.id);
    // Verify generation hasn't changed
    if gen == self.doc.arena_generation() {
        *guard = (idx, gen);
        idx
    } else {
        drop(guard);
        self.resolve_and_cache() // Retry
    }
}

Copy link
Member

@zxch3n zxch3n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! There are a few things that still need addressing though

}

struct ObserverInner {
subscriber_set: SubscriberSet<Option<ContainerIdx>, Subscriber>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can replace ContainerIdx here with ContainerID directly and avoid using container_id_map?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants