Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[package]
name = "signet-libmdbx"
description = "Idiomatic and safe MDBX wrapper"
version = "0.5.0"
version = "0.6.0"
edition = "2024"
rust-version = "1.92"
license = "MIT OR Apache-2.0"
Expand Down Expand Up @@ -41,6 +41,7 @@ thiserror = "2.0.18"
tracing = "0.1.44"

dashmap = { version = "6.1.0", features = ["inline"], optional = true }
auto_impl = "1.3.0"

[features]
default = []
Expand Down
202 changes: 202 additions & 0 deletions ISSUE_BENCH_REGRESSION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
# Issue: Cursor and Iterator Benchmark Regression

the baseline `cursor::traverse::raw` is 606.19 ns.

this branch causes `cursor::traverse::iter::single_thread` performance to
degrade from 784.02 ns to 995.32 ns

and `cursor:traverse::iter` perf to degrade from 847.02 ns to 1,015.2 tns

Given that both of these benchmarks have similar degradation, the root cause
is likely the same.

## Summary

Cursor and iterator benchmarks have regressed. The root cause is mutex lock
acquisition on every cursor operation for synchronized transactions.

## Root Cause

### Hot Path Analysis

Every iterator step calls `execute_op` (iter.rs:215-238):

```rust
fn execute_op(&self) -> ReadResult<KvOpt<'tx, A, Key, Value>> {
let access = self.cursor.access();
access.with_txn_ptr(|txn| { // ← Called for EVERY item
ffi::mdbx_cursor_get(...)
})?
}
```

For `PtrSyncInner` (synchronized transactions), `with_txn_ptr` acquires a
mutex (access.rs:643-653):

```rust
fn with_txn_ptr<F, R>(&self, f: F) -> MdbxResult<R> {
let timeout_flag = self.lock(); // ← MUTEX LOCK per item
if *timeout_flag {
return Err(MdbxError::ReadTransactionTimeout);
}
let result = f(self.txn);
Ok(result)
}
```

### Impact

With 100 items in the benchmark:

- **100+ mutex lock/unlock cycles** in a tight loop
- Each lock involves atomic operations and potential thread contention
- Completely dominates the actual FFI call time

### Why single_thread Benchmarks Are Faster

`TxUnsync` uses `RoGuard` which has no mutex (access.rs:396-415):

```rust
// RoGuard::with_txn_ptr - no mutex, just Arc operations
fn with_txn_ptr<F, R>(&self, f: F) -> MdbxResult<R> {
if let Some(strong) = self.try_ref() { // Arc upgrade (atomic, no mutex)
return Ok(f(strong.ptr));
}
Err(MdbxError::ReadTransactionTimeout)
}
```

Arc atomic operations are much cheaper than mutex lock/unlock.

### Comparison with Raw FFI

The raw benchmark (cursor.rs:219-238) shows baseline performance:

```rust
while mdbx_cursor_get(cursor, &mut key, &mut data, MDBX_NEXT) == 0 {
// No locking, no checks, just FFI
}
```

## Benchmark Structure

```
benches/cursor.rs:
cursor::traverse::iter - sync tx, uses PtrSyncInner (slow)
cursor::traverse::iter_x3 - sync tx, 3 iterations
cursor::traverse::for_loop - sync tx, explicit loop
cursor::traverse::raw - raw FFI baseline (fast)
cursor::traverse::iter::single_thread - unsync tx, RoGuard (faster)
cursor::traverse::iter_x3::single_thread - unsync tx, 3 iterations
cursor::traverse::for_loop::single_thread - unsync tx, explicit loop
```

## Potential Fixes

### Option 1: Guarded Iteration

Hold lock for entire iteration session:

```rust
impl Iter {
fn with_guard<F, R>(&mut self, f: F) -> MdbxResult<R>
where
F: FnOnce(&mut Self) -> R,
{
let _guard = self.cursor.access().try_guard()?;
Ok(f(self))
}
}

// Usage: hold lock, iterate all items
cursor.iter().with_guard(|iter| {
for item in iter { ... }
})
```

### Option 2: Cached Validity Check

Cache timeout check result per iteration batch:

```rust
struct Iter {
// ... existing fields
validity_token: Option<ValidityToken>,
}

impl Iter {
fn execute_op(&mut self) -> ReadResult<...> {
// Revalidate periodically, not every call
if self.validity_token.is_none() || self.validity_token.expired() {
self.validity_token = Some(self.cursor.access().validate()?);
}
// Fast path: use cached pointer
unsafe { ffi::mdbx_cursor_get(...) }
}
}
```

### Option 3: Lock-Free Fast Path for RO

For read-only transactions that haven't timed out, skip locking:

```rust
fn with_txn_ptr<F, R>(&self, f: F) -> MdbxResult<R> {
// Fast path: check timeout flag without lock (relaxed read)
if !self.timeout_flag.load(Ordering::Relaxed) {
return Ok(f(self.txn));
}
// Slow path: acquire lock, recheck
let timeout_flag = self.lock();
if *timeout_flag {
return Err(MdbxError::ReadTransactionTimeout);
}
Ok(f(self.txn))
}
```

**Warning**: This has subtle correctness implications. The monitor sets the
flag while holding the lock, so a relaxed read could see stale data. Would
need careful analysis.

### Option 4: Batch Operations

Add batch cursor operations that acquire lock once:

```rust
impl Cursor {
fn collect_n<Key, Value>(&mut self, n: usize) -> MdbxResult<Vec<(Key, Value)>> {
self.access().with_txn_ptr(|txn| {
let mut results = Vec::with_capacity(n);
for _ in 0..n {
// All FFI calls under single lock
match unsafe { ffi::mdbx_cursor_get(...) } {
0 => results.push(decode(...)),
MDBX_NOTFOUND => break,
err => return Err(err),
}
}
Ok(results)
})
}
}
```

## Key Files

- `src/tx/iter.rs:215-238` - execute_op, the hot path
- `src/tx/access.rs:643-653` - PtrSyncInner::with_txn_ptr (mutex)
- `src/tx/access.rs:396-415` - RoGuard::with_txn_ptr (no mutex)
- `benches/cursor.rs` - benchmark definitions

## Recommendation

Option 1 (guarded iteration) is likely the cleanest solution:

- Maintains correctness guarantees
- Single lock acquisition per iteration session
- Compatible with existing API (can be opt-in)
- Clear semantics: "I'm iterating, don't timeout during this"

The guard would need to prevent timeout while held, similar to how
`SyncTxGuard` already works.
126 changes: 126 additions & 0 deletions ISSUE_READER_CLEANUP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Issue: Readers Not Being Cleaned Up

## Summary

Read-only transactions (readers) are not being properly cleaned up. This
investigation documents the reader lifecycle and potential causes.

## Architecture

### Type Hierarchy

```
TxSync<K>
└── inner: Arc<SyncInner<K>>
├── ptr: PtrSync<K>
│ └── inner: Arc<PtrSyncInner<K>> ← actual txn pointer lives here
└── db_cache: SharedCache
```

### Reader Tracking (read-tx-timeouts feature)

When `read-tx-timeouts` is enabled, the `TxnManager` maintains:

- `active: DashMap<usize, (PtrSync<RO>, Instant, Option<Backtrace>)>` - currently
active RO transactions
- `timed_out_not_aborted: DashSet<usize>` - transactions that timed out but
user hasn't dropped yet

## Lifecycle

### Creation (sync.rs:567-585)

```rust
impl TxSync<RO> {
pub(crate) fn new(env: Environment) -> MdbxResult<Self> {
// 1. Create raw MDBX transaction via FFI
mdbx_txn_begin_ex(..., &mut txn, ...);

// 2. Wrap in TxSync
let this = Self::new_from_ptr(env, txn);

// 3. Clone PtrSync and store in tracking map
#[cfg(feature = "read-tx-timeouts")]
this.env().txn_manager().add_active_read_transaction(txn, this.inner.ptr.clone());
// ^^^^^^^^^^^^^^^^^^^^
// Arc<PtrSyncInner> refcount becomes 2

Ok(this)
}
}
```

### Expected Cleanup Path

1. User drops `TxSync`
2. `Arc<SyncInner>` refcount → 0
3. `SyncInner::drop` runs (sync.rs:325-340):
- Calls `remove_active_read_transaction(ptr)` → removes from map
- Map's `PtrSync` clone is dropped → Arc refcount decreases
4. `SyncInner.ptr` (PtrSync) is dropped → Arc refcount → 0
5. `PtrSyncInner::drop` runs (access.rs:667-692):
- Calls `remove_active_read_transaction` again (no-op, already removed)
- Calls `mdbx_txn_abort(ptr)` → **reader slot released**

### Timeout Path (txn_manager.rs:231-318)

The timeout monitor thread:

1. Iterates `active` transactions
2. For transactions exceeding `max_duration`:
- Acquires lock on `PtrSyncInner`
- Calls `mdbx_txn_reset(ptr)` - **parks reader, does NOT release slot**
- Sets timeout flag to `true`
- Removes from `active` map
- Adds to `timed_out_not_aborted` set

**Critical**: `mdbx_txn_reset` parks the reader but the reader slot remains
occupied. Only `mdbx_txn_abort` releases it.

## Potential Issues

### 1. Timed-out transactions with live handles

If a transaction times out but the user still holds `TxSync`:

- Monitor calls `mdbx_txn_reset` (reader parked, slot occupied)
- Monitor removes from `active`, adds to `timed_out_not_aborted`
- User still holds `TxSync` → `PtrSyncInner` still alive
- Reader slot stays occupied until user drops `TxSync`

If users leak `TxSync` handles (never drop them), reader slots accumulate.

### 2. Arc reference in map prevents cleanup

The map stores `PtrSync<RO>` which holds `Arc<PtrSyncInner<RO>>`. If
`SyncInner::drop` fails to remove from the map for any reason, the Arc
reference keeps `PtrSyncInner` alive, preventing `mdbx_txn_abort`.

### 3. Double-removal timing

Both `SyncInner::drop` and `PtrSyncInner::drop` call
`remove_active_read_transaction`. This is intentional (belt and suspenders)
but relies on the map operations being idempotent.

## Key Files

- `src/tx/sync.rs:567-585` - TxSync<RO>::new, adds to tracking
- `src/tx/sync.rs:325-340` - SyncInner::drop, removes from tracking
- `src/tx/access.rs:667-692` - PtrSyncInner::drop, calls mdbx_txn_abort
- `src/sys/txn_manager.rs:156-176` - add/remove tracking methods
- `src/sys/txn_manager.rs:231-318` - timeout monitor thread

## Investigation Steps

1. Add tracing to `add_active` and `remove_active` to verify calls match up
2. Check `Arc::strong_count` on `PtrSyncInner` at key points
3. Monitor `timed_out_not_aborted.len()` over time
4. Check if `TxSync` handles are being leaked (never dropped)
5. Verify `mdbx_txn_abort` is actually being called in `PtrSyncInner::drop`

## Questions to Answer

- Are `TxSync` handles being leaked somewhere?
- Is `SyncInner::drop` running for all transactions?
- Is `remove_active_read_transaction` succeeding?
- What's the state of `timed_out_not_aborted` over time?
4 changes: 2 additions & 2 deletions benches/transaction.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ fn bench_get_rand_sync(c: &mut Criterion) {
b.iter(|| {
let mut i = 0usize;
for key in &keys {
i += *txn.get::<ObjectLength>(db.dbi(), key.as_bytes()).unwrap().unwrap();
i += *txn.get_owned::<ObjectLength>(db.dbi(), key.as_bytes()).unwrap().unwrap();
}
black_box(i);
})
Expand All @@ -75,7 +75,7 @@ fn bench_get_rand_unsync(c: &mut Criterion) {
b.iter(|| {
let mut i = 0usize;
for key in &keys {
i += *txn.get::<ObjectLength>(db.dbi(), key.as_bytes()).unwrap().unwrap();
i += *txn.get_owned::<ObjectLength>(db.dbi(), key.as_bytes()).unwrap().unwrap();
}
black_box(i);
})
Expand Down
Loading