Add a variable-length integer encoder/decoder #744

lhecker · 2026-01-20T18:46:35Z

For now, this module has no purpose.
I wrote it as an experiment for encoding VM instructions.

DHowett

How do you feel about non-canonical overlong encodings (which is a problem UTF-8 also suffers from)?

that would be something like encoding 0x01 as 0x17 0x00 0x00 0x00, if I have parsed your description correctly.

lhecker · 2026-01-20T20:48:03Z

Yeah, I thought about that. SQLite's varint for instance doesn't support this but has a more efficient encoding. I intentionally decided against that, for one because decoding becomes faster, and also because non-canonical encodings are quite beneficial:

When a LSH instruction jumps further down into the instruction stream, the address offset depends on the number of bytes in-between. That number depends on the encoding size of all the varints in-between. And those in turn could be downward jumps which, again, depend on the encoding size of other varints.

I'm sure I'll come up with a solution to this recursive problem at some point, if I want to. But I'm fairly certain that having non-canonical encodings will allow for easy "tie breakers" for any such algorithm.

DHowett · 2026-01-20T22:35:07Z

crates/stdext/src/varint.rs

+// Copyright (c) Microsoft Corporation.
+// Licensed under the MIT License.
+
+//! Variable-length `u32` encoding and decoding, with efficient storage of `u32::MAX`.


I guess I don't understand - it's not a u32 encodig, it's a u28 encoding with a special case for u32::MAX and a pretty significant gap between 268435455 and 4294967295

Yeah, that's fair. Perhaps I should move this into the lsh project now that I made it a library. 🤔 The reason it's an "u28" is because lsh really doesn't need values >2^28, while an efficient compression for a >2^28 value is still useful (it's used for setting the input offset to max. when matching a .*).

Add a variable-length integer encoder/decoder

7675bd5

DHowett reviewed Jan 20, 2026

View reviewed changes

Fix build

8198b20

lhecker enabled auto-merge (squash) January 20, 2026 20:56

DHowett reviewed Jan 20, 2026

View reviewed changes

DHowett approved these changes Jan 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a variable-length integer encoder/decoder #744

Add a variable-length integer encoder/decoder #744

Uh oh!

lhecker commented Jan 20, 2026

Uh oh!

DHowett left a comment

Uh oh!

lhecker commented Jan 20, 2026

Uh oh!

DHowett Jan 20, 2026

Uh oh!

lhecker Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add a variable-length integer encoder/decoder #744

Are you sure you want to change the base?

Add a variable-length integer encoder/decoder #744

Uh oh!

Conversation

lhecker commented Jan 20, 2026

Uh oh!

DHowett left a comment

Choose a reason for hiding this comment

Uh oh!

lhecker commented Jan 20, 2026

Uh oh!

DHowett Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

lhecker Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants