Also-rANS: Asymmetric Numeral Systems for entropy coding

rANS is one of a family of entropy coding methods that we can use to compress a stream of symbols losslessly. For some set of symbols s ∈ S s \in S s∈S, w/ accompanying probabilities p s p_s ps, Shannon’s source coding theorem tells us that symbol with probability p s p_s ps carries exactly log ⁡ 2 ( 1 / p s ) \log_2(1/p_s) log2(1/ps) bits of information. A symbol that shows up half the time costs 1 bit. One that shows up a quarter of the time costs 2 bits. One that shows up 3/8 of the time costs log ⁡ 2 ( 8 / 3 ) ≈ 1.415 \log_2(8/3) \approx 1.415 log2(8/3)≈1.415 bits.

Huffman coding can help us cleanly with the first two. But it uses fixed width symbols. The third symbol is going to end up taking 2 2 2 bits to decode. The result is that we can’t get perfect lossless compression.

rANS is one of a set of arithmetic coding methods that can achieve perfect compression, by not using fixed width symbols. Instead, the stream of bits is encoded into an integer, by a series of reversible arithmetic operations.

Encoding a sequence into an integer §

The code is a single integer, call it the state x x x. Each symbol we encode transforms x x x through some arithmetic operation. The operation has to be reversible, because the decoder needs to recover both the symbol and the previous state from the new one.

The specific operation rANS uses is:

x ′ = ⌊ x f s ⌋ ⋅ M + c s + ( x m o d f s ) x' = \left\lfloor \frac{x}{f_s} \right\rfloor \cdot M + c_s + (x \bmod f_s) x ′ = ⌊ f s x ⌋ ⋅ M + c s + ( x mod f s )

where f s f_s fs is the frequency of symbol s s s in our probability table, c s c_s cs is its cumulative frequency (the sum of frequencies of all symbols that come before it in our enumeration), and M M M is the total frequency ∑ s f s \sum_s f_s ∑sfs. For our alphabet, that’s:

Symbol f_s c_s A 4 0 B 3 4 C 1 7

So M = 8 M = 8 M=8.

Let me walk through encoding the sequence A , B , C A, B, C A,B,C starting from some seed state x = 13 x = 13 x=13. Pick 13 arbitrarily for now; we’ll talk about why the seed value matters later.

Encode A A A ( f = 4 f = 4 f=4, c = 0 c = 0 c=0):

x ′ = ⌊ 13 4 ⌋ ⋅ 8 + 0 + ( 13 m o d 4 ) = 3 ⋅ 8 + 0 + 1 = 25 x' = \left\lfloor \frac{13}{4} \right\rfloor \cdot 8 + 0 + (13 \bmod 4) = 3 \cdot 8 + 0 + 1 = 25 x ′ = ⌊ 4 13 ⌋ ⋅ 8 + 0 + ( 13 mod 4 ) = 3 ⋅ 8 + 0 + 1 = 25

Encode B B B ( f = 3 f = 3 f=3, c = 4 c = 4 c=4):

x ′ = ⌊ 25 3 ⌋ ⋅ 8 + 4 + ( 25 m o d 3 ) = 8 ⋅ 8 + 4 + 1 = 69 x' = \left\lfloor \frac{25}{3} \right\rfloor \cdot 8 + 4 + (25 \bmod 3) = 8 \cdot 8 + 4 + 1 = 69 x ′ = ⌊ 3 25 ⌋ ⋅ 8 + 4 + ( 25 mod 3 ) = 8 ⋅ 8 + 4 + 1 = 69

Encode C C C ( f = 1 f = 1 f=1, c = 7 c = 7 c=7):

x ′ = ⌊ 69 1 ⌋ ⋅ 8 + 7 + ( 69 m o d 1 ) = 69 ⋅ 8 + 7 + 0 = 559 x' = \left\lfloor \frac{69}{1} \right\rfloor \cdot 8 + 7 + (69 \bmod 1) = 69 \cdot 8 + 7 + 0 = 559 x ′ = ⌊ 1 69 ⌋ ⋅ 8 + 7 + ( 69 mod 1 ) = 69 ⋅ 8 + 7 + 0 = 559

The state walked 13 → 25 → 69 → 559 13 \to 25 \to 69 \to 559 13→25→69→559. After encoding three symbols, our whole “code” is the single integer 559 559 559. The entire message is packed into that number.

Now the payoff. Let’s check how much the state grew in bits. It started at log ⁡ 2 ( 13 ) ≈ 3.70 \log_2(13) \approx 3.70 log2(13)≈3.70 bits. It ended at log ⁡ 2 ( 559 ) ≈ 9.13 \log_2(559) \approx 9.13 log2(559)≈9.13 bits. We added 5.43 5.43 5.43 bits total. Shannon says our three symbols carry 1 + 1.415 + 3 = 5.415 1 + 1.415 + 3 = 5.415 1+1.415+3=5.415 bits of information. The state grew by almost exactly that much The per-step growth wasn’t exactly 1, 1.415, 3 bits either. Encoding A A A grew the state by log ⁡ 2 ( 25 / 13 ) ≈ 0.94 \log_2(25/13) \approx 0.94 log2(25/13)≈0.94 bits, less than Shannon’s 1 1 1 bit for A A A. Encoding B B B grew it by log ⁡ 2 ( 69 / 25 ) ≈ 1.47 \log_2(69/25) \approx 1.47 log2(69/25)≈1.47 bits, a hair over Shannon’s 1.415 1.415 1.415. These individual wobbles are a quirk of encoding integer states. Each step rounds slightly up or down depending on where x x x lands relative to f s f_s fs. The rANS guarantee is asymptotic, so over many symbols the wobbles average out and the per-symbol cost converges to the Shannon limit..

Importantly, we didn’t round anything to the nearest bit. Encoding B B B grew the state by 1.47 1.47 1.47 bits. Fractional bits!

Where did they come from? The formula again:

x ′ = ⌊ x f s ⌋ ⋅ M + c s + ( x m o d f s ) x' = \left\lfloor \frac{x}{f_s} \right\rfloor \cdot M + c_s + (x \bmod f_s) x ′ = ⌊ f s x ⌋ ⋅ M + c s + ( x mod f s )

To first order, x ′ ≈ x ⋅ M f s = x / P ( s ) x' \approx x \cdot \frac{M}{f_s} = x / P(s) x′≈x⋅fsM=x/P(s). So log ⁡ 2 x ′ \log_2 x' log2x′ grows by approximately log ⁡ 2 ( 1 / P ( s ) ) \log_2(1/P(s)) log2(1/P(s)), which is the Shannon information content of s s s.

One way to think of rANS encoding is as a change of base that smuggles in the symbol along the way. In base f s f_s fs, the number x x x has last digit x m o d f s x \bmod f_s xmodfs and “rest” ⌊ x / f s ⌋ \lfloor x / f_s \rfloor ⌊x/fs⌋. In base M M M, the encoded x ′ x' x′ has the same rest, but the last digit is promoted from a digit in [ 0 , f s ) [0, f_s) [0,fs) to a digit in [ 0 , M ) [0, M) [0,M), specifically c s + ( x m o d f s ) c_s + (x \bmod f_s) cs+(xmodfs). That promotion is where the symbol is injected: the promoted digit lands inside symbol s s s‘s range [ c s , c s + f s ) [c_s, c_s + f_s) [cs,cs+fs). The decoder, which we’ll build next, reads x ′ m o d M x' \bmod M x′modM, sees which range the value falls in, and recovers both the symbol and the remainder.

Decoding reverses it §

To decode, we invert the encoding. Given a state x ′ x' x′, recover the last-encoded symbol and the previous state in three steps:

Read the low digit x ′ m o d M x' \bmod M x ′ mod M . Find the symbol s s s whose range contains it, i.e. c s ≤ x ′ m o d M < c s + f s c_s \leq x' \bmod M < c_s + f_s c s ≤ x ′ mod M < c s + f s . Reconstruct the previous state:

x = ⌊ x ′ M ⌋ ⋅ f s + ( x ′ m o d M − c s ) x = \left\lfloor \frac{x'}{M} \right\rfloor \cdot f_s + (x' \bmod M - c_s) x = ⌊ M x ′ ⌋ ⋅ f s + ( x ′ mod M − c s )

Walking back from our encoded state x ′ = 559 x' = 559 x′=559:

Decode from x = 559 x = 559 x=559:

559 m o d 8 = 7 559 \bmod 8 = 7 559 mod 8 = 7 , which lies in C C C ‘s range [ 7 , 8 ) [7, 8) [ 7 , 8 ) . Symbol is C C C .

, which lies in ‘s range . Symbol is . Previous state: ⌊ 559 / 8 ⌋ ⋅ 1 + ( 7 − 7 ) = 69 ⋅ 1 + 0 = 69 \left\lfloor 559/8 \right\rfloor \cdot 1 + (7 - 7) = 69 \cdot 1 + 0 = 69 ⌊ 559/8 ⌋ ⋅ 1 + ( 7 − 7 ) = 69 ⋅ 1 + 0 = 69 .

Decode from x = 69 x = 69 x=69:

69 m o d 8 = 5 69 \bmod 8 = 5 69 mod 8 = 5 , which lies in B B B ‘s range [ 4 , 7 ) [4, 7) [ 4 , 7 ) . Symbol is B B B .

, which lies in ‘s range . Symbol is . Previous state: ⌊ 69 / 8 ⌋ ⋅ 3 + ( 5 − 4 ) = 8 ⋅ 3 + 1 = 25 \left\lfloor 69/8 \right\rfloor \cdot 3 + (5 - 4) = 8 \cdot 3 + 1 = 25 ⌊ 69/8 ⌋ ⋅ 3 + ( 5 − 4 ) = 8 ⋅ 3 + 1 = 25 .

Decode from x = 25 x = 25 x=25:

25 m o d 8 = 1 25 \bmod 8 = 1 25 mod 8 = 1 , which lies in A A A ‘s range [ 0 , 4 ) [0, 4) [ 0 , 4 ) . Symbol is A A A .

, which lies in ‘s range . Symbol is . Previous state: ⌊ 25 / 8 ⌋ ⋅ 4 + ( 1 − 0 ) = 3 ⋅ 4 + 1 = 13 \left\lfloor 25/8 \right\rfloor \cdot 4 + (1 - 0) = 3 \cdot 4 + 1 = 13 ⌊ 25/8 ⌋ ⋅ 4 + ( 1 − 0 ) = 3 ⋅ 4 + 1 = 13 .

The state walked 559 → 69 → 25 → 13 559 \to 69 \to 25 \to 13 559→69→25→13, the mirror image of the encoding walk, and we’re back at the seed. The decoded symbols appeared in the order C , B , A C, B, A C,B,A.

Note the order. We encoded A , B , C A, B, C A,B,C but decoded C , B , A C, B, A C,B,A. rANS is last-in-first-out. The encoder behaves like a stack, and the decoder pops from it. In practice this means real rANS encoders process their input message in reverse order, so that the decoder, running forward through the compressed state, produces symbols in the correct forward order.

Until now, our compressed “code” has been a single integer x ′ x' x′. That’s mathematically tidy but not directly implementable. For a long input, log ⁡ 2 x ′ \log_2 x' log2x′ grows linearly with the message length, so x ′ x' x′ itself grows exponentially, and no fixed-width integer can hold it.

The fix is renormalization. The idea is we keep the state in a fixed range [ L , b L ) [L, bL) [L,bL) for some chosen L L L and base b b b. Common choices are b = 2 b = 2 b=2 for a bit-level compressed stream or b = 256 b = 256 b=256 for a byte-level one. Before encoding any symbol, check whether encoding would push the state above b L bL bL. If so, shift the low base- b b b digits of x x x out to an output stream, one at a time, until x x x is small enough that encoding will leave the state inside [ L , b L ) [L, bL) [L,bL).

This emitted stream is the actual output of the encoding algorithm. Instead of one giant integer, we get a stream of bits or bytes spilled during encoding, plus a small residual state (always in [ L , b L ) [L, bL) [L,bL)) left over at the end. The compressed output is the concatenation of the two.

Decoding reverses the process. Starting from the residual state, decode a symbol via the formula from the previous section, which shrinks the state by a factor of about 1 / P ( s ) 1/P(s) 1/P(s). If the state has dropped below L L L, refill it by pulling digits from the stream back into the state as low digits, until the state is in [ L , b L ) [L, bL) [L,bL) again. We then just repeat until the stream is empty and the state matches the seed.

Putting it together §

Here is complete rANS encode and decode, in pseudocode. Pick a state range [ L , b L ) [L, bL) [L,bL), share the frequency table { ( f s , c s ) } \{(f_s, c_s)\} {(fs,cs)} and the total M M M between encoder and decoder, and that’s the whole scheme.

def encode (message, freq, M, L, b): x = L # seed state stream = [] for s in reversed (message): # process LIFO f, c = freq[s] while M * x >= b * L * f: # renormalize: spill low digit stream.append(x % b) x //= b x = (x // f) * M + c + (x % f) # apply encode formula return x, stream # residual state + digits def decode (x, stream, freq, M, L, b, n): slot_to_symbol = build_slot_table(freq) # e.g. [A,A,A,A,B,B,B,C] message = [] for _ in range (n): slot = x % M s = slot_to_symbol[slot] message.append(s) f, c = freq[s] x = (x // M) * f + slot - c # apply decode formula while x < L and stream: # renormalize: refill low digit x = x * b + stream.pop() return message

slot_to_symbol is a lookup table of size M M M holding which symbol owns each of the M M M slot positions. For our running alphabet with M = 8 M = 8 M=8, it is the list [ A , A , A , A , B , B , B , C ] [A, A, A, A, B, B, B, C] [A,A,A,A,B,B,B,C] This is an O ( 1 ) O(1) O(1) way of solving the problem of finding the correct range. You could also search for it..

The decode step is cheap. Per symbol it is one mod, one table lookup, one integer division, one multiply, one add, one subtract, and possibly a few digit-reads from the stream. No variable-length prefix code, no bit-by-bit decoding. Each symbol takes near-constant work, which is why rANS and its variants are popular in performance-sensitive compressors.

Encoding and decoding are mirror images. Encoding processes the message in reverse and pushes digits onto the stream. Decoding processes the stream in reverse (via pop ) and produces the message in forward order. The whole scheme is LIFO throughout: in symbols, and in stream digits.

That is rANS. A single integer carries fractional bits of information per symbol, arithmetic operations encode and decode reversibly, and renormalization keeps the whole thing to a bounded state plus a digit stream.