Galois/Counter Mode and random nonces

It turns out you can encrypt more than 2^32 messages with AES-GCM with a random nonce under certain conditions. It’s still not a good idea, but you can just about do it. #cryptography

Galois/Counter Mode, better known as GCM, is very popular way of encrypting data with AES to provide authenticated encryption with associated data (AEAD). Phew, that’s a lot of terminology. Let’s just say that AEAD is the gold standard for data encryption, and AES-GCM is one of the most popular choices—by far the most widely used encryption algorithm on the internet.

Whenever you encrypt a message with GCM, you need to use a unique nonce (number used once). For protocols like TLS, this nonce is generated deterministically: essentially a counter is initialised when the connection is first established and incremented after every message. This is mostly fine for stateful protocols like TLS (except when it isn’t), but is incredibly hard to do in a stateless protocol, where servers and clients may be coming and going, crashing, resuming VMs, etc. Reusing a nonce even once for GCM is absolutely catastrophic, as an observer can then trivially recover the authentication sub-key and probably the message content too.

So the solution that most people use is to use random nonces, created using a CSPRNG. The problem is that GCM’s nonce is only 96 bits long, which means that the probability of two random nonces being the same (a collision) approaches 50% after around 248 messages. 50% is way too high for comfort, so NIST advises to keep the chance of a collision to less than 2-32, or about 1 in 4 billion. That limit comes after 232 messages for GCM: 4 billion again, give or take, and that is the limit NIST imposes for GCM with a random nonce. That sounds like a lot, and for many uses it is, but for some high-frequency usage cases, like Google’s front-end servers, that limit can be reached quite quickly. Google have stated (pp. 4) that when under DDoS attack, their servers may have to produce “several hundred million [encrypted tokens] per second”. Under that load, they would hit the 232 limit in 43 seconds or less! The solution they designed is described in that linked paper: AES-GCM-SIV, which is able to tolerate some number of nonce collisions, but under a weaker notion of security that is only really applicable to that use-case (where the data being encrypted is itself random).

But is GCM limited to a 96-bit nonce in the first place? The answer turns out to be no (with some caveats).

But is GCM limited to a 96-bit nonce in the first place? The answer turns out to be no.

Where does the 96-bit limit come from? GCM is, as the name suggests, based on a simpler mode called Counter Mode (or CTR for short). In CTR mode, a block cipher like AES is used to encrypt a sequence of incrementing counter values: 0, 1, 2, 3, … and so on. This produces a sequence of pseudorandom data (the keystream) that is then bitwise-XORed with the message to encrypt it. AES is a 128-bit block cipher, so the input counter is encoded as 128 bits, or 16 bytes, and it produces 16 bytes of output each time. This means that the counter needs to be encrypted for each (16-byte) block of data encrypted rather than each message. To ensure that no two blocks ever collide, GCM splits the 128-bit counter into two parts: a 96-bit per-message nonce, and a 32-bit block counter. Each message uses a unique nonce, and the block counter is reset to 1 each time (the all-zero counter is used internally) and incremented for each block, allowing a maximum of 232 × 16 = 64GiB per message. This allows the application to simply increment the nonce for each message, without having to keep track of how many blocks of data have been encrypted.

That’s fine for deterministic nonces, but for random nonces segregating the counter space in this way is counter-productive (pun intended!). Unless you are encrypting really large messages (approaching the 64GiB limit), it is generally better to randomise the entire 128-bit counter space. That is, you pick a random starting point in the counter space and then increment that initial counter for each block of the message. This will create “runs” of used-up nonces, spread uniformly around the full 2128 space. Although it may initially seem like this would make collisions more likely compared to having a separate block counter, in fact 2128 is so enormously bigger than 296 that the chance of two “runs” of counters overlapping is vanishingly small until you’ve encrypted a very large number of blocks. In fact, you would hit NIST’s 2-32 probability limit after encrypting around 248 blocks (281 trillion). For small messages, of just a few blocks in length, as in Google’s case, that allows you to encrypt close to 248 messages. For example, suppose those messages are all <= 128 bytes in length. In that case, you could encrypt 245 messages before you hit the limit, which means Google would only need to change the key every 4 days or so, even under large-scale DDoS attack.

OK, very well, you might be saying, but GCM doesn’t support randomising the entire 128-bit counter space, so this is all academic, isn’t it? Well, it turns out that it does. GCM has the strange property that it allows the nonce (IV) to be any length, not just exactly 96 bits. If it is exactly 96 bits, then the initial counter value becomes the nonce concatenated with the 32-bit initial block counter (set to 1). If it is not 96 bits, then the value is first hashed with GHASH, GCM’s universal hash function, and the full 128-bit output becomes the initial counter value:

However, we can’t just assume that the output of GHASH is ideal with respect to collisions, and in fact the analysis in this paper shows that it is not. In the worst case, where messages can be up to 232 -2 blocks long, there is an additional factor of 222 in the security advantage to an attacker. However, if we restrict ourselves to shorter messages then the impact is less severe. For example, taking our example of 128-byte messages and 16-byte nonces, then the probability of a collision after applying GHASH is ≤ 29/2128 = 2-119. In that case, you can encrypt about 243.5 messages (about 12.4 trillion) before you hit NIST’s limit, allowing Google to reuse a key for about 34.6 hours.

Even in the worst case, with messages up to the maximum 64GiB in size, this analysis shows that you can still encrypt 236 messages with a random 128-bit nonce: 16 times as many as with a random 96-bit nonce. For more realistic message sizes, the advantage is greater.

So, technically you can use random nonces with AES-GCM with better bounds than folklore would have you believe. Would I recommend it? No. Even if it technically works, it’s a highly non-standard usage and against the letter of NIST’s recommendations. I wouldn’t spend innovation tokens on this kind of thing. Frankly, if I needed to do something like this, I would derive a fresh key for each message from a long nonce, in the way that XSalsa20 does: using a real pseudorandom function. Libsodium even exposes this functionality as a general-purpose facility.