I started writing the security considerations around avoiding nonce
repetition with AES-GCM and concluded that it just can't be done
reliably. 96 bits is impossibly short in the setting we have to deal
with.
Since the server is stateless, the number of nonces that we need to
able to generate without a repetition isn't limited to the number of
requests that a "reasonable" client would make. An adversary can
capture a request and replay it as many times as the server can keep
up with.
So random nonces are right out: 96 bits means a 2**48 birthday bound
and that's far too low.
If servers run in load balanced clusters that share keys, each server
needs its own "namespace". We don't want to rely on admins to manually
assign a unique number to each server in the cluster, so maybe we
could the MAC address of a network interface. That chews up a lot of
bits right there, but maybe it's still okay.
What happens if a server crashes and loses the state of its nonce
counter? DJB's solution to this (I can't find the cite at the moment)
was to maintain a counter value on disk that's always well ahead of
highest nonce you've used. Make sure an updated value gets flushed to
disk before you reach the old value. But this is highly subject to
administrator error: what if the disk fails and its contents are
restored from backup?
Could we use timestamps? That's brittle too. It means servers can't
serve secure traffic until they've synchronized their clock, which
maybe isn't the worst thing but it's hardly ideal. For clients it's
even dicier.
My conclusion is that we just have to move away from AES_128_GCM and
choose something that has a longer nonce and/or is misuse-resistant.
Our choices, unfortunately are a bit limited.
Here's the list of everything that's been assigned an identifier by
IANA: http://www.iana.org/assignments/aead-parameters/aead-parameters.xhtml
All the AES-CCM modes have the same problem as AES-GCM. The AES-OCB
modes have an N_MAX of 15, which means a 2**60 birthday bound. That's
better than 2**48 but still pushing our luck. CHACHA20_POLY1305 has
N_MAX 12, which is particularly gauling, both that XChaCha20 doesn't
seem to be a thing, and that the CFRG chose ChaCha20 rather than
XSalsa20.
That leaves the AES_SIV_CMAC family, which seems to offer all the
properties we want: misuse-resistance and unlimited nonce length. The
construction has a security proof and is well-studied. Unfortunately,
there seem to be very few implementations: the only one I can find is
part of hostapd and I have no idea if it has ever been audited (not
that this would be a huge deal to do -- it's 188 lines of pretty
straightforward code).
As I mentioned to Aanchal earlier today, there's also
https://tools.ietf.org/html/draft-irtf-cfrg-gcmsiv-01 which provides
similar misuse-resistance properties with better performance. N_MAX is
16, which combined with misuse-resistance is enough to make me
comfortable. But it's still a draft.
So for the time being, AES_SIV_CMAC_256 seems like the best choice
we've got and I am going to update the draft to have it replace
AES_128_GCM as the MTI algorithm. I don't know whether this will be
final. Next week I'll reach out to the CFRG for advice.