python3 -m venv venv
source venv/bin/activate
pip install -m requirements.txt
-
Introduction to XOR between strings
-
Ahh. Crypto! To break it, loop through all 256 possible bytes to find which of the keys produces the decrypted text that looks the most like English.
frequency.py has code to score how much a string looks like English (testing how close to a certain distribution our data is).
xor.py has code to encrypt/decrypt using a single-byte XOR cipher.
-
4. Detect single-character XOR
Very similar to challenge #3. Break all ciphertexts, pick the one with the best "English score".
-
5. Implement repeating-key XOR
Simply cycle the key when XOR-ing with the plaintext.
-
Interesting! Breaking it amounts to breaking a single-byte XOR cipher for the 1st character of every repeated key, then for the 2nd, the 3rd, etc.
Guessing the keysize is done by picking the keysize that minimizes the normalized hamming distance (
distance / keysize
) between "blocks" of the repeated key. This is because we can expect 2 different blocks encrypted with the right keysize to have similar bit patterns (e.g. matching characters), so minimizing the normalized hamming distance gives us the keysize that produces the most similar blocks. We need to normalize because e.g. a keysize of 4 would have 2 times as many bits as a keysize of 2. -
Intro to using the pycrypto API. aes.py has the code that handles AES encryption/decryption.
-
ECB mode encrypts the blocks independently. This means that two identical blocks of plaintext at different positions will produce the same ciphertext block. All that we need to do is find the ciphertext that has the most (or any, really) duplicate ciphertext blocks, which is very unlikely with 16 bytes of otherwise random data.
-
We pad the plaintext so that it has a length that is a multiple of the blocksize (e.g. 16). If it already is, we add a full block of padding. The padding byte to use is equal to the length of the padding.
Examples:
pad('0123456789') = '0123456789' + '666666' pad('') = '\x10' * 16
-
Wikipedia's image on the matter is enough of a description to implement this. Basically, every encrypted block is XORed with the previous block (or the IV if this is the first block). The result is that 2 identical plaintext blocks will no longer automatically encrypt to the same ciphertext block (compared to ECB).
-
11. An ECB/CBC detection oracle
When using ECB, two identical plaintext blocks will encrypt to the same ciphertext block.
Therefore, a block that contains duplicate plaintext blocks will contain duplicate ciphertext blocks once encrypted.
Our oracle checks if the ciphertext contains duplicate blocks. If it does, we consider the ciphertext to be encrypted using ECB. Otherwise, we consider that it used CBC.
-
12. Byte-at-a-time ECB decryption (Simple)
Detecting the blocksize is relatively easy. We start by noting
len(ciphertext)
. We continue adding prefix bytes until we notice a change inlen(ciphertext)
. When that happens, it means that we have crossed a block boundary and we know the blocksize to belen(after) - len(start)
.Using the ECB property that we exploited earlier (identical plaintext blocks encrypt to identical ciphertext blocks), we can bruteforce individual bytes at the end of blocks.
For example, assume that we want to bruteforce the (unknown) plaintext:
hello this is a test
.At first, we want to isolate the first character (
h
, but we don't know that yet) at the end of a block, through a prefix ofA
s. Like this:AAAAAAAAAAAAAAAh
We get the encrypted result of this (call this
C0
). We can try all possible bytes that could follow our prefix, like the following:AAAAAAAAAAAAAAAa AAAAAAAAAAAAAAAb AAAAAAAAAAAAAAAc ...
Eventually, we will get an encrypted block that will match
C0
, and thus we will know the first byte of our plaintext.We can bruteforce the second byte (
e
) similarly. We start by gettingC1
by encrypting:AAAAAAAAAAAAAAhe
Then by trying all possible bytes:
AAAAAAAAAAAAAAha AAAAAAAAAAAAAAhb AAAAAAAAAAAAAAhc ...
Until we find a block that encrypts to
C1
. And so on, for all bytes in the first block.For the following blocks, the idea is the same, but a little more thought must be put into where we get our
Ci
and what should be put as a prefix.The block
Ci
is merely the index of the block we already are at with the next byte that we want to bruteforce. The reason is quite simple: with our padding, we will only be pushing that byte at the end of its current block, not produce a next block.The padding only has to be enough to put the next bruteforced byte at the end of a block. When bruteforcing, however, the bytes before the tried byte must be the last 15 (blocksize - 1) known bytes, to fit
Ci
.This is all clearer through an example:
plaintext: hello this is a (...unknown....) |---block 0----||---block 1----| bruteforcing: t (unknown yet) We will pad with As: padded: AAAAAAAAAAAAAAAhello this is a t(...unknown....) |---block 0----||---block 1----||---block 2----| Ci = block 1 Bruteforce tries: ello this is a a ello this is a b ello this is a c ...
We repeat the steps up until the point where we reach the last block. Then, we will eventually hit the padding.
If the padding happens to be 1 byte, we will successfully find it (this is just like bruteforcing a normal byte that just happens to be
0x01
).If the padding is anything else, we will first successfully "bruteforce" a
0x01
in the plaintext (because we will be padding the plaintext until it is only missing a byte, so it will be padded with0x01
). Then, we will try to bruteforce the next byte, but fail. The reason is that the padding will now be[0x02, 0x02]
, while we are trying[0x01, 0x??]
. Once that happens, we know that we have successfully bruteforced the whole plaintext. We can remove our undesirable bruteforced0x01
padding and enjoy reading our plaintext. ๐ -
Using the same exploitable ECB property as before, we craft multiple messages and pick-and-choose the parts that we want.
Blocks A:
email=aaaaaaaaaa (A0) adminPPPPPPPPPPP (A1) (P is the padding that will be needed as a final block) &uid=10&role=use (A2) rPPPPPPPPPPPPPPP (A3)
Blocks B:
email=aaaaaaaaaa (B0) aaa&uid=10&role= (B1) userPPPPPPPPPPPP (B2)
By crafting the ciphertext
B0 + B1 + A1
, we get:email=aaaaaaaaaa (B0) aaa&uid=10&role= (B1) adminPPPPPPPPPPP (A1)
Which grants us admin access to log in.
-
14. Byte-at-a-time ECB decryption (Harder)
This attack is the same as the challenge #12, but with some required initial work, offsets and padding to apply.
We first need to find out how long the prefix is. We do this by generating 2 blocks of fixed data (e.g.
[0xA] * 32
) and gradually increasing the size of the fixed data until we find 2 neighbour duplicate blocks in the ciphertext. This indicates that we have fully padded the last block of the prefix and that we have produced two blocks of our own input after that. To make sure that we don't just have identical blocks because the prefix happened to end with our fixed value (therefore fooling us into thinking that we have padded 1 more byte than we really have), we can try with 2 different fixed values, e.g.[0] * 32
and[1] * 32
.Then, one can use the index where the duplicate blocks begin to find where the first block after the prefix starts. With that information, we can find the amount of padding that was required to pad the prefix to a multiple of blocksize through
len(fixed_data) - 2 * blocksize
. The length of the prefix is thenindex of first of the duplicates - padding length
.With the length of the prefix, we just use our algorithm from challenge #12, but prefixing our input with some padding to pad the prefix to a blocksize multiple. We also need to offset any index in the produced ciphertext by the amount of blocks in the prefix.
-
We get the last byte of the plaintext as
n
and make sure that the lastn
bytes of the plaintext are equal ton
. If it is not the case, raise an exception. -
Start out by encrypting a normal token for a block of 16 bytes. This will be where we will inject our crafted block. Call this encrypted block
current_block
.We want to inject
target = ";admin=true;abc="
.We know that the plaintext block following our input is:
next_plain = ";comment2=%20lik"
.When decrypting with CBC, the following is done:
next_plain = next_block_pre_xor ^ current_block
We can calculate
next_block_pre_xor = next_plain ^ current_block
.We want
next_block_pre_xor ^ crafted_block
to yieldtarget
, so we choose:crafted_block = target ^ next_block_pre_xor
.Then, all we need is to swap
current_block
with ourcrafted_block
to get admin access. The decryption ofcurrent_block
will yield scrambled plaintext, but it is not a problem since it only modifiescomment1
.
-
If we have an oracle that tells us if the padding on a ciphertext is valid or not, we are able to recover the plaintext.
Keep in mind that decryption happens like this:
block_pre_xor xxxxxxxxxxxxxxxx ^ prev_block pppppppppppppppp = plaintext tttttttttttttttt
We can play with the
prev_block
to search for a byte that results in valid padding:block_pre_xor xxxxxxxxxxxxxxx? ^ prev_block pppppppppppppppB = (we bruteforce 'B') plaintext ttttttttttttttt1
Once we have the
prev_block
byte that gives valid padding, we can deducepre_xor
byte?
.Once we know the last
N
bytes of thepre_xor
, we can use that information to craft the ending of the padding in just the right way to bruteforce theN+1
th byte (starting from the right).With all the
pre_xor
bytes recovered from a block, we can XOR them with out realprev_block
to find the plaintext.However, we must keep the following tricky thing in mind:
The last block (and other plaintext blocks, if we're unlucky) can produce two valid
pre_xor
bytes, instead of just one: E.g. block...55555
We'll find these 2 validpre_xor
bytes:- the one that produces
...55551
- the one that produces
...55555
When this happens, we can just alter the 2nd last byte to produce junk, so that only the padding ending with a
1
will pass: E.g. changing5
to4
- the
pre_xor
that produces...55541
passes - the
pre_xor
that produces...55545
does not
- the one that produces
-
18. Implement CTR, the stream cipher mode
There isn't much to say here... We generate a keystream with:
aes(key, nonce + counter)
Where
nonce
andcounter
are 64-bits integers encoded in little-endian andcounter
is the amount of blocks generated so far. -
19. Break fixed-nonce CTR mode using substitutions
We reuse our
english_test
function and pick the key that has the best score. In some cases, we pick a key that is not as good as some others because we notice (by analysis of the plaintexts obtained) that the key is not the right one.As mentioned by the challenge description, this is a bit manual indeed.
-
20. Break fixed-nonce CTR statistically
Very similar to the previous challenge, except that we can heavily reuse code that we've written before.
-
21. Implement the MT19937 Mersenne Twister RNG
This task surprisingly took a little longer than just copying Wikipedia's pseudocode. I spent some time wondering why Python's random would yield different numbers (and state!) from me, so I investigated.
I compared with C++'s std::mt19937, which yielded the same results as me. I couldn't find posts complaining about this specifically so I took a look at CPython's source. It turns out that the underlying C module seeds in a different way from what we see on Wikipedia. It assumes that the seed could be greater than 32-bits and just uses all of the bits of the seed in a different procedure. The result: it ends up with a different state from us.
Just to make sure I looked at how
numpy
's random and it does generate the same state as us when seeding. Therefore, the tests for this one use values that I extracted with C++'s implementation. -
Pretty straightforward. Bruteforce all sensible seeds (a couple of seconds in the past) and pick the one that matches.
-
23. Clone an MT19937 RNG from its output
The state is directly coming from the next output, so it is really easy to recover the next state from a given output.
Essentially, we need to look at
624
transformations of the state to fully recover the MT state:state = [untemper(output) for output in outputs[-624:]]
Our
untemper
needs to reverse the right and left shift operations.Let's look at a case with 8-bits:
y = 10101010
If we're shifting right by 3 bits, we'll do the following:
10101010 y 00010101 y >> 3 10111111 y ^ y >> 3
We notice that the first 3 bits of the result will match the first 3 bits of the original
y
. Then, those original 3 bits will be used (once shifted) to xor with the originaly
. By using the known first 3 bits, we can recover the next 3 bits ofy
by doing this:10111111 y ^ y >> 3 00010100 known_bits >> 3 10101011 (y ^ y >> 3) ^ (known_bits >> 3)
We can then recover the whole
y
this way. The left shift is very similar, but we&
with a constant before xoring. -
24. Create the MT19937 stream cipher and break it
This challenge was very similar to challenge 22. Basically, we bruteforce the seed space to find one that gives an expected decryption/token.
-
25. Break "random access read/write" AES CTR
Here we're doing a terrible mistake: we are reusing our keystream on a different plaintext. Solution? We provide
00000000...
and it gets directly xored with the keystream (yielding the keystream). We xor that with the original ciphertext and we get the secret back! -
This one is really straightforward. We know the plaintext, so we can find the keystream and use it to encrypt our token the way that we want.
-
27. Recover the key from CBC with IV=Key
When we split the obtained ciphertext in 3 blocks:
C1, C2, C3
We can then produce a ciphertext in the following way:
C1, 0, C1
and capture the produced plaintext from the error message.
This means that
P1
is the result ofKEY ^ P3
(sinceP3
is unchanged by xoring with0
). We can recover the key through:P1 ^ P3 = (P3 ^ KEY) ^ P3 = KEY
. -
28. Implement a SHA-1 keyed MAC
Not much to do apart from adding a python sha1 implementation here...
In progress.