HM74
7 minutes to read
We are given this Verilog hardware description code:
module encoder(
input [3:0] data_in,
output [6:0] ham_out
);
wire p0, p1, p2;
assign p0 = data_in[3] ^ data_in[2] ^ data_in[0];
assign p1 = data_in[3] ^ data_in[1] ^ data_in[0];
assign p2 = data_in[2] ^ data_in[1] ^ data_in[0];
assign ham_out = { p0, p1, data_in[3], p2, data_in[2], data_in[1], data_in[0] };
endmodule
module main;
wire[3:0] data_in = 5;
wire[6:0] ham_out;
encoder en(data_in, ham_out);
initial begin
#10;
$display("%b", ham_out);
end
endmodule
We also have an online instance to connect to:
$ nc 165.232.100.46 31734
Captured: 1001100011100101010011001000110100111110110001111011001111001101110000010011110111011000011101110101001011111111000111010101111000010110010000001111011100110011011101000100001110111100011111000011000001100000000100110100010110010111000011010011011111111100110110100111001010010000110011010110011100011011100101011000011001000111010001111000101110100110000110100100110011111110110001111101101110000111100001010011000001000100101011111111001111001011110010110011001010010001100001001010110111111001110100000010011110000101001110000111111111100010101100100111010111011111100110010111110011000100100111111000000011000110010011011100001000011100001001001111111111010111011100001000011100110111001101010001110010010101011111110001101001001101010110110001100110010100101111101110000100001111010010111111111000111100111000000111011111100001110000111000110101011011101011010011010010111111111110110110011001011001111100110011110011011001011001100100011110010101
Captured
Captured
Captured
Captured
Captured
Plus, this is the description of the challenge:
As you venture further into the depths of the tomb, your communication with your team becomes increasingly disrupted by noise. Despite their attempts to encode the data packets, the errors persist and prove to be a formidable obstacle. Fortunately, you have the exact Verilog module used in both ends of the communication. Will you be able to discover a solution to overcome the communication disruptions and proceed with your mission?
Understanding the challenge
We can assume that the remote instance is running a hardware device (probably an FPGA) with the above Verilog description. Also, we can guess that the device is always trying to send us the flag in binary format. However, due to a noisy channel, we have some errors when receiving and decoding the information.
Therefore, we must find a way to correct the errors and recover the information.
Hardware description analysis
The Verilog file implements an encoder that given 4 bits of information, outputs 7 bits:
$$ \mathrm{en}\left(\begin{bmatrix} d_4 \\ d_3 \\ d_2 \\ d_1 \end{bmatrix}\right) = \begin{bmatrix} p_1 \\ p_2 \\ d_4 \\ p_3 \\ d_3 \\ d_2 \\ d_1 \end{bmatrix} = \begin{bmatrix} d_1 \oplus d_3 \oplus d_4 \\ d_1 \oplus d_2 \oplus d_4 \\ d_4 \\ d_1 \oplus d_2 \oplus d_3 \\ d_3 \\ d_2 \\ d_1 \end{bmatrix} $$
This type of encoding is known as Hamming code. In particular, this implementation is Hamming(7, 4) (that’s why the name of the challenge is “HM74”). These Hamming codes allow receivers to detect one-bit and two-bit errors, or correct one-bit errors without detection of uncorrected errors.
Initial approach
At first, we forgot about Hamming codes and all the theory behind and used a statistical approach. The thing is that the set of valid 7-bit chunks is limited (actually, only 16 are valid, since there are only 16 possible entries). So, what we did is find a truth table and see if any 7-bit chunk was equal to one of the values on the truth table:
$ python3 -q
>>> def en(d4, d3, d2, d1):
... p1 = d1 ^ d3 ^ d4
... p2 = d1 ^ d2 ^ d4
... p3 = d1 ^ d2 ^ d3
... return ''.join(map(str, [p1, p2, d4, p3, d3, d2, d1]))
...
>>> from itertools import product
>>>
>>> for d4, d3, d2, d1 in product(*[range(2)] * 4):
... print(f'{d4}{d3}{d2}{d1} -> {en(d4, d3, d2, d1)}')
...
0000 -> 0000000
0001 -> 1101001
0010 -> 0101010
0011 -> 1000011
0100 -> 1001100
0101 -> 0100101
0110 -> 1100110
0111 -> 0001111
1000 -> 1110000
1001 -> 0011001
1010 -> 1011010
1011 -> 0110011
1100 -> 0111100
1101 -> 1010101
1110 -> 0010110
1111 -> 1111111
>>>
>>> for d4, d3, d2, d1 in product(*[range(2)] * 4):
... print(f'{en(d4, d3, d2, d1)} -> {d4}{d3}{d2}{d1}')
...
0000000 -> 0000
1101001 -> 0001
0101010 -> 0010
1000011 -> 0011
1001100 -> 0100
0100101 -> 0101
1100110 -> 0110
0001111 -> 0111
1110000 -> 1000
0011001 -> 1001
1011010 -> 1010
0110011 -> 1011
0111100 -> 1100
1010101 -> 1101
0010110 -> 1110
1111111 -> 1111
At this point, we can create a Python script that receives samples from the remote instance and prints the matching 4-bit input when a 7-bit chunk appears in the above truth table:
#!/usr/bin/env python3
from collections import Counter
from pwn import log, remote, sys
truth_table = {
'0000000': '0000',
'1101001': '0001',
'0101010': '0010',
'1000011': '0011',
'1001100': '0100',
'0100101': '0101',
'1100110': '0110',
'0001111': '0111',
'1110000': '1000',
'0011001': '1001',
'1011010': '1010',
'0110011': '1011',
'0111100': '1100',
'1010101': '1101',
'0010110': '1110',
'1111111': '1111',
}
host, port = sys.argv[1].split(':')
io = remote(host, port)
def get_chunks():
io.recvuntil(b'Captured: ')
data = io.recvline().strip().decode()
return [data[i : i + 7] for i in range(0, len(data), 7)]
flag = ''
binary_flag = ''
io.info('Collecting samples...')
samples = [get_chunks() for _ in range(50)]
prog = log.progress('Flag')
while '}' not in flag:
characters = Counter()
for chunks in samples:
chunk = chunks[len(binary_flag) // 4]
if chunk in truth_table:
characters[truth_table.get(chunk)] += 1
if len(characters):
binary_flag += characters.most_common()[0][0]
else:
io.info('Collecting more samples...')
samples = [get_chunks() for _ in range(50)]
if len(binary_flag) % 8 == 0:
flag = bytes.fromhex(hex(int(binary_flag, 2))[2:]).decode()
prog.status(flag)
prog.success(flag)
Flag
If we run the script, we will get the flag:
$ python3 solve.py 94.237.63.201:48734
[+] Opening connection to 94.237.63.201 on port 48734: Done
[*] Collecting samples...
[+] Flag: HTB{hmm_w1th_s0m3_ana1ys15_y0u_c4n_3x7ract_7h3_h4mmin9_7_4_3nc_fl49}
[*] Closed connection to 94.237.63.201 port 48734
The full script can be found in here: solve.py
.
Intended way
The intended way to solve this challenge is to apply properties of Hamming codes to detect and correct errors. There’s a lot of theory and algebra behind these codes (more information at Wikipedia).
For instance, we will take the first 7-bit chunk received in the above output: 1001100
. Now we will define the parity check matrix $H$ for this Hamming code:
$$ H = \begin{pmatrix} 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 \end{pmatrix} $$
Now we need to multply this matrix by the 7-bit chunk as column vector (modulo $2$):
$$ \begin{pmatrix} 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 \end{pmatrix} \cdot \begin{pmatrix} 1 \\ 0 \\ 0 \\ 1 \\ 1 \\ 0 \\ 0 \end{pmatrix} = \begin{pmatrix} 2 \\ 0 \\ 2 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix} $$
As a result, we can be sure that 1001100
was transmitted correctly, in fact, 0100 -> 1001100
.
Let’s see this chunk: 1011100
(from the third output):
$$ \begin{pmatrix} 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 \end{pmatrix} \cdot \begin{pmatrix} 1 \\ 0 \\ 1 \\ 1 \\ 1 \\ 0 \\ 0 \end{pmatrix} = \begin{pmatrix} 3 \\ 1 \\ 2 \end{pmatrix} = \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} $$
Since 110
in reverse order is 011
, which is 3
in decimal, the bit at position 3
needs to be flipped (this is intentionally made by design). So we have:
$$ \begin{pmatrix} 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 \end{pmatrix} \cdot \begin{pmatrix} 1 \\ 0 \\ \color{yellow}{0} \\ 1 \\ 1 \\ 0 \\ 0 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix} $$
So, we have corrected one error and the correct information that was transmitted is 1001100
, and again, 0100 -> 1001100
.
This method is only successful if there is only 1 error per codeword. Although Hamming(7, 4) can detect up to 2 errors, there is no way to differentiate between codewords with 1 error or 2 errors, so using the correction method might give wrong results. Therefore, the probabilistic approach works better this time.