Up next in the Toddler’s Bottle series is “collision”. Our hint reads:
Daddy told me about cool MD5 hash collision today.
I wanna do something like that too!
ssh col@pwnable.kr -p2222 (pw:guest)
Let’s get cracking!
What’s this MD5 thing?
MD5 is a hashing algorithm that takes an input and creates a 128-bit hash as its output. It was invented in 1991, although as early as 1996, collision issues were detected. More serious/practical collisions were found in 2005 and 2008.
Hashing algorithms are supposed to produce unique outputs (hashes) for unique inputs. When two unique inputs produce the same output, that’s a hash collision.
Col.c
If we use nano
, cat
, vi
, etc to look at col.c, we see:
#include
#include
unsigned long hashcode = 0x21DD09EC;
unsigned long check_password(const char* p){
int* ip = (int*)p;
int i;
int res=0;
for(i=0; i<5; i++){
res += ip[i];
}
return res;
}
int main(int argc, char* argv[]){
if(argc<2){
printf("usage : %s [passcode]\n", argv[0]);
return 0;
}
if(strlen(argv[1]) != 20){
printf("passcode length should be 20 bytes\n");
return 0;
}
if(hashcode == check_password( argv[1] )){
system("/bin/cat flag");
return 0;
}
else
printf("wrong passcode.\n");
return 0;
}
The first if-block of main
tells us that we need to provide a passcode.
The second if-block tells us that this passcode needs to be 20 bytes long.
The third if-block sends our passcode to the check_passcode
function. If the result is equivalent to the hard-coded hashcode (0x21DD09EC), the program will print out the flag for us.
unsigned long check_password(const char* p){
int* ip = (int*)p;
int i;
int res=0;
for(i=0; i<5; i++){
res += ip[i];
}
return res;
}
GDB
To get a better idea of what’s going on, let’s use gdb
col@ubuntu:~$ gdb
(gdb)
I want to look at the col
executable, and then disassemble the check_password function shown above.
(gdb) file col
Reading symbols from col...(no debugging symbols found)...done.
(gdb) disassemble check_password
Dump of assembler code for function check_password:
0x08048494 <+0>: push %ebp
0x08048495 <+1>: mov %esp,%ebp
0x08048497 <+3>: sub $0x10,%esp
0x0804849a <+6>: mov 0x8(%ebp),%eax
0x0804849d <+9>: mov %eax,-0x4(%ebp)
0x080484a0 <+12>: movl $0x0,-0x8(%ebp)
0x080484a7 <+19>: movl $0x0,-0xc(%ebp)
0x080484ae <+26>: jmp 0x80484c2 <check_password+46>
0x080484b0 <+28>: mov -0xc(%ebp),%eax
0x080484b3 <+31>: shl $0x2,%eax
0x080484b6 <+34>: add -0x4(%ebp),%eax
0x080484b9 <+37>: mov (%eax),%eax
0x080484bb <+39>: add %eax,-0x8(%ebp)
0x080484be <+42>: addl $0x1,-0xc(%ebp)
0x080484c2 <+46>: cmpl $0x4,-0xc(%ebp)
0x080484c6 <+50>: jle 0x80484b0 <check_password+28>
0x080484c8 <+52>: mov -0x8(%ebp),%eax
0x080484cb <+55>: leave
0x080484cc <+56>: ret
End of assembler dump.
(gdb)
We can use gdb to set breakpoints. If we set one at the add
instruction (after the mov
instruction) by using its address:
(gdb) break *0x080484bb
Breakpoint 1 at 0x80484bb
If we run it with a passcode like 11112222333344445555, we can look at the pattern in the eax
(accumulator) register.
So we need 5 things (4 bytes each) that add up to 0x21DD09EC
Ugh math
If we find a hex calculator and divide 0x21DD09EC by 5, we’ll see that there’s a remainder. But! We can probably do 4 * [aaaalmost the right number] + leftovers
, right?
0x21DD09EC / 5 = 0x6C5CEC8 (plus remainder)
0x06C5CEC8 is our “aaaalmost the right number” value.
0x21DD09EC - 0x06C5CEC8 * 4 = our leftovers
0x21DD09EC - 0x1B173B20 = our leftovers = 0x06C5CECC
Double-checking:
0x06C5CEC8 * 4 + 0x06C5CECC = 0x21DD09EC
Unfortunately, thanks to endianness, we need to flip the order of the bytes so that 0x06 is last and 0xc8 is first, and so on:
- 0x06C5CEC8 -> 0xC8 0xCE 0xC5 0x06
- 0x06C5CECC -> 0xCC 0xCE 0xC5 0x06
Collision time!
If we take our little-endian 4-byte groups of data and use python to add them up into a string to pass in, we get something like:
python -c "print '\xc8\xce\xc5\x06'*4+'\xcc\xce\xc5\x06'"
If we use that as the input to col
:
col@ubuntu:~$ ./col "`python -c "print '\xc8\xce\xc5\x06'*4+'\xcc\xce\xc5\x06'"`"
daddy! I just managed to create a hash collision :)