Pwnable.kr: ‘collision’ Walkthrough

Up next in the Toddler’s Bottle series is “collision”.  Our hint reads:

Daddy told me about cool MD5 hash collision today.

I wanna do something like that too!

ssh col@pwnable.kr -p2222 (pw:guest)

Let’s get cracking!

What’s this MD5 thing?

MD5 is a hashing algorithm that takes an input and creates a 128-bit hash as its output.  It was invented in 1991, although as early as 1996, collision issues were detected.  More serious/practical collisions were found in 2005 and 2008.

Hashing algorithms are supposed to produce unique outputs (hashes) for unique inputs.  When two unique inputs produce the same output, that’s a hash collision.

Col.c

If we use nano, cat, vi, etc to look at col.c, we see:

#include
#include
unsigned long hashcode = 0x21DD09EC;
unsigned long check_password(const char* p){
    int* ip = (int*)p;
    int i;
    int res=0;
    for(i=0; i<5; i++){
        res += ip[i];
    }
    return res;
}

int main(int argc, char* argv[]){
    if(argc<2){
        printf("usage : %s [passcode]\n", argv[0]);
        return 0;
    }
    if(strlen(argv[1]) != 20){
        printf("passcode length should be 20 bytes\n");
        return 0;
    }

    if(hashcode == check_password( argv[1] )){
        system("/bin/cat flag");
        return 0;
    }
    else
        printf("wrong passcode.\n");
    return 0;
}

The first if-block of main tells us that we need to provide a passcode.

The second if-block tells us that this passcode needs to be 20 bytes long.

The third if-block sends our passcode to the check_passcode function.  If the result is equivalent to the hard-coded hashcode (0x21DD09EC), the program will print out the flag for us.

unsigned long check_password(const char* p){
    int* ip = (int*)p;
    int i;
    int res=0;
    for(i=0; i<5; i++){
        res += ip[i];
    }
    return res;
}

GDB

To get a better idea of what’s going on, let’s use gdb

col@ubuntu:~$ gdb
(gdb)

I want to look at the col executable, and then disassemble the check_password function shown above.

(gdb) file col
Reading symbols from col...(no debugging symbols found)...done.
(gdb) disassemble check_password
Dump of assembler code for function check_password:
   0x08048494 <+0>: push   %ebp
   0x08048495 <+1>: mov    %esp,%ebp
   0x08048497 <+3>: sub    $0x10,%esp
   0x0804849a <+6>: mov    0x8(%ebp),%eax
   0x0804849d <+9>: mov    %eax,-0x4(%ebp)
   0x080484a0 <+12>: movl   $0x0,-0x8(%ebp)
   0x080484a7 <+19>: movl   $0x0,-0xc(%ebp)
   0x080484ae <+26>: jmp    0x80484c2 <check_password+46>
   0x080484b0 <+28>: mov    -0xc(%ebp),%eax
   0x080484b3 <+31>: shl    $0x2,%eax
   0x080484b6 <+34>: add    -0x4(%ebp),%eax
   0x080484b9 <+37>: mov    (%eax),%eax
   0x080484bb <+39>: add    %eax,-0x8(%ebp)
   0x080484be <+42>: addl   $0x1,-0xc(%ebp)
   0x080484c2 <+46>: cmpl   $0x4,-0xc(%ebp)
   0x080484c6 <+50>: jle    0x80484b0 <check_password+28>
   0x080484c8 <+52>: mov    -0x8(%ebp),%eax
   0x080484cb <+55>: leave 
   0x080484cc <+56>: ret    
End of assembler dump.
(gdb)

We can use gdb to set breakpoints.  If we set one at the add instruction (after the mov instruction) by using its address:

(gdb) break *0x080484bb
Breakpoint 1 at 0x80484bb

If we run it with a passcode like 11112222333344445555, we can look at the pattern in the eax (accumulator) register.

So we need 5 things (4 bytes each) that add up to 0x21DD09EC

Ugh math

If we find a hex calculator and divide 0x21DD09EC by 5, we’ll see that there’s a remainder.  But!  We can probably do 4 * [aaaalmost the right number] + leftovers, right?

0x21DD09EC / 5 = 0x6C5CEC8 (plus remainder)

0x06C5CEC8 is our “aaaalmost the right number” value.

0x21DD09EC - 0x06C5CEC8 * 4 = our leftovers
0x21DD09EC - 0x1B173B20 = our leftovers = 0x06C5CECC

Double-checking:

0x06C5CEC8 * 4 + 0x06C5CECC = 0x21DD09EC

Unfortunately, thanks to endianness, we need to flip the order of the bytes so that 0x06 is last and 0xc8 is first, and so on:

Collision time!

If we take our little-endian 4-byte groups of data and use python to add them up into a string to pass in, we get something like:

python -c "print '\xc8\xce\xc5\x06'*4+'\xcc\xce\xc5\x06'"

If we use that as the input to col:

col@ubuntu:~$ ./col "`python -c "print '\xc8\xce\xc5\x06'*4+'\xcc\xce\xc5\x06'"`"
daddy! I just managed to create a hash collision :)