Hack.lu's OREO with ret2dl-resolve

Hack.lu 2014 was really well done and entertaining. For one challenge we needed to get system from an unknown libc while bypassing ASLR. The return to dl-resolve technique I used wasn't known to me and I will explain it in this post.

This exploit took me a ridiculous amount of time during the CTF and is overkill in more than one place. Nevertheless the write-up should be interesting because the technique known as return to dl-resolve wasn't known to me before and seems to only be mentioned in Volume 0x0b, Issue 0x3a, Phile #0x04 and in a Japanese writeup that I didn't quite understand because I don't speak Japanese.

Here are some links to the challenge binary, the original exploit I used during the ctf and binexpect.

I won't cover the details of the fastbin exploit since that part is trivial; reading the Malleficarum is the only thing required. I will cover the rest of my exploit even the less interesting parts because some of them might be overkill enough to be matter for amusement.

Some stats about the problem:

Controlling the stack (twice)

Once the House of Spirit ensures us control over the next pointer to malloc, one of the easiest ways to acquire execution control is to make it return an address somewhere near the beginning of the GOT so you can overwrite it as per usual.

From there you want to trigger your stack pivot so that you have a little more stack space to play with.

There is a very nice gadget at 0x08048b42 which does add esp, 0x1c ; pop3; ret. This does exactly what we want when called in place of scanf because it will move the stack right into the buffer containing what we just typed at the prompt (and we can trigger this as many times as we want.) Guess what? Yep, didn't see that one during the CTF. I did something way more complicated.

Here is what the GOT looked like after my overwrite:

original GOT overwritten
link_map target target stack
dl-resolve 0x08048ae3 pop ebp; ret;
printf printf
free 0x08048450 push link_map; jmp dl-resolve
fgets fgets
__stack_chk_fail 0x0804844c leave; ret;
malloc malloc
puts puts
__gmon_start__ garbage clobbered w/ final nullbyte.
strlen untouched
__libc_start_main untouched
__isoc99_sscanf untouched

Notice the pointers to link_map and the dl-resolve function which aren't usually considered part of the GOT but are located right in front of it.

The way this pivot works is as follows. On the next call to free, target will be pushed on the stack and instead of dl-resolve our pop ebp; ret will be excuted and return control to the original program. After some instructions the canary is checked relative to ebp, of course it doesn't match anymore and we regain control through our overwrite of __stack_chk_fail. Because now we control ebp we simply end with a classic leave; ret; which pivots our stack.

We now control where the stack points to. As we will see later my final ropchain has some data with it so I needed somewhere to store it. The only large buffer at a known address is the one where the message is normally stored. The trouble with this is that it is too close to the GOT and when the stack grows it ends up overwriting important entries such as malloc which isn't acceptable for our exploit. So I decided to go with a small loader that I stored on the heap during the initial preparations for the House of Spirit, I know the address of this because it is very easy to leak.

finaltack ebp == some location in the middle of the heap
fgets fgets(finalstack, 1025, stdin)
0x0804844c leave; ret;
finalstack this is also where we write to

After this we control the stack for real. We can simply send our ropchain and take up as much space as we want.

Getting a shell

A lot of the work until now was getting a nice stack to play with. In the first version of this exploit I didn't do any of this because I thought I could just guess which libc was running based on the addresses I leaked from the GOT and simply return to system at some point. But, in the challenge's author's own words:

cutz: yea the libc is self compiled [and] guessing offset sucks ^^

So. Let's get thinking, unknown libc and ASLR. Sounds familiar? Wasn't to me. As it turns out there are a couple of quite simple solutions (see at the end of this writeup). But you know, its way more fun to go with the trick that no one is using since 2001 especially during a 48h CTF that is scheduled in the middle of the goddamn week when I'm supposed to be working.

Return to dl-resolve

After some other wild ideas that obviously didn't work or I would be writing about them instead, it dawned upon me that this had to be a solved problem. If system (which is what I really wanted to call) had been used in the binary how would its address end up in the GOT the first time it's called anyway? Looking this up is a matter of googling; the initial call to a library function ends up calling dl-resolve and passing it an index into the binary's (not library's!) relocation table and a pointer to something called link_map. While googling I also noticed the Japanese writeup that mentioned this technique so I was pretty sure this would work.

I guess if it simply passed the symbol's name it would be too easy, right? Instead it uses the relocation entry to get an index into a symbol table which in turn yields an index in the string table which gives it the function's name. This name is then used to resolve the symbol in the library. How does it get the library? Well link_map is a linked list of all loaded libraries, it finds it in there.

This technique is all about calling dl-resolve with specially crafted arguments. We give it the index into the relocation table for system and the original value for link_map. This is all very cool, except for the fact system isn't in that table. If it was it would also be in the GOT/PLT and we wouldn't be having this whole problem in the first place. The thing is, however, that this index doesn't really have to fit in the table. It can be way bigger so that the relocation entry it points to is somewhere we control.

Structure layout

Elf32_Rel *reloc = JMPREL + index;
Elf32_Sym *sym = &SYMTAB[ELF32_R_SYM(reloc->r_info];
name = STRTAB + sym->st_name;

index is the argument to dl-resolve and JMPREL, SYMTAB and STRTAB are constants because they are addresses of parts of the binary and there is no PIE.

We have everything we need to setup some fake structures on the heap, compute the correct index and pass it to dl-resolve.

This works pretty well for the first line, but reloc->r_info is only stored using one byte and the heap is far away from the original symtab (which is in the binary!). This is a problem because there is no writable memory at all within 4096 (256 * sizeof (Elf32_Sym)) bytes of the symtab.

The solution to this is a small detour. It turns out the symtab location isn't computed from scratch each time by parsing the binary, instead it is available at a known location in the .bss. Lets quickly overwrite that using a call to fgets and make everyone believe the symtab is on the heap!

Once this is done we can execute our plan using the fake structures we have setup and dl-resolve will fetch system and return to it for us.

Alternative: Do it yourself

After doing the previous technique during the CTF I learned it isn't that hard to locate libc. You can simply round a known address down to a multiple of pagesize and keep subtracting pagesizes until it starts with \x7fELF. Another (cleaner) way is, you know, to simply loop through the link_map until you find the library you're looking for. That's what the bloody thing is supposed to be used for after all. (If you don't know what link_map is you should have read the previous section.)

Once you have the start of libc and you have something that lets you repeatedly leak from an arbitrary address it is trivial to parse any library's .dynsym to retrieve whatever symbol. This technique can be seen in this exploit.

Because those conditions are met most of the time, I think this is the reason the previous technique is so rarely used since it requires a lot of fake structure setup. I believe the return to dl-resolve can still be useful sometimes because it doesn't require any leaks per-se. In a non PIE binary it should be possible to pull it off without leaking anything. The only time I had to leak stuff in this exploit was because I had to corrupt the whole GOT to do my overly-complicated pivot and still needed the original values for later.