Low Level Etude One – Hello Worlds (Part 2)
Hello World - puts or printf?
bl - branch with link
Let's get back on track and learn about bl
. Consider the following simple program:
.globl _start
.p2align 2
say_hello:
mov x0, #1
adrp x1, msg@PAGE
add x1, x1, msg@PAGEOFF
adrp x2, msg_sz@PAGE
add x2, x2, msg_sz@PAGEOFF
ldr x2, [x2]
mov x16, #4
svc 0x80
_start:
b say_hello
mov x16, #1
svc 0x80
.data
msg: .asciz "Hello World!"
msg_sz: .word .-msg-1
If you build and start this it will print Hello World
forever. The b - branch
instruction will jump to say_hello
and continue execution after say_hello
with the next line which is the same branch instruction, thus repeating forever.
So we need to change b
to bl - branch with link
and at the end of the say_hello
block we add a ret
instruction. Now the execution will continue right after the bl say_hello
instruction. This happens because bl
saves the address of the next instruction into the lr
register and ret
jumps to the address saved in the lr
register.
But! What if we override the lr
registers content with another bl
instruction? Let's add the following:
.globl _start
.p2align 2
print_newline:
mov x0, #1
adrp x1, newline@PAGE
add x1, x1, newline@PAGEOFF
mov x2, #1
mov x16, #4
svc 0x80
ret
say_hello:
mov x0, #1
adrp x1, msg@PAGE
add x1, x1, msg@PAGEOFF
adrp x2, msg_sz@PAGE
add x2, x2, msg_sz@PAGEOFF
ldr x2, [x2]
mov x16, #4
svc 0x80
bl print_newline
ret
_start:
bl say_hello
mov x16, #1
svc 0x80
.data
msg: .asciz "Hello World!"
msg_sz: .word .-msg-1
.align 4
newline: .asciz "\n"
Can you already see the problem? With bl print_newline
we save another address to the lr
register and overwrite what was already saved. So once we call ret
from print_newline
we'll fall on the ret
instruction at the end of say_hello
which is another ret
statement that will jump to that very location, again. So we're in an endless loop.
The easy fix is to just save the content of the lr
register before the bl
instruction and restore if before the ret
instruction:
say_hello:
mov x0, #1
adrp x1, msg@PAGE
add x1, x1, msg@PAGEOFF
adrp x2, msg_sz@PAGE
add x2, x2, msg_sz@PAGEOFF
ldr x2, [x2]
mov x16, #4
svc 0x80
mov x3, lr
bl print_newline
mov lr, x3
ret
Beware that this will only work if the code that we jump into will not fiddle with the x3
register that we used to save the content of the lr
register.
Function call convention
So what is the right way to make a proper function call in arm64 assembler? Stephens Book has a nice summary:
For the calling routine:
- Save registers
x0 - x18
if you use them. - Move the first eight parameters into the registers
x0 - x7
. Functions with varadic parameters might be handled differently, we'll come to that - Push additional parameters on the stack.
- Use
bl
to call the function. - Evalute the return code in
x0
. - Restore
x0 - x18
, if needed.
For the called function:
- Push
lr
andx19 - x30
onto the stack if used in the routine. - Do the work.
- Put return code in
x0
- Pop
lr
andx19 - x30
if pushed in step 1. - Use
ret
instruction.
So that's no quite what we have been doing. Let's double check what clang did for us, again:
.LC0:
.string "Hello World!"
main:
stp x29, x30, [sp, -16]!
mov x29, sp
adrp x0, .LC0
add x0, x0, :lo12:.LC0
bl puts
ldp x29, x30, [sp], 16
ret
The first line stores a pair (st ore p air) of registers on the stack after subtracting 16 from sp
. sp
is the stack pointer that holds the currect position of the stack. Since the stack grows in negative direction subtracting 16 makes room to save the contents of two registers.
Let's check what the two registers are. Start lldb two
and enter b start
and r
. Now type re r
.
General Purpose Registers:
x0 = 0x0000000000000001
x1 = 0x000000016fdff618
x2 = 0x000000016fdff628
x3 = 0x000000016fdff778
[...]
x27 = 0x0000000000000000
x28 = 0x0000000000000000
fp = 0x000000016fdff5f0
lr = 0x000000010000d08c dyld`start + 520
sp = 0x000000016fdff4b0
pc = 0x0000000100003f8c zero`start
cpsr = 0x60001000
So x29
is the frame pointer and x30
is the link register. Let's explore this with a minimal sample program:
.globl _start
.p2align 2
_start:
stp fp, lr, [sp, -16]!
mov fp, sp
; work
ldp fp, lr, [sp], 16
ret
Build the program and start it in the debugger breaking on start. Let's explore the stack:
(lldb) re r sp
sp = 0x000000016fdff460
(lldb) m read 0x000000016fdff460
0x16fdff460: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x16fdff470: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Execute the first instruction and check again:
(lldb) re r sp
sp = 0x000000016fdff450
(lldb) m read 0x000000016fdff450
0x16fdff450: a0 f5 df 6f 01 00 00 00 8c d0 00 00 01 00 00 00 ...o............
0x16fdff460: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Let's check if that looks like it should:
(lldb) re r fp
fp = 0x000000016fdff5a0
(lldb) re r lr
lr = 0x000000010000d08c dyld`start + 520
Looks pretty good. Indeed the stack pointer got decremented by 16 bytes making room for the two 8 byte values saved in fp
and lr
and then they got pushed onto the stack. (In little endian byte order!) Next we move the fp - frame pointer
to the new position of the stack pointer so that the called function can construct a stack frame that can hold its local variables if needed. After the work is done we pop back fp
and lr
and are safe to call ret
.
So let's rewrite our last hello world example a little bit:
say_hello:
stp fp, lr, [sp, -16]!
mov fp, sp
mov x0, #1
adrp x1, msg@PAGE
add x1, x1, msg@PAGEOFF
adrp x2, msg_sz@PAGE
add x2, x2, msg_sz@PAGEOFF
ldr x2, [x2]
mov x16, #4
svc 0x80
bl print_newline
ldp fp, lr, [sp], 16
ret
So far so good!
Variadic parameters
Now let's write a printf
driven Hello World program. Since printf
uses variadic parameters we cannot use the registers x1 - x7
for all but the first parameter. The call convention simply differs. The variadic parameters go on the stack. Let's see how this is done:
.globl _start
.p2align 2
.equ variadic_param_1, 0
say_hello:
stp fp, lr, [sp, #-16]!
sub sp, sp, #16
mov fp, sp
adrp x0, format_str@PAGE
add x0, x0, format_str@PAGEOFF
adrp x1, msg@PAGE
add x1, x1, msg@PAGEOFF
str x1, [fp, #variadic_param_1]
bl _printf
add sp, sp, #16
ldp fp, lr, [sp], #16
ret
_start:
stp fp, lr, [sp, #-16]!
mov fp, sp
bl say_hello
ldp fp, lr, [sp], #16
ret
format_str: .asciz "%s\n"
.data
msg: .asciz "Hello World!"
The equ
directive gives a symbolic name to a numeric constant. We will reserve some space on the stack for the variadic parameters, and the first one will go in the first bucket, hence the 0 offset. After storing fp
and lr
onto the stack we move the stack pointer and our frame pointer 16 bytes further. This will give us room for 2 64 bit values. We only need one, but the sp
needs to be 16 byte aligned on Dariwn. After we loaded the address of msg
into the x1
register we can save it to our stack-frame (which can hold 2 64bit values). Since that is where the stack-pointer points to, that's also where printf
will be looking for it's first variadic parameter if the format string requires it.
You can play with a second variadic parameter and make another symbolic name: .equ variadic_param_2, 8
or just store the second value to our stack frame using: str reg, [fp, #8]
instead of str reg, [fp, #variadic_param_2]
.