Low Level Etude One – Hello Worlds

Hello Worlds

There is a great book about ARM64 assembler Programming with 64-Bit ARM Assembly Language from Stephen Smith and the great repository Hello Silicon in which Alex translated all the content from the book to Apple Silicon.

Why Assembler

Disclaimer: Don't write your software in plain Assembler!

With that out of the way I encourage you to actually DO WRITE (at least mini-) software in Assembler, just because it can be great fun (see here: Human Resource Machine) and – on a serious note – it might sharpen your debugging skills and you might appreciate what higher level languages can do for you – I think you even get a clearer view on what you want higher level languages to do.

Hello World – syscall

Let's start with the first Hello World example, presented in the aformentioned resources:

; zero.s .globl _start .p2align 2 _start: mov x0, #1 adr x1, msg mov x2, #13 mov x16, #4 svc 0x80 mov x16, #1 svc 0x80 msg: .asciz "Hello World!\n"

Let's build and run:

as -o zero.o zero.s ld -o zero zero.o -lSystem -syslibroot `xcrun -sdk macosx --show-sdk-path` -e _start ./zero

It should indeed print Hello World on the screen.

What happens

svc is the mnemonic for supervisor call so it calls into the OS kernel and it calls the SYS_write syscall. How do I know? Let's check.

open `xcrun -sdk macosx --show-sdk-path`/usr/include/sys/syscall.h

and see for youself.

// [...] #define SYS_syscall 0 #define SYS_exit 1 #define SYS_fork 2 #define SYS_read 3 #define SYS_write 4 // [...]

Now you know what the last two lines do. They exit the program with whatever exit code there is in the x0 register at that moment. Let's check:

./zero; echo $?

I bet the return code is 13, how do I know that? Let's find out:

open `xcrun -sdk macosx --show-sdk-path`/usr/include/unistd.h

and search for write(. You'll find the following declaration:

ssize_t write(int __fd, const void * __buf, size_t __nbyte) __DARWIN_ALIAS_C(write);

and with very little fantasy you can imagine that calling SYS_write will return the number of bytes written. You can even make sense of the values in the x0, x1 and x2 registers, now. It's the file descriptor (stdin=0, stdout=1, stderr=2) the address of the string and its length.

The first 8 arguments to a function go into the reigsters x0-x7, the return value can be read from x0. We will explore how functions with more (or variadic) parameters work.

Try changing the value for __nbyte to 5 and build and run the program again.

Normally one would expect a well written Hello World program to exit with the exit code 0 if it was successfull and some other value otherwise (do you hear me codesign 🤬). So please insert a mov x0, #0 to set the parameter to SYS_exit to 0 no matter what happend before that – totally ignoring error handling – if you want to replicate the codesign behaviour in the case of finding ambigious certificates to codesign your binary and thus not signing your binary, at all.

Make it more readable

You can actually use names instead of numbers if you use clang or gcc's preprocessor. Copy zero.s to zero_names.S and change it to the following:

#include <sys/syscall.h> .globl _start .p2align 2 _start: mov x0, #1 adr x1, msg mov x2, #13 mov x16, #SYS_write svc 0x80 mov x16, #SYS_exit svc 0x80 msg: .asciz "Hello World!\n"

Now you need a C compiler to build this:

clang -o zero_names zero_names.S -e _start

I will explain the reason behind -e start later and if I miss anything Alex or Stephen will have you covered.

Let's first ask another question:

Why linking against libSystem

That's a good question. We don't use any function provided by libSystem in zero.s do we? We talk directly to the OS kernel.

Let's build what we have in plain C and see what happens:

#include <unistd.h> #include <sys/syscall.h> int main() { return syscall(SYS_write, 1, "Hello World!\n", 13); }

Hey codesign! See what I did there returning something useful instead of 0 in every case?

Compile it with:

clang -o zero_in_c zero_in_c.c -Wno-deprecated

Before we talk about the supression of the deprecation warning I want to introduce you to a really great website: Compiler Explorer by Matt Goldbold. Open it and paste our C programm in there. Make sure to select C as the language and ARM64 as the platform. You can click here.

This is not quite the assembler programm we wrote :-/ I mean it basically does the same, but there are three differences:

  • There's some fiddeling with the stack,
  • it uses bl to call the syscall function instead of svc and
  • the addressing of the string resource is a bit more general.

There will be more etudes explaining all of this.

Back to the warning. Please delete -Wno-deprecated from the command above and read the following:

zero_in_c.c:5:9: warning: 'syscall' is deprecated: first deprecated in macOS 10.12 - syscall(2) is unsupported; please switch to a supported interface. For SYS_kdebug_trace use kdebug_signpost(). [-Wdeprecated-declarations] return syscall(SYS_write, 1, "Hello World!\n", 13); ^ /Applications/Xcode-14.0.0-Beta.3.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/unistd.h:746:6: note: 'syscall' has been explicitly marked deprecated here int syscall(int, ...); ^ 1 warning generated.

Let me translate this for you:

"Please don't talk to our kernel directly! Use the APIs we provide!".

Wait there is even something between the lines:

"We could change the layout and the numbering of the syscalls anytime without telling you, because we know you will only use our APIs and we'll update them alongside our kernel. So everything will work fine!"

Fair enough! We have been warned.

Now you know the reason we need to link even our first version of the Hello World program to libSystem. This is the lowest ground where Apple wants us to live. In fact neither Alex nor I manged to build a Mach executable without linking it to libSystem.

You can link it:

ld -o zero zero.o -e _start -static

But you cannot start it. (Is there a way? Please tell me!)

Improved Hello World – syscall

One thing that I found annoying right away is having to provide the length of the string. Stephens book has a neat trick for that:

; zero_length.s .globl _start .p2align 2 _start: mov x0, #1 adrp x1, msg@PAGE add x1, x1, msg@PAGEOFF adrp x2, msg_sz@PAGE add x2, x2, msg_sz@PAGEOFF ldr x2, [x2] mov x16, #4 svc 0x80 mov x16, #1 svc 0x80 .data msg: .asciz "Hello World!\n" msg_sz: .word .-msg

For this to work we needed to move the string msg out of the explicit text section of our program into the data section. Text section can be read and executed, data section can be read and written, not executed!

Untill now we could load the string in msg with adr reg, msg. That generated an address relative to the pc register. I encourage you to read Stephens book to find out how ARM64 manages it to put 64 bit addresses into opcodes that are only 64 bit long. Now we need to use adrp and add. adrp gives us the address to the memory page that holds msg and add adds the approriate offset.

Let's evaluate that the line add x1, x1, msg@PAGEOFF does in fact add nothing to x1, because msg has zero offset from the page start.

Build the programm and run it with

lldb zero_length

Once in lldb enter b start and then r. Now you can see it already:

-> 0x100003f90 <+0>: mov x0, #0x1 0x100003f94 <+4>: adrp x1, 1 0x100003f98 <+8>: add x1, x1, #0x0 ; msg 0x100003f9c <+12>: adrp x2, 1

msg lives at 0 offset. Enter n 3 times and then re r x1 to see the content of the x1 register (re gister r ead x1). Copy the value and enter:

m read 0x0000000100004000

0x0000000100004000 being the value from x1.

Now step again two times and see whats in x2

(lldb) re r x2 x2 = 0x000000010000400e msg_sz

check the memory at that address:

0x10000400e: 0e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x10000401e: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

So the value is 0e which is 14 in decimal.

ldr x2, [x2] loads the value from the address in x2 to the x2 register.

So the assembler evaluated . to the current memory address which is right after msg and -msg subtracts the offset of msg giving you its length + 1. So actually it should read:

.data msg: .asciz "Hello World!\n" msg_sz: .word .-msg-1

Let's check in the debugger:

(lldb) re r x2 x2 = 0x000000000000000d

That's 13!

Hello World - puts or printf?

Ok. We learned that we're not supposed to talk to the OS kernel directly. How would a reasonable C Hello World example look in assembler anyway.

int main() { return puts("Hello World!"); }

Put that into Compiler Explorer and get: (You can click here)

.LC0: .string "Hello World!" main: stp x29, x30, [sp, -16]! mov x29, sp adrp x0, .LC0 add x0, x0, :lo12:.LC0 bl puts ldp x29, x30, [sp], 16 ret

We understand the addressing already, but we need to learn about bl - branch with link and the stack next.

Before we do that. I just want to try out something else. Copy zero.s to zero_no_symbols.s and change it to the following:

; zero_no_symbols.s .globl _start .p2align 2 _start: mov x0, #0 // 0 adr x1, #0x18 // 4 mov x2, #13 // 8 + 4 mov x16, #4 // 12 + 8 svc 0x80 // 16 + 12 mov x16, #1 // 20 + 16 svc 0x80 // 24 + 20 msg: .asciz "Hello World!\n" // 28 + 24 = 0x18

Instead of using adr x1, msg you can simply count instructions and calculate the position of msg yourself. Darwin wants everything aligned on 4 byte boundaries. That is what p2align 2 does. So you can see that the second instruction is 4 bytes from the program start and msg is 24 bytes from the pc register after the first instruction is executed (the pc register (p rogramm c ounter) always has the next line that is going to be executed). So since 24 is 0x18 in hex we can put that there instead of msg. Pretty useless, but helpful to understand.

To prove all this execute:

lldb zero

and enter b start then r and dis:

zero`start: -> 0x100003f8c <+0>: mov x0, #0x1 0x100003f90 <+4>: adr x1, #0x18 ; msg 0x100003f94 <+8>: mov x2, #0x5 0x100003f98 <+12>: mov x16, #0x4 0x100003f9c <+16>: svc #0x80 0x100003fa0 <+20>: mov x16, #0x1 0x100003fa4 <+24>: svc #0x80

Enter n one time and then check the pc register re r pc

(lldb) re r pc pc = 0x0000000100003f90 zero`start + 4

Add 24 to 0x0000000100003f90 which is 0x0000000100003fa8 and then enter m read 0x0000000100003fa8:

(lldb) m read 0x0000000100003fa8 0x100003fa8: 48 65 6c 6c 6f 20 57 6f 72 6c 64 21 0a 00 00 00 Hello World!.... 0x100003fb8: 01 00 00 00 1c 00 00 00 00 00 00 00 1c 00 00 00 ................

See! There's your msg.

Part 2


Oh! By the way if you want your own Linux syscall to play with start here: Implementing a Linux syscall