I wanted to write a simple program in AMD64 assembly language which prints "Hello, World!". Here is the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
SYS_WRITE equ 1 SYS_EXIT equ 60 STD_OUTPUT equ 1 section .text global _start _start: mov rax, SYS_WRITE mov rdi, STD_OUTPUT ;lea rsi, [rel msg] mov rsi, msg mov rdx, msglen syscall mov rax, SYS_EXIT mov rdi, 0 syscall section .data msg: db `Shellcode: "Hello world!"\n` msglen equ $-msg |
It contains just two system calls (write and exit). Assembling and linking yields an ELF-64 executable which prints the "Hello, World!" message:
1 2 3 4 |
[johndoe@ArchLinux]% nasm -f elf64 HelloWorld -o HelloWorld.o [johndoe@ArchLinux]% ld HelloWorld.o -o HelloWorld [johndoe@ArchLinux]% ./HelloWorld Shellcode: "Hello world!" |
It contains both the code and data sections:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
[johndoe@ArchLinux]% objdump -d -M intel HelloWorld HelloWorld: file format elf64-x86-64 Disassembly of section .text: 00000000004000b0 <_start>: 4000b0: b8 01 00 00 00 mov eax,0x1 4000b5: bf 01 00 00 00 mov edi,0x1 4000ba: 48 be d8 00 60 00 00 movabs rsi,0x6000d8 4000c1: 00 00 00 4000c4: ba 1a 00 00 00 mov edx,0x1a 4000c9: 0f 05 syscall 4000cb: b8 3c 00 00 00 mov eax,0x3c 4000d0: bf 00 00 00 00 mov edi,0x0 4000d5: 0f 05 syscall [johndoe@ArchLinux]% objdump -j .data -s HelloWorld HelloWorld: file format elf64-x86-64 Contents of section .data: 6000d8 5368656c 6c636f64 653a2022 48656c6c Shellcode: "Hell 6000e8 6f20776f 726c6421 220a o world!". |
A shellcode payload, which will be embedded into a C file as a string, should not contain the data section. The message string should be moved into the code (.text) section.
While doing so, in order to prevent the moved data from execution, a bridge with a relative jump instruction can be used. This code works fine when compiled with NASM:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
SYS_WRITE equ 1 SYS_EXIT equ 60 STD_OUTPUT equ 1 section .text global _start _start: jmp short MainCode msg: db `Shellcode: "Hello world!"\n` msglen equ $-msg MainCode: mov rax, SYS_WRITE mov rdi, STD_OUTPUT mov rsi, msg mov rdx, msglen syscall mov rax, SYS_EXIT mov rdi, 0 syscall |
The problem arises when the raw binary opcodes of this assembly code are put into a C file as a string. Due to some relocation procedure during execution of the shellcode binary, the address of the message string "msg" points to an irrelevant location and the program prints nothing.
In order to overcome this problem I borrowed the idea of Position Independent Code (PIC) and I applied it here. In such scenario, all references to variables should be replaced with relative addressing. This means that all absolute addresses have to be recalculated with respect to the current value of the instruction pointer register RIP ("program counter"). NASM allows using [rel variable] macro to calculate the offset. Overall, just one line should be changed in this code. Let's replace
1 |
mov rsi, msg |
with
1 |
lea rsi, [rel msg] |
Assembler will generate the following opcode:
1 |
lea rsi,[rip+0xffffffffffffffd5] |
So the final version of the assembly file which works both when compiled with NASM and when compiled via a wrapper C file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
SYS_WRITE equ 1 SYS_EXIT equ 60 STD_OUTPUT equ 1 section .text global _start _start: jmp short MainCode msg: db `Shellcode: "Hello world!"\n` msglen equ $-msg MainCode: mov rax, SYS_WRITE mov rdi, STD_OUTPUT lea rsi, [rel msg] mov rdx, msglen syscall mov rax, SYS_EXIT mov rdi, 0 syscall |
Assembling and extracting opcodes is simple:
1 2 3 4 5 6 7 |
[johndoe@ArchLinux]% nasm -f elf64 ShellCodeHelloWorld -o ShellCodeHelloWorld.o [johndoe@ArchLinux]% ld ShellCodeHelloWorld.o -o ShellCodeHelloWorld_nasm [johndoe@ArchLinux]% ./ShellCodeHelloWorld Shellcode: "Hello world!" [johndoe@ArchLinux]% objcopy -O binary --only-section=.text -I elf64-x86-64 ShellCodeHelloWorld_nasm OpCodes.bin [johndoe@ArchLinux]% hexdump -v -e '"\\""x" 1/1 "%02x" ""' OpCodes.bin \xeb\x1a\x53\x68\x65\x6c\x6c\x63\x6f\x64\x65\x3a\x20\x22\x48\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64\x21\x22\x0a\xb8\x01\x00\x00\x00\xbf\x01\x00\x00\x00\x48\x8d\x35\xd5\xff\xff\xff\xba\x1a\x00\x00\x00\x0f\x05\xb8\x3c\x00\x00\x00\xbf\x00\x00\x00\x00\x0f\x05 |
Embedding generated opcodes as a string into a wrapper C file:
1 2 3 4 5 6 7 8 9 |
const char ShellCode[] = "\xeb\x1a\x53\x68\x65\x6c\x6c\x63\x6f\x64\x65\x3a\x20\x22\x48\x65" "\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64\x21\x22\x0a\xb8\x01\x00\x00" "\x00\xbf\x01\x00\x00\x00\x48\x8d\x35\xd5\xff\xff\xff\xba\x1a\x00" "\x00\x00\x0f\x05\xb8\x3c\x00\x00\x00\xbf\x00\x00\x00\x00\x0f\x05"; void _start(){ (*(void(*)())ShellCode)(); } |
Since it is a minimalistic shellcode, we don't need the "main" function and replace standard system start-up files with our code. In this case, the execution start from the "_start" function, which is the entry point of the executable. Compiling without including libc library and start-up files:
1 2 3 |
[johndoe@ArchLinux]% gcc ShellCodeHelloWorld.c -o ShellCodeHelloWorld -nostdlib -nostartfiles [johndoe@ArchLinux]% ./ShellCodeHelloWorld Shellcode: "Hello world!" |
The string may be moved into the "_start" function
1 2 3 4 5 6 7 8 |
void _start(){ char ShellCode[] = "\xeb\x1a\x53\x68\x65\x6c\x6c\x63\x6f\x64\x65\x3a\x20\x22\x48\x65" "\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64\x21\x22\x0a\xb8\x01\x00\x00" "\x00\xbf\x01\x00\x00\x00\x48\x8d\x35\xd5\xff\xff\xff\xba\x1a\x00" "\x00\x00\x0f\x05\xb8\x3c\x00\x00\x00\xbf\x00\x00\x00\x00\x0f\x05"; (*(void(*)())ShellCode)(); } |
but two more compilation flags should be added in this case:
1 2 3 |
[johndoe@ArchLinux]% gcc ShellCodeHelloWorld.c -o ShellCodeHelloWorld -nostdlib -nostartfiles -fno-stack-protector -z execstack [johndoe@ArchLinux]% ./ShellCodeHelloWorld Shellcode: "Hello world!" |
One last thing which should be mentioned is that shellcodes can be copied using "strcopy" function. This implies that the shellcodes cannot have null-bytes. Taking that into account, we can rewrite the above shellcode by using different instructions with no null-byte opcodes which perform the same function. The generated machine code is null-byte free:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
SYS_WRITE equ 1 SYS_EXIT equ 60 STD_OUTPUT equ 1 section .text global _start _start: EB1A jmp short MainCode msg: 5368656C6C636F6465- db `Shellcode: "Hello world!"\n` 3A202248656C6C6F20- 776F726C6421220A msglen equ $-msg MainCode: 4831C0 xor rax, rax B001 mov al, SYS_WRITE 4831FF xor rdi, rdi 6683C701 add di, STD_OUTPUT 488D35D3FFFFFF lea rsi, [rel msg] 4831D2 xor rdx, rdx 4883C21A add rdx, msglen 0F05 syscall 4831C0 xor rax, rax B03C mov al, SYS_EXIT 4831FF xor rdi, rdi 0F05 syscall |