The GNU Debugger (GDB) is a powerful tool to debug binary executables. It can be used to do reverse-engineering as well. Let's debug the following code written by LiveOverflow:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
/* The code is taken from: */ /* https://github.com/LiveOverflow/liveoverflow_youtube/blob/master/0x05_simple_crackme_intro_assembler/license_1.c */ #include <string.h> #include <stdio.h> int main(int argc, char *argv[]) { if(argc==2) { printf("Checking License: %s\n", argv[1]); if(strcmp(argv[1], "AAAA-Z10N-42-OK")==0) { printf("Access Granted!\n"); } else { printf("WRONG!\n"); } } else { printf("Usage: <key>\n"); } return 0; } |
All it does is checking the key provided as an argument with the correct key embedded inside the code. We need to compile it with the debug flag:
1 |
gcc license_1.c -o license_1 -g |
Supplying our key "MYKEY" as an argument can be done in different ways. Directly:
1 |
[johndoe@ArchLinux]% gdb --args license_1 MYKEY |
Or after loading the program by starting it with an argument "MYKEY":
1 2 3 4 5 |
[johndoe@ArchLinux]% gdb license_1 GNU gdb (GDB) 8.0 ... Reading symbols from license_1...done. (gdb) start MYKEY |
Similarly, but by using "set" operator
1 2 3 4 5 |
[johndoe@ArchLinux]% gdb license_1 GNU gdb (GDB) 8.0 ... Reading symbols from license_1...done. (gdb) set args MYKEY |
So let's print the argument count and the arguments themselves by dereferencing the pointers and specifying the number of characters to be printed by using artificial arrays (@ sign):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
(gdb) p argc $1 = 2 (gdb) p /x argv $2 = 0x7fffffffe2e8 (gdb) p /x *argv $3 = 0x7fffffffe64a (gdb) p /x *(argv+1) $4 = 0x7fffffffe6c3 (gdb) p /s **argv@23 $5 = "/home/johndoe/license_1" (gdb) p /s **(argv+1)@5 $6 = "MYKEY" (gdb) p argv[0] $7 = 0x7fffffffe64a "/home/johndoe/license_1" (gdb) p argv[1] $8 = 0x7fffffffe6c3 "MYKEY" |
The first argument is the executable's file name with full path and the second one is key we supplied, so in total there are two argument and hence "argc" is equal to 2.
It's too easy. We can dig into more low-level. From the "System V Application Binary Interface: AMD64 Architecture Processor Supplement" (Linux AMD64 ABI) we know that the first argument to a C function (we are not talking about BASH) is passed via RDI register, the second is passed via RSI register. So in the main function
1 2 3 |
int main (int argc, char *argv[]){ ... } |
integer "argc" will be found in register RDI and array of pointers to characters "argv" will be found in register RSI. What "argv" really means will be clear in a minute.
If we look into the memory map, we will see
1 2 3 4 5 6 7 8 9 10 11 12 |
Synonym Address Content ========================================================= argc RDI : 2 argv RSI : 0x00007fffffffe2e8 : *argv 0x00007fffffffe2e8 : 0x00007fffffffe64a *(argv+1) 0x00007fffffffe2f0 : 0x00007fffffffe6c3 : : **argv 0x00007fffffffe64a : /home/johndoe/licence_1 : **(argv+1) 0x00007fffffffe6c3 : MYKEY |
Let's examine registers' content using "print" command. Format identifers "d" stands for "decimal", "x" - for hexadecimal, "z" - for hexadecimal with zero extension.
1 2 3 4 5 6 |
(gdb) p /d $rdi $4 = 2 (gdb) p /x $rsi $5 = 0x7fffffffe2e8 (gdb) p /z $rsi $6 = 0x00007fffffffe2e8 |
GDB cannot guess type of register RSI, so dereferencing it using "*" gives only a 32-bit address, not 64-bit one:
1 2 3 4 5 6 |
(gdb) ptype $rsi type = int64_t (gdb) ptype *$rsi type = int (gdb) p /x *$rsi $7 = 0xffffe64a |
That's why we need to cast it to pointer to a long integer, which is of 64-bit size:
1 2 3 4 |
(gdb) ptype *(long int *)$rsi type = long (gdb) p /x *(long int *)$rsi $8 = 0x7fffffffe64a |
Now the address is printed correctly. More proper way to do this is to cast RSI register to the native type of "argv":
1 2 3 4 |
(gdb) p /x *(char **)$rsi $9 = 0x7fffffffe64a (gdb) p /s **(char **)$rsi@23 $10 = "/home/johndoe/license_1" |
We can also dump memory directly by giving the address to "examine" ("x") command:
1 2 3 4 5 6 7 8 9 |
(gdb) x /2xg 0x7fffffffe2e8 0x7fffffffe2e8: 0x00007fffffffe64a 0x00007fffffffe6c3 (gdb) x /2xg $rsi 0x7fffffffe2e8: 0x00007fffffffe64a 0x00007fffffffe6c3 (gdb) x /s 0x00007fffffffe64a 0x7fffffffe64a: "/home/johndoe/license_1" (gdb) x /s 0x00007fffffffe6c3 0x7fffffffe6c3: "MYKEY" |
This command examines content of a the memory to which the address supplied points to. So it answers the question "What is stored at the given address?". It implies dereferencing of the pointer supplied. The output looks like "address": "content of the memory to which the address points to". Command "print", on contrary, just prints the address and does nothing more. This is a very important difference.
The format specifier "2xg" in "examine" command stands for "repeat twice hexadecimal dumping of a giant word (which is of size 64 bits)". Strings can be displayed by submitting "s" format specifier.
However, completely variable/register name dependent examination of memory is possible:
1 2 3 4 5 6 |
(gdb) x /s *(long int*)$rsi 0x7fffffffe64a: "/home/johndoe/license_1" (gdb) x /s *((long int*)$rsi+1) 0x7fffffffe6c3: "MYKEY" (gdb) x /s ((long int*)$rsi)[1] 0x7fffffffe6c3: "MYKEY" |
Here casting new pointer types and dereferencing pointers is exactly the same as in C. Manual dereferencing of pointers without "examine" command and printing the obtained content allows to display only a single character, which is "/" of "/home/johndoe/license_1":
1 2 |
(gdb) p /c **(long int*)$rsi $9 = 47 '/' |
Actually, it can be overcome by using artificial arrays but it looks ugly:
1 2 |
(gdb) p /c *((char *)*(long int *)$rsi)@23 $10 = {47 '/', 104 'h', 111 'o', 109 'm', 101 'e', 47 '/', 106 'j', 111 'o', 104 'h', 110 'n', 100 'd', 111 'o', 101 'e', 47 '/', 108 'l', 105 'i', 99 'c', 101 'e', 110 'n', 115 's', 101 'e', 95 '_', 49 '1', 0 '\000'} |
Making it prettier requires submitting a string format specifier:
1 2 |
(gdb) p /s *((char *)*(long int *)$rsi)@23 $11 = /home/johndoe/license_1 |
Instead of "print" we can use the system's "printf" function with well-known format specifiers just like in C:
1 2 3 4 5 6 7 8 9 10 11 |
(gdb) printf "%s\n", 0x00007fffffffe64a /home/johndoe/license_1 (gdb) printf "%s\n", 0x00007fffffffe6c3 MYKEY (gdb) printf "%s\n", *(long int*)$rsi /home/johndoe/license_1 (gdb) printf "%s\n", ((long int*)$rsi)[1] MYKEY (gdb) printf "%s\n", *((long int*)$rsi+1) MYKEY |
Variables and memory can be changed while debugging:
1 2 3 |
(gdb) set *(long int*)$rsi="My new string instead of 'home/johndoe/license_1'" (gdb) x /s *(long int*)$rsi 0x7fffffffe64a: "My new string instead of 'home/johndoe/license_1': |
Dumping the stack is easy:
1 2 3 4 5 6 7 8 9 10 11 |
(gdb) x /20xg $rsp 0x7fffffffe200: 0x0000000000400640 0x00007ffff7a5543a 0x7fffffffe210: 0x0000000000000000 0x00007fffffffe2e8 0x7fffffffe220: 0x0000000200000000 0x00000000004005bd 0x7fffffffe230: 0x0000000000000000 0x22417f8a58a275a3 0x7fffffffe240: 0x00000000004004d0 0x00007fffffffe2e0 0x7fffffffe250: 0x0000000000000000 0x0000000000000000 0x7fffffffe260: 0xddbe80f5900275a3 0xddbe9040f3c675a3 0x7fffffffe270: 0x0000000000000000 0x0000000000000000 0x7fffffffe280: 0x0000000000000000 0x0000000000000002 0x7fffffffe290: 0x00000000004005bd 0x00000000004006b0 |
Disassembling current function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
(gdb) disass Dump of assembler code for function main: 0x0000000000400567 <+0>: push rbp 0x0000000000400568 <+1>: mov rbp,rsp 0x000000000040056b <+4>: sub rsp,0x10 0x000000000040056f <+8>: mov DWORD PTR [rbp-0x4],edi 0x0000000000400572 <+11>: mov QWORD PTR [rbp-0x10],rsi => 0x0000000000400576 <+15>: cmp DWORD PTR [rbp-0x4],0x2 0x000000000040057a <+19>: jne 0x4005cd <main+102> 0x000000000040057c <+21>: mov rax,QWORD PTR [rbp-0x10] 0x0000000000400580 <+25>: add rax,0x8 0x0000000000400584 <+29>: mov rax,QWORD PTR [rax] 0x0000000000400587 <+32>: mov rsi,rax 0x000000000040058a <+35>: mov edi,0x400664 0x000000000040058f <+40>: mov eax,0x0 0x0000000000400594 <+45>: call 0x400470 <printf@plt> 0x0000000000400599 <+50>: mov rax,QWORD PTR [rbp-0x10] 0x000000000040059d <+54>: add rax,0x8 0x00000000004005a1 <+58>: mov rax,QWORD PTR [rax] 0x00000000004005a4 <+61>: mov esi,0x40067a 0x00000000004005a9 <+66>: mov rdi,rax 0x00000000004005ac <+69>: call 0x400480 <strcmp@plt> 0x00000000004005b1 <+74>: test eax,eax 0x00000000004005b3 <+76>: jne 0x4005c1 <main+90> 0x00000000004005b5 <+78>: mov edi,0x40068a 0x00000000004005ba <+83>: call 0x400460 <puts@plt> 0x00000000004005bf <+88>: jmp 0x4005d7 <main+112> 0x00000000004005c1 <+90>: mov edi,0x40069a 0x00000000004005c6 <+95>: call 0x400460 <puts@plt> 0x00000000004005cb <+100>: jmp 0x4005d7 <main+112> 0x00000000004005cd <+102>: mov edi,0x4006a1 0x00000000004005d2 <+107>: call 0x400460 <puts@plt> 0x00000000004005d7 <+112>: mov eax,0x0 0x00000000004005dc <+117>: leave 0x00000000004005dd <+118>: ret End of assembler dump. |
Disassembling any part of the code around RIP register using "examine" command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
(gdb) x /33i $rip-15 0x400567 <main>: push rbp 0x400568 <main+1>: mov rbp,rsp 0x40056b <main+4>: sub rsp,0x10 0x40056f <main+8>: mov DWORD PTR [rbp-0x4],edi 0x400572 <main+11>: mov QWORD PTR [rbp-0x10],rsi => 0x400576 <main+15>: cmp DWORD PTR [rbp-0x4],0x2 0x40057a <main+19>: jne 0x4005cd <main+102> 0x40057c <main+21>: mov rax,QWORD PTR [rbp-0x10] 0x400580 <main+25>: add rax,0x8 0x400584 <main+29>: mov rax,QWORD PTR [rax] 0x400587 <main+32>: mov rsi,rax 0x40058a <main+35>: mov edi,0x400664 0x40058f <main+40>: mov eax,0x0 0x400594 <main+45>: call 0x400470 <printf@plt> 0x400599 <main+50>: mov rax,QWORD PTR [rbp-0x10] 0x40059d <main+54>: add rax,0x8 0x4005a1 <main+58>: mov rax,QWORD PTR [rax] 0x4005a4 <main+61>: mov esi,0x40067a 0x4005a9 <main+66>: mov rdi,rax 0x4005ac <main+69>: call 0x400480 <strcmp@plt> 0x4005b1 <main+74>: test eax,eax 0x4005b3 <main+76>: jne 0x4005c1 <main+90> 0x4005b5 <main+78>: mov edi,0x40068a 0x4005ba <main+83>: call 0x400460 <puts@plt> 0x4005bf <main+88>: jmp 0x4005d7 <main+112> 0x4005c1 <main+90>: mov edi,0x40069a 0x4005c6 <main+95>: call 0x400460 <puts@plt> 0x4005cb <main+100>: jmp 0x4005d7 <main+112> 0x4005cd <main+102>: mov edi,0x4006a1 0x4005d2 <main+107>: call 0x400460 <puts@plt> 0x4005d7 <main+112>: mov eax,0x0 0x4005dc <main+117>: leave 0x4005dd <main+118>: ret |
Here format specifier "i" interprets binary data as an opcode and disassembles it. Overall, GDB is quite suitable for disassembling and examining executables while debugging.