Objdump's linear sweep
While objdump's linear algorithm makes it fast, there are tradeoffs. For example, if we construct a Linux executable, we find we can insert strings into various headers which objdump will, to no surprise, blindly misinterpret. For example, we can use the __asm__
constructor to create a binary with a .section .text
string section containing zzz
, which is just an irrelevant string, but which objdump will parse as machine instructions anyway:
#include <stdio.h>
void main() {
__asm__(
".section .text\n"
"1: .string \"zzz\";"
".section .data\n"
"message: .string \"Transference..\\n\"\n"
".section .text\n"
"mov $4, %rax\n"
"mov $1, %rbx\n"
"mov $message, %rcx\n"
"mov $14, %rdx\n"
"int $0x80\n"
"mov $1, %rax\n"
"xor %rbx, %rbx\n"
"int $0x80\n"
);
}
This has to be compiled without stdlib and without position independence so gcc doesn't complain. So, gcc -nostdlib -no-pie -o golf golf.c
. Then we can run our binary and see it does in fact execute without crashing:
$ ./golf
Transference ..
And now observe the way objdump handles strings in the .section .text
section. Running objdump against our binary: objdump -D -j .text golf
:
Disassembly of section .text:
0000000000401000 :
401000: 55 push %rbp
401001: 48 89 e5 mov %rsp,%rbp
401004: 7a 7a jp 401080
401006: 7a 00 jp 401008
401008: 48 c7 c0 04 00 00 00 mov $0x4,%rax
40100f: 48 c7 c3 01 00 00 00 mov $0x1,%rbx
401016: 48 c7 c1 00 30 40 00 mov $0x403000,%rcx
40101d: 48 c7 c2 0e 00 00 00 mov $0xe,%rdx
401024: cd 80 int $0x80
401026: 48 c7 c0 01 00 00 00 mov $0x1,%rax
40102d: 48 31 db xor %rbx,%rbx
401030: cd 80 int $0x80
401032: 90 nop
401033: 5d pop %rbp
401034: c3 ret
And here we see our dead bytes "zzz" (7a, 7a, 7a) are interpreted as machine instructions. I believe this trick also works with various other program headers, too. We could use this to potentially create deliberately misleading binaries, or perhaps worse. Or inadvertently such behavior could mislead an analyst.
r2's recursive disassembly
So, recursive disassembly solves some of the problems that linear disassembly obviously gives us. But even radare2 interprets our meaningless "zzz" sequence in the .text
section, which of course get translated to jp
opcodes, even though it's just a string which is irrelevant to the program. Though, it's slightly better here, because radare2 detects the jp
codes and provides some context, letting us know neither 0x00401004
nor 0x00401006
are the real entry points, labels them, and correctly marks 0x401008
as the relevant entrypoint:
[0x0040102d]> pdr@entry0
;-- section..text:
;-- segment.LOAD1:
;-- main:
;-- rip:
┌ 53: entry0 ();
│ bp: 0 (vars 0, args 0)
│ sp: 0 (vars 0, args 0)
│ rg: 0 (vars 0, args 0)
│ 0x00401000 55 push rbp ; [02] -r-x section size 53 named .text
│ 0x00401001 4889e5 mov rbp, rsp
│ 0x00401004 7a7a jp 0x401080
| // true: 0x00401080 false: 0x00401006
│ 0x00401006 7a00 jp 0x401008
| // true: 0x00401008 false: 0x00401008
│ ; CODE XREF from entry0 @ 0x401006
│ 0x00401008 48c7c0040000. mov rax, 4
│ 0x0040100f 48c7c3010000. mov rbx, 1
│ 0x00401016 48c7c1003040. mov rcx, loc.message ; 0x403000 ; "Transference..\n"
│ 0x0040101d 48c7c20e0000. mov rdx, 0xe ; 14
│ 0x00401024 cd80 int 0x80
│ 0x00401026 48c7c0010000. mov rax, 1
│ 0x0040102d 4831db xor rbx, rbx
│ 0x00401030 cd80 int 0x80
│ 0x00401032 90 nop
│ 0x00401033 5d pop rbp
└ 0x00401034 c3 ret
But what if the string we leave in our .text
section isn't a jmp
variant? What if instead, we leave the string hell
in the .section .text
? Well, then it becomes a bit more ambiguous, even in radare2. We of course see our dead bytes, 65, 68, 6c, 6c
, in little Endian.
[0x00401000]> pdr@entry0
;-- section..text:
;-- segment.LOAD1:
;-- entry0:
;-- rip:
┌ 54: int main (int argc, char **argv, char **envp);
│ 0x00401000 55 push rbp ; [02] -r-x section size 54 named .text
│ 0x00401001 4889e5 mov rbp, rsp
│ 0x00401004 68656c6c00 push 0x6c6c65 ; 'ell'
│ 0x00401009 48c7c0040000. mov rax, 4
│ 0x00401010 48c7c3010000. mov rbx, 1
│ 0x00401017 48c7c1003040. mov rcx, loc.message ; 0x403000 ; "Transference..\n"
│ 0x0040101e 48c7c20e0000. mov rdx, 0xe ; 14
│ 0x00401025 cd80 int 0x80
│ 0x00401027 48c7c0010000. mov rax, 1
│ 0x0040102e 4831db xor rbx, rbx
│ 0x00401031 cd80 int 0x80
│ 0x00401033 90 nop
│ 0x00401034 5d pop rbp
└ 0x00401035 c3 ret
A cleaner, better way to analyze (and glean potential strings from) an executable Linux file is via the readelf
utility:
$ readelf -x .text golf
Hex dump of section '.text':
0x00401000 554889e5 68656c6c 0048c7c0 04000000 UH..hell.H......
0x00401010 48c7c301 00000048 c7c10030 400048c7 H......H...0@.H.
0x00401020 c20e0000 00cd8048 c7c00100 00004831 .......H......H1
0x00401030 dbcd8090 5dc3 ....].
No comments:
Post a Comment