Elfland
Just as Windows has its various executable formats, so too does Linux. In this land, there are Elfs, also known as Executable Linux Files. If we look at elf.h
, we can see the structures which constitute the ELF format:
#define EI_NIDENT 16
typedef struct {
unsigned char e_ident[EI_NIDENT];
Elf32_Half e_type;
Elf32_Half e_machine;
Elf32_Word e_version;
Elf32_Addr e_entry;
Elf32_Off e_phoff;
Elf32_Off e_shoff;
Elf32_Word e_flags;
Elf32_Half e_ehsize;
Elf32_Half e_phentsize;
Elf32_Half e_phnum;
Elf32_Half e_shentsize;
Elf32_Half e_shnum;
Elf32_Half e_shstrndx;
} Elf32_Ehdr;
typedef struct {
unsigned char e_ident[EI_NIDENT];
Elf64_Half e_type;
Elf64_Half e_machine;
Elf64_Word e_version;
Elf64_Addr e_entry;
Elf64_Off e_phoff;
Elf64_Off e_shoff;
Elf64_Word e_flags;
Elf64_Half e_ehsize;
Elf64_Half e_phentsize;
Elf64_Half e_phnum;
Elf64_Half e_shentsize;
Elf64_Half e_shnum;
Elf64_Half e_shstrndx;
} Elf64_Ehdr;
e_ident
Straightforward enough? This is how the kernel sees Executable Linux Files. Here's a quick rundown of what each of these field names formally represent within the ELF format:
- e_ident: stores the file's identification info, like magic number, class, and endianness.
- e_type: tells the file's type (e.g., executable, shared library).
- e_machine: describes the architecture (e.g., x86, ARM).
- e_version: version of the ELF format.
- e_entry: address where the program starts running.
- e_phoff: offset to the program header table.
- e_shoff: offset to the section header table.
- e_flags: flags for specific machine behaviors.
- e_ehsize: size of the ELF header.
- e_phentsize: size of each program header entry.
- e_phnum: number of program header entries.
- e_shentsize: size of each section header entry.
- e_shnum: number of section headers.
- e_shstrndx: index to the section name string table.
If we use readelf
we can see this for ourselves, along with program and section headers, offsets, relocations, symbol tables, and more.
$ readelf -a /usr/bin/gzip
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: ARM
Version: 0x1
Entry point address: 0x11fe0
Start of program headers: 52 (bytes into file)
Start of section headers: 71084 (bytes into file)
Flags: 0x5000400, Version5 EABI, hard-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 9
Size of section headers: 40 (bytes)
Number of section headers: 28
Section header string table index: 27
$ readelf -l /usr/bin/ls
Elf file type is DYN (Position-Independent Executable file)
Entry point 0x6d30
There are 13 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000002d8 0x00000000000002d8 R 0x8
INTERP 0x0000000000000318 0x0000000000000318 0x0000000000000318
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000000036f8 0x00000000000036f8 R 0x1000
LOAD 0x0000000000004000 0x0000000000004000 0x0000000000004000
0x0000000000014db1 0x0000000000014db1 R E 0x1000
LOAD 0x0000000000019000 0x0000000000019000 0x0000000000019000
0x00000000000071b8 0x00000000000071b8 R 0x1000
LOAD 0x0000000000020f30 0x0000000000021f30 0x0000000000021f30
0x0000000000001348 0x00000000000025e8 RW 0x1000
DYNAMIC 0x0000000000021a38 0x0000000000022a38 0x0000000000022a38
0x0000000000000200 0x0000000000000200 RW 0x8
NOTE 0x0000000000000338 0x0000000000000338 0x0000000000000338
0x0000000000000030 0x0000000000000030 R 0x8
NOTE 0x0000000000000368 0x0000000000000368 0x0000000000000368
0x0000000000000044 0x0000000000000044 R 0x4
GNU_PROPERTY 0x0000000000000338 0x0000000000000338 0x0000000000000338
0x0000000000000030 0x0000000000000030 R 0x8
GNU_EH_FRAME 0x000000000001e170 0x000000000001e170 0x000000000001e170
0x00000000000005ec 0x00000000000005ec R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x0000000000020f30 0x0000000000021f30 0x0000000000021f30
0x00000000000010d0 0x00000000000010d0 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
03 .init .plt .plt.got .plt.sec .text .fini
04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .data.rel.ro .dynamic .got .data .bss
06 .dynamic
07 .note.gnu.property
08 .note.gnu.build-id .note.ABI-tag
09 .note.gnu.property
10 .eh_frame_hdr
11
12 .init_array .fini_array .data.rel.ro .dynamic .got
I won't be covering all of the ELF sections in this blog post. For a comprehensive breakdown of Linux sections, I recommend this page: https://stevens.netmeister.org/631/elf.html
In this post, we're going to write x64 assembly to search for ELF magic numbers: 7f 45 4c 46
.
So, we'll begin by loading all of the strings we need for our program into the .data
section. We'll setup variables for the magic number, success message, usage message, error message, and a fail message.
section .data
elf_magic db 0x7F, 'E', 'L', 'F' ; ELF magic number
msg db "[+] ELF magic detected", 10 ; msg to print
msg_len equ $ - msg
usage_msg db "Usage: ./elf_check <filename>", 10
usage_msg_len equ $ - usage_msg
error_msg db "Error opening file. Please supply a valid file path.", 10
error_msg_len equ $ - error_msg
not_elf db "[-] No ELF magic detected", 10
not_elf_len equ $ - not_elf
section .bss
buffer resb 4 ; allocate 4 bytes for our read buffer
section .text
global _start
We'll set the strings and their lengths. elf_magic db
defines our byte signature, while the correlated msg db
holds our success msg
string and an ASCII newline (10). We also define its length with equ
, defining the msg_len
as a constant. The $
delimiter indicates the current location and subtracts it from the correlating string, e.g. msg_len
, yielding the length of the string of the msg
variable. We repeat this design pattern for the other strings we use in our program.
We also create a .bss
section -- the block starting symbols -- which hold our statically allocated variable resb
. This value in particular allocates a single byte -- which we allocate four of -- for the purpose of reading and looping through a buffer, byte by byte, later in our program.
Each of these directives communicate to the compiler and linker the structure of our executable. For example, the .text
subsection global _start
tells the linker (ld) where our program actually begins.
Next we'll use x86_64 instructives to communicate that we would like to open a file. If no file path is detected, we create a jump via jl
(jump if less) to a usage message indicating that the program requires a valid file path.
With a valid file path supplied, we handle its file descriptor and prepare to process it. Some familiarity with asm
is assumed. But I've tried to make the comments clear:
We use the open
system call and the O_RDONLY
flag to open our file. After setting the arguments, we invoke the call and test
if it was successful. If the test
is negative, we head for the exit, calling another variation of jump (js
; jump if sign flag is set) and bailing out to the .error_opening
message.
But if a valid file descriptor is found, we begin processing it and setup to enter a loop to compare the bytes of the supplied file to the ELF magic byte array we stashed in our .data
section.
_start:
mov rdi, [rsp] ; stack pointer to rdi for our argument
cmp rdi, 2 ; compare argc to 2 (our executable + 1 argument)
jl .usage_msg ; no arg? jump to usage_msg; else, get the filename from argv[1]
mov rdi, [rsp + 16] ; rsp+16 (argv1) to rdi
; open file (open syscall)
mov rsi, 0x0 ; rsi, O_RDONLY
mov rdx, 0 ; rdx for mode, unused
mov rax, 2 ; syscall number for open
syscall ; open(argv[1], O_RDONLY)
; success? check file descriptor, rax
test rax, rax ; check if fd is valid
js .error_opening ; jump to .error_opening if open failed
; save file descriptor in rbx
mov rbx, rax ; rbx file descriptor from rax to rbx
; read first 4 bytes from the file (read syscall)
mov rdi, rbx ; rdi, file descriptor
lea rsi, [buffer] ; load buffer to rsi to store bytes
mov rdx, 4 ; arg to read four bytes
mov rax, 0 ; syscall number for read
syscall ; read(file_desc, buffer, 4)
mov rdi, elf_magic ; rdi to point to the ELF magic number
mov rcx, 4 ; set loop counter to 4 bytes
jmp .compare_loop ; jump to .compare_loop
When we begin the "read first 4 bytes" portion of the code, we're setting up the arguments, which adheres to the Linux x86_64 calling convention.
Afterward, we invoke a syscall
. This is a call to the read()
function which does read(fd, buffer, 4)
.
The system call to read() is really doing this:
read(rdi, rsi, rdx) <------> read(file_descriptor, buffer, 4)
Lastly, we do mov rdi, elf_magic
to move the byte signature we're looking for, e.g. elf_magic
, to the rdi
register, and prepare the rcx
register as a loop counter by setting it to 4 just before jumping into .compare_loop
.
If you're not familiar with calling conventions, you can read more about them here: "Arguments Passing in Linux"
Register | Argument User Space | Argument Kernel Space |
---|---|---|
%rax | Not Used | System Call Number |
%rdi | Argument 1 | Argument 1 |
%rsi | Argument 2 | Argument 2 |
%rdx | Argument 3 | Argument 3 |
%r10 | Not Used | Argument 4 |
%r8 | Argument 5 | Argument 5 |
%r9 | Argument 6 | Argument 6 |
%rcx | Argument 4 | Destroyed |
%r11 | Not Used | Destroyed |
Next, we want to compare the bytes that we have read from the buffer to the ELF magic bytes we have stored in the .data
section of our program. Note that registers such as al
and bl
are registers for accessing single bytes.
Here's a chart of the registers and their related counterparts. Note: these registers can also be accessed and used independently. One need not use rsi
and sil
together to access single bytes. One can imagine zig-zagging across the chart below for reads, writes, compares, etc.
For example, if I want to move a byte from rsi
to the al
register for use in a loop, that is acceptable. But one can of course just use the default associated sil
register.
8-byte register | Bytes 0-3 | Bytes 0-1 | Byte 0 |
---|---|---|---|
%rax | %eax | %ax | %al |
%rcx | %ecx | %cx | %cl |
%rdx | %edx | %dx | %dl |
%rbx | %ebx | %bx | %bl |
%rsi | %esi | %si | %sil |
%rdi | %edi | %di | %dil |
%rsp | %esp | %sp | %spl |
%rbp | %ebp | %bp | %bpl |
%r8 | %r8d | %r8w | %r8b |
%r9 | %r9d | %r9w | %r9b |
%r10 | %r10d | %r10w | %r10b |
%r11 | %r11d | %r11w | %r11b |
%r12 | %r12d | %r12w | %r12b |
%r13 | %r13d | %r13w | %r13b |
%r14 | %r14d | %r14w | %r14b |
%r15 | %r15d | %r15w | %r15b |
Our compare loop will use the default associated registers which are shown the chart, e.g. rsi
and sil
-- as well as rdi
and dil
:
.compare_loop:
mov sil, byte [rsi] ; load byte from buffer
mov dil, byte [rdi] ; load byte from elf_magic
cmp sil, dil ; compare bytes
jne .not_elf ; not equal, not ELF
inc rsi ; move to next byte in buffer
inc rdi ; move to next byte in elf_magic
loop .compare_loop ; repeat until counter is 0
; if compare_loop completes, the file has ELF magic
; print msg
mov rdi, 1 ; file descriptor 1, stdout, to rdi
lea rsi, [msg] ; load address of msg to rsi
mov rdx, msg_len ; length of msg to rdx
mov rax, 1 ; syscall number for write
syscall ; write(1, msg, msg_len)
; exit syscall
mov rax, 60 ; syscall number for exit
xor rdi, rdi ; exit code 0, success
syscall ; exit(0)
This recursively iterates through the bytes -- looping four times courtesy of the rcx
counter we set in the _start
function. As the loop runs, it calls the inc
, e.g. increment, to move forward through the bytes pointed at by rsi
and rdi
If the bytes that we load from the buffer
match the bytes in elf_magic
, then we go forward -- setting up to print the success message by copying the file descriptor for standard output to the rdi
register, calling lea
to load the effective address of our msg
in bracket notation, and the corresponding msg_len
we set in the .data
section earlier. Last, we invoke our syscall. If all goes right, we should see the message: "[+] ELF magic detected"
However, if .compare_loop
iterates and a byte doesn't match the ELF magic signature, then we jump via jne
(jump-if-not-equal) register to the .not_elf
function.
Below is our .not_elf
function. We'll reuse this epilogue design pattern for exiting out of our program two more times, for both our .error_opening
and .usage_message
functions.
.not_elf:
; print .not_elf message
mov rdi, 1 ; file descriptor 1, stdout, to rdi
lea rsi, [not_elf] ; load address of not_elf msg to rsi
mov rdx, not_elf_len ; not_elf msg length to rdx
mov rax, 1 ; syscall number for write
syscall ; write(1, "not_elf", len)
; exit syscall
mov rax, 60 ; syscall number for exit
mov rdi, 1 ; exit code 1, failure
syscall ; exit(1)
If we use nasm
to compile, and then use ld
to link our executable, we can test to see if it successfully finds ELF file signatures.
stephan@vm:~$ nasm -f elf64 -o elfCheck.o elfCheck.asm
stephan@vm:~$ ld -s -o elfCheck elfCheck.o
stephan@vm:~$ ./elfCheck /etc/hostname
[-] No ELF magic detected
stephan@vm:~$ ./elfCheck /usr/bin/gzip
[+] ELF magic detected
It works! But wait, what if we spoof an ELF file? Then our ELF magic checker has been foiled!
stephan@vm:~$ echo -n -e '\x7f\x45\x4c\x46' > spoofed_elf
stephan@vm:~$ xxd spoofed_elf
00000000: 7f45 4c46 .ELF
stephan@vm:~$ ./elfCheck spoofed_elf
[+] ELF magic detected
Rats. We'll have to build an ELF validator that
No comments:
Post a Comment