The Portable Executable (PE) file format is the native file format for executable and binary files in the Microsoft Windows ecosystem.
Updating this post to add that you can find the Python code gists from this blog post on my Github's gist archive here.
PE Format
The PE format is used for Portable executables, DLLs, and various other Windows files. And if we open a binary in PE-bear, for example, we can load and more closely examine the data and metadata of the Portable Executable structure. I'll also use the Python library known as `pefile` to demonstrate.
The first thing we see on PE-bear's output for our process is the DOS Header. This is the initial 64 bytes of the file, which includes a DOS stub and pointers to the DOS MZ header.
The DOS Stub holds legacy DOS program information. Windows provides backwards-compatible support, so older programs aren't affected by modern Windows execution.
And then we have our NT Headers. One contains a signature indicating a valid PE file. And the File Header and Optional Header point to their respective File Header and Optional Header information.
The File Header provides general information about the file, including machine architecture, number of sections, timestamp, etc. While the Optional Header provides detailed characteristics and memory layout information, including Image base, entry point, subsystem, etc.
The last entry we see in PE-bear are the Section Headers. These include an array of headers, each describing a section and their respective section names, sizes, and locations.
File Sections follow (approximately) a particular format:
Actual data and code in the file.
- .text: Executable code.
- .rdata: Read-only data.
- .pdata: Exception handling info.
- .didat: Delay-load import descriptors.
- .rsrc: Resources like icons.
- .reloc: Relocation information.
- .imports: Import tables for DLLs.
This is a somewhat simplification however. There are many layers to the Portable Executable Format. After the DOS stub, there are various other sections involving the Common Object File Format (COFF). The fields, which are largely self-explanatory, are as follows:
COFF Header- Signature
- Machine
- NumberOfSections
- TimeDateStamp
- PointerToSymbolTable (deprecated)
- NumberOfSymbolTable (deprecated)
- SizeOfOptionalHeader
- Characteristics
- Magic Number
- Linker Versions
- SizeOfCode (sum of all sections)
- SizeOfInitializedData
- SizeOfUninitializedData
- AddressOfEntryPoint
- BaseOfCode
- BaseOfData
- ImageBase
- SectionAlignment
- FileAlignment
- Operating System Versions
- Image Versions
- Subsystem Versions
- Win32VersionValue
- SizeOfImage
- SizeOfHeaders
- CheckSum
- Subsystem
- DllCharacteristics
- SizeOfStackReserve
- SizeOfStackCommit
- SizeOfHeapReserve
- SizeOfHeapCommit
- LoaderFlags
- NumberOfRvaAndSizes
- ExportTable
- SizeOfExportTable
- ImportTable
- SizeOfImportTable
- ResourceTable
- SizeOfResourceTable
- ExceptionTable
- SizeOfExceptionTable
- CertificateTable
- SizeOfCertificateTable
- BaseRelocationTable
- SizeOfBaseRelocationTable
- Debug
- SizeOfDebug
- ArchitectureData
- SizeOfArchitectureData
- GlobalPtr
- TLSTable
- SizeOfTLSTable
- LoadConfigTable
- SizeOfLoadConfigTable
- BoundImport
- SizeOfBoundImport
- ImportAddressTable
- SizeOfImportAddressTable
- DelayImportDescriptor
- SizeOfDelayImportDescriptor
- CLRRuntimeHeader
- SizeOfCLRRuntimeHeader
- Name
- VirtualSize
- VirtualAddress
- SizeOfRawData
- PointerToRawData
- PointerToRelocations
- PointerToLinenumbers
- NumberOfRelocationsn
- NumberOfLinenumbers
- Characteristics
A small but important detail is that, in contrast to data sitting at rest on disk — 'live' animated data loaded into a process can possibly have different addresses to these values, which are calculated via virtual addressing.
And so what we're really interested in are these Relative-Virtual Addresses (RVAs) used within a Portable Executable — that is, the adjusted addresses calculated during runtime by adding the base address.
When a binary is loaded into the Windows environment, Windows reads its headers and sections. Then it maps these sections into memory, applying the appropriate relocations, loading the appropriate imports, and then executing the entry point function.
If we get Python on our Windows machine, using the pefile
library we can see that, although sometimes addresses are similar, there can also be stark differences between Raw and Virtual information. Our code gist:
import pefile
def print_section_info(pe):
for section in pe.sections:
section_name = section.Name.decode().rstrip('\x00')
print(f"Section Name: {section_name}")
print(f"Raw Address: 0x{section.PointerToRawData:08X}")
print(f"Raw Size: 0x{section.SizeOfRawData:08X}")
print(f"Virtual Address: 0x{section.VirtualAddress:08X}")
print(f"Virtual Size: 0x{section.Misc_VirtualSize:08X}")
print("")
if __name__ == "__main__":
pe = pefile.PE("C:\\Windows\\notepad.exe")
print_section_info(pe)
I've trimmed the output for brevity, but feel free to test this on your machine against Windows Portable executables:
Section Name: .text Raw Address: 0x00001000 Raw Size: 0x00028000 Virtual Address: 0x00001000 Virtual Size: 0x00027BC2 Section Name: .rdata Raw Address: 0x00029000 Raw Size: 0x0000B000 Virtual Address: 0x00029000 Virtual Size: 0x0000A608 Section Name: .data Raw Address: 0x00034000 Raw Size: 0x00001000 Virtual Address: 0x00034000 Virtual Size: 0x000026C0 Section Name: .pdata Raw Address: 0x00035000 Raw Size: 0x00002000 Virtual Address: 0x00037000 Virtual Size: 0x00001434
Here we can see that .pdata
has a different a different Virtual Address in contrast to its Raw Address. But what is .pdata
exactly?
Structured Exception Handling
So, before we can run a program on Windows, during compilation and runtime, if the file follows the appropriate Windows conventions, it sets up some information in functions which effectively give Windows a map of what to do during exceptions. When our program unwinds on the stack, it might encounter a condition that causes an exception — an error, invalid instruction, or an access violation, etc. Abnormal events or conditions that disrupt the normal flow of program execution. Per Microsoft's documentation for x64 exception handling:
The RUNTIME_FUNCTION structure must be DWORD aligned in memory. All addresses are image relative, that is, they're 32-bit offsets from the starting address of the image that contains the function table entry. These entries are sorted, and put in the .pdata section of a PE32+ image. For dynamically generated functions [JIT compilers], the runtime to support these functions must either use RtlInstallFunctionTableCallback or RtlAddFunctionTable to provide this information to the operating system. Failure to do so will result in unreliable exception handling and debugging of processes.
So, it can be said that, if our program follows the format correctly, all of the Structured Exception Handling directives effectively live within the .pdata section. And furthermore, we can also use the Python library pefile to analyze the .pdata sections involved in stack unwinding and structured exception handling.
The information within .pdata
lives inside RUNTIME_FUNCTION. The struct for the RUNTIME_FUNCTION is as follows:
typedef struct _IMAGE_RUNTIME_FUNCTION_ENTRY {
DWORD BeginAddress;
DWORD EndAddress;
union {
DWORD UnwindInfoAddress;
DWORD UnwindData;
} DUMMYUNIONNAME;
} RUNTIME_FUNCTION, *PRUNTIME_FUNCTION, _IMAGE_RUNTIME_FUNCTION_ENTRY, *_PIMAGE_RUNTIME_FUNCTION_ENTRY;
The values within the RUNTIME_FUNCTION of a Portable Executable are listed in the Microsoft documentation as being in the following format. We're interested in (2) - Chained Unwind Info: the function start address, end address, and unwind info address:
The unwind data info structure is used to record the effects a function has on the stack pointer, and where the nonvolatile registers are saved on the stack
struct UNWIND_INFO Size Value UBYTE: 3 Version UBYTE: 5 Flags UBYTE Size of prolog UBYTE Count of unwind codes UBYTE: 4 Frame Register UBYTE: 4 Frame Register offset (scaled) USHORT * n Unwind codes array variable Can either be of form (1) or (2) below (1) Exception Handler Size Value ULONG Address of exception handler variable Language-specific handler data (optional) (2) Chained Unwind Info Size Value ULONG Function start address ULONG Function end address ULONG Unwind info address
Therefore we need to iterate over the struct - and three unsigned long integers, each which is 4 bytes, for a total of a 12 byte read upon each loop:
import pefile
import struct
def main():
pe = pefile.PE("C:\\Windows\\notepad.exe", fast_load=True)
for section in pe.sections:
if section.Name.decode().rstrip('\x00') == '.pdata':
print(".pdata address: {} size: {}".format(hex(section.PointerToRawData), hex(section.SizeOfRawData)))
print_pdata_info(section)
def print_pdata_info(section):
with open("C:\\Windows\\notepad.exe", "rb") as file:
file.seek(section.PointerToRawData)
for i in range(0, section.SizeOfRawData, 12):
baddr, eaddr, uaddr = struct.unpack('<3L', file.read(12))
if not baddr:
break
print("Begin address: {} End address: {} Unwind info: {}".format(hex(baddr), hex(eaddr), hex(uaddr)))
if __name__ == "__main__":
main()
And if we run our code like so, python pdata_dump.py
, we will loop through the .pdata entries. I've cut the output for brevity because there are many values within this structure:
.pdata address: 0x35000 size: 0x2000 Begin address: 0x1008 End address: 0x108e Unwind info: 0x30144 Begin address: 0x1094 End address: 0x11ba Unwind info: 0x30120 ...
We can additionally verify that these addresses are in fact correct by observing that these relative addresses are precisely the offsets which PE-Bear also outputs.
There's a lot of other stuff possible with pefile
. If you're interested in doing anything with Portable Executables, it's a great library to use for research.
No comments:
Post a Comment