|Dates:||February 10-17, 2024 (spread over the week)|
Reverse engineering is the art of extracting valuable information from unknown binary programs. No matter whether we aim to find vulnerabilities in closed-source software, dissect the internals of nation-state malware, or simply bypass copy protection technologies: Reverse engineering helps us to pinpoint relevant code/data locations, enables us to reconstruct high-level constructs from machine code, and thus provides us with insights into valuable program internals.
In this training, we learn the fundamentals of reverse engineering from scratch, ranging from reconstructing high-level code over recovering complex data structures and C++ class hierarchies to analyzing complex malware samples. In between, we become proficient in using state-of-the-art tools such as IDA, Ghidra, and GDB. This way, the training accompanies students in their first reverse engineering steps and paves their way for a long journey.
First, we discuss the layers between machine code and high-level languages, introduce binary file formats and get to know important tools such as hex editors, disassemblers, decompilers, and debuggers. Afterward, we familiarize ourselves with the X86-64 instruction set architecture, the most common architecture on desktop computers and servers. Thereby, we learn how to manually write assembly code, inspect registers and flags in a debugger, and reconstruct arithmetic calculations and loops in a disassembler.
In the second part, we cover the reconstruction of high-level code constructs from machine code. For this, we compile C code to machine code and compare them side-by-side. Using different compilers and optimization levels, we are able to study the manifold representations of high-level constructs. Afterward, we focus on manually recovering high-level functions from compiler-generated code. Finally, we dive into the area of software cracking and deepen our skills by reverse engineering and patching serial validation schemes.
Before we reconstruct complex data structures and C++ classes with Ghidra, we first learn how to identify them manually. Following, we have a look at how to recover class inheritance relationships, analyze constructors & virtual functions, and how to dissolve virtual function calls.
Finally, we put our obtained knowledge into practice by analyzing nation-state malware samples. After discussing challenges and strategies when dealing with complex binaries, we identify malware functionality based on API functions and reconstruct class hierarchies of malware modules. In order to reveal hidden strings in the binary, we script Ghidra to automatically decrypt them.
Note that the training focuses on hands-on sessions. While some lecture parts provide an understanding of how high-level code can be represented in machine code, various hands-on sessions teach how to interact with reverse engineering tools and reconstruct high-level code from binary programs. The trainer actively supports the students to successfully solve the given exercises. After a task is completed, we discuss different solutions in class. Furthermore, students receive detailed reference solutions that can be used during and after the course.
While this class mostly focuses on the X86-64 architecture, we can optionally take a look at the ARM32 architecture and discuss their differences and similarities. Since the course teaches reverse engineering in a general way, students will notice that all techniques and tools can also be applied to other architectures.
Learn reverse engineering from scratch and understand all layers between machine code and high-level languages
Become proficient in using state-of-the-art tools like IDA, Ghidra and GDB
Learn how to reconstruct (nested) conditionals and loops, functions, complex data structures and C++ classes from machine code
Get to know strategies to analyze complex binaries and apply them to nation-state malware samples
Deepen your reverse engineering skills in various hands-on sessions
The training orientates at the following outline:
Introduction to Reverse Engineering
Reconstruction of Functions
Reconstruction of Data Structures
C++ Reverse Engineering
Malware Reverse Engineering
ARM32 Architecture (Optional)
The participants should have some familiarity with low-level programming in C. Particularly, a basic understanding of pointers is recommended.
We will use Zoom for video conferencing and slides/desktop sharing, while we use Slack as a live chat system for questions and other communication. It is recommended to use a multi-desktop environment where Zoom is used on one screen while the student can process tasks on the other. Furthermore, it is recommended to use a webcam so that participants are able to see each other, giving the training a more personal touch.
Students should have access to a computer with 4 GB RAM (minimum) and at least 20 GB disk space. Furthermore, they should install a virtualization software such as Virtual Box or VMware. Students will be provided with a Linux VM containing all necessary tools and setups.
Tim Blazytko is a well-known binary security researcher and co-founder of emproof. After working on novel methods for code deobfuscation, fuzzing and root cause analysis during his PhD, Tim now builds code obfuscation schemes tailored to embedded devices. Moreover, he gives trainings on reverse engineering & code deobfuscation, analyzes malware and performs security audits.