Let’s say we see the following arithmetic expression:
After I recently gave a workshop on the Analysis of Virtualization-based Obfuscation at r2con2021 (slides, code & samples and the recording are available online), I would like to use this blog post for a brief summary on how to write disassemblers for VM-based obfuscators based on symbolic execution.
In a previous blog post, we already discussed that it is valuable to know which code areas are obfuscated; those areas often guard sensitive code and are worth a closer look. Furthermore, we designed a heuristic to automatically detect control-flow flattening and state machines in binaries by identifying specific loop characteristics in the control-flow graph. However, other code obfuscation techniques such as opaque predicates, complex arithmetic encodings or virtualization are not necessarily covered by this heuristic, especially if the control-flow graph is loop-free. For these cases, we have to develop new heuristics to identify obfuscation.
Automation plays a crucial rule in reverse engineering, no matter whether we search for vulnerabilities in software, analyze malware or remove obfuscated layers from code. Once we manually identify repeating patterns, we try to automate the process as far as possible. For automation, it often doesn’t matter if you use Binary Ninja, IDA Pro or Ghidra, as long as you have the knowledge how to realize it in your tool of choice. As you will see, you don’t have to be an expert to automate tedious reverse engineering tasks; sometimes it just takes a few lines of code to improve your understanding a lot.
Following my last blog post, I got a lot of questions about additional material on control-flow analysis. While most compiler books (such as the Dragon Book) cover related topics in-depth, I decided to publish my own presentation that was initially built for (but never made it into) my training class on software deobfuscation. The slide deck illustrates the theory of control-flow graph construction, dominance relations and loop analysis. In the second part of this post, I would like to show you how to play around with these concepts using the reverse engineering framework Miasm.
Commercial businesses and malware authors often use code obfuscation to protect specific code areas to impede reverse engineering. In my experience, knowing which code areas are obfuscated often pinpoints sensitive code parts that are worth a closer look. For example, the FinSpy samples that were discovered in September 2020 obfuscate their main modules with Obfuscator-LLVM, while the two-staged dropper isn’t obfuscated at all.