<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://synthesis.to/feed.xml" rel="self" type="application/atom+xml" /><link href="https://synthesis.to/" rel="alternate" type="text/html" /><updated>2026-05-05T17:42:49+02:00</updated><id>https://synthesis.to/feed.xml</id><title type="html">Tim Blazytko’s Blog</title><entry><title type="html">Building a Pipeline for Agentic Malware Analysis</title><link href="https://synthesis.to/2026/03/18/agentic_malware_analysis.html" rel="alternate" type="text/html" title="Building a Pipeline for Agentic Malware Analysis" /><published>2026-03-18T00:00:00+01:00</published><updated>2026-03-18T00:00:00+01:00</updated><id>https://synthesis.to/2026/03/18/agentic_malware_analysis</id><content type="html" xml:base="https://synthesis.to/2026/03/18/agentic_malware_analysis.html"><![CDATA[<p>Large language models have already become useful helpers in reverse engineering: they can clean up decompiler output, suggest better function names, answer focused questions about unfamiliar code, and write small helper scripts. I discussed some of these LLM-assisted reversing use cases in my REcon 2024 talk <a href="https://synthesis.to/presentations/recon24_llm_reversing.pdf">The Future of Reverse Engineering with Large Language Models</a>. So far, though, the model has mostly stayed in a supporting role: we still decide what is worth following, what to inspect next, and where deeper reversing effort is needed.</p>

<p>What changes this picture is the rise of agents. Once a model can use tools in a loop, inspect intermediate results, and continue analysis on its own, it can take over meaningful parts of the workflow. We already see that across tasks such as <a href="https://c.mov/nfuncs-agent/">solving CTF challenges</a>, <a href="https://security.googleblog.com/2023/08/ai-powered-fuzzing-breaking-bug-hunting.html">generating fuzzing harnesses</a>, <a href="https://zeltser.com/ai-malware-analysis-remnux">malware analysis</a>, and <a href="https://sean.heelan.io/2026/01/18/on-the-coming-industrialisation-of-exploit-generation-with-llms/">exploit-generation experiments</a>. However, while agents are already useful on their own, they become substantially more effective once they are given the right environment, tooling, and guidance.</p>

<p>In this post, we show how to build such an agentic malware-analysis pipeline in practice. We demonstrate this on a concrete malware case study in which some functionality is easy to identify from strings and APIs, while other parts only reveal themselves after deeper analysis. We first introduce the basic ingredients of agentic workflows, then look at what a strong general-purpose agent can find in the sample on its own, and finally compare that to the same agent embedded in a structured workflow. This lets us compare what already works, what the pipeline improves, and which limitations remain.</p>

<p>The companion repository with the analysis environment, pipeline, and sample setup is available <a href="https://github.com/mrphrazer/agentic-malware-analysis">here</a>.</p>

<h2 id="agents">Agents</h2>

<p>For the purposes of this post, we can use a simple definition: an <a href="https://simonwillison.net/2025/Sep/18/agents/"><em>agent</em> is an LLM that can use tools in a loop to achieve a goal</a>. Instead of answering once inside a single prompt, it can inspect tool output, decide what to do next, execute another step, and continue until it reaches a result. That is the key shift from <em>LLM-assisted reverse engineering</em> to <em>agentic workflows</em>.</p>

<p>The basic loop is straightforward. The model receives a task, chooses a tool, sees the result, and repeats this process until it can produce a final answer. In simplified pseudocode, such a reverse-engineering agent could look as follows:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># define tools
</span><span class="n">tools</span> <span class="o">=</span> <span class="p">{</span>
    <span class="sh">"</span><span class="s">strings</span><span class="sh">"</span><span class="p">:</span> <span class="n">strings</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">disasm</span><span class="sh">"</span><span class="p">:</span> <span class="n">disasm</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">decompile</span><span class="sh">"</span><span class="p">:</span> <span class="n">decompile</span><span class="p">,</span>
<span class="p">}</span>

<span class="c1"># set LLM instructions
</span><span class="n">msgs</span> <span class="o">=</span> <span class="p">[</span>
    <span class="p">{</span>
        <span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">system</span><span class="sh">"</span><span class="p">,</span>
        <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="p">(</span>
            <span class="sh">"</span><span class="s">You are a reverse-engineering agent. </span><span class="sh">"</span>
            <span class="sh">"</span><span class="s">Use tools to analyze binaries.</span><span class="sh">"</span>
        <span class="p">),</span>
    <span class="p">},</span>
    <span class="p">{</span>
        <span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">user</span><span class="sh">"</span><span class="p">,</span>
        <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="p">(</span>
            <span class="sh">"</span><span class="s">Figure out what function sub_401230 does.</span><span class="sh">"</span>
        <span class="p">),</span>
    <span class="p">},</span>
<span class="p">]</span>

<span class="c1"># agent loop
</span><span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
    <span class="n">r</span> <span class="o">=</span> <span class="nf">llm</span><span class="p">(</span><span class="n">msgs</span><span class="p">,</span> <span class="n">tools</span><span class="o">=</span><span class="n">tools</span><span class="p">)</span>

    <span class="k">if</span> <span class="n">r</span><span class="p">.</span><span class="nb">type</span> <span class="o">==</span> <span class="sh">"</span><span class="s">final</span><span class="sh">"</span><span class="p">:</span>
        <span class="nf">print</span><span class="p">(</span><span class="n">r</span><span class="p">.</span><span class="n">content</span><span class="p">)</span>
        <span class="k">break</span>

    <span class="n">result</span> <span class="o">=</span> <span class="n">tools</span><span class="p">[</span><span class="n">r</span><span class="p">.</span><span class="n">tool_name</span><span class="p">](</span><span class="o">**</span><span class="n">r</span><span class="p">.</span><span class="n">tool_args</span><span class="p">)</span>
    <span class="n">msgs</span> <span class="o">+=</span> <span class="p">[</span>
        <span class="n">r</span><span class="p">,</span>
        <span class="p">{</span>
            <span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">tool</span><span class="sh">"</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">:</span> <span class="n">r</span><span class="p">.</span><span class="n">tool_name</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="nf">str</span><span class="p">(</span><span class="n">result</span><span class="p">),</span>
        <span class="p">},</span>
    <span class="p">]</span>
</code></pre></div></div>

<p>The agent can retrieve strings, disassembly, and decompiler output, decide which tool to call next, observe the result, and iteratively refine its understanding until it can answer the question. Even this simple setup is already enough to turn a model from a passive assistant into an active analysis loop.</p>

<p>While there are good reasons to implement such loops from scratch, and official examples from both <a href="https://cookbook.openai.com/examples/reasoning_function_calls">OpenAI</a> and <a href="https://docs.anthropic.com/en/docs/build-with-claude/computer-use">Anthropic</a> show this approach directly, in practice many people will use command-line agents such as OpenAI’s <a href="https://github.com/openai/codex">Codex CLI</a>, Anthropic’s <a href="https://github.com/anthropics/claude-code">Claude Code</a>, or open-source agents like <a href="https://github.com/anomalyco/opencode">opencode</a>. These tools already handle the interactive loop, tool execution, and session state for us.</p>

<p>For reverse engineering, the next question is how we connect the right analysis tooling and how we guide the workflow. The <a href="https://modelcontextprotocol.io/"><em>Model Context Protocol (MCP)</em></a>, as defined in the <a href="https://modelcontextprotocol.io/specification/latest/">official specification</a>, standardizes how agents discover and call external tools. Through MCP, an agent can query a disassembler, collect metadata, inspect cross-references, or run helper scripts without us manually copying results back and forth. Projects such as <a href="https://github.com/LaurieWired/GhidraMCP">GhidraMCP</a> and <a href="https://github.com/mrexodia/ida-pro-mcp">ida-pro-mcp</a> make this kind of integration directly available for <a href="https://ghidra-sre.org/">Ghidra</a> and <a href="https://hex-rays.com/ida-pro/">IDA Pro</a>.</p>

<p><a href="https://agentskills.io/"><em>Agent skills</em></a>, on the other hand, are reusable, task-specific workflows for agents. Their main purpose is to tell the agent how to approach a task. In practice, a skill can include a workflow, supporting scripts, file templates, examples, and concrete expectations for the output. This is useful across many reverse-engineering tasks: for a crackme, we may want the agent to first identify the relevant input checks and success paths before it starts patching the binary. For obfuscated code, we may want it to first classify the protection, determine where and how it appears in the binary, and then derive a suitable workflow and tooling strategy to deobfuscate it. In malware analysis, we may want the agent to first collect broad evidence such as strings and API imports, identify the most relevant points of interest, and only then inspect the surrounding code regions in the disassembler.</p>

<p>However, tool access and workflow guidance still do not solve the main practical limitation of unstructured agent runs: the <em>context window</em>. This is the temporary amount of information an agent can keep in view during a session, including chat history, tool output, notes, and intermediate observations. In reverse engineering, this window fills up quickly. Large amounts of disassembly or decompiler output, especially for large functions or groups of related functions, can already consume a significant fraction of the available context. Once the context becomes too large, the agent usually compacts it into a summary. As a consequence, details may be lost, the analysis may drift, work may be repeated, or the agent may simply get lost.</p>

<p>To deal with this limited context, two things are especially useful in practice: subagents and planning. With <em>subagents</em>, the main agent can offload focused work to a child agent with its own context window, such as analyzing a single function or tracing references to a string. Anthropic documents this pattern explicitly as <a href="https://docs.anthropic.com/en/docs/claude-code/sub-agents">subagents</a>, while OpenAI discusses comparable delegation setups as <a href="https://openai.github.io/openai-agents-python/multi_agent/">multi-agent orchestration</a>. <em>Planning</em> helps with a different part of the same problem: it does not give the agent more context, but it can help it spend that context more deliberately. For example, Claude Code exposes this explicitly as <a href="https://docs.anthropic.com/en/docs/claude-code/common-workflows#plan-mode">Plan Mode</a>, where the agent first explores the problem with read-only operations and works out which evidence to collect and which leads are worth following.</p>

<p>Still, neither subagents nor planning solve the underlying issue. If the context gets compacted, the agent may still lose track of prior findings because it has no <em>persistent memory</em>, no stable notes to recover from, and no workflow it can resume reliably.</p>

<p>With these foundations in place, let us now look at what a strong agent can do with tool access alone. Afterwards, we will build a pipeline around the same setup and examine how much that changes the depth and reliability of the analysis.</p>

<h2 id="baseline-unstructured-agent-based-malware-analysis">Baseline: Unstructured Agent-based Malware Analysis</h2>

<p>To get a feeling for what a strong general-purpose agent can already do on its own, we start with a simple setup: direct access to a disassembler, but no task-specific workflow, no persistent case file, and no additional guidance beyond the analysis prompt. In other words, the agent can drive the tools, but it has to structure the investigation on its own. That setup is already useful in practice, and we should expect it to produce a decent first triage and some concrete leads. The more interesting question is where it starts to flatten out once the sample requires deeper reconstruction.</p>

<p>As our test case, we use <code class="language-plaintext highlighter-rouge">mfc42ul.dll</code> from the German “Staatstrojaner” (federal Trojan horse) case publicized by the <a href="https://www.ccc.de/en/updates/2011/staatstrojaner">Chaos Computer Club (CCC)</a>. The sample is a good fit because some of its functionality is relatively easy to spot from strings, imports, and a quick pass through the code: screenshots, Skype interception, persistence, and covert communication. But it also contains deeper logic that is much harder to recover from a shallow analysis, including process-aware activation, command dispatching, protocol details, hardcoded C2 information, and the statically embedded AES key. That makes it a useful test case: it lets us judge not only whether the agent finds something interesting, but also how far the analysis actually goes.</p>

<p>In our initial run, we use OpenAI’s <a href="https://github.com/openai/codex">Codex CLI</a> with <code class="language-plaintext highlighter-rouge">gpt-5.4 xhigh</code> and give it access to my headless Binary Ninja MCP server <a href="https://github.com/mrphrazer/binary-ninja-headless-mcp"><code class="language-plaintext highlighter-rouge">binary-ninja-headless-mcp</code></a>. We then ask the agent to analyze the sample via the MCP server and produce a detailed overview of the malware, its features, and the corresponding code locations. At this point, the agent has everything it needs to inspect the binary and drive the analysis on its own:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>› Analyze the malware in examples/samples/mfc42ul.dll
  Give me a detailed overview of the sample’s functionality and features,
  together with the corresponding code locations. Use the MCP server to
  perform the analysis.
</code></pre></div></div>

<p>One first useful observation is that the agent does not behave randomly. Even without additional workflow guidance, it starts with a fairly sensible triage strategy: open the sample, collect high-level metadata, identify likely behavior from strings and imports, and then follow those leads into the code. The following excerpt shows how the run begins:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>• I’m analyzing examples/samples/mfc42ul.dll with the Binary Ninja MCP tools.

• Updated Plan
  └ Static malware triage and behavior reconstruction for the DLL using Binary
    Ninja MCP.
    □ Open the DLL in Binary Ninja and collect high-level metadata.
    □ Identify likely behavior clusters from strings/xrefs.
    □ Trace the main execution paths and map capabilities to concrete code locations.

• Called
  └ binary_ninja_headless_mcp.session.open({...})
    ok session_id=...
</code></pre></div></div>

<p>As the run continues, the same pattern remains visible: broad metadata collection first, then pivots off suspicious strings and imports, and only gradually deeper inspection of the code.</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>• The file is a 32-bit PE DLL with a large code surface, so I’m doing a fast
  triage pass before drilling into behavior. I’m collecting exports,
  imports, and structural metadata now, then I’ll pivot off suspicious
  strings and entry-point code.
</code></pre></div></div>

<p>This is already a useful and fairly reasonable starting point. The agent is not guessing blindly; it is using the tooling in a disciplined way and follows a plausible initial workflow. At the same time, the limitation is already visible: the analysis is driven primarily by obvious pivots such as strings, imports, and surface-level structure. That is often enough for a first triage, but not necessarily enough to recover the deeper logic of the sample.</p>

<p>After roughly 15 minutes and 268 tool calls, the agent produced a first high-level summary of the sample. The result is already quite useful for an initial triage. The agent:</p>

<ul>
  <li>
    <p>identifies the dropped driver <code class="language-plaintext highlighter-rouge">winsys32.sys</code>, the corresponding installation logic, and the driver device <code class="language-plaintext highlighter-rouge">\\.\KeyboardClassC</code>, and ties them to functions such as <code class="language-plaintext highlighter-rouge">sub_1000e8a0</code> and <code class="language-plaintext highlighter-rouge">sub_10003a40</code>,</p>
  </li>
  <li>
    <p>notices the Skype-related markers and target process names, and links them to nearby control logic around <code class="language-plaintext highlighter-rouge">sub_10007490</code>,</p>
  </li>
  <li>
    <p>recovers the proxy-discovery logic in <code class="language-plaintext highlighter-rouge">sub_1000ead0</code>, including registry-based handling of Internet Explorer proxy settings,</p>
  </li>
  <li>
    <p>identifies a proxy-aware outbound transport based on HTTP <code class="language-plaintext highlighter-rouge">CONNECT</code>, together with the protocol marker <code class="language-plaintext highlighter-rouge">C3PO-r2d2-POE</code>, in <code class="language-plaintext highlighter-rouge">sub_10006790</code>, and</p>
  </li>
  <li>
    <p>finds a secondary payload path around <code class="language-plaintext highlighter-rouge">sub_1000d470</code>, including temporary-file staging, execution, and later cleanup.</p>
  </li>
</ul>

<p>This is already a meaningful speedup for the initial analysis: the agent produces a useful high-level overview of the sample’s features and maps them to concrete code locations. At the same time, the result remains fairly shallow. It does not identify the cryptographic code, recover the hardcoded key, or reconstruct the deeper protocol logic. In other words, it is a good first triage, but not yet a deep analysis.</p>

<p>Another practical problem is that the run nearly exhausts the available context window. By the end, only a small fraction of the context remains available. This makes the setup brittle: even when the agent finds useful leads, it has limited ability to preserve and build on them once the investigation grows larger or requires repeated deep dives into related code. Large chunks of disassembly and decompiler output consume context quickly, and once earlier findings are compacted into summaries, details can easily be lost.</p>

<p>So the baseline is already helpful, but mostly as triage. It points us to relevant functionality, recovers several visible features, and gives us concrete starting points for manual inspection. What it does not yet provide is a deeper reconstruction of how the malware is organized internally.</p>

<p>To see whether this is mostly a Codex-specific limitation or a more general pattern, we can repeat the same experiment with another strong agent. Using Anthropic’s <a href="https://github.com/anthropics/claude-code">Claude Code</a> with <code class="language-plaintext highlighter-rouge">Opus 4.6</code> and <code class="language-plaintext highlighter-rouge">high effort</code>, the overall picture improves somewhat. In a similar amount of time, Claude recovers a few additional details:</p>

<ul>
  <li>
    <p>the audio-recording pipeline, including dual-channel capture and the related <code class="language-plaintext highlighter-rouge">waveIn*</code> handling,</p>
  </li>
  <li>
    <p>the SNMP-based host fingerprinting together with the VMware MAC-prefix filter in the anti-analysis logic, and</p>
  </li>
  <li>
    <p>the logic that reads the DLL’s own filename at runtime, suggesting that the sample checks whether it is still running under the expected module name.</p>
  </li>
</ul>

<p>At the same time, the overall pattern remains the same: Claude also gives a useful first overview, but it still does not really reconstruct the deeper logic of the sample.</p>

<p>Overall, these experiments suggest that changing the model helps only to a limited extent; it does not solve the main problem. As we will see, what matters much more is whether the agent has clear guidance, a structured plan, and a way to continue the analysis even after context-window resets. Let us now look at how to build that into the workflow.</p>

<h2 id="agentic-analysis-pipeline">Agentic Analysis Pipeline</h2>

<p>The earlier experiments already show the main limitations that the workflow needs to address. A strong general-purpose agent with access to a binary-analysis engine can produce a useful first triage: it recovers visible functionality, finds sensible pivots, and maps obvious behavior to code locations. In our case, that was enough to identify some of the easier-to-spot features, but not the cryptographic code, the deeper protocol logic, or the internal organization of the sample. At the same time, the run was already approaching the limits of the context window.</p>

<p>To address these problems, we need a workflow that gives the agent a clear analytical structure and a persistent external memory it can return to. In the end, our goal is to make the analysis more repeatable, resumable, and capable of going deeper than unstructured tool-driven exploration without exhausting the context window unnecessarily. Therefore, we can now define four design goals for the workflow:</p>

<ol>
  <li>
    <p><strong>Start with broad fingerprinting.</strong> The agent should first collect a wide range of signals—strings, API imports, symbol names, crypto indicators, points of interest, and other easy-to-extract artifacts—to build a rough high-level picture of the sample before drifting into deeper analysis.</p>
  </li>
  <li>
    <p><strong>Keep raw evidence separate from higher-level reasoning.</strong> Raw artifacts (e.g., strings), extracted signals, planning results, hypotheses, and interpretations should not all be mixed together. This lets the agent work from structured summaries without losing access to the underlying evidence.</p>
  </li>
  <li>
    <p><strong>Follow a clearly staged process.</strong> The workflow should make explicit what has already been done, what comes next, which questions remain open, and what every stage is expected to produce. In implementation terms, stages are defined by templates, expected output artifacts, and lightweight validators that check whether the required files and fields are present before the workflow continues. This ensures that the workflow produces explicit intermediate artifacts that later stages can build on directly.</p>
  </li>
  <li>
    <p><strong>Externalize state to disk.</strong> Notes, artifacts, progress, hypotheses, and reports should be written to persistent storage so that the analysis can continue even after the context window fills up.</p>
  </li>
</ol>

<p>We address these goals by structuring the agent around a staged workflow that roughly mirrors how a human analyst would approach a malware sample: first collecting broad signals such as strings, API imports, and signatures; then building a rough picture of the sample; forming initial hypotheses; prioritizing the most promising leads; and finally validating or falsifying those hypotheses through deeper analysis. To keep this workflow manageable, we <em>separate responsibilities</em> across three roles: the orchestrator, the planner, and the reporter.</p>

<p>The <em>orchestrator</em> coordinates execution, initializes the case directory, checks artifact completeness, and manages phase transitions. The <em>planner</em> turns intermediate evidence into hypotheses, priorities, and concrete deep-analysis tasks. The <em>reporter</em> turns the accumulated artifacts into a consumable write-up, separates observed facts from inferred claims, and records uncertainty and next steps. During exploration, the orchestrator can also delegate focused collection tasks to subagents and merge their outputs back into the case state.</p>

<p>At a high level, the workflow looks as follows:</p>

<ol>
  <li>
    <p><strong>Broad fingerprinting and triage:</strong> Collect cheap signals such as file metadata, hashes, entropy-related hints, strings, imports, symbol names, <a href="https://virustotal.github.io/yara/"><code class="language-plaintext highlighter-rouge">YARA</code></a> matches, <a href="https://mandiant.github.io/capa/"><code class="language-plaintext highlighter-rouge">capa</code></a> results, compiler or packer hints, crypto-related indicators, and other obvious points of interest to build an initial picture of the sample.</p>
  </li>
  <li>
    <p><strong>Filtering and grouping:</strong> Filter the raw output, group related findings, and surface the strings, imports, capability clues, and code regions that look most relevant.</p>
  </li>
  <li>
    <p><strong>Hypothesis building:</strong> Use those findings to form initial ideas about what the sample does, which major components it seems to contain, and which leads are worth following.</p>
  </li>
  <li>
    <p><strong>Internal structure mapping:</strong> Build a working map of how the sample appears to be organized internally: which parts seem to handle persistence, communication, collection, injection, or other major tasks, and how they fit together.</p>
  </li>
  <li>
    <p><strong>Deep-analysis planning:</strong> Select and prioritize the functions, components, features, and hypotheses that are the best candidates for deeper inspection with the binary-analysis tooling.</p>
  </li>
  <li>
    <p><strong>Targeted deep analysis:</strong> Inspect those high-priority targets with the binary-analysis tooling to validate or revise earlier assumptions and recover the missing low-level logic.</p>
  </li>
</ol>

<p>The key improvement comes from imposing a structured workflow on the analysis. Instead of letting the agent interact with tools in an open-ended way, the workflow organizes the process into explicit stages with concrete intermediate artifacts. Raw evidence is collected first, then filtered and grouped, and then turned into hypotheses, structure maps, and plans for deeper inspection. As new evidence emerges, the agent can revise earlier assumptions and update the case state, while continuing from the artifacts produced so far.</p>

<p>To support this, the pipeline uses a persistent on-disk case directory for every sample. After each major step, artifacts, notes, ranked observations, hypotheses, plans, and reports are updated and stored. In effect, this directory becomes the agent’s externalized working memory. If the context window gets compacted, the agent can reread prior artifacts and continue from the current state. This also makes the workflow inspectable and reproducible, since intermediate results are no longer hidden inside transient model context.</p>

<p>We implemented this pipeline as a portable analysis environment that combines helper scripts, a reusable orchestration skill, and MCP-connected disassembler tooling into a single workflow. The agent runs inside a dedicated Docker container defined by the <a href="https://github.com/mrphrazer/agentic-malware-analysis/blob/main/Dockerfile">Dockerfile</a>, with access to the available tools, scripts, and MCP servers. The workflow is packaged as a <a href="https://github.com/mrphrazer/agentic-malware-analysis/tree/main/agent_helpers/claude/skills/malware-analysis-orchestrator"><code class="language-plaintext highlighter-rouge">malware-analysis-orchestrator</code> skill for Claude Code</a> together with a <a href="https://github.com/mrphrazer/agentic-malware-analysis/tree/main/agent_helpers/codex/skills/malware-analysis-orchestrator">matching Codex variant</a>, and can use either our <a href="https://github.com/mrphrazer/binary-ninja-headless-mcp">Binary Ninja Headless MCP server</a> or <a href="https://github.com/mrphrazer/ghidra-headless-mcp">Ghidra Headless MCP server</a> for static analysis. The setup is designed to remain generic across agents, binary formats, and analysis backends. The full pipeline is available on <a href="https://github.com/mrphrazer/agentic-malware-analysis">GitHub</a>.</p>

<p>With that in place, we can rerun the same <code class="language-plaintext highlighter-rouge">mfc42ul.dll</code> sample under the structured workflow and compare the result to the earlier analysis.</p>

<h2 id="evaluation">Evaluation</h2>

<p>In the following, we focus on the Claude Code runs, since they produced the clearest write-up for this section. The runtime remained in roughly the same range as before. Codex improved in the same direction once the workflow was enabled as well, but we omit its output here because it would not change the overall conclusion.</p>

<p>Before turning to the results, let us briefly recall what is present in <code class="language-plaintext highlighter-rouge">mfc42ul.dll</code>. Some features—such as screenshot capture, Skype interception, persistence, and proxy handling—were relatively easy to identify from strings, API imports, and a shallow code pass. However, the sample also contains deeper logic and embedded data that are much harder to recover, including process-aware activation, command dispatching, protocol details, hardcoded C2 information, and the statically embedded AES key. These are exactly the parts where the structured workflow has to prove its value.</p>

<p>To evaluate this, we slightly expand the earlier query and explicitly instruct the agent to use the <code class="language-plaintext highlighter-rouge">malware-analysis-orchestrator</code> skill:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>❯ Analyze the malware in examples/samples/mfc42ul.dll -- Give me a detailed
  overview of the sample's functionality and features, together with the
  corresponding code locations. Use the skill
  /agent/agent_helpers/claude/skills/malware-analysis-orchestrator/ for
  analysis.
</code></pre></div></div>

<p>After roughly 17 minutes, the workflow completes its first pass. The agent populates all 13 required case files, identifies 13 functional components, records 10 behavior hypotheses, and assigns HIGH confidence to 8 of them. Even before any dedicated deep-validation step, the resulting overview is already materially richer than the earlier triage-oriented summary. To keep the presentation manageable, we show a slightly shortened version of the first-pass output below:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>● Analysis complete. All 13 artifacts populated. Here is the detailed overview:

  1. Process-Aware Initialization (sub_1000f590 @ 0x1000f590)
  2. Persistence — AppInit_DLLs (sub_1000d680 @ 0x1000d680)
  3. Persistence — Kernel Driver Service (sub_10003a40 @ 0x10003a40)
  6. C2 Communication (sub_10006790 @ 0x10006790)
  9. Skype VoIP Interception (sub_10007720 @ 0x10007720)
  14. Central Orchestrator (sub_1000ecc0 @ 0x1000ecc0)

  All artifacts are in status/000-mfc42ul.dll/. The analysis identified
  13 functional components, 10 behavior hypotheses (8 HIGH confidence),
  56 capa capabilities, and produced comprehensive IOCs for detection engineering.
</code></pre></div></div>

<p>Already at this stage, the workflow no longer stops at visible features alone. It begins to group functionality into components, map those components to specific code regions, and point toward the sample’s internal control structure. That is already a qualitatively different kind of result.</p>

<p>The difference is not just that the workflow finds more. It also writes much more than a short final summary. Instead, the run produces a detailed set of intermediate artifacts for hypotheses, component mapping, planning, prioritization, and reporting. The following files are just a few examples of the case state the workflow yields:</p>

<ul>
  <li>
    <p><code class="language-plaintext highlighter-rouge">05_behavior_hypotheses.md</code> records the current hypotheses, the supporting evidence, their confidence, and the checks that are still missing.</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">08_deep_analysis_plan.md</code> turns those hypotheses into concrete follow-up tasks, with target functions, goals, and expected findings.</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">09_priority_queue.md</code> tracks which open questions matter most, why they matter, and what depends on resolving them.</p>
  </li>
  <li>
    <p><code class="language-plaintext highlighter-rouge">10_reporting_draft.md</code> brings the current findings, code references, IOCs, and remaining gaps into a draft report that the agent can return to later.</p>
  </li>
</ul>

<p>From a higher-level perspective, the earlier run compares to the structured workflow as follows:</p>

<table>
  <thead>
    <tr>
      <th>Aspect</th>
      <th>Earlier run</th>
      <th>Structured workflow</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Runtime</td>
      <td>~15 min</td>
      <td>~17 min</td>
    </tr>
    <tr>
      <td>Output form</td>
      <td>High-level triage summary</td>
      <td>Structured case directory with 13 required files</td>
    </tr>
    <tr>
      <td>Deeper logic recovery</td>
      <td>Limited</td>
      <td>Materially better</td>
    </tr>
    <tr>
      <td>Follow-up readiness</td>
      <td>Weak</td>
      <td>Strong</td>
    </tr>
    <tr>
      <td>Persistent state</td>
      <td>No</td>
      <td>Yes</td>
    </tr>
  </tbody>
</table>

<p>Most importantly, the first pass no longer leaves us with just a short summary. Because it produces ranked hypotheses and a persistent case directory, we can continue directly from that state rather than restarting the investigation. We therefore ask the agent to validate the 8 HIGH-confidence hypotheses. Roughly 30 minutes later, the workflow produces a much deeper second pass:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Validation Results

H1: Full-Featured Surveillance RAT — VALIDATED
- Discovered complete command dispatch vtable at 0x1003d7d0 with 10 remotely-callable commands
- Mapped orchestrator vtable at 0x1003d7f8 with C2 data loop, heartbeat (byte 0x26), fingerprinting

H3: Process-Aware Module Activation — VALIDATED
- Confirmed GetModuleFileNameA + case-insensitive path matching in sub_10003990
- Full init chain: DllMain → process detect → module alloc → orchestrator

H6: C2 with AES-128 Encryption — VALIDATED
- Protocol vtable at 0x1003d330: encrypt+send, recv+decrypt, raw I/O via WinSock send()/recv()
- Encrypt: sub_10012130, Decrypt: sub_10013260, Key expansion: sub_10014410
- Wire format: C3PO-r2d2-POE → 4-byte length → AES-128-ECB encrypted payload
</code></pre></div></div>

<p>Three improvements stand out here. First, the workflow recovers the internal command dispatch rather than stopping at the sample’s visible high-level capabilities. Second, it validates the process-aware activation logic and reconstructs the initialization chain from <code class="language-plaintext highlighter-rouge">DllMain</code> into the orchestrator. Third, it begins to recover the protocol layer itself, including the protocol vtable, the send and receive paths, and the cryptographic routines that are used together with the C2 communication.</p>

<p>A closer look at the recovered command-dispatch table shows the kind of internal structure that the deeper pass can uncover:</p>

<table>
  <thead>
    <tr>
      <th>Vtable Index</th>
      <th>Function</th>
      <th>Opcode</th>
      <th>Purpose</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>[0]</td>
      <td><code class="language-plaintext highlighter-rouge">sub_1000d270</code></td>
      <td>—</td>
      <td>Driver drop/installation</td>
    </tr>
    <tr>
      <td>[1]</td>
      <td><code class="language-plaintext highlighter-rouge">sub_1000e2b0</code></td>
      <td>—</td>
      <td>Spawn monitoring thread</td>
    </tr>
    <tr>
      <td>[2]</td>
      <td><code class="language-plaintext highlighter-rouge">sub_1000e210</code></td>
      <td>—</td>
      <td>Driver cleanup (IOCTL 0xb2012)</td>
    </tr>
    <tr>
      <td>[3]</td>
      <td><code class="language-plaintext highlighter-rouge">sub_1000d680</code></td>
      <td>—</td>
      <td>AppInit_DLLs persistence write</td>
    </tr>
    <tr>
      <td>[4]</td>
      <td><code class="language-plaintext highlighter-rouge">sub_1000db50</code></td>
      <td>—</td>
      <td>Self-update via named pipe</td>
    </tr>
    <tr>
      <td>[5]</td>
      <td><code class="language-plaintext highlighter-rouge">sub_1000e120</code></td>
      <td>—</td>
      <td>System reboot (EWX_REBOOT|EWX_FORCE)</td>
    </tr>
    <tr>
      <td>[6]</td>
      <td><code class="language-plaintext highlighter-rouge">sub_1000e370</code></td>
      <td>—</td>
      <td>Config update (“%d %d %d %d %d %d”)</td>
    </tr>
    <tr>
      <td>[7]</td>
      <td><code class="language-plaintext highlighter-rouge">sub_1000d470</code></td>
      <td>0x23</td>
      <td>Download-and-execute (%TEMP%~tmp%08x~.exe)</td>
    </tr>
    <tr>
      <td>[8]</td>
      <td><code class="language-plaintext highlighter-rouge">sub_10006710</code></td>
      <td>0x27</td>
      <td>Enable + send command</td>
    </tr>
    <tr>
      <td>[9]</td>
      <td><code class="language-plaintext highlighter-rouge">sub_10006780</code></td>
      <td>—</td>
      <td>Disable (clear flag at obj+0x20)</td>
    </tr>
  </tbody>
</table>

<p>Taken together, this is a substantial step up from the earlier runs. The workflow does not just recover a longer list of visible features; it begins to expose the malware’s internal organization: how capabilities are dispatched, how activation depends on the host process, and where protocol handling and cryptographic routines sit in relation to the C2 communication. At this point, the agent already leaves behind a much more useful working model of the sample than a conventional first-pass triage.</p>

<p>The result is still not perfect. In this run, the agent did not recover the hardcoded C2 server or the embedded AES key, and it reported AES-128-ECB even though the sample actually uses AES-256-ECB. However, this miss should be interpreted in context. The workflow did identify the relevant protocol path, the wire format, the use of ECB mode, and the cryptographic routines involved in the C2 communication. In other words, it did not miss the protocol layer entirely; its reconstruction remained incomplete at some of the finer details. At the same time, the run remains non-deterministic: a rerun may recover additional details, emphasize different parts of the sample, or make different mistakes. The result should therefore not be treated as a final report, but as an intermediate state.</p>

<p>Even with these gaps, the improvement is substantial. The structured workflow pushes the agent well beyond loose triage and leaves behind a much stronger intermediate state: a case directory with evidence, ranked hypotheses, priorities, and partially validated internal structure. In practice, that can save an analyst hours to days of early-stage work. Instead of spending most of the time collecting obvious clues and organizing first impressions, we can focus on validating deeper findings, resolving the remaining unknowns, and correcting the places where the agent is still wrong.</p>

<p>Because the workflow preserves explicit case state, follow-up analysis also becomes much sharper. The analyst can guide the agent toward a specific hypothesis, subsystem, or unresolved question, and the agent can continue from the accumulated evidence rather than starting over. It can also annotate the analysis database itself—for example by renaming symbols, attaching notes, and marking relevant functions—so that the human analyst can continue from a better-structured and better-documented starting point.</p>

<h2 id="closing-remarks">Closing Remarks</h2>

<p>As we’ve seen, agents are already changing reverse engineering in a significant way. In this post, even the unstructured run was already useful as a first triage: it recovered visible functionality, pointed to relevant code locations, and produced concrete leads for follow-up. The real improvement, however, came from giving the agent a structured workflow, persistent case state, and the right analysis tooling. This also shows where agents are already particularly strong: scaling the mechanical parts of analysis, collecting evidence, correlating artifacts, generating scripts, and producing a first structured understanding of a binary far faster than a human could.</p>

<p>And we are still early. Models, tools, and agentic workflows are improving quickly, and the current generation already shows how much repetitive reverse-engineering work can be automated. Over time, this will likely push human analysts away from manually performing every step themselves and more toward guiding workflows, validating results, refining assumptions, and deciding where deeper analysis is needed.</p>

<p>But that does not make human expertise obsolete. Reverse engineering remains an adversarial task: binaries may be unusual, intentionally deceptive, or heavily obfuscated. In those settings, a plausible answer is not the same as a correct one. Agents can help explore and accelerate, but they still struggle when the analysis requires novel reasoning, strong validation, or a change in strategy.</p>

<p>Obfuscation is a good example. Even agentic workflows that perform well on triage and structured analysis can still break down once strong control-flow obfuscation, virtualization, custom packing, or non-standard data encodings enter the picture. At that point, the challenge shifts from simply driving tools to understanding how the protections work, testing hypotheses against the binary, and designing the right custom tooling or deobfuscation workflow. An LLM can help with that process, but the direction still has to come from the analyst.</p>

<p>In that sense, agents do not remove the need for reverse-engineering expertise; they raise the bar for where that expertise matters most. And if there is one area where this becomes especially visible, it is deobfuscation. For analysts who have to deal with protected or obfuscated binaries in practice, I cover many of these workflows in my hands-on training <a href="/training_software_deobfuscation.html">Software Deobfuscation Techniques</a>.</p>]]></content><author><name>Tim Blazytko</name></author><category term="reversing" /><category term="malware" /><category term="agentic" /><category term="agents" /><category term="automation" /><category term="mcp" /><category term="llm" /><summary type="html"><![CDATA[Building agentic malware-analysis workflows that combine strong initial triage with deeper automated recovery of malware structure and behavior.]]></summary></entry><entry><title type="html">Identification of API Functions in Binaries</title><link href="https://synthesis.to/2023/08/02/api_functions.html" rel="alternate" type="text/html" title="Identification of API Functions in Binaries" /><published>2023-08-02T00:00:00+02:00</published><updated>2023-08-02T00:00:00+02:00</updated><id>https://synthesis.to/2023/08/02/api_functions</id><content type="html" xml:base="https://synthesis.to/2023/08/02/api_functions.html"><![CDATA[<p><em>During my presentation <a href="https://recon.cx/cfp.recon.cx/2023/talk/QD8UNJ/index.html">“Unveiling Secrets in Binaries using Code Detection Strategies”</a> at REcon 2023 (<a href="/presentations/recon23_code_detection.pdf">slides</a>, <a href="https://www.youtube.com/watch?v=y95MNr2Xu-g">recording</a>, <a href="https://github.com/mrphrazer/obfuscation_detection">code</a>), I showcased heuristics how to navigate in unknown binaries in various reverse engineering settings, such as malware analysis, vulnerability discovery and embedded firmware analysis. In this talk, I also presented a simple but powerful technique to identify common API functions in statically-linked executables and in embedded firmware. This blog post will delve deeper into this subject, exploring these API functions, the intuition behind the heuristic, and its additional use cases in the context of malware analysis.</em></p>

<p>In reverse engineering, API calls provide crucial insights into a program’s behavior. They often perform high-level operations like file access, network communication, string operations or memory management. They help us reverse engineers to build a better understanding of what a program does and how it works. In the context of vulnerability research, we can locate areas of code that are commonly associated with vulnerabilities, like functions related to memory management or user input processing. For malware analysis, we can learn how a malware sample interacts with the operating system, networks, files, etc.; this can help us to understand its purpose and functionality.</p>

<p>However, in some cases, API functions are hard to identify in binaries: for example in statically-linked executables and in embedded firmware. In statically-linked binaries, the code of standard libraries and third-party libraries is incorporated directly into the binary. Since there is no clear separation between the user code and the libraries, without symbols, it is difficult to identify which functions belong to the program and which are from libraries. In the case embedded reverse engineering, firmware is often designed to directly interact with the hardware of a specific device; therefore, it may contain unique API functions which contain similar functionalities as in standard libraries, but are not present in those. Sometimes, manufacturers even use their own proprietary standard library implementations; without access to the source code or documentation, it can be nearly impossible to identify these functions.</p>

<p>In this blog post, we will initially discuss common methods for detecting API functions; then,  we’ll evaluate the heuristic and showcase additional use cases for malware analysis. So, if you are curious how to detect API functions in various reversing scenarios, stay along. If you want to play around with the heuristic on hour own, you can use my <a href="https://binary.ninja/">Binary Ninja</a> plugin <a href="https://github.com/mrphrazer/obfuscation_detection"><em>obfuscation_detection</em></a>.</p>

<h2 id="methods-to-detect-api-functions">Methods to Detect API Functions</h2>

<p>Without symbol information, it remains challenging to identify API functions in statically-linked binaries and embedded firmware for the reasons mentioned above. In practice, two common approaches rely on signature matching and cross-referencing with known libraries.</p>

<p>The concept behind <em>signature matching</em> is straightforward: it involves searching for specific byte sequences—or signatures—of known API functions within a binary. These signatures, stored in a database, each act as a distinct fingerprint for an associated API function. During analysis, the binary is scanned and the byte sequences of functions are compared to the signatures from the database. When a match occurs, it signifies the presence of that particular API function in the binary. Several systems employ this function signature method, such as <a href="https://hex-rays.com/ida-pro/">IDA Pro</a>’s <a href="https://docs.hex-rays.com/user-guide/signatures/flirt">Fast Library Identification and Recognition Technology (FLIRT)</a> and Binary Ninja’s <a href="https://github.com/Vector35/sigkit">Signature Kit Plugin</a>.</p>

<p>However, the efficacy of signature matching heavily depends on the accuracy of the signature database. This database must include signatures corresponding to the specific library version used in the binary; furthermore, it must be compiled for an identical platform (operating system &amp; CPU architecture) using similar compiler settings. Variations can cause significant differences and lead to mismatches, limiting the usability of such systems in practice.</p>

<p>The second method, <em>cross-referencing with known libraries</em>, employs a comparable strategy. In this method, binary diffing techniques (e.g., <a href="https://www.zynamics.com/bindiff.html">BinDiff</a>) are used to compare the binary against different versions of known libraries to locate similarities. When a match is found, the function name and type information from the known library can be transferred to the binary under investigation. Since this method considers additional features like control-flow graph structures and call hierarchies, it allows slight differences in library versions and compiler settings without causing a complete mismatch; this way, it potentially overcomes some limitations of the signature matching approach.</p>

<p>As a downside, this method often carries a significant degree of uncertainty due to the reliance on feature matching, which can lead to false positives and negatives. Furthermore, cross-referencing is still constrained by the availability of a corresponding library. If the binary was compiled with an uncommon or proprietary library for which there is no available version to compare against, these methods would not be effective.</p>

<p>In the following, we will take a look at another simple but effective heuristic, which has proven to be very efficient in many reversing scenarios.</p>

<h2 id="heuristic-frequently-called-functions">Heuristic: Frequently Called Functions</h2>

<p>One heuristic approach with notable potential is the analysis of call frequency: identifying functions within a binary that are invoked frequently by others. This simple technique can yield substantial insights: functions with high call frequencies are often of significant interest in the reverse engineering process, since they often represent core functionalities that the software leans heavily upon.</p>

<p>Often, these frequently called functions are crucial API functions, facilitating a wide range of fundamental operations within a program. Examples include string operations (e.g., <code class="language-plaintext highlighter-rouge">strlen</code>, <code class="language-plaintext highlighter-rouge">strcmp</code>), memory management functions (e.g., <code class="language-plaintext highlighter-rouge">malloc</code>, <code class="language-plaintext highlighter-rouge">free</code>, <code class="language-plaintext highlighter-rouge">memcpy</code>, and <code class="language-plaintext highlighter-rouge">memset</code>), file access methods (e.g., <code class="language-plaintext highlighter-rouge">open</code>, <code class="language-plaintext highlighter-rouge">close</code>), and network communication (e.g., <code class="language-plaintext highlighter-rouge">send</code>, <code class="language-plaintext highlighter-rouge">receive</code>). These API functions play pivotal roles within the binary, making them frequent call targets and providing valuable insights into the binary’s operation. Therefore, identifying these high-frequency call targets can offer valuable footholds for understanding the inner workings of the binary.</p>

<p>To implement this heuristic in your disassembler of choice, intuitively, we sort all functions by their number of independent callers in descending order. To focus on the most relevant findings, one might choose to concentrate, for example, on the top 10% of these functions.</p>

<p>In BinaryNinja, this operation can be achieved within a few lines of Python code:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">find_most_frequently_called_functions</span><span class="p">(</span><span class="n">bv</span><span class="p">):</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Frequently Called Functions</span><span class="sh">"</span><span class="p">)</span>

    <span class="c1"># print top 10% (iterate in descending order)
</span>    <span class="k">for</span> <span class="n">f</span><span class="p">,</span> <span class="n">num_callers</span> <span class="ow">in</span> <span class="nf">get_top_10_functions</span><span class="p">(</span><span class="n">bv</span><span class="p">.</span><span class="n">functions</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">f</span><span class="p">:</span> <span class="nf">len</span><span class="p">(</span><span class="n">f</span><span class="p">.</span><span class="n">callers</span><span class="p">)):</span>
        <span class="nf">print</span><span class="p">(</span>
            <span class="sa">f</span><span class="sh">"</span><span class="s">Function </span><span class="si">{</span><span class="nf">hex</span><span class="p">(</span><span class="n">f</span><span class="p">.</span><span class="n">start</span><span class="p">)</span><span class="si">}</span><span class="s"> (</span><span class="si">{</span><span class="n">f</span><span class="p">.</span><span class="n">name</span><span class="si">}</span><span class="s">) is called from </span><span class="si">{</span><span class="n">num_callers</span><span class="si">}</span><span class="s"> different functions.</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p>In the code above, the function <code class="language-plaintext highlighter-rouge">find_most_frequently_called_functions</code> takes a binary view <code class="language-plaintext highlighter-rouge">bv</code> (a representation of the analyzed binary) as input, and applies the <code class="language-plaintext highlighter-rouge">get_top_10_functions</code> function to the list of all functions within the binary. This function ranks the API functions based on the number of independent callers; the result is then printed for the top 10% of functions. This approach surfaces those functions that are most frequently called, making them prime candidates for further investigation during the reverse engineering process.</p>

<p>Similar implementations can be realized in IDA or Ghidra: In IDA, the corresponding functions can be found by iterating over cross-references to a particular function; in <a href="https://ghidra-sre.org/">Ghidra</a>, the <a href="https://ghidra.re/ghidra_docs/api/ghidra/program/model/listing/Function.html#getCallingFunctions(ghidra.util.task.TaskMonitor)"><code class="language-plaintext highlighter-rouge">getCallingFunctions</code></a> method of a function object can be utilized to identify the callers.</p>

<p>However, while our outlined heuristic offers an intuitive approach for identifying API functions in binaries, let us now investigate whether we can validate its effectiveness in various real-world reverse engineering scenarios.</p>

<h2 id="evaluation">Evaluation</h2>

<p>To evaluate the effectiveness of our heuristic in real-world scenarios, we will apply it to a set of diverse case studies:</p>

<ul>
  <li>the <code class="language-plaintext highlighter-rouge">ls</code> command from the Linux <a href="https://www.gnu.org/software/coreutils/">Coreutils</a> package,</li>
  <li>a statically-linked malware sample from the <a href="https://en.wikipedia.org/wiki/Xor_DDoS">XOR DDos family</a>,</li>
  <li>an embedded firmware and</li>
  <li>another malware sample from the <a href="https://malpedia.caad.fkie.fraunhofer.de/details/win.plugx">PlugX family</a>.</li>
</ul>

<p>For consistency and comparability, we will limit our analysis to the top 10 identified functions in each case. We anticipate that these functions will reveal commonly invoked API calls. Moreover, our heuristic may also uncover critical functions that underpin the key functionalities or logic of the binary under analysis. In these cases, we’ll delve deeper to understand their roles and operations. With this said, let’s plunge into the results!</p>

<h3 id="coreutils">Coreutils</h3>

<p>Our first case for evaluation is the <code class="language-plaintext highlighter-rouge">ls</code> command from the Coreutils package, well-known for generating command line directory listings. Since this binary is dynamically linked, API functions appear as wrappers that enable jumps to functions not included in the binary; at load time, the exact function locations are resolved by the operating system during the dynamic linking phase.</p>

<p>Upon applying our heuristic, we obtain the following results:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Function 0x100007528 (_putchar) is called from 28 different functions.
Function 0x100007522 (_printf) is called from 23 different functions.
Function 0x100007558 (_snprintf) is called from 12 different functions.
Function 0x1000074bc (_getenv) is called from 10 different functions.
Function 0x10000755e (_strcmp) is called from 9 different functions.
Function 0x1000069f0 (sub_1000069f0) is called from 9 different functions.
Function 0x100007582 (_strlen) is called from 8 different functions.
Function 0x100007564 (_strcoll) is called from 8 different functions.
Function 0x10000745c (_err) is called from 8 different functions.
Function 0x1000073fc (___error) is called from 8 different functions.
</code></pre></div></div>

<p>A quick scan at the results reveals that 9 out of the top 10 identified functions relate directly to API calls. These include operations related to string manipulation (such as <code class="language-plaintext highlighter-rouge">_strcmp</code>, <code class="language-plaintext highlighter-rouge">_strlen</code>, <code class="language-plaintext highlighter-rouge">_strcoll</code>, <code class="language-plaintext highlighter-rouge">_snprintf</code>), output to <code class="language-plaintext highlighter-rouge">stdout</code> (<code class="language-plaintext highlighter-rouge">_putchar</code>, <code class="language-plaintext highlighter-rouge">_printf</code>), environment variable parsing (<code class="language-plaintext highlighter-rouge">_getenv</code>), and error handling (<code class="language-plaintext highlighter-rouge">_err</code>, <code class="language-plaintext highlighter-rouge">___error</code>).</p>

<p>Interestingly, one function in the top-10 list, <code class="language-plaintext highlighter-rouge">sub_1000069f0</code>, does not appear to be a typical API function. However, a closer examination reveals that this function acts as a wrapper for the API function <code class="language-plaintext highlighter-rouge">_putchar</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1000069f0  55                 push    rbp {__saved_rbp}
1000069f1  4889e5             mov     rbp, rsp {__saved_rbp}
1000069f4  e82f0b0000         call    _putchar
1000069f9  31c0               xor     eax, eax  {0x0}
1000069fb  5d                 pop     rbp {__saved_rbp}
1000069fc  c3                 retn     {__return_addr}
</code></pre></div></div>

<p>In summary, all of the top 10 identified functions are API-related.</p>

<h3 id="xor-ddos">XOR DDos</h3>

<p>Our second case study involves a statically-linked <a href="https://www.virustotal.com/gui/file/0f00c2e074c6284c556040012ef23357853ccac4ad1373d1dea683562dc24bca">malware sample</a> from the XOR DDos family. As indicated by its name, this Linux-targeting malware executes <a href="https://en.wikipedia.org/wiki/Denial-of-service_attack">DDos attacks</a> and employs XOR-based encryptions (for strings and C&amp;C server communication). This particular sample retains all its symbols; therefore, it serves as an excellent evaluation target, as we can use these function names to easily confirm whether the identified functions are API functions. The output of our heuristic test on this sample is as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Function 0x8065320 (free) is called from 298 different functions.
Function 0x8066a70 (memcpy) is called from 191 different functions.
Function 0x80662b0 (strlen) is called from 184 different functions.
Function 0x80669b0 (memset) is called from 174 different functions.
Function 0x8063d30 (__libc_malloc) is called from 151 different functions.
Function 0x8053810 (__lll_unlock_wake_private) is called from 148 different functions.
Function 0x8053700 (__lll_lock_wait_private) is called from 122 different functions.
Function 0x8060080 (ptmalloc_init) is called from 114 different functions.
Function 0x80569b0 (__strtol_internal) is called from 99 different functions.
Function 0x80661f0 (strcmp) is called from 93 different functions.
</code></pre></div></div>

<p>The most frequently called function is <code class="language-plaintext highlighter-rouge">free</code>, an API function which deallocates memory on the heap. Other frequently invoked API functions are also related to memory management (<code class="language-plaintext highlighter-rouge">__libc_malloc</code>), data movement (<code class="language-plaintext highlighter-rouge">memcpy</code>), memory initialization (<code class="language-plaintext highlighter-rouge">memset</code>) and string operations (<code class="language-plaintext highlighter-rouge">strlen</code>, <code class="language-plaintext highlighter-rouge">__strtol_internal</code>, <code class="language-plaintext highlighter-rouge">strcmp</code>). Moreover, the appearance of API functions like <code class="language-plaintext highlighter-rouge">__lll_unlock_wake_private</code>, <code class="language-plaintext highlighter-rouge">__lll_unlock_wake_private</code> and <code class="language-plaintext highlighter-rouge">ptmalloc_init</code> among the top-called functions suggests that the malware extensively employs multi-threading. This behavior aligns well with the nature of the sample, which establishes a multitude of network connections to execute DDoS attacks. Once again, all identified functions are API functions.</p>

<h3 id="embedded-firmware">Embedded Firmware</h3>

<p>For our next case study, we delve into embedded firmware reverse engineering. We examine a non-further specified demo application designed for an embedded microcontroller operating on an <a href="https://developer.arm.com/documentation/ddi0553/latest/">ARMv8M architecture</a>. This application implements hardware initialization routines as well as basic network communication logic relying on cryptographic techniques. Instead of using a standard library, the firmware employs a custom statically linked library. We’ve compiled the firmware with the highest level of optimization, while retaining function symbols for easier analysis.</p>

<p>These are the results:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Function 0x8081cd8 (memcpy) is called from 379 different functions.
Function 0x808140c (vLoggingPrintfError) is called from 343 different functions.
Function 0x8081d26 (memset) is called from 338 different functions.
Function 0x80625a4 (sys_assert) is called from 307 different functions.
Function 0x80785ae (DbgConsole_Printf) is called from 250 different functions.
Function 0x8040738 (multiply_casper) is called from 247 different functions.
Function 0x807e2a0 (ulSetInterruptMask) is called from 160 different functions.
Function 0x807fd28 (CASPER_MEMCPY) is called from 139 different functions.
Function 0x80821e0 (__mbedtls_mpi_free_veneer) is called from 102 different functions.
Function 0x80511b8 (mbedtls_platform_zeroize) is called from 87 different functions.
</code></pre></div></div>

<p>The heuristic reveals a wild mix of the most frequently called functions within the embedded firmware: At the top of the list, we find <code class="language-plaintext highlighter-rouge">memcpy</code> and <code class="language-plaintext highlighter-rouge">memset</code>, reflecting an extensive amount of memory manipulation operations throughout the firmware. This is followed by <code class="language-plaintext highlighter-rouge">vLoggingPrintfError</code> and <code class="language-plaintext highlighter-rouge">DbgConsole_Printf</code>, which are associated with console output functions, implying a logging or debugging system implemented within the firmware, while <code class="language-plaintext highlighter-rouge">sys_assert</code> hints towards a custom error-handling mechanism.</p>

<p>The firmware further includes custom routines to handle memory-related tasks, such as <code class="language-plaintext highlighter-rouge">CASPER_MEMCPY</code>, <code class="language-plaintext highlighter-rouge">__mbedtls_mpi_free_veneer</code> and <code class="language-plaintext highlighter-rouge">mbedtls_platform_zeroize</code>. These functions belong to the cryptographic libraries <a href="https://mcuxpresso.nxp.com/api_doc/dev/1581/group__casper.html">CASPER</a> and <a href="https://github.com/Mbed-TLS/mbedtls">mbedTLS</a>, providing optimized or secure alternatives to standard memory operations. Notably, <code class="language-plaintext highlighter-rouge">mbedtls_platform_zeroize</code> offers a secure means to erase sensitive data from memory, a crucial function in cryptographic contexts.</p>

<p>Additionally, <code class="language-plaintext highlighter-rouge">multiply_casper</code> provides an optimized version for cryptographic multiplications. Furthermore, <code class="language-plaintext highlighter-rouge">ulSetInterruptMask</code> appears to be related to the firmware’s interrupt handling system, a common element in microcontroller-based applications.</p>

<p>In summary, our heuristic provides insights into the firmware’s core functionality by highlighting a mixture of standard library functions, application-specific error handling routines, console output functions, custom cryptographic implementations, and functions related to interrupt management within the firmware.</p>

<h3 id="plugx">PlugX</h3>

<p>For our final experiment, we take a look at a <a href="https://www.virustotal.com/gui/file/344fc6c3211e169593ab1345a5cfa9bcb46a4604fe61ab212c9316c0d72b0865">sample</a> from the PlugX malware family. Compared to the other case studies, due to the lack of function symbols, we have significantly less guidance. To put it differently, we solely rely on our heuristic and manual analysis.</p>

<p>Here are the results obtained from the PlugX sample:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Function 0x10001620 (sub_10001620) is called from 1253 different functions.
Function 0x10001590 (sub_10001590) is called from 1253 different functions.
Function 0x1002cc5e (__seterrormode) is called from 320 different functions.
Function 0x1002cc4e (sub_1002cc4e) is called from 219 different functions.
Function 0x1002d530 (__free_base) is called from 141 different functions.
Function 0x100302f0 (sub_100302f0) is called from 90 different functions.
Function 0x10001530 (sub_10001530) is called from 73 different functions.
Function 0x10003af0 (sub_10003af0) is called from 63 different functions.
Function 0x10004360 (sub_10004360) is called from 61 different functions.
Function 0x10003cb0 (sub_10003cb0) is called from 31 different functions.
</code></pre></div></div>

<p>Given these results, two observations stand out: Firstly, the only two identifiable API functions are <code class="language-plaintext highlighter-rouge">__seterrormode</code> and <code class="language-plaintext highlighter-rouge">__free_base</code>; these are responsible for error handling and memory management. Secondly, two functions—<code class="language-plaintext highlighter-rouge">sub_10001620</code> and <code class="language-plaintext highlighter-rouge">sub_10001590</code>—are called with a significantly high frequency compared to all other functions in the binary.</p>

<p>Analyzing the calling patterns of these functions reveals a fascinating interplay: Whenever <code class="language-plaintext highlighter-rouge">sub_10001620</code> is invoked, <code class="language-plaintext highlighter-rouge">sub_10001590</code> has always been executed just a few lines before. Even more, the input parameters of <code class="language-plaintext highlighter-rouge">sub_10001590</code> are string values corresponding to Windows APIs, such as <code class="language-plaintext highlighter-rouge">kernel32</code>. Finally, we also note that the return value of this function is fed as input to <code class="language-plaintext highlighter-rouge">sub_10001620</code>, indicating that the output of the one function serves as input to the other; if we put everything together, we can represent a typical instance of this call structure as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sub_10001620(0, sub_10001590("kernel32"), 0xffc97c1f)
</code></pre></div></div>

<p>We now understand why both functions are invoked an equal number of times: They are interdependent. Yet, upon closer inspection, we make another noteworthy discovery. The third parameter of the outer function—the constant <code class="language-plaintext highlighter-rouge">0xffc97c1f</code>—in combination with the string <code class="language-plaintext highlighter-rouge">kernel32</code>, suggests that this sequence of function calls represents an <em>API hashing routine</em>, a mechanism often utilized by malware to obfuscate API calls from static analysis. This mechanism works by traversing a series of API function names represented as strings, computing their hash values and subsequently comparing the computed hash values to a pre-computed constant during runtime. If a match is found, the corresponding API function is imported and executed.</p>

<p>Without going too much into detail, further analysis reveals that <code class="language-plaintext highlighter-rouge">sub_10001590</code> acts as an obfuscated call to <a href="https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibrarya"><code class="language-plaintext highlighter-rouge">LoadLibraryA</code></a>, which loads a specified Windows library into the address space of the calling process. Subsequently, <code class="language-plaintext highlighter-rouge">sub_10001620</code> implements a slightly altered version of the <a href="https://en.wikipedia.org/wiki/Computation_of_cyclic_redundancy_checks">CRC32</a> algorithm, which compares the computed CRC32 hash to the constant input (<code class="language-plaintext highlighter-rouge">0xffc97c1f</code>). This workflow perfectly matches the modus operandi of API function hashing.</p>

<p>Using this knowledge, we can now represent the function calls as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>crc32(0, LoadLibraryA("kernel32"), 0xffc97c1f) ; "GetProcAddress\x00"
</code></pre></div></div>

<p>In this context, the constant <code class="language-plaintext highlighter-rouge">0xffc97c1f</code> corresponds to the hash of the <a href="https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-getprocaddress"><code class="language-plaintext highlighter-rouge">GetProcAddress</code></a> function, which is loaded from <code class="language-plaintext highlighter-rouge">kernel32</code>.</p>

<p>To conclude this case study, our heuristic once again identified all key API functions within the analyzed malware sample. Although it doesn’t pinpoint specific API functions as explicitly as it does in the other case studies, the heuristic confirms the underlying assumption that API functions are called frequently. In this particular case, an API hashing routine—designed to import hidden API functions—emerged as the most frequently called function structure in the entire malware sample. As a consequence, the heuristic pinpoints code locations which provide a deep understanding of the malware’s behavior; therefore, it can serve as a potent tool in malware analysis.</p>

<p>In a broader perspective, we can say the heuristic is very effective in detecting frequently used API functions. Even in instances where highlighted functions are no API calls, they often expose other integral functionalities of the binary—for example, cryptographic operations in the embedded firmware sample. The heuristic proves to be not just simplistic and easy to implement, but also extremely efficient. It provides deep insights into binaries, irrespective of the reverse engineering scenario at hand, be it malware analysis, embedded firmware reverse engineering or the analysis of statically-linked executables; it is an invaluable tool in our reverse engineering arsenal.</p>

<h2 id="closing-remarks">Closing Remarks</h2>

<p>In a series of <a href="/2021/08/10/obfuscation_detection.html">previous</a> <a href="/2021/03/03/flattening_detection.html">blog</a> <a href="/2023/01/26/uncommon_instruction_sequences.html">posts</a>, I’ve discussed various efficient, architecture-agnostic heuristics that allow us to pinpoint obfuscated code in large binaries. Throughout these blog posts, I’ve often highlighted that these heuristics have a wider scope, proving valuable across a diverse range of reverse engineering scenarios. These observations inspired me to present at REcon, showcasing the versatility of these heuristics in a plethora of contexts. This blog post marks another significant step on that path.</p>

<p>Interestingly, the heuristic introduced today represents a clear departure from previous ones, as it supports a wide range of reverse engineering scenarios, while it is the first not specifically designed to pinpoint obfuscated code. With the same goal of providing useful tools and insights for the entire reverse engineering community, I’ve updated the documentation of my <a href="https://github.com/mrphrazer/obfuscation_detection">obfuscation detection plugin</a> to offer more intuitive use cases and insights for everyday reverse engineering tasks. If you’re interested, I encourage you to check it out.</p>

<p>My professional work does not only span the domain of code (de)obfuscation—a topic I deeply explore in my <a href="/training_software_deobfuscation.html">training sessions</a>—but also traverses the landscapes of embedded security and malware analysis. Therefore, you can expect the introduction of more potent heuristics designed for a broad spectrum of reverse engineering scenarios over time. So, stay tuned for more exciting things to come. Until our next encounter, thanks for all the fish! :)</p>]]></content><author><name>Tim Blazytko</name></author><category term="reversing" /><category term="embedded" /><category term="malware" /><summary type="html"><![CDATA[Identify API functions in statically linked and embedded binaries using heuristics and analysis workflows.]]></summary></entry><entry><title type="html">Statistical Analysis to Detect Uncommon Code</title><link href="https://synthesis.to/2023/01/26/uncommon_instruction_sequences.html" rel="alternate" type="text/html" title="Statistical Analysis to Detect Uncommon Code" /><published>2023-01-26T00:00:00+01:00</published><updated>2023-01-26T00:00:00+01:00</updated><id>https://synthesis.to/2023/01/26/uncommon_instruction_sequences</id><content type="html" xml:base="https://synthesis.to/2023/01/26/uncommon_instruction_sequences.html"><![CDATA[<p>Statistical analysis is a set of methods which analyze and organize data to discover its underlying structure. One of the most common use cases in computer science is <a href="https://en.wikipedia.org/wiki/Machine_learning">machine learning</a>, for which they form the mathematical foundation. However, often, even the simplest analysis techniques are powerful enough to significantly simplify day-to-day tasks. In this blog post, I will show you how such a technique, <a href="https://en.wikipedia.org/wiki/N-gram">n-gram analysis</a>, can be used to identify uncommon instruction sequences in binary code. It is not only fun to see what statistics can reveal about assembly patterns, but it is also an effective technique to pinpoint obfuscated code or other obscure computations which might be worth a closer look during reverse engineering.</p>

<p>Similar to previous blog posts in which we <a href="/2021/03/03/flattening_detection.html">developed a heuristics to pinpoint control-flow flattening</a> and <a href="/2021/08/10/obfuscation_detection.html">discussed various complexity metrics to pinpoint obfuscation in general</a>, the implementation of these techniques has been added to my <a href="https://github.com/mrphrazer/obfuscation_detection">Obfuscation Detection</a> plugin for <a href="https://binary.ninja/">Binary Ninja</a>. As for the other heuristics, the implementation is architecture-agnostic, although it contains some architecture-specific optimizations for x86, x86-64, ARM32 and AARCH64.</p>

<p>In the following, we first familiarize ourselves with the foundations of n-gram analysis and its use cases. Then, we discuss how to perform statistical analysis of assembly code and develop a heuristic to identify uncommon instruction sequences. Afterward, we explore some similarities and differences between the most common CPU architectures. Finally, we evaluate the heuristic by identifying and analyzing obfuscated functions in malware, a Windows kernel module, an anti-cheat software and a mobile DRM system.</p>

<h2 id="n-gram-analysis">n-gram Analysis</h2>

<p>The term <em>n-gram</em> originates from the field of computational linguistics. <a href="https://en.wikipedia.org/wiki/N-gram">n-grams</a> are used to analyze the underlying structures of a language. For this, we partition a text into fixed-size chunks (so-called <em>grams</em> of size n) and count the number of occurrences of the individual chunks in the text. If the underlying text sample is large enough, we retrieve a representative frequency distribution of the language that gives us insights into common patterns of the underlying language, such as the most common letters or most common words of the language. Furthermore, it allows us to predict which word most likely follows after a given word; a characteristic which is used to support users while typing messages on mobile phones by <a href="https://en.wikipedia.org/wiki/T9_(predictive_text)">making suggestions based on the current writing context</a>.</p>

<p>To illustrate this, let us count all 1-grams in the following text:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>This is a test.
</code></pre></div></div>

<p>First, we normalize the text by representing it as a continuous sequence of lower-case letters only:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>thisisatest
</code></pre></div></div>

<p>For the English language, 1-grams are equivalent to individual letters. Therefore, we just count the number of occurrences of all individual letters:</p>

<ul>
  <li>t: 3</li>
  <li>h: 1</li>
  <li>i: 2</li>
  <li>s: 3</li>
  <li>a: 1</li>
  <li>e: 1</li>
</ul>

<p>As we can see, the letters ‘t’ and ‘s’ are the most common, ‘a’ and ‘e’ the least common letters. If the sample text was larger—for example, 1,000 words—we would obtain a <a href="https://en.wikipedia.org/wiki/Letter_frequency">representative frequency distribution of the English language</a>, in which ‘e’ is the most common letter with 13%, followed by ‘t’ with 9.1% and ‘a’ with 8.2%. These tables are used, for example, in cryptanalysis to break <a href="https://en.wikipedia.org/wiki/Substitution_cipher">substitution ciphers</a>: By performing a <a href="https://en.wikipedia.org/wiki/Frequency_analysis#:~:text=7%20External%20links-,Frequency%20analysis%20for%20simple%20substitution%20ciphers,same%20letter%20in%20the%20ciphertext.">frequency analysis</a> of the ciphertext, we can map the distribution of letters in the ciphertext to the distribution of letters of the English language. As a consequence, we can undo the substitution.</p>

<p>While such 1-grams can already be used to break substitution ciphers, statistical analysis becomes even more powerful if it also considers the <em>context</em> of the letters, for example which letter follows before and afterward in the text. To achieve this, we increase the size of the chunks (the number n) and count these chunks in a sliding window approach. For example, if we want to compute all 2-grams in the text above, we count the chunks as follows:</p>

<ul>
  <li>th: 1</li>
  <li>hi: 1</li>
  <li>is: 2</li>
  <li>si: 1</li>
  <li>sa: 1</li>
  <li>at: 1</li>
  <li>te: 1</li>
  <li>st: 1</li>
</ul>

<p>As we can see, ‘is’ is the most-common 2-gram, while all other 2-grams only occur once.</p>

<p>We could continue this approach and compute all 3-grams, 4-grams, 5-grams etc. Now the question is: What <em>n</em> should be used for statistical analysis? While there is no definite answer to this, we can say that the larger the <em>n</em>, the more context we consider. While 1-grams don’t consider any context, 7-grams and more usually tend to overfit to the input text and lose universality. In my experience, 3-grams or 5-grams provide a good trade-off between considering the context and keeping universality. In the following, we perform all n-gram analyses on the assembly level as 3-grams.</p>

<h2 id="statistical-analysis-of-assembly-code">Statistical Analysis of Assembly Code</h2>

<p>Similarly to natural languages, we can perform a statistical analysis on machine code and assembly code to identify underlying patterns and make predictions based on these patterns. For example, based on the byte frequency of individual bytes in the binary, we can differentiate between code and data; furthermore, we can <a href="https://corte.si/posts/visualisation/entropy/index.html">identify packed code</a> by computing the entropy of individual sections. Closer to the n-gram analysis discussed above, statistical analysis also supports the <a href="https://www.recon.cx/2012/schedule/attachments/40_Chernov-Troshina.pdf">reverse engineering of unknown instruction set architectures</a>: By mapping byte distributions to common instruction sequences, it is possible to identify the byte representation of the most common instructions, such as the architectural equivalent of <code class="language-plaintext highlighter-rouge">mov</code> or <code class="language-plaintext highlighter-rouge">call</code> instructions.</p>

<p>In our case, as stated above, the goal is to identify uncommon instruction sequences in binaries. For this, we rely on the disassembly output of one of the major disassemblers, such as <a href="https://hex-rays.com/ida-pro/">IDA Pro</a>, <a href="https://ghidra-sre.org/">Ghidra</a> or <a href="https://binary.ninja/">Binary Ninja</a>. Now, we need to accomplish two things:</p>

<ol>
  <li>We need to compute common 3-gram frequencies of assembly code as ground truth.</li>
  <li>We need a heuristic to identify code patterns which <em>deviate</em> from the most common 3-grams.</li>
</ol>

<p>For the sake of simplicity, we rely on x86-64 bit assembly to illustrate how to achieve these goals. Later on, we discuss how such a heuristic can be implemented in an architecture-agnostic way.</p>

<h3 id="obtaining-a-ground-truth">Obtaining a Ground Truth</h3>

<p>To obtain a ground truth, we need to compute common 3-grams of the assembly language. However, to avoid overfitting to specific binaries or architectures, we consider instruction opcodes only and <em>omit</em> instruction details such as concrete registers or memory addresses. For example, consider the following code sequence:</p>

<pre><code class="language-x86">push    rbp
mov     rbp, rsp
push    r14
push    rbx
test    rdi, rdi
</code></pre>

<p>To perform a 3-gram analysis, we first remove all the instruction details and obtain the following sequence:</p>

<pre><code class="language-x86">push
mov
push
push
test
</code></pre>

<p>Based on this, we can compute the following 3-grams:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">push ; mov ; push</code></li>
  <li><code class="language-plaintext highlighter-rouge">mov ; push ; push</code></li>
  <li><code class="language-plaintext highlighter-rouge">push ; push ; test</code></li>
</ul>

<p>If we perform this analysis over a large set of binaries from different categories (for example compilers, debug tools, multimedia libraries etc.), we get a representative collection of 3-grams, from which we can use, for instance, the 1,000 most common 3-grams as a ground truth.</p>

<h3 id="identification-of-uncommon-instruction-sequences">Identification of Uncommon Instruction Sequences</h3>

<p>To identify uncommon instruction sequences, we intuitively want to find code locations whose instruction sequences highly deviate from the generated ground truth. However, for this, we need a well-defined window in which we perform the analysis. The most straightforward approach is to perform this analysis on the function level. As a consequence, we re-phrase our goal as follows:</p>

<p>Our goal is to identify functions whose instruction sequences deviate the most from the generated ground truth. For this, we perform the following steps for each function:</p>

<ol>
  <li>We count the number of 3-grams in the function which are not part of the ground truth.</li>
  <li>We divide this number by the total number of 3-grams in the function.</li>
</ol>

<p>This way, we can assign a <em>score</em> to each function; this score represents the relative amount of 3-grams which are not contained in the ground truth. The higher the score, the more do the instruction sequences in the function deviate from the ground truth.</p>

<p>Given that each function has a score representing the deviation from the ground truth, we can sort all functions by their score—analogous to <a href="/2021/08/10/obfuscation_detection.html">the other detection metrics</a>—and manually inspect the top 10% of these functions.</p>

<h3 id="architecture-agnostic-implementation">Architecture-agnostic Implementation</h3>

<p>To realize this heuristic in an architecture-agnostic way, the implementation relies on Binary Ninja’s <a href="https://docs.binary.ninja/dev/bnil-llil.html">low-level intermediate language (LLIL)</a>. This intermediate language is a common representation of assembly code for all architectures supported by Binary Ninja. During analysis, Binary Ninja lifts assembly instructions into this LLIL, which will then be used for further analysis passes, up to its decompiler output.</p>

<p>To exemplify, the architecture-dependent x86-64 sequence</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mov
mov
mov
</code></pre></div></div>

<p>translates to LLIL as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>LowLevelILOperation.LLIL_SET_REG
LowLevelILOperation.LLIL_SET_REG
LowLevelILOperation.LLIL_SET_REG
</code></pre></div></div>

<p>where the operation <code class="language-plaintext highlighter-rouge">LLIL_SET_REG</code> represents an architecture-independent assignment to a register.</p>

<p>We can use this implementation to detect uncommon instruction sequences on all architectures supported by Binary Ninja. However, this approach has a downside: Since it relies on LLIL, it <em>enforces</em> the lifting of all instructions in the binary, which increases the analysis time. On the contrary, an analysis on the assembly level would be significantly faster, but requires a ground truth for each architecture.</p>

<p>As a consequence, the implementation is a trade-off between universality and speed: For the architectures x86, x86-64, ARM32 and ARM64, there are custom ground truths and analyses based on the disassembly opcodes; for all other architectures, the implementation relies on Binary Ninja’s LLIL.</p>

<p>To build a representative ground truth for a wide variety of common applications, the Linux <a href="https://www.gnu.org/software/coreutils/">coreutils</a> &amp; <a href="https://www.gnu.org/software/binutils/">binutils</a>, compiler <a href="https://gcc.gnu.org/">GCC</a>, debugger <a href="https://www.sourceware.org/gdb/">GDB</a>, image manipulation program <a href="https://www.gimp.org/">GIMP</a> as well as the multimedia player <a href="http://www.mplayerhq.hu/">mplayer</a> were used. To cover 3-gram variations of different architectures, all projects have been compiled for the four architectures mentioned above and the according 3-grams have been computed.</p>

<h2 id="similarities-and-differences-between-architectures">Similarities and Differences between Architectures</h2>

<p>Before we evaluate the heuristic on various real-world scenarios, let’s take some time and explore some of the similarities and differences between different architectures for fun. However, keep in mind that the following observations only scratch the surface and try to give a better intuition. That said, let us take a closer look at the <a href="https://github.com/mrphrazer/obfuscation_detection/blob/main/obfuscation_detection/ngrams.py">computed ground truths</a>.</p>

<p>At a first glance, we can see that, across all architectures, the most common instruction sequences are data movement chains, sometimes combined with arithmetic operations or function calls.
Furthermore, we can see that all architectures include 3-grams of architecture-specific <code class="language-plaintext highlighter-rouge">push</code> and <code class="language-plaintext highlighter-rouge">pop</code> operations, most to be found in <a href="https://en.wikipedia.org/wiki/Function_prologue_and_epilogue">function prologues and epilogues</a>. In addition, they all include several architecture-specific <code class="language-plaintext highlighter-rouge">add</code> chains, due to offset calculations and pointer arithmetics.</p>

<p>Some of the differences, however, are quite more interesting. For instance, the ARM-based architectures contain significantly more load/store operations; the reason for that is that, in most cases, these architectures read and write memory using <a href="https://en.wikipedia.org/wiki/Load%E2%80%93store_architecture">dedicated instructions</a>, while the x86 and x86-64 architectures can access memory in nearly all data movement and arithmetic operations. Another difference can be seen between the x86 and the x86-64 architecture: The 32-bit architecture contains significantly more <code class="language-plaintext highlighter-rouge">push ; call</code> chains. This is due to differences in the <a href="https://en.wikipedia.org/wiki/Calling_convention">calling conventions</a>: the 32-bit architecture uses the stack to pass function arguments, while the 64-bit architecture mostly uses registers.</p>

<p>In short, we can say that many low-level patterns such as function management, calls, data movement and pointer arithmetic are architecture-agnostic, while there are architecture-specific concepts which lead to differences in the most common instruction patterns. This might also have an impact on the architecture-agnostic ground truth in LLIL, if the boundary for the most common 3-grams is set too low. For this reason, the ground truth in LLIL stores the 3,000 most common LLIL 3-grams obtained for all architectures, while the architecture-specific implementations only store the 1,000 most common 3-grams.</p>

<p>Finally, let us now look at some real-world binaries and evaluate what kind of instruction sequences can be identified by the heuristic.</p>

<h2 id="evaluation">Evaluation</h2>

<p>To understand what kind of code locations these heuristics identify, we take a closer look at several binaries that are only partially obfuscated; this means, we use binaries which contain mostly compiler-generated code, but are obfuscated in specific parts to hide underlying computations. In the following sections, we analyze a malware sample, a module from the Window kernel, an anti-cheat system and a mobile DRM system. Note that some of these binaries will not be specified further; however, although we omit some details, the discussion will still provide some intuition and strategies of what to expect and how to proceed. So, to reproduce the results, feel free to pick the anti-cheat software or DRM system of your choice and follow along. Now, let’s get into this and discuss the results.</p>

<h3 id="malware">Malware</h3>

<p>The chosen <a href="https://www.virustotal.com/gui/file/3261111211bb0de7eb9ccb14997865913e8841ec1a5701988102105bba4e4aba/detection">malware sample</a> belongs to the <a href="https://en.wikipedia.org/wiki/Emotet">emotet family</a> and has been already described in a <a href="/2021/03/03/flattening_detection.html">previous blog post</a>, in which we analyzed its control-flow flattening implementation. If we use the heuristic to identify uncommon instructions sequences, it identifies 14 functions; while some of them implement control-flow flattening, the majority (11) of them share a structure similar to the following code snippet:</p>

<pre><code class="language-x86">push    ecx {var_4_6}
mov     dword [esp {var_4}], 0x8f8e
shl     dword [esp {var_4_1}], 0x3  {0x47c70}
xor     dword [esp {var_4_2}], 0xe2f0cbde  {0xe2f4b7ae}
xor     dword [esp {var_4_3}], 0x3a546a44  {0xd8a0ddea}
add     dword [esp {var_4_4}], 0xffff377d  {0xd8a01567}
or      dword [esp {var_4_5}], 0xe5ad4c54  {0xfdad5d77}
xor     dword [esp], 0xfdad5c37  {0x140}
mov     eax, dword [esp]  {0x140}
pop     ecx  {0x140}
retn     {__return_addr}
</code></pre>

<p>As we can see, the function initializes some variables on the stack. If we analyze where and how these functions are called, we learn that they are part of the control-flow flattening obfuscation and used to update the state variable which encodes the control flow.</p>

<p>Overall, every function identified by the heuristic is obfuscated or pinpoint a helper function managing the obfuscated state.</p>

<h3 id="windows-kernel">Windows Kernel</h3>

<p>Next, let us take a closer look at <code class="language-plaintext highlighter-rouge">ci.dll</code>, the <a href="https://j00ru.vexillium.org/2010/06/insight-into-the-driver-signature-enforcement/">Windows kernel module responsible for code integrity and file authentication</a>. As documented by the Airbus Security Lab, this kernel module implements an AES <a href="https://blog.quarkslab.com/introduction-to-whiteboxes-and-collision-based-attacks-with-qbdi.html">white-box</a>, which is protected by a virtualization-based obfuscator, the <a href="https://github.com/airbus-seclab/warbirdvm">Warbird virtual machine</a>. Note that the concept of <em>virtualization-based</em> obfuscation (also known as <em>virtual machines</em> or <em>VMs</em>) and how to break them has been covered in a <a href="/2021/10/21/vm_based_obfuscation.html">previous blog post</a>.</p>

<p>With our heuristic, overall we identify 206 out of 2,065 functions (the top 10%, as described above). Nearly half of these functions are related to cryptographic operations, as we can deduce from by their symbol names (for example <code class="language-plaintext highlighter-rouge">SymCryptSha256AppendBlocks_shani</code>); however, 125 of them follow a different pattern: While their symbol name does not reveal any information, their overall code structure is similar to the following code snippet:</p>

<pre><code class="language-x86">mov     r10, rcx
lea     r9d, [rdx+0x62d7be82]
lea     eax, [rdx+0x3853ae71]
movzx   ecx, r9b
xor     eax, r9d
xor     rcx, 0x76
add     eax, 0x64cf8a2f
mov     r8, rdx
shr     r8, 0x20
sub     r8d, edx
shl     rax, 0x20
mov     rcx, qword [r10+rcx*8]
add     r8d, 0x4547fd6
shl     r8, 0x20
or      rax, 0x5d304937
or      r8, 0x1d570d9f
cmp     qword [r10+0x128], rcx
cmove   rax, r8
retn     {__return_addr}
</code></pre>

<p>These functions implement handlers of the Warbird VM. As we can see, the handler encodes its data flow using constant unfolding (e.g., writing <code class="language-plaintext highlighter-rouge">x + 5 </code> as <code class="language-plaintext highlighter-rouge">x + 10 - 5</code>) and conditional moves. However, if we take a closer look we notice <code class="language-plaintext highlighter-rouge">[r10+rcx*8]</code>, which might access a 64-bit jump table at address <code class="language-plaintext highlighter-rouge">r10</code> to which the offset <code class="language-plaintext highlighter-rouge">rcx</code> is added, probably computing the address of the next VM handler.</p>

<p>In summary, we can say that the heuristic identified uncommon instructions sequences; 61% of these functions are part of the Warbird VM, while the others implement cryptographic operations.</p>

<h3 id="anti-cheat">Anti-Cheat</h3>

<p>After having examined a malware sample and a Windows kernel module, let us now look into an <a href="https://en.wikipedia.org/wiki/Cheating_in_online_games#Anti-cheating_methods_and_limitations">anti-cheat software</a>, which is used to detect cheating attempts in online games. Due to its size, this time, the heuristic finds 3,433 out of 34,321 functions in total. While we do not want to manually inspect 3,433 functions, we can immediately identify a common structure by exploring some of these:</p>

<pre><code class="language-x86">pop     rsp
lea     rsi, [rel 0x141f2c3e6]
mov     rbx, r11
movq    rdi, xmm6
sub     r9, 0x4
movq    xmm7, rbx
movq    rbp, xmm7
mov     qword [rbp], rdi
movsxd  rax, dword [rsp]
add     rsp, 0x4
add     rsi, rax
jmp     rsi
</code></pre>

<p>We have hundreds of identified functions which end in indirect jumps and compute the target address by adding an offset to a fixed constant. This seems to be some kind of control-flow encoding; however, if you spend more time with virtualization-based obfuscation, this pattern might also remind you of a technique which is known as <em>threaded code</em>: At the end of each VM handler, the dispatcher is inlined and computes the address of the next handler. While we cannot know for sure without further analysis, there is a good chance that the anti-cheat software is protected via virtualization-based obfuscation, which has been located by the heuristic.</p>

<h3 id="mobile-drm">Mobile DRM</h3>

<p>Finally, let us take a look at a mobile DRM system which plays encrypted multi-media content. Given its size, we again identify a large number of functions (approximately 3,000) with our heuristic. This time, we have a hard time manually inspecting the functions; randomly looking at some function does not reveal any patterns with regards to similar code structures, graph layouts or others. Without further knowledge, way more analysis effort is required to get a better understanding.</p>

<p>However, this does not mean that the heuristics did not identify <em>interesting</em> functions. For example, we identified a lot of functions using a wide amount of floating-point instructions, which it not uncommon for multi-media applications. Furthermore, we pinpointed functions making extensive usage of hardware-based encryption instructions, indicating that these functions might perform some kind of content decryption or key derivation. Finally, we also identified some functions protected with <em>arithmetic obfuscation</em> or <a href="/2021/11/11/practical_mba_deobfuscation.html">mixed Boolean-Arithmetic (MBA) expressions</a>, indicating that we are looking at the right places.</p>

<p>Overall, we can summarize that the heuristic reliably pinpoints uncommon instruction sequences; in certain binaries, this is often obfuscated code. Furthermore, we have seen that choosing the top 10% of all functions might be a bit overwhelming for large binaries. In some cases, the sheer amount of functions can be managed by detecting repetitive obfuscation patterns, which allows an effortless classification of the identified functions. In other cases, significantly more effort is required. However, even in these cases, the heuristic can be useful. Digging into large binaries is always like looking for a needle in a haystack; each heuristic performing a reasonably helpful pre-selection speeds up reverse engineering.</p>

<h2 id="closing-remarks">Closing Remarks</h2>

<p>In this blog post, I tried to give an intuition for the power of statistical analysis, not only in the context of code obfuscation. I’ve always been fascinated by statistical analysis and its ability to reveal underlying structures, which makes it a good fit for reverse engineering.  Although I assumed after my last blog post on <a href="/2021/08/10/obfuscation_detection.html">obfuscation detection</a> that this series would be finished, I came up with the idea in a discussion with friends and implemented a prototype; after some experiments, I implemented it properly in an architecture-agnostic way. While the results certainly overlap (at least partially) with the other heuristics, it is the first which does not rely on complexity metrics, adding another dimension to the analysis. Furthermore, it is fun to explore the differences and oddities between architectures, while learning that overall, they are very similar on a higher level.</p>

<p>A further perspective of this post is to show how to classify results identified by such heuristics. While we stopped analysis rather quickly, the intention was to provide some insights and strategies how one can reason about obfuscation, starting from scratch. Further analysis until deobfuscation and understanding of the underlying protected code is an iterative process, which I partially covered in other blog posts, such as the post on <a href="/2021/10/21/vm_based_obfuscation.html">breaking virtualization-based obfuscation</a> and the post on <a href="/2021/11/11/practical_mba_deobfuscation.html">simplifying MBA-based obfuscation</a>. Furthermore, I cover such topics and strategies in-depth in my <a href="/training_software_deobfuscation.html">training classes on code deobfuscation</a>.</p>]]></content><author><name>Tim Blazytko</name></author><category term="reversing" /><category term="malware" /><category term="deobfuscation" /><summary type="html"><![CDATA[Statistical analysis for detecting uncommon instruction sequences and spotting obfuscation patterns in binaries.]]></summary></entry><entry><title type="html">Practical MBA Deobfuscation with msynth</title><link href="https://synthesis.to/2021/11/11/practical_mba_deobfuscation.html" rel="alternate" type="text/html" title="Practical MBA Deobfuscation with msynth" /><published>2021-11-11T00:00:00+01:00</published><updated>2021-11-11T00:00:00+01:00</updated><id>https://synthesis.to/2021/11/11/practical_mba_deobfuscation</id><content type="html" xml:base="https://synthesis.to/2021/11/11/practical_mba_deobfuscation.html"><![CDATA[<p>Let’s say we see the following arithmetic expression:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>((x ^ y) + (2 * (x &amp; y))) + ((x ^ z) + (2 * (x &amp; z)))
</code></pre></div></div>

<p>What does it compute? You will see, it’s not very complicated; it is <code class="language-plaintext highlighter-rouge">(x + y) + (x + z)</code>.</p>

<p>What we can observe here is one of the most sophisticated code obfuscation techniques which is known as <em>mixed Boolean-Arithmetic (MBA)</em>. MBAs <a href="https://tigress.wtf/encodeArithmetic.html">arithmetically encode</a> semantically simple operations in complex arithmetic expressions to <em>hide</em> the underlying computations. In general, it is <a href="https://blog.quarkslab.com/what-theoretical-tools-are-needed-to-simplify-mba-expressions.html">extremely difficult to simplify</a> such expressions, since there are endless possibilities to construct them. Nevertheless, recent research advanced in breaking certain instances of MBAs by using <a href="https://github.com/quarkslab/arybo">algebraic simplifications</a>, <a href="/papers/usenix17-syntia.pdf">program synthesis</a>, <a href="https://aclanthology.org/2020.findings-emnlp.56.pdf">machine learning</a> and <a href="https://www.usenix.org/system/files/sec21fall-liu-binbin.pdf">the exploitation of mathematical characteristics</a>. However, one of the most generic and powerful attacks on MBAs is known as <a href="https://www.ndss-symposium.org/wp-content/uploads/2020/04/bar2020-23009.pdf">QSynth</a> (official framework: <a href="https://github.com/quarkslab/qsynthesis">QSynthesis</a>) and has been introduced by Robin David and Luigi Coniglio. Earlier this year, I published my MBA deobfuscation framework <a href="https://github.com/mrphrazer/msynth">msynth</a> which incorporates similar ideas; it is specifically designed to work on real-world binaries and can simplify most MBAs found in the wild.</p>

<p>From this blog post, you will get a better understanding of the ideas behind msynth and learn how to use it to simplify MBA expressions in real-world binaries. Who knows, it may assist your day-to-day malware analysis and code deobfuscation tasks or just give you the ability to play around with MBA deobfuscation.</p>

<h2 id="simplification-approach">Simplification Approach</h2>

<p>Given a complex expression like the one above, our goal is to find a shorter expression that exhibits the same input-output behavior. In our case, <code class="language-plaintext highlighter-rouge">(x + y) + (x + z)</code> is such a simpler expression: No matter which inputs we assign to <code class="language-plaintext highlighter-rouge">x</code>, <code class="language-plaintext highlighter-rouge">y</code> and <code class="language-plaintext highlighter-rouge">z</code>, both expressions always compute to the same results. To find such expressions, intuitively, we want to identify parts of the complex expression—so-called <em>subexpressions</em>—and repeatedly replace them with simpler representations.  However, to keep this process efficient, we first collect a large number of relationships between complex and simple expressions. In short, our simplification approach uses two components:</p>

<ul>
  <li>A database of arithmetic simplification identities</li>
  <li>An algorithm which recursively walks over all subexpressions</li>
</ul>

<p>The database can be understood as a pre-computed lookup table mapping input-output behavior to the smallest possible expression. For example, if we know that the input tuple <code class="language-plaintext highlighter-rouge">(10, 20)</code> evaluates to <code class="language-plaintext highlighter-rouge">30</code> (written as <code class="language-plaintext highlighter-rouge">(10, 20) -&gt; 30</code>) and <code class="language-plaintext highlighter-rouge">(10, 10)</code> evaluates to <code class="language-plaintext highlighter-rouge">20</code> (written as <code class="language-plaintext highlighter-rouge">(10, 10) -&gt; 20</code>), then we can map this behavior to an addition of two variables, for example <code class="language-plaintext highlighter-rouge">x + y</code>. However, we would <em>not</em> map this behavior to an expression such as <code class="language-plaintext highlighter-rouge">x + y + 5 - 5</code>, since it is not the shortest representation.</p>

<p>The simplification algorithm recursively walks over all subexpressions–from the largest to the shortest—and evaluates the subexpression’s input-output behavior. Afterward, we query the database whether a shorter entry exists for this behavior. If so, we replace the current subexpression with the one from the database and stop; otherwise, we continue and go to the next subexpression. We repeat this process, until we cannot simplify any more subexpressions.</p>

<p>If this sounds confusing, don’t worry: Let us illustrate the interplay between algorithm &amp; database and gradually simplify the expression above. First, let us assume we have the following lookup table, in which each input-output pair uses an input tuple of length 3 and in which the expression can use the variables <code class="language-plaintext highlighter-rouge">x</code>, <code class="language-plaintext highlighter-rouge">y</code> and <code class="language-plaintext highlighter-rouge">z</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(10, 20, 30) -&gt; 30: x + y
(10, 20, 30) -&gt; 40: x + z
</code></pre></div></div>

<p>The database contains two entries; one entry maps an input vector to <code class="language-plaintext highlighter-rouge">x + y</code>, another one to <code class="language-plaintext highlighter-rouge">x + z</code>.</p>

<p>To simplify the complex expression, we now derive all of its subexpressions, sort them by their lengths in descending order and iteratively walk over the following list:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>((x ^ y) + (2 * (x &amp; y))) + ((x ^ z) + (2 * (x &amp; z)))
(x ^ y) + (2 * (x &amp; y))
(x ^ z) + (2 * (x &amp; z))
2 * (x &amp; y)
2 * (x &amp; z)
x ^ y
x &amp; y
x ^ z
x &amp; z
x
y
z
</code></pre></div></div>

<p>Starting with the original expression, we evaluate it with <code class="language-plaintext highlighter-rouge">(10, 20, 30)</code> and obtain <code class="language-plaintext highlighter-rouge">70</code>. Since <code class="language-plaintext highlighter-rouge">(10, 20, 30) -&gt; 70</code> is not in our database, we continue with the next subexpression, <code class="language-plaintext highlighter-rouge">(x ^ y) + (2 * (x &amp; y))</code>. This time, it evaluates to <code class="language-plaintext highlighter-rouge">30</code>, giving us <code class="language-plaintext highlighter-rouge">(10, 20, 30) -&gt; 30</code>. This entry maps to <code class="language-plaintext highlighter-rouge">x + y</code>. Therefore, we replace the subexpression in the complex expression with <code class="language-plaintext highlighter-rouge">x + y</code>, which leads us to:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(x + y) + ((x ^ z) + (2 * (x &amp; z)))
</code></pre></div></div>

<p>Then, we re-start the algorithm, now beginning with the shorter expression. Once we reach the subexpression <code class="language-plaintext highlighter-rouge">(x ^ z) + (2 * (x &amp; z))</code>, we evaluate it to <code class="language-plaintext highlighter-rouge">(10, 20, 30) -&gt; 40</code>. If we perform a database lookup for this entry, we learn that it is equivalent to <code class="language-plaintext highlighter-rouge">x + z</code>. As a consequence, we again replace the subexpression and obtain:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(x + y) + (x + z)
</code></pre></div></div>

<p>In a final run, we re-start the process one last time. However, since we cannot simplify any further subexpressions, the algorithm terminates.</p>

<p>This example illustrates the deobfuscation process quite well. However, to be usable in real-world scenarios, we have to take care of a few things:</p>

<ul>
  <li>
    <p>If the pre-computed database contains only a handful of entries, simplification will not be very effective. In general we can say: The more entries the pre-computed database contains, the higher the success rate. To derive more database entries, we can perform an exhaustive search over all possible combinations of variables and mathematical operations up to a certain upper bound. Additionally we can generate random expressions of higher complexity and also add them to the database.</p>
  </li>
  <li>
    <p>It is error-prone to use only a single input-output pair for database lookups. Consider the following example:  The input-output pair <code class="language-plaintext highlighter-rouge">(2, 2) -&gt; 4</code> maps to more than one expression; it holds for <code class="language-plaintext highlighter-rouge">x + y</code>, <code class="language-plaintext highlighter-rouge">x + x</code>, <code class="language-plaintext highlighter-rouge">x * y</code> and others, which all have the same length. To circumvent this and increase the accuracy, we have to use a higher number of input-output pairs. In practice, numbers between 10 and 30 are sufficient. While higher numbers further increase the accuracy, they slow  down the simplification process enormously. To exclude false positives entirely, we can use an SMT solver to check if the subexpression and the matching expression from the database are equivalent.</p>
  </li>
  <li>
    <p>Expressions on the binary level often use up/downcasts to realize, for example, 32-bit operations on 64-bit architectures. To simplify such arithmetic constructs, we have to include type casts in our pre-computed lookup tables.</p>
  </li>
</ul>

<p>With that in mind, let us now dive into msynth and use it to break some MBAs.</p>

<h2 id="in-practice">In Practice</h2>

<p><a href="https://github.com/mrphrazer/msynth">msynth</a> is an MBA deobfuscation framework built on top of the <a href="https://github.com/cea-sec/miasm">Miasm</a> reverse engineering framework, which we already used in <a href="/2021/10/21/vm_based_obfuscation.html">my last blog post on VM-based deobfuscation</a>. It ships with a <a href="https://github.com/mrphrazer/msynth/blob/main/oracle.pickle">pre-computed database</a> (referred to as <em>oracle</em>) including common constants and type casts. To simplify MBAs in binaries, msynth can be combined with Miasm’s symbolic execution engine to derive expressions from assembly code. Example scripts for this can be found in <a href="https://github.com/mrphrazer/msynth/blob/main/scripts/symbolic_simplification.py">msynth’s repository</a> and <a href="https://github.com/mrphrazer/hitb2021ams_deobfuscation/blob/master/simplify_finspy.py">in the repository of one of my code deobfuscation workshops</a>.</p>

<p>Let us now first have a look at how we can use msynth to simplify arbitrary MBAs. Then, we  combine it with symbolic execution to break an MBA on the binary level. Afterward, we examine how msynth can be used to deobfuscate MBAs in malware samples. Finally, we see that msynth can even be useful in analysis scenarios where the code is not necessarily obfuscated.</p>

<h3 id="mba-simplification">MBA Simplification</h3>

<p>To simplify an MBA expression, we need msynth’s <code class="language-plaintext highlighter-rouge">Simplifier</code> class which has to be initialized with a file path to the pre-computed database. Then, we can simplify an expression in Miasm IR by calling <code class="language-plaintext highlighter-rouge">simplify</code>. Overall, this only takes three lines of code:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">msynth</span> <span class="kn">import</span> <span class="n">Simplifier</span>

<span class="c1"># initialize simplifier
</span><span class="n">simplifier</span> <span class="o">=</span> <span class="nc">Simplifier</span><span class="p">(</span><span class="n">oracle_path</span><span class="p">)</span>
<span class="c1"># simplify expression
</span><span class="n">simplified</span> <span class="o">=</span> <span class="n">simplifier</span><span class="p">.</span><span class="nf">simplify</span><span class="p">(</span><span class="n">expression</span><span class="p">)</span>
</code></pre></div></div>

<p>If we add some more lines, we turn the snippet into a fully-working script that simplifies our MBA from before (a similar script which can be used for own experiments can be found <a href="https://github.com/mrphrazer/hitb2021ams_deobfuscation/blob/master/simplify_mba.py">here</a>):</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">miasm.expression.expression</span> <span class="kn">import</span> <span class="o">*</span>
<span class="kn">from</span> <span class="n">msynth</span> <span class="kn">import</span> <span class="n">Simplifier</span>

<span class="k">def</span> <span class="nf">mba</span><span class="p">(</span><span class="n">size</span><span class="p">):</span>
    <span class="c1"># ((x ^ y) + (2 * (x &amp; y))) + ((x ^ z) + (2 * (x &amp; z)))
</span>    <span class="n">x</span> <span class="o">=</span> <span class="nc">ExprId</span><span class="p">(</span><span class="sh">"</span><span class="s">x</span><span class="sh">"</span><span class="p">,</span> <span class="n">size</span><span class="p">)</span>
    <span class="n">y</span> <span class="o">=</span> <span class="nc">ExprId</span><span class="p">(</span><span class="sh">"</span><span class="s">y</span><span class="sh">"</span><span class="p">,</span> <span class="n">size</span><span class="p">)</span>
    <span class="n">z</span> <span class="o">=</span> <span class="nc">ExprId</span><span class="p">(</span><span class="sh">"</span><span class="s">z</span><span class="sh">"</span><span class="p">,</span> <span class="n">size</span><span class="p">)</span>
    <span class="nf">return </span><span class="p">((</span><span class="n">x</span> <span class="o">^</span> <span class="n">y</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="nc">ExprInt</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">size</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">x</span> <span class="o">&amp;</span> <span class="n">y</span><span class="p">)))</span> <span class="o">+</span> <span class="p">((</span><span class="n">x</span> <span class="o">^</span> <span class="n">z</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="nc">ExprInt</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">size</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">x</span> <span class="o">&amp;</span> <span class="n">z</span><span class="p">)))</span>

<span class="c1"># path to pre-computed lookup table
</span><span class="n">oracle_path</span> <span class="o">=</span> <span class="sh">"</span><span class="s">./oracle.pickle</span><span class="sh">"</span>
<span class="c1"># initialize simplifier
</span><span class="n">simplifier</span> <span class="o">=</span> <span class="nc">Simplifier</span><span class="p">(</span><span class="n">oracle_path</span><span class="p">)</span>
<span class="c1"># simplify expression
</span><span class="n">simplified</span> <span class="o">=</span> <span class="n">simplifier</span><span class="p">.</span><span class="nf">simplify</span><span class="p">(</span><span class="nf">mba</span><span class="p">(</span><span class="mi">32</span><span class="p">))</span>

<span class="c1"># print results
</span><span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">original: </span><span class="si">{</span><span class="nf">mba</span><span class="p">(</span><span class="mi">32</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">simplified: </span><span class="si">{</span><span class="n">simplified</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p>The output looks as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>original: (x ^ y) + 0x2 * (x &amp; y) + (x ^ z) + 0x2 * (x &amp; z)
simplified: y + z + x * 0x2
</code></pre></div></div>

<p>msynth does not only simplify the MBA, the resulting expression is even shorter than <code class="language-plaintext highlighter-rouge">(x + y) + (x + z)</code>. This is due to Miasm’s symbolic execution engine which simplifies <code class="language-plaintext highlighter-rouge">x + x</code> to <code class="language-plaintext highlighter-rouge">x * 0x2</code>. If we aim at simplifying more complex MBAs, we can easily modify the MBA functions and for example receive:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>original: ((x ^ 0xFFFFFFFF) | z) + -(((((((0x1 + z + -0x1) | (x ^ 0xFFFFFFFF)) + -(x ^ 0xFFFFFFFF)) ^ 0xFFFFFFFF) &amp; z) + z + (z &amp; (z ^ 0xFFFFFFFF))) ^ 0xFFFFFFFF) + (x &amp; ((x &amp; y) + (x &amp; y) + (x ^ y))) + (x &amp; ((x &amp; y) + (x &amp; y) + (x ^ y))) + (x ^ ((x &amp; y) + (x &amp; y) + (x ^ y))) + 0x1 + ((((y + ((y ^ 0xFFFFFFFF) &amp; z)) | (y + ((y ^ 0xFFFFFFFF) &amp; z) + (((y + ((y ^ 0xFFFFFFFF) &amp; z)) ^ 0xFFFFFFFF) &amp; z))) + -(y &amp; (y + ((y ^ 0xFFFFFFFF) &amp; z)))) ^ 0xFFFFFFFF) + --x + ((y + ((y ^ 0xFFFFFFFF) &amp; z) + (((y + ((y ^ 0xFFFFFFFF) &amp; z)) ^ 0xFFFFFFFF) &amp; z)) &amp; (z ^ 0xFFFFFFFF)) + ((x ^ 0xFFFFFFFF) | z) + -(((((((0x1 + z + -0x1) | (x ^ 0xFFFFFFFF)) + -(x ^ 0xFFFFFFFF)) ^ 0xFFFFFFFF) &amp; z) + z + (z &amp; (z ^ 0xFFFFFFFF))) ^ 0xFFFFFFFF) + (x &amp; ((x &amp; y) + (x &amp; y) + (x ^ y))) + (x &amp; ((x &amp; y) + (x &amp; y) + (x ^ y))) + (x ^ ((x &amp; y) + (x &amp; y) + (x ^ y))) + 0x1 + ((((y + ((y ^ 0xFFFFFFFF) &amp; z)) | (y + ((y ^ 0xFFFFFFFF) &amp; z) + (((y + ((y ^ 0xFFFFFFFF) &amp; z)) ^ 0xFFFFFFFF) &amp; z))) + -(y &amp; (y + ((y ^ 0xFFFFFFFF) &amp; z)))) ^ 0xFFFFFFFF) + --x + ((y + ((y ^ 0xFFFFFFFF) &amp; z) + (((y + ((y ^ 0xFFFFFFFF) &amp; z)) ^ 0xFFFFFFFF) &amp; z)) &amp; (z ^ 0xFFFFFFFF)) + ((x ^ 0xFFFFFFFF) | z) + -(((((((0x1 + z + -0x1) | (x ^ 0xFFFFFFFF)) + -(x ^ 0xFFFFFFFF)) ^ 0xFFFFFFFF) &amp; z) + z + (z &amp; (z ^ 0xFFFFFFFF))) ^ 0xFFFFFFFF) + (x &amp; ((x &amp; y) + (x &amp; y) + (x ^ y))) + (x &amp; ((x &amp; y) + (x &amp; y) + (x ^ y))) + (x ^ ((x &amp; y) + (x &amp; y) + (x ^ y))) + 0x1 + ((((y + ((y ^ 0xFFFFFFFF) &amp; z)) | (y + ((y ^ 0xFFFFFFFF) &amp; z) + (((y + ((y ^ 0xFFFFFFFF) &amp; z)) ^ 0xFFFFFFFF) &amp; z))) + -(y &amp; (y + ((y ^ 0xFFFFFFFF) &amp; z)))) ^ 0xFFFFFFFF) + --x + ((y + ((y ^ 0xFFFFFFFF) &amp; z) + (((y + ((y ^ 0xFFFFFFFF) &amp; z)) ^ 0xFFFFFFFF) &amp; z)) &amp; (z ^ 0xFFFFFFFF))
simplified: z * 0x3 + (x + y) * 0x6
</code></pre></div></div>

<h3 id="mba-challenge">MBA Challenge</h3>

<p>Since we know how to simplify MBAs, we go a step further and combine it with Miasm’s symbolic execution engine. For this, we rely on the script <a href="https://github.com/mrphrazer/msynth/blob/main/scripts/symbolic_simplification.py"><code class="language-plaintext highlighter-rouge">symbolic_simplification.py</code></a> from msynth’s repository as an example. Provided with a binary, an address and a path to the pre-computed database, the script symbolically executes a single basic block at the given address and simplifies all derived expressions. To use it, we have to call it as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python symbolic_simplification.py &lt;binary&gt; &lt;address&gt; &lt;path to database&gt;
</code></pre></div></div>

<p>Let us try this on an <a href="https://github.com/mrphrazer/msynth/blob/main/samples/mba_challenge">artificial MBA challenge</a> which is inspired by a commercial protector. In particular, let us simplify a function-level MBA which returns the result in <code class="language-plaintext highlighter-rouge">rax</code> and uses <code class="language-plaintext highlighter-rouge">rdi</code>, <code class="language-plaintext highlighter-rouge">rsi</code> and <code class="language-plaintext highlighter-rouge">rdx</code> as inputs:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00001200  89f1               mov     ecx, esi
00001202  f7d1               not     ecx
00001204  21d1               and     ecx, edx
00001206  01f1               add     ecx, esi
00001208  4189c8             mov     r8d, ecx
0000120b  21ce               and     esi, ecx
0000120d  41f7d0             not     r8d
00001210  8d043e             lea     eax, [rsi+rdi]
00001213  4121d0             and     r8d, edx
00001216  f7d2               not     edx
00001218  4101c8             add     r8d, ecx
0000121b  4421c2             and     edx, r8d
0000121e  4409c1             or      ecx, r8d
00001221  01d0               add     eax, edx
00001223  29c8               sub     eax, ecx
00001225  c3                 retn     {__return_addr}
</code></pre></div></div>

<p>Let’s also have a look at its decompilation:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span> <span class="nf">mba3</span><span class="p">(</span><span class="kt">int32_t</span> <span class="n">arg1</span><span class="p">,</span> <span class="kt">int32_t</span> <span class="n">arg2</span><span class="p">,</span> <span class="kt">int32_t</span> <span class="n">arg3</span><span class="p">)</span> <span class="p">{</span>

    <span class="kt">int32_t</span> <span class="n">rcx_2</span> <span class="o">=</span> <span class="p">(</span><span class="n">not</span><span class="p">.</span><span class="n">d</span><span class="p">(</span><span class="n">arg2</span><span class="p">)</span> <span class="o">&amp;</span> <span class="n">arg3</span><span class="p">)</span> <span class="o">+</span> <span class="n">arg2</span>
    <span class="kt">int32_t</span> <span class="n">r8_3</span> <span class="o">=</span> <span class="p">(</span><span class="n">not</span><span class="p">.</span><span class="n">d</span><span class="p">(</span><span class="n">rcx_2</span><span class="p">)</span> <span class="o">&amp;</span> <span class="n">arg3</span><span class="p">)</span> <span class="o">+</span> <span class="n">rcx_2</span>
    <span class="k">return</span> <span class="n">zx</span><span class="p">.</span><span class="n">q</span><span class="p">((</span><span class="n">arg2</span> <span class="o">&amp;</span> <span class="n">rcx_2</span><span class="p">)</span> <span class="o">+</span> <span class="n">arg1</span> <span class="o">+</span> <span class="p">(</span><span class="n">not</span><span class="p">.</span><span class="n">d</span><span class="p">(</span><span class="n">arg3</span><span class="p">)</span> <span class="o">&amp;</span> <span class="n">r8_3</span><span class="p">)</span> <span class="o">-</span> <span class="p">(</span><span class="n">rcx_2</span> <span class="o">|</span> <span class="n">r8_3</span><span class="p">))</span>
<span class="p">}</span>

</code></pre></div></div>

<p>From looking at the expression, we don’t get any clue what it computes. However, if we symbolically execute it with Miasm and simplify it with msynth, we get the following output (for the return value in <code class="language-plaintext highlighter-rouge">rax</code>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>before: {(((((((RSI[0:32] ^ 0xFFFFFFFF) &amp; RDX[0:32]) + RSI[0:32]) ^ 0xFFFFFFFF) &amp; RDX[0:32]) + ((RSI[0:32] ^ 0xFFFFFFFF) &amp; RDX[0:32]) + RSI[0:32]) &amp; (RDX[0:32] ^ 0xFFFFFFFF)) + -(((((((RSI[0:32] ^ 0xFFFFFFFF) &amp; RDX[0:32]) + RSI[0:32]) ^ 0xFFFFFFFF) &amp; RDX[0:32]) + ((RSI[0:32] ^ 0xFFFFFFFF) &amp; RDX[0:32]) + RSI[0:32]) | (((RSI[0:32] ^ 0xFFFFFFFF) &amp; RDX[0:32]) + RSI[0:32])) + (RDI + {(((RSI[0:32] ^ 0xFFFFFFFF) &amp; RDX[0:32]) + RSI[0:32]) &amp; RSI[0:32] 0 32, 0x0 32 64})[0:32] 0 32, 0x0 32 64}


[snip]

simplified: {-RDX[0:32] + RDI[0:32] + RSI[0:32] 0 32, 0x0 32 64}
</code></pre></div></div>

<p>We clearly see that the the simplified code performs an addition of <code class="language-plaintext highlighter-rouge">rdi</code> and <code class="language-plaintext highlighter-rouge">rsi</code> and subtracts <code class="language-plaintext highlighter-rouge">rdx</code>. However, these operations are only performed on the register’s lower 32-bit (denoted by <code class="language-plaintext highlighter-rouge">[0:32]</code>); thus, the final result is extended (with zeros) to 64-bits. If we want to clean up the decompiler output, we could rewrite it as follows:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span> <span class="nf">mba3</span><span class="p">(</span><span class="kt">int32_t</span> <span class="n">arg1</span><span class="p">,</span> <span class="kt">int32_t</span> <span class="n">arg2</span><span class="p">,</span> <span class="kt">int32_t</span> <span class="n">arg3</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">zx</span><span class="p">.</span><span class="n">q</span><span class="p">(</span><span class="n">arg1</span> <span class="o">+</span> <span class="n">arg2</span> <span class="o">-</span> <span class="n">arg3</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="malware-analysis">Malware Analysis</h3>

<p>Leaving the artificial examples behind, let us now use msynth to support us in the process of malware analysis. Often, malware is obfuscated with frameworks such as <a href="https://github.com/obfuscator-llvm/obfuscator/wiki">Obfuscator-LLVM</a>, which commonly deploy MBAs. For example, the <a href="https://github.com/mrphrazer/hitb2021ams_deobfuscation/blob/master/samples/finspy">FinSpy sample</a>—which we already used in a <a href="/2021/03/03/flattening_detection.html">prior blog post</a>—uses MBAs to to hide data movements based on the conditional move instruction <code class="language-plaintext highlighter-rouge">cmov</code>. If we can simplify the MBA, we know whether the data will be conditionally moved or not.</p>

<p>To deobfuscate MBAs within a basic block, we call the script as we did in the example before:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>python symbolic_simplification.py samples/finspy 0x4035af oracle.pickle
</code></pre></div></div>

<p>After we symbolically executed the basic block at <code class="language-plaintext highlighter-rouge">0x4035af</code> and simplified it with msynth, we get the following output:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>before: (((@32[@64[0x614F40]][31:32]?({@32[@64[0x614F40]] 0 32, 0xFFFFFFFF 32 64},{@32[@64[0x614F40]] 0 32, 0x0 32 64})) * ((@32[@64[0x614F40]] + 0xFFFFFFFF)[31:32]?({@32[@64[0x614F40]] + 0xFFFFFFFF 0 32, 0xFFFFFFFF 32 64},{@32[@64[0x614F40]] + 0xFFFFFFFF 0 32, 0x0 32 64})))[0:32] &amp; 0x1)?({(((({((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) &amp; (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} ^ 0x1) | 0x1) ^ 0x1) &amp; 0x1) | {((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) &amp; (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} 0 8, 0xFFFFFF 8 64},{({((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) &amp; (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} ^ 0x1) | {((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) &amp; (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} 0 8, 0xFFFFFF 8 64})

[snip]

simplified: 0xFFFFFF01
</code></pre></div></div>

<p>While the original symbolic expression is long and complex, msynth simplifies it to the constant <code class="language-plaintext highlighter-rouge">0xFFFFFF01</code>. In another example, we can simplify a similarly complex expression to the constant <code class="language-plaintext highlighter-rouge">0x0</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>before: ((((@32[@64[0x614F40]][31:32]?({@32[@64[0x614F40]] 0 32, 0xFFFFFFFF 32 64},{@32[@64[0x614F40]] 0 32, 0x0 32 64})) * ((@32[@64[0x614F40]] + 0xFFFFFFFF)[31:32]?({@32[@64[0x614F40]] + 0xFFFFFFFF 0 32, 0xFFFFFFFF 32 64},{@32[@64[0x614F40]] + 0xFFFFFFFF 0 32, 0x0 32 64})))[0:32] &amp; 0x1)?(((((({((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) &amp; (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} ^ 0x1) | 0x1) ^ 0x1) &amp; 0x1) | {((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) &amp; (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8}) &amp; 0x1,(({((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) &amp; (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} ^ 0x1) | {((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) &amp; (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8}) &amp; 0x1))?(0x0,0x1)

[snip]

simplified: 0x0
</code></pre></div></div>

<h3 id="complex-code">Complex Code</h3>

<p>Sometimes we have to analyze code which is not necessarily obfuscated but so complex, for example due to aggressive compiler optimizations, that we don’t have any clue of what is going on. In these cases, msynth can also support us in our analysis by removing at least some complexity in the expressions.</p>

<p>As an example, let us use the Windows kernel <code class="language-plaintext highlighter-rouge">ntoskrnl.exe</code> which we already used in a <a href="/2021/08/10/obfuscation_detection.html">previous blog post</a> to detect PatchGuard. If we take a closer look at PatchGuard’s initialization function, we find a repetitive pattern that is rather complex. This pattern might be guarded by MBAs, but it might also be just a piece of weird code whereof we currently can’t determine the functionality, given our current level of knowledge. However, even in this case, msynth allows us to massively simplify the expression:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>before: {((({(({RDI 0 64, 0x0 64 128} * {((RAX | (RDX &lt;&lt; 0x20)) &gt;&gt;&gt; 0x3) ^ (RAX | (RDX &lt;&lt; 0x20)) 0 64, 0x0 64 128})[0:32] ^ ({RDI 0 64, 0x0 64 128} * {((RAX | (RDX &lt;&lt; 0x20)) &gt;&gt;&gt; 0x3) ^ (RAX | (RDX &lt;&lt; 0x20)) 0 64, 0x0 64 128})[64:96]) &gt;&gt; 0xF 0 32, 0x0 32 64} * 0x4EC4EC4F)[32:64] &gt;&gt; 0x3)[31:32]?({({(({RDI 0 64, 0x0 64 128} * {((RAX | (RDX &lt;&lt; 0x20)) &gt;&gt;&gt; 0x3) ^ (RAX | (RDX &lt;&lt; 0x20)) 0 64, 0x0 64 128})[0:32] ^ ({RDI 0 64, 0x0 64 128} * {((RAX | (RDX &lt;&lt; 0x20)) &gt;&gt;&gt; 0x3) ^ (RAX | (RDX &lt;&lt; 0x20)) 0 64, 0x0 64 128})[64:96]) &gt;&gt; 0xF 0 32, 0x0 32 64} * 0x4EC4EC4F)[32:64] &gt;&gt; 0x3 0 32, 0xFFFFFFFF 32 64} * 0x1A,{({(({RDI 0 64, 0x0 64 128} * {((RAX | (RDX &lt;&lt; 0x20)) &gt;&gt;&gt; 0x3) ^ (RAX | (RDX &lt;&lt; 0x20)) 0 64, 0x0 64 128})[0:32] ^ ({RDI 0 64, 0x0 64 128} * {((RAX | (RDX &lt;&lt; 0x20)) &gt;&gt;&gt; 0x3) ^ (RAX | (RDX &lt;&lt; 0x20)) 0 64, 0x0 64 128})[64:96]) &gt;&gt; 0xF 0 32, 0x0 32 64} * 0x4EC4EC4F)[32:64] &gt;&gt; 0x3 0 32, 0x0 32 64} * 0x1A))[0:32] 0 32, 0x0 32 64}

[snip]

simplified: {(({(({RDI 0 64, 0x0 64 128} * {((RAX | (RDX &lt;&lt; 0x20)) &gt;&gt;&gt; 0x3) ^ (RAX | (RDX &lt;&lt; 0x20)) 0 64, 0x0 64 128})[0:32] ^ ({RDI 0 64, 0x0 64 128} * {((RAX | (RDX &lt;&lt; 0x20)) &gt;&gt;&gt; 0x3) ^ (RAX | (RDX &lt;&lt; 0x20)) 0 64, 0x0 64 128})[64:96]) &gt;&gt; 0xF 0 32, 0x0 32 64} * 0x4EC4EC4F)[32:64] &gt;&gt; 0x3) * 0x1A 0 32, 0x0 32 64}
</code></pre></div></div>

<p>Although the simplified expression is still complex and doesn’t tell us much without further analysis, the expression is significantly better to read due to its eliminated subexpressions.</p>

<p>Overall, the aforementioned scenarios are only example usages for msynth. In practice, you may stumble upon a wide variety of MBAs or other complex expressions of which you suspect shorter representations to exist. Try it on your own; msynth does not solve all of your problems, but it may be a useful tool that supports your analysis.</p>

<h2 id="closing-remarks">Closing Remarks</h2>

<p>I wrote msynth with the goal to create a stable and usable MBA deobfuscation framework that supports me in my day-to-day reverse engineering tasks; it has been used in audits of various obfuscation schemes as well as in the analysis of some malware samples; furthermore, my students use it regularly in my <a href="/training_software_deobfuscation.html">code deobfuscation training</a> to break sophisticated protection schemes such as virtual machines with MBA-obfuscated handlers. msynth isn’t perfect (especially in terms of performance for very large expressions), but it will improve over time. Furthermore, it is a playground for me (and hopefully others) to test and develop novel deobfuscation techniques which will be integrated over time. So, have fun with it and stay tuned!</p>]]></content><author><name>Tim Blazytko</name></author><category term="reversing" /><category term="malware" /><category term="deobfuscation" /><summary type="html"><![CDATA[Practical MBA deobfuscation with msynth, simplifying mixed Boolean-arithmetic expressions in binaries.]]></summary></entry><entry><title type="html">Writing Disassemblers for VM-based Obfuscators</title><link href="https://synthesis.to/2021/10/21/vm_based_obfuscation.html" rel="alternate" type="text/html" title="Writing Disassemblers for VM-based Obfuscators" /><published>2021-10-21T00:00:00+02:00</published><updated>2021-10-21T00:00:00+02:00</updated><id>https://synthesis.to/2021/10/21/vm_based_obfuscation</id><content type="html" xml:base="https://synthesis.to/2021/10/21/vm_based_obfuscation.html"><![CDATA[<p>After I recently gave a workshop on the <em>Analysis of Virtualization-based Obfuscation</em> at <a href="https://rada.re/con/2021/">r2con2021</a> (<a href="/presentations/r2con2021-deobfuscation.pdf">slides</a>, <a href="https://github.com/mrphrazer/r2con2021_deobfuscation/">code &amp; samples</a> and the <a href="https://www.youtube.com/watch?v=b6udPT79itk">recording</a> are available online), I would like to use this blog post for a brief summary on how to write disassemblers for VM-based obfuscators based on symbolic execution.</p>

<p>While we cover only a basic implementation of virtualization-based obfuscation, we focus on a generic approach to build disassemblers that scale for most virtual machines that can be found in the wild. As a leading example for this blog post, we will use a simple virtual machine that protects a <a href="https://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci implementation</a> using the <a href="https://tigress.wtf">Tigress C obfuscator</a>. In the end, we will also have a look at how the original code can be reconstructed from the VM disassembly.</p>

<p>If you would like to play around with the code (based on <a href="https://github.com/cea-sec/miasm">Miasm</a>) and samples, you can find them <a href="https://github.com/mrphrazer/r2con2021_deobfuscation/">here</a>.</p>

<h2 id="virtual-machines">Virtual Machines</h2>

<p><a href="https://tigress.wtf/virtualize.html">Virtual machines</a> (VM) belong to the most sophisticated obfuscation schemes, since they create an additional layer of complexity for reverse engineers: By introducing a custom <a href="https://en.wikipedia.org/wiki/Instruction_set_architecture">instruction set architecture (ISA)</a> in software (similar to the <a href="https://en.wikipedia.org/wiki/Java_virtual_machine">Java virtual machine</a>), they hide the original code in a sequence of bytes (the so-called <em>bytecode</em>) that is <em>interpreted</em> at runtime. In other words, the bytecode is an array of bytes that encodes the to-be-protected program as machine code; without further knowledge of the virtual machine implementing the custom instruction set architecture, the bytes aren’t <em>meaningful</em> for a reverse engineer. To give them a meaning, we have to fully understand the VM and its execution flow, such that we can build a disassembler for the custom instruction set and decode the bytecode.</p>

<p>To achieve this, let us start with opening a virtual machine in the disassembler of our choice. For the simplest virtual machines, we see a control-flow graph like the following:</p>

<p class="center"><img src="/images/vm1.svg" alt="vm" width="100%" /></p>

<p>We see a control-flow graph that has a central basic block—the <em>dispatcher</em>—which directs the control flow to individual basic blocks, which jump back to the dispatcher. While this reminds us of <a href="/2021/03/03/flattening_detection.html">control-flow flattening</a>, there is a fundamental difference: In the case of  control-flow flattening, the individual basic blocks have a 1:1 mapping to the original code. However, for virtual machines, these basic blocks are known as <em>handlers</em> and implement the individual instruction semantics of the custom instruction set architecture, such as virtual addition or push/pop instructions.</p>

<p>On a high-level, the virtual machine operates as follows: Starting at the <em>entry</em> basic block, it backups the native CPU registers and initializes the VM state. Afterward, it walks over the bytecode array, iteratively <em>fetches</em> some bytes, <em>decodes</em> the corresponding instruction and <em>executes</em> the handler that implements the virtual instruction. The <em>fetch-decode-execute</em> (FDE) process is performed in a loop until the VM reaches a specific handler that restores the native CPU registers and leaves the VM. This handler is called  the VM <em>exit</em>. Given this context, we can annotate the control-flow graph as follows:</p>

<p class="center"><img src="/images/vm2.svg" alt="vm" width="100%" /></p>

<p>Internally, virtual machines often use additional data structures to preserve their internal state (like storing native registers and intermediate values). Two of the most important data structures are the <em>virtual instruction pointer</em> and (sometimes) the <em>virtual stack pointer</em>. Similar to a native instruction pointer register (such as <code class="language-plaintext highlighter-rouge">rip</code> on x86-64), the virtual instruction pointer points to the current instruction in the bytecode and is used to decode operands and keep track of the VM execution flow. Analogously, the virtual stack pointer keeps track of a VM-internal stack that <em>may</em> be used to store intermediate values.</p>

<p>Overall, we can understand a VM as an obfuscated function with a custom calling convention: It takes some parameters as inputs and calculates one or more outputs. To preserve the outer execution context, it makes use of a <a href="https://en.wikipedia.org/wiki/Function_prologue_and_epilogue">prologue and epilogue</a> (VM entry and exit) that saves and restores the native CPU state. Internally, it operates on an undocumented state and interprets a sequence of bytes that represents the protected code. To reconstruct the underlying code, we have to locate the virtual machine’s inputs/outputs, understand its internal state and instruction set. Then, we can write a disassembler for the custom instruction set architecture which decodes the bytecode into human-readable pseudocode. In the following, we will have a look at how we can write a generic disassembler based on symbolic execution. Beforehand, we make a short excursion into symbolic execution.</p>

<h2 id="symbolic-execution">Symbolic Execution</h2>

<p><em>Symbolic execution</em> is a program analysis technique which allows us to symbolically evaluate and summarize assembly code. These summaries provide concrete insights into the semantics of executed instructions.</p>

<p>To symbolically execute assembly code, we first lift it into an intermediate representation. Afterward, we evaluate the code assignment by assignment and track the individual register and memory assignments in a hashmap that is referred to as <em>symbolic state</em>. To propagate the data flow between the instructions, we always use the latest register/memory definitions from the symbolic state. For example, consider the following assembly code and the corresponding assignments:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mov rax, rbx                  ; rax := rbx
add rax, 0x20                 ; rax := rax + 0x20
add rbx, rax                  ; rbx := rbx + rax
xor rcx, rbx                  ; rcx := rcx ^ rbx
</code></pre></div></div>

<p>Let us assume that the <em>initial</em> symbolic state maps all registers to themselves:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rax: rax
rbx: rbx
rcx: rcx
</code></pre></div></div>

<p>After the first instruction, we update the symbolic state such that <code class="language-plaintext highlighter-rouge">rax</code> now maps to <code class="language-plaintext highlighter-rouge">rbx</code>. If we evaluate the second instruction, we propagate this information and assign <code class="language-plaintext highlighter-rouge">state[rax] + 0x20 = rbx + 0x20</code> to <code class="language-plaintext highlighter-rouge">rax</code>.  After we symbolically executed all instructions, we have the following symbolic state:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rax: rax + 0x20
rbx: rbx + (rax + 0x20)
rcx: rcx ^ (rbx + (rax + 0x20))
</code></pre></div></div>

<p>The symbolic state reveals the underlying semantics of a sequence of code in a comprehensive manner. If we want to enrich symbolic analysis with (partially) concrete values, we can provide them to the initial symbolic state and perform a <em>concolic execution</em>. For example, if we initialize <code class="language-plaintext highlighter-rouge">rax</code> with <code class="language-plaintext highlighter-rouge">0x10</code>, we obtain</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rax: 0x30
rbx: rbx + 0x30
rcx: rcx ^ (rbx + 0x30)
</code></pre></div></div>

<p>In a purely concolic scenario, symbolic execution can be understood as an emulator for assembly code. However, if we keep at least some registers/memory locations symbolic, we get a very powerful technique which allows us to automatically explore virtual machine interpreters. In the following, we will make use of this to build a generic disassembler.</p>

<h2 id="writing-a-disassembler">Writing a Disassembler</h2>

<p>Our goal is to design a generic disassembler approach that works for most virtual machines in the wild, no matter their complexity. Starting from scratch with nearly no prior knowledge of the virtual machine, we interactively add more and more knowledge until we have a fully-fledged disassembler. For this, we build a system which symbolically explores the virtual machine and enrich the output with useful information about the symbolic state. On a high-level, the approach consists of two steps:</p>

<ol>
  <li>We build a symbolic executor that <em>follows the VM execution flow</em> from the VM entry to the VM exit.</li>
  <li>Afterward, we add <em>callbacks to interesting code locations (such as VM handlers)</em> and dump useful information from the symbolic state.</li>
</ol>

<p>Building such a symbolic executor is an interactive process: Starting at the VM entry, we symbolically follow the execution flow until the symbolic executor stops. This is always the case if the VM does a control-flow transfer that relying on symbolic values. Then, the symbolic executor has <em>insufficient</em> information to derive a concrete value. In these cases, we manually inspect why this happens and add additional knowledge to the symbolic state. Often, these are either missing information about the bytecode or (conditional) jumps which depend on user input (inputs to the VM from the outside). For the former, we concretize the symbolic memory around the bytecode; for the latter, we can hardcode specific input values.</p>

<p>Once the symbolic executor follows the execution from VM entry to exit without any interruption, we can manually reverse engineer specific VM components and add more and more knowledge about the VM to the executor. Typically, we start by dumping the virtual instruction pointer and track all the executed handlers. Afterward, we manually inspect individual handlers, understand their semantics and add this information to the symbolic executor. In the end, we receive a fully-fledged disassembler which allows us to dump concrete values at runtime, print and optimize the disassembly or reconstruct the control-flow graph. In theory, it is even possible to perform a devirtualization and emit equivalent x86-64 code, although this may take quite some time to implement correctly.</p>

<h2 id="in-practice">In Practice</h2>

<p>After we discussed the generic approach to build a disassembler based on symbolic execution, we will now exemplify the process on a simple virtual machine. For a better understanding, we will first give a detailed overview of the virtual machine. Afterward, we explore the process from symbolically following the execution flow to building a fully-fledged disassembler.</p>

<h3 id="sample">Sample</h3>

<p>Our <a href="https://github.com/mrphrazer/r2con2021_deobfuscation/blob/main/samples/vm_basic.bin?raw=true">sample of choice</a> protects an <a href="https://github.com/mrphrazer/r2con2021_deobfuscation/blob/main/samples/src/fib.c">implementation of Fibonacci</a> with a virtual machine whose control-flow graph is comparable to the one above. The virtual machine is <em>stack-based</em>; it uses a dedicated stack to store intermediate values; for example, a calculation such as <code class="language-plaintext highlighter-rouge">5 + 8</code> is performed by the following pseudocode:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>push 5
push 8
add
</code></pre></div></div>

<p>The VM entry is at <code class="language-plaintext highlighter-rouge">0x115a</code>. The virtual machine uses <code class="language-plaintext highlighter-rouge">rdx</code> as the virtual instruction pointer and <code class="language-plaintext highlighter-rouge">rcx</code> as the virtual stack pointer; the bytecode is located at <code class="language-plaintext highlighter-rouge">0x4060</code>. The handler <code class="language-plaintext highlighter-rouge">0x11a9</code> performs a stack-based addition:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>000011a9  4883c201           add     rdx, 0x1
000011ad  8b01               mov     eax, dword [rcx]
000011af  0141f8             add     dword [rcx-0x8], eax
000011b2  4883e908           sub     rcx, 0x8
000011b6  eb4f               jmp     0x1207
</code></pre></div></div>

<p>For this, it fetches two arguments from the stack—<code class="language-plaintext highlighter-rouge">[rcx]</code> and <code class="language-plaintext highlighter-rouge">[rcx-0x8]</code>—adds them and stores them on the top of the stack. It also increments the virtual instruction pointer in <code class="language-plaintext highlighter-rouge">0x11a9</code> and the virtual stack pointer in <code class="language-plaintext highlighter-rouge">0x11b2</code>. Overall, the virtual machine has the following 11 handlers:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">0x129e</code>: CMPBE (compare below equal)</li>
  <li><code class="language-plaintext highlighter-rouge">0x1238</code>: PUSHFROMVAR (load integer from local variable and push onto stack)</li>
  <li><code class="language-plaintext highlighter-rouge">0x126d</code>: CHECKZERO (if constant == 0x0 then push <code class="language-plaintext highlighter-rouge">var_c</code> on top of stack)</li>
  <li><code class="language-plaintext highlighter-rouge">0x11c4</code>: PUSHPTR (push pointer to local variable)</li>
  <li><code class="language-plaintext highlighter-rouge">0x1262</code>: GOTO (jump to an address)</li>
  <li><code class="language-plaintext highlighter-rouge">0x11a9</code>: ADD (add two dwords)</li>
  <li><code class="language-plaintext highlighter-rouge">0x1245</code>: VMEXIT (leave the VM)</li>
  <li><code class="language-plaintext highlighter-rouge">0x11f1</code>: CMPE (compare if two values are equal)</li>
  <li><code class="language-plaintext highlighter-rouge">0x11e1</code>: PUSH (push constant onto stack)</li>
  <li><code class="language-plaintext highlighter-rouge">0x1281</code>: JCC (conditional jump)</li>
  <li><code class="language-plaintext highlighter-rouge">0x1226</code>: POPTOVAR (assign value to local variable)</li>
</ul>

<p>To perform the Fibonacci calculation, it expects an input parameter in <code class="language-plaintext highlighter-rouge">rdi</code> representing the n-th Fibonacci number to calculate; after the VM exit, <code class="language-plaintext highlighter-rouge">rax</code> holds the output (the n-th Fibonacci number).</p>

<p>If you want to reverse engineer the virtual machine on your own, use the provided information as a guidance. However, let us now forget these analysis details and start writing the disassembler without any prior knowledge.</p>

<h3 id="following-the-vm-execution-flow">Following the VM Execution Flow</h3>

<p>To build a symbolic executor that follows the VM execution flow from  the VM entry to the exit, we use the script <a href="https://github.com/mrphrazer/r2con2021_deobfuscation/blob/main/follow_execution_flow.py">follow_execution_flow.py</a>. It hardcodes the VM entry at <code class="language-plaintext highlighter-rouge">0x115a</code> and uses the following loop to symbolically follow the execution flow:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># init worklist
</span><span class="n">basic_block_worklist</span> <span class="o">=</span> <span class="p">[</span><span class="nc">ExprInt</span><span class="p">(</span><span class="n">start_addr</span><span class="p">,</span> <span class="mi">64</span><span class="p">)]</span>

<span class="c1"># worklist algorithm
</span><span class="k">while</span> <span class="n">basic_block_worklist</span><span class="p">:</span>
    <span class="c1"># get current block
</span>    <span class="n">current_block</span> <span class="o">=</span> <span class="n">basic_block_worklist</span><span class="p">.</span><span class="nf">pop</span><span class="p">()</span>

    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">current block: </span><span class="si">{</span><span class="n">current_block</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

    <span class="c1"># symbolical execute block -&gt; next_block: symbolic value/address to execute
</span>    <span class="n">next_block</span> <span class="o">=</span> <span class="n">sb</span><span class="p">.</span><span class="nf">run_block_at</span><span class="p">(</span><span class="n">ira_cfg</span><span class="p">,</span> <span class="n">current_block</span><span class="p">)</span>

    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">next block: </span><span class="si">{</span><span class="n">next_block</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

    <span class="c1"># is next block is integer or label, continue execution
</span>    <span class="k">if</span> <span class="n">next_block</span><span class="p">.</span><span class="nf">is_int</span><span class="p">()</span> <span class="ow">or</span> <span class="n">next_block</span><span class="p">.</span><span class="nf">is_loc</span><span class="p">():</span>
        <span class="n">basic_block_worklist</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">next_block</span><span class="p">)</span>
</code></pre></div></div>

<p>In short, it implements a worklist algorithm that iteratively executes the next basic block <em>until</em> the calculation of the next basic block relies on symbolic values. Initially, the output looks as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>current block: 0x115A
next block: 0x1207
current block: 0x1207
next block: 0x120A
current block: 0x120A
next block: (@8[0x4060] == 0x80)?(0x129E,0x1212)
</code></pre></div></div>

<p>We successfully executed the basic blocks <code class="language-plaintext highlighter-rouge">0x115A</code>, <code class="language-plaintext highlighter-rouge">0x1207</code> and <code class="language-plaintext highlighter-rouge">0x120A</code>. However, the next basic block relies on a memory value that the symbolic executor cannot resolve: <code class="language-plaintext highlighter-rouge">@8[0x4060]</code> (which can be understood as an 8-bit memory read from the address <code class="language-plaintext highlighter-rouge">0x4060</code>). Let’s have a look at the corresponding disassembly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00001178  488d15e12e0000     lea     rdx, [rel data_4060]
[...]
00001207  0fb602             movzx   eax, byte [rdx]
[...]
0000120a  3c80               cmp     al, 0x80
</code></pre></div></div>

<p>We notice that the comparison with <code class="language-plaintext highlighter-rouge">al</code> relies on the memory dereference <code class="language-plaintext highlighter-rouge">[rdx]</code>, which again relies on the address <code class="language-plaintext highlighter-rouge">0x4060</code>. In other words, the symbolic executor propagated the memory read to the byte comparison: <code class="language-plaintext highlighter-rouge">@8[0x4060] == 0x80</code>. Semantically, the virtual machine reads the first byte from the bytecode and dispatches it to find the first handler. To allow the symbolic executor to perform the dispatching, we empower it with knowledge of the bytecode by adding it to the symbolic memory:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># add bytecode to symbolic memory -- start address and size (highest address - lowest address)
</span><span class="n">sym_address</span><span class="p">,</span> <span class="n">sym_value</span> <span class="o">=</span> <span class="nf">constraint_memory</span><span class="p">(</span><span class="mh">0x4060</span><span class="p">,</span> <span class="mh">0x4140</span> <span class="o">-</span> <span class="mh">0x4060</span><span class="p">)</span>
<span class="n">sb</span><span class="p">.</span><span class="n">symbols</span><span class="p">[</span><span class="n">sym_address</span><span class="p">]</span> <span class="o">=</span> <span class="n">sym_value</span>
</code></pre></div></div>

<p>Afterward, the symbolic executor runs successfully. The next time it stops it looks as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[...]
current block: 0x120A
next block: 0x1212
current block: 0x1212
next block: 0x1218
current block: 0x1218
next block: 0x121C
current block: 0x121C
next block: 0x121E
current block: 0x121E
next block: 0x1281
current block: 0x1281
next block: RDI[0:32]?(0x1298,0x1286)
</code></pre></div></div>

<p>This time, we stop at a conditional jump that relies on <code class="language-plaintext highlighter-rouge">rdi</code>. After some manual investigation, we come to the conclusion that this is an input parameter to the VM. Let us initialize it with <code class="language-plaintext highlighter-rouge">0</code> as follows:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># constraint VM input (rdi, first function argument). The value in `ExprInt` represents the function's input value.
</span><span class="n">rdi</span> <span class="o">=</span> <span class="nc">ExprId</span><span class="p">(</span><span class="sh">"</span><span class="s">RDI</span><span class="sh">"</span><span class="p">,</span> <span class="mi">64</span><span class="p">)</span>
<span class="n">sb</span><span class="p">.</span><span class="n">symbols</span><span class="p">[</span><span class="n">rdi</span><span class="p">]</span> <span class="o">=</span> <span class="nc">ExprInt</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">64</span><span class="p">)</span>
</code></pre></div></div>

<p>If we re-run again, the symbolic executor stops here:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[...]
next block: 0x1245
current block: 0x1245
next block: 0x125A
current block: 0x125A
next block: @64[RSP]
</code></pre></div></div>

<p>Now, the next basic block is an address taken from the stack; a typical pattern for a <code class="language-plaintext highlighter-rouge">ret</code> instruction. Indeed, if we look at the corresponding handler we see that it leaves the VM; we found the VM exit. In other words, our symbolic executor now runs from the VM entry to the VM exit. If we have a closer look at the VM exit, we also see the following assembly line:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>00001245  8b01               mov     eax, dword [rcx]
</code></pre></div></div>

<p>Here, we use the virtual stack pointer <code class="language-plaintext highlighter-rouge">rcx</code> and load a value into <code class="language-plaintext highlighter-rouge">rax</code>. In other words, we found the VM output. To verify our assumption, we can add the following code lines after the worklist algorithm to print the VM output:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># dump VMs/functions' return value -- only works if SE runs until the end
</span><span class="n">rax</span> <span class="o">=</span> <span class="nc">ExprId</span><span class="p">(</span><span class="sh">"</span><span class="s">RAX</span><span class="sh">"</span><span class="p">,</span> <span class="mi">64</span><span class="p">)</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">sb</span><span class="p">.</span><span class="n">symbols</span><span class="p">[</span><span class="n">rax</span><span class="p">]</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">VM return value: </span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p>If we re-run the script, we receive <code class="language-plaintext highlighter-rouge">0</code> as output; if we change the input value in <code class="language-plaintext highlighter-rouge">rdi</code> to <code class="language-plaintext highlighter-rouge">10</code>, we get as output <code class="language-plaintext highlighter-rouge">55</code>. In other words, the symbolic executor correctly emulates the VM execution for the provided inputs and calculates the same output as the binary (remember that it calculates the n-th Fibonacci number).</p>

<p>In summary, we have not only built a symbolic executor following the VM execution flow from the VM entry to its exit; we were also able to emulate the whole VM and inspect its input and output. In the following, we will extend the analysis and enrich the symbolic emulator with information about the individual VM handlers.</p>

<h3 id="from-symbolic-execution-to-vm-disassembly">From Symbolic Execution to VM Disassembly</h3>

<p>In the next step, we want to turn the symbolic executor into a fully-fledged disassembler. For this, we want to add callbacks for the individual handlers which dump additional information from the symbolic state. To realize this, we first start by manually creating a list of handlers. Then, we add the callback in the worklist algorithm as it can be seen in <a href="https://github.com/mrphrazer/r2con2021_deobfuscation/blob/main/vm_disassembler.py">vm_disassembler.py</a>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># worklist algorithm
</span><span class="k">while</span> <span class="n">basic_block_worklist</span><span class="p">:</span>
    <span class="c1"># get current block
</span>    <span class="n">current_block</span> <span class="o">=</span> <span class="n">basic_block_worklist</span><span class="p">.</span><span class="nf">pop</span><span class="p">()</span>

    <span class="c1"># if current block is a VM handler, dump handler-specific knowledge
</span>    <span class="k">if</span> <span class="n">current_block</span><span class="p">.</span><span class="nf">is_int</span><span class="p">()</span> <span class="ow">and</span> <span class="nf">int</span><span class="p">(</span><span class="n">current_block</span><span class="p">)</span> <span class="ow">in</span> <span class="n">VM_HANDLERS</span><span class="p">:</span>
        <span class="nf">disassemble</span><span class="p">(</span><span class="n">sb</span><span class="p">,</span> <span class="n">current_block</span><span class="p">)</span>
    <span class="p">[...]</span>
</code></pre></div></div>

<p>If the current symbolically executed basic block is the beginning of a VM handler, we call the function <code class="language-plaintext highlighter-rouge">disassemble</code> which takes the handler and the symbolic state as input. Within the function, we dispatch the individual handler addresses and dump handler-specific information. For now, let us just create a template and print the virtual instruction pointer:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="c1"># fetch concrete value of current virtual instruction pointer
</span>    <span class="n">vip</span> <span class="o">=</span> <span class="n">sb</span><span class="p">.</span><span class="n">symbols</span><span class="p">[</span><span class="nc">ExprId</span><span class="p">(</span><span class="sh">"</span><span class="s">RDX</span><span class="sh">"</span><span class="p">,</span> <span class="mi">64</span><span class="p">)]</span>

    <span class="c1"># catch the individual handlers and print execution context
</span>    <span class="k">if</span> <span class="nf">int</span><span class="p">(</span><span class="n">address</span><span class="p">)</span> <span class="o">==</span> <span class="mh">0x129e</span><span class="p">:</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">vip</span><span class="si">}</span><span class="s">: handler </span><span class="si">{</span><span class="n">address</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="p">[...]</span>
</code></pre></div></div>

<p>If we execute the script, we get the following handler trace:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0x4060: handler 0x11E1
0x4065: handler 0x11C4
0x406A: handler 0x1226
0x406B: handler 0x11E1
0x4070: handler 0x11C4
0x4075: handler 0x1226
0x4076: handler 0x11C4
0x407B: handler 0x1238
0x407C: handler 0x11C4
0x4081: handler 0x1226
0x4082: handler 0x1262
0x4087: handler 0x11E1
0x408C: handler 0x126D
0x4091: handler 0x1238
0x4092: handler 0x11F1
0x4093: handler 0x1281
0x409D: handler 0x11E1
0x40A2: handler 0x11C4
0x40A7: handler 0x1226
0x40A8: handler 0x1262
0x40AD: handler 0x1262
0x4136: handler 0x11C4
0x413B: handler 0x1238
0x413C: handler 0x1245
</code></pre></div></div>

<p>Looking at the trace, we see that some handlers are executed more than once. To give the instruction trace more meaning, we can now manually inspect the individual handlers and replace the output with handler-specific information. For example, we could start and replace the last line <code class="language-plaintext highlighter-rouge">handler 0x1245</code> with <code class="language-plaintext highlighter-rouge">VMEXIT</code>. This process might be time-consuming; however, it becomes easier the more handlers we already know since we permanently improve our understanding of how the VM works internally. In the end, we can enrich the disassembler as it can be seen in <a href="https://github.com/mrphrazer/r2con2021_deobfuscation/blob/main/vm_disassembler_final.py">vm_disassembler_final.py</a>. Its final output is:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0x4060: PUSH 0x0
0x4065: PUSHPTR var_0x4 (push pointer to local variable var_0x4)
0x406A: POPTOVAR (assign value to local variable)
0x406B: PUSH 0x1
0x4070: PUSHPTR var_0x8 (push pointer to local variable var_0x8)
0x4075: POPTOVAR (assign value to local variable)
0x4076: PUSHPTR var_0x8 (push pointer to local variable var_0x8)
0x407B: PUSHFROMVAR (load integer from local variable and push onto stack)
0x407C: PUSHPTR var_0xC (push pointer to local variable var_0xC)
0x4081: POPTOVAR (assign value to local variable)
0x4082: GOTO 0x4087
0x4087: PUSH 0x0
0x408C: if 0x0 == 0x0 then push var_c (rdi) on top of stack
0x4091: PUSHFROMVAR (load integer from local variable and push onto stack)
0x4092: CMPE
0x4093: conditional jump
0x409D: PUSH 0x0
0x40A2: PUSHPTR var_0x10 (push pointer to local variable var_0x10)
0x40A7: POPTOVAR (assign value to local variable)
0x40A8: GOTO 0x40AD
0x40AD: GOTO 0x4136
0x4136: PUSHPTR var_0x10 (push pointer to local variable var_0x10)
0x413B: PUSHFROMVAR (load integer from local variable and push onto stack
0x413C: VMEXIT
</code></pre></div></div>

<p>Given this detailed VM disassembly output, we can now have a closer look at the VM disassembly, detect patterns, simplify the output and reconstruct the high-level code.</p>

<h3 id="code-reconstruction">Code Reconstruction</h3>

<p>To reconstruct the high-level code for the observed VM trace, let’s first have a closer look at some instruction patterns:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0x4060: PUSH 0x0
0x4065: PUSHPTR var_0x4 (push pointer to local variable var_0x4)
0x406A: POPTOVAR (assign value to local variable)
</code></pre></div></div>

<p>This pattern occurs several times; it pushes a constant onto the stack, afterward a local variable offset. Finally, it assigns the value to the local variable. In short, this pattern can be simplified to <code class="language-plaintext highlighter-rouge">var_0x4 = 0x0</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0x4076: PUSHPTR var_0x8 (push pointer to local variable var_0x8)
0x407B: PUSHFROMVAR (load integer from local variable and push onto stack
0x407C: PUSHPTR var_0xC (push pointer to local variable var_0xC)
0x4081: POPTOVAR (assign value to local variable)
</code></pre></div></div>

<p>This instruction pattern first pushes a pointer to a local variable onto the stack, afterward it fetches its value. Then, we push another pointer to a variable and assign it the value from the stack. In short, it can be written as <code class="language-plaintext highlighter-rouge">var_0xc = var_0x8</code>.</p>

<p>Let’s also have a closer look at the conditional jump:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0x4087: PUSH 0x0
0x408C: if 0x0 == 0x0 then push var_c on top of stack
0x4091: PUSHFROMVAR (load integer from local variable and push onto stack)
0x4092: CMPE
0x4093: conditional jump
</code></pre></div></div>

<p>We first push <code class="language-plaintext highlighter-rouge">0x0</code> onto the stack. Then, we compare <code class="language-plaintext highlighter-rouge">0x0 == 0x0</code> and if this is true (which it is in this case), we put <code class="language-plaintext highlighter-rouge">var_c</code> onto the stack. We didn’t explore it in depth, but <code class="language-plaintext highlighter-rouge">var_c</code> holds the value of <code class="language-plaintext highlighter-rouge">rdi</code>, the input to the VM. This way, we put <code class="language-plaintext highlighter-rouge">rdi</code> onto the stack. Finally, we compare the two stack values if they are the equal: <code class="language-plaintext highlighter-rouge">if rdi == 0x0: [...]</code>.</p>

<p>If we do this for all instruction patterns and remove the gotos, we receive a cleaned output:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>; var_0x4 = 0x0
0x4060: PUSH 0x0
0x4065: PUSHPTR var_0x4 (push pointer to local variable var_0x4)
0x406A: POPTOVAR (assign value to local variable)

; var_0x8 = 0x1
0x406B: PUSH 0x1
0x4070: PUSHPTR var_0x8 (push pointer to local variable var_0x8)
0x4075: POPTOVAR (assign value to local variable)

; var_0xc = var_0x8
0x4076: PUSHPTR var_0x8 (push pointer to local variable var_0x8)
0x407B: PUSHFROMVAR (load integer from local variable and push onto stack
0x407C: PUSHPTR var_0xC (push pointer to local variable var_0xC)
0x4081: POPTOVAR (assign value to local variable)

; push rdi (== 0) onto stack
0x4087: PUSH 0x0
0x408C: if 0x0 == 0x0 then push var_c on top of stack
0x4091: PUSHFROMVAR (load integer from local variable and push onto stack

; if rdi == 0
0x4092: CMPE
0x4093: conditional jump

; var_0x10 = 0x0
0x409D: PUSH 0x0
0x40A2: PUSHPTR var_0x10 (push pointer to local variable var_0x10)
0x40A7: POPTOVAR (assign value to local variable)

; return var_0x10
0x4136: PUSHPTR var_0x10 (push pointer to local variable var_0x10)
0x413B: PUSHFROMVAR (load integer from local variable and push onto stack

0x413C: VMEXIT
</code></pre></div></div>

<p>If we also remove the VM disassembler output and re-write our comments in pseudocode, we obtain:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">var_0x4</span> <span class="o">=</span> <span class="mh">0x0</span>
<span class="n">var_0x8</span> <span class="o">=</span> <span class="mh">0x1</span>
<span class="n">var_0xc</span> <span class="o">=</span> <span class="n">var_0x8</span>

<span class="k">if</span> <span class="n">rdi</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
    <span class="n">var_0x10</span> <span class="o">=</span> <span class="mh">0x0</span>
    <span class="k">return</span> <span class="n">var_0x10</span>
<span class="k">else</span><span class="p">:</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">not seen in VM trace</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p>We might see that it is the initialization of the Fibonacci function where we initialize two values and compare the input to <code class="language-plaintext highlighter-rouge">0x0</code>. Since we initialized <code class="language-plaintext highlighter-rouge">rdi</code> with <code class="language-plaintext highlighter-rouge">0x0</code> in the symbolic executor, the VM execution triggers exactly this case. To dispel the last doubts, we can also compare it with the original, unprotected code:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">unsigned</span> <span class="kt">int</span> <span class="nf">fib</span><span class="p">(</span><span class="kt">unsigned</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
  <span class="kt">unsigned</span> <span class="n">a</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span>
  <span class="kt">unsigned</span> <span class="n">b</span><span class="o">=</span><span class="mi">1</span><span class="p">;</span>
  <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">s</span> <span class="o">=</span> <span class="n">b</span><span class="p">;</span>

  <span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
  <span class="p">}</span>
  <span class="p">[...]</span>
</code></pre></div></div>

<p>As we can see, our reconstruction nearly fits the original code snippet line by line. To reconstruct the missing parts of the original code, we can assign other values to <code class="language-plaintext highlighter-rouge">rdi</code> and re-run the whole process.</p>

<p>This has been a long journey, but in the end we analyzed a virtual machine, wrote an interactive disassembler based on symbolic execution and reconstructed the original code based on the VM disassembly.</p>

<h2 id="setting-the-scene">Setting the Scene</h2>

<p>In this post, we discussed the fundamentals of virtualization-based obfuscation and learned a generic approach on how to build disassemblers for virtual machine implementations using symbolic execution. Afterward, we put it into practice on a simple virtual machine for which we not only constructed a disassembler, but also recovered parts of the protected high-level code. Overall, the whole process has been an interplay between manual analysis and automation; especially the manual work required a lot of time. In general, large parts of this time-consuming manual work can be simplified by using heuristics, pattern matching and other analysis techniques; we could, for example, detect the handlers automatically and also extract their semantics. This may be a topic for a future post.</p>

<p>In the wild, there exist a plethora of virtualization-based obfuscators of varying complexity that are deployed in malware, commercial applications and even in the <a href="/2021/08/10/obfuscation_detection.html">Windows kernel</a>. Naturally, the example code from this post will break horribly on such samples: function calls, indirect calls, jump tables, external/API calls, additional layers of obfuscation and others must be treated in special ways. Nevertheless, the core concepts remain the same.</p>

<p>If you would like to dive deeper into these topics, learn how to combine this approach with other deobfuscation techniques such as compiler optimizations, taint analysis, SMT solving &amp; program synthesis or if you would like to know how you can handle the aforementioned special cases, then have a look at my <a href="/training_software_deobfuscation.html">training class on code deobfuscation techniques</a>.</p>]]></content><author><name>Tim Blazytko</name></author><category term="reversing" /><category term="malware" /><category term="deobfuscation" /><category term="automation" /><summary type="html"><![CDATA[How to write disassemblers for VM-based obfuscators and recover instruction semantics via symbolic execution.]]></summary></entry><entry><title type="html">Automated Detection of Obfuscated Code</title><link href="https://synthesis.to/2021/08/10/obfuscation_detection.html" rel="alternate" type="text/html" title="Automated Detection of Obfuscated Code" /><published>2021-08-10T00:00:00+02:00</published><updated>2021-08-10T00:00:00+02:00</updated><id>https://synthesis.to/2021/08/10/obfuscation_detection</id><content type="html" xml:base="https://synthesis.to/2021/08/10/obfuscation_detection.html"><![CDATA[<p>In a <a href="/2021/03/03/flattening_detection.html">previous blog post</a>, we already discussed that it is valuable to know which code areas are obfuscated; those areas often guard sensitive code and are worth a closer look. Furthermore, we designed a heuristic to automatically detect <a href="https://tigress.wtf/flatten.html">control-flow flattening</a> and state machines in binaries by identifying specific loop characteristics in the control-flow graph. However, other code obfuscation techniques such as <a href="https://tigress.wtf/opaque.html">opaque predicates</a>, <a href="https://tigress.wtf/encodeArithmetic.html">complex arithmetic encodings</a> or <a href="https://tigress.wtf/virtualize.html">virtualization</a> are not necessarily covered by this heuristic, especially if the control-flow graph is loop-free. For these cases, we have to develop new heuristics to identify obfuscation.</p>

<p>In this blog post, we will have a look at some generic heuristics that allow us to quickly identify interesting code parts. For this, we first discuss the general idea to detect code obfuscation. Afterward, we have a closer look at different heuristics and what they identify. In the end, we investigate how the heuristics work on (partially) obfuscated binaries.</p>

<p>We implemented all heuristics in a <a href="https://binary.ninja/">Binary Ninja</a> plugin called <a href="https://github.com/mrphrazer/obfuscation_detection">Obfuscation Detection</a>. The repository also contains (most of the) sample binaries we use for evaluation. If you would like to play around with it, check it out.</p>

<h2 id="detection-heuristics">Detection Heuristics</h2>

<p>Our goal is to develop heuristics that pinpoint code which shares similar characteristics to obfuscated code. Early in the reverse engineering process when we want to get a better overview over the binary, we can use these heuristics to spot <em>interesting</em> code locations that we can inspect manually. Even if these code locations are not obfuscated, they are still relevant for reverse engineers since they often implement complex dispatching routines, cryptographic algorithms or other important program logic.</p>

<p>One way to look at code obfuscation is that it tries to impede reverse engineering by artificially increasing the code’s complexity. Therefore, we can identify obfuscated code by looking for <em>complex</em> code, such as functions with large basic blocks or control-flow graphs. Another way to look at code obfuscation is that it tries to confuse reverse engineers by playing with their assumptions and analysis tools. This way, we can look for <em>anomalies</em> in our analysis tooling, such as overlapping instructions or meaningless disassembly.</p>

<p>If we want to apply such heuristics to large binaries (e.g., several hundreds of megabytes in size), they have to be efficient and provide a minimal analysis overhead. As a consequence, we rely on data points that are easy to obtain and heuristics that are cheap to compute.</p>

<p>In practice, it is useful to apply several heuristics independently: While different heuristics may find the same code locations, they may also find other locations since they look for different characteristics. In the following, we will get to know three such heuristics that accomplish the aforementioned requirements.</p>

<h3 id="complex-functions">Complex Functions</h3>

<p>Intuitively, large functions implement a complex program logic such as file parsing, network protocols, dispatching routines or cryptographic algorithms. If functions are large due to cod obfuscation, they often contain dead code, (nested) opaque predicates or control-flow flattening.</p>

<p>To determine a function’s complexity, we could, for example, count its number of instructions, basic blocks or bytes. However, in all of these cases, we ignore the function’s branch characteristics: We ignore if the function consists contains loops or (nested) branches. A more generic way to measure a function’s complexity is to measure the complexity of its control-flow graph; for this, we can calculate its <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a>. In short, the cyclomatic complexity measures the number of independent paths in a function and is calculated by <code class="language-plaintext highlighter-rouge">#edges - #blocks + 2</code>. 
If the control-flow graph has only a single basic block and no edges, we get <code class="language-plaintext highlighter-rouge">0 - 1 + 2 = 1</code>; if it contains five basic blocks and eight edges, we get <code class="language-plaintext highlighter-rouge">8 - 5 + 2 = 5</code>. The second control-flow graph is more complex. We can implement this heuristic within a few lines in Binary Ninja:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">calc_cyclomatic_complexity</span><span class="p">(</span><span class="n">function</span><span class="p">):</span>
    <span class="c1"># number of basic blocks
</span>    <span class="n">num_blocks</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">function</span><span class="p">.</span><span class="n">basic_blocks</span><span class="p">)</span>
    <span class="c1"># number of edges in the graph
</span>    <span class="n">num_edges</span> <span class="o">=</span> <span class="nf">sum</span><span class="p">([</span><span class="nf">len</span><span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="n">outgoing_edges</span><span class="p">)</span> <span class="k">for</span> <span class="n">b</span> <span class="ow">in</span> <span class="n">function</span><span class="p">.</span><span class="n">basic_blocks</span><span class="p">])</span>
    
    <span class="k">return</span> <span class="n">num_edges</span> <span class="o">-</span> <span class="n">num_blocks</span> <span class="o">+</span> <span class="mi">2</span>
</code></pre></div></div>

<p>To build a meaningful heuristic that is based on the cyclomatic complexity, we can <em>sort</em> all functions by their complexity and print the <em>upper</em> 10%:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">find_complex_functions</span><span class="p">(</span><span class="n">bv</span><span class="p">):</span>
    <span class="c1"># sort functions by cyclomatic complexity
</span>    <span class="n">sorted_functions</span> <span class="o">=</span> <span class="nf">sorted</span><span class="p">(</span><span class="n">bv</span><span class="p">.</span><span class="n">functions</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nf">calc_cyclomatic_complexity</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>

    <span class="c1"># bound to print only the top 10%
</span>    <span class="n">bound</span> <span class="o">=</span> <span class="n">math</span><span class="p">.</span><span class="nf">ceil</span><span class="p">(((</span><span class="nf">len</span><span class="p">(</span><span class="n">bv</span><span class="p">.</span><span class="n">functions</span><span class="p">)</span> <span class="o">*</span> <span class="mi">10</span><span class="p">)</span> <span class="o">/</span> <span class="mi">100</span><span class="p">))</span>
    
    <span class="c1"># print top 10% (iterate in descending order)
</span>    <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="nf">list</span><span class="p">(</span><span class="nf">reversed</span><span class="p">(</span><span class="n">sorted_functions</span><span class="p">))[:</span><span class="n">bound</span><span class="p">]:</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="nf">hex</span><span class="p">(</span><span class="n">f</span><span class="p">.</span><span class="n">start</span><span class="p">)</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="nf">calc_cyclomatic_complexity</span><span class="p">(</span><span class="n">f</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p>Overall, the function’s cyclomatic complexity provides a good heuristic to fingerprint complex graphs. However, it ignores the size of basic blocks and the number of instructions in the function; data points which are interesting to pinpoint other code constructs.</p>

<h3 id="large-basic-blocks">Large Basic Blocks</h3>

<p>Large basic blocks guarantee that a sequence of code is executed in a row. Often, they implement complex calculations, (unrolled) cryptographic algorithms or initialization routines. For obfuscated code, large basic blocks often contain dead code, initialize virtual machines or hide simple arithmetic calculations in complex arithmetic encodings.</p>

<p>To pinpoint large basic blocks, we could sort all basic blocks in the binary by their number of instructions. However, this might create a lot of noise, especially if there are a lot of large blocks within the same function. Alternatively, we can consider large basic blocks in a <em>function context</em> by calculating the <em>average number of instructions</em> per basic block, relative to the number of basic blocks in the function:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># instructions in the function
------------------------------
# basic blocks in the function
</code></pre></div></div>

<p>The average number of instructions is high if a function contains only a single large basic block or if a function contains several larger blocks. This way, we can again pinpoint complex code on the function level.</p>

<p>In Binary Ninja, we implement this as follows:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">calc_average_instructions_per_block</span><span class="p">(</span><span class="n">function</span><span class="p">):</span>
    <span class="c1"># number of basic blocks
</span>    <span class="n">num_blocks</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">function</span><span class="p">.</span><span class="n">basic_blocks</span><span class="p">)</span>
    <span class="c1"># number of instructions
</span>    <span class="n">num_instructions</span> <span class="o">=</span> <span class="nf">sum</span><span class="p">(</span>
        <span class="p">[</span><span class="n">b</span><span class="p">.</span><span class="n">instruction_count</span> <span class="k">for</span> <span class="n">b</span> <span class="ow">in</span> <span class="n">function</span><span class="p">.</span><span class="n">basic_blocks</span><span class="p">])</span>
    
    <span class="k">return</span> <span class="n">num_instructions</span> <span class="o">/</span> <span class="n">num_blocks</span>
</code></pre></div></div>

<p>Similar to the previous heuristic, we can sort all functions based on their average basic block size and <em>print</em> the <em>upper</em> 10%.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">find_large_basic_blocks</span><span class="p">(</span><span class="n">bv</span><span class="p">):</span>
    <span class="c1"># sort functions by average basic block size
</span>    <span class="n">sorted_functions</span> <span class="o">=</span> <span class="nf">sorted</span><span class="p">(</span><span class="n">bv</span><span class="p">.</span><span class="n">functions</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="nf">calc_average_instructions_per_block</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>

    <span class="c1"># bound to print only the top 10%
</span>    <span class="n">bound</span> <span class="o">=</span> <span class="n">math</span><span class="p">.</span><span class="nf">ceil</span><span class="p">(((</span><span class="nf">len</span><span class="p">(</span><span class="n">bv</span><span class="p">.</span><span class="n">functions</span><span class="p">)</span> <span class="o">*</span> <span class="mi">10</span><span class="p">)</span> <span class="o">/</span> <span class="mi">100</span><span class="p">))</span>
    <span class="c1"># print top 10% (iterate in descending order)
</span>    <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="nf">list</span><span class="p">(</span><span class="nf">reversed</span><span class="p">(</span><span class="n">sorted_functions</span><span class="p">))[:</span><span class="n">bound</span><span class="p">]:</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="nf">hex</span><span class="p">(</span><span class="n">f</span><span class="p">.</span><span class="n">start</span><span class="p">)</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">math</span><span class="p">.</span><span class="nf">ceil</span><span class="p">(</span><span class="nf">calc_average_instructions_per_block</span><span class="p">(</span><span class="n">f</span><span class="p">))</span><span class="si">}</span><span class="s">.</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="instruction-overlapping">Instruction Overlapping</h3>

<p>Up until now, the developed heuristics aimed at detecting complex code. However, sometimes code obfuscation tries to confuse disassemblers by introducing opaque control transfers to addresses that are <em>in the middle</em> of valid instructions. This way, the disassembler does not know how to proceed and build the control-flow graph, since two instructions <em>overlap</em>. In non-obfuscated code, this can also happen in cases where the disassembler mistakenly interprets data as code, therefore creates <em>meaningless</em> disassembly.</p>

<p>To detect instruction overlapping, we can simply walk over all instructions and store the addresses of <em>all</em> bytes that belong to the corresponding instruction in a set. If we check before inserting a byte, whether it is already part of the set, we know that instructions do not overlap. Otherwise—if an address is already in the set before insertion—we found two instructions overlap. In this case, we can print the function the instruction belongs to. If we want to go a step further, we can also highlight the corresponding instructions in the disassembler.</p>

<p>Compared to the other heuristics in which we walked over all functions and interacted with the graph API, we now have to walk over all function bytes in the program, making the implementation and runtime overhead more expensive (we omit the code listing for readability). However, in practice, the overhead is still acceptable for large binaries (up to several minutes).</p>

<h2 id="evaluation">Evaluation</h2>

<p>To get a feeling on how good these heuristics work, we will evaluate them on three binaries: two malware samples and the Windows kernel. Intuitively, we expect to find obfuscated code as well as other interesting program logic. So, let’s dig in and have a look how the heuristics perform on the individual samples.</p>

<h3 id="emotet">Emotet</h3>

<p>The <a href="https://www.virustotal.com/gui/file/3261111211bb0de7eb9ccb14997865913e8841ec1a5701988102105bba4e4aba/detection">Emotet sample</a> sample uses a custom implementation of control-flow flattening. While most of it functions are obfuscated, the sample contains also non-obfuscated code.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Cyclomatic Complexity
Function 0x4063f0 (sub_4063f0) has a cyclomatic complexity of 76.
Function 0x4012a0 (sub_4012a0) has a cyclomatic complexity of 36.
Function 0x405800 (sub_405800) has a cyclomatic complexity of 35.
Function 0x402b60 (sub_402b60) has a cyclomatic complexity of 35.
Function 0x409e20 (sub_409e20) has a cyclomatic complexity of 31.
Function 0x404f50 (sub_404f50) has a cyclomatic complexity of 29.
Function 0x40a4b0 (sub_40a4b0) has a cyclomatic complexity of 27.
Function 0x402210 (sub_402210) has a cyclomatic complexity of 26.
Function 0x4025a0 (sub_4025a0) has a cyclomatic complexity of 24.
Function 0x40a9d0 (sub_40a9d0) has a cyclomatic complexity of 22.
Function 0x409530 (sub_409530) has a cyclomatic complexity of 22.
[snip]
</code></pre></div></div>

<p>We see that one function is significantly more complex than all others; the others, however, are in a comparable dimension. Most of the functions (including the most complex one) are obfuscated via control-flow flattening; the functions that are not obfuscated implement some sequential dispatching logic.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Large Basic Blocks
Basic blocks in function 0x405e40 (sub_405e40) contain on average 19 instructions.
Basic blocks in function 0x405d20 (sub_405d20) contain on average 19 instructions.
Basic blocks in function 0x405e00 (sub_405e00) contain on average 16 instructions.
Basic blocks in function 0x405c90 (sub_405c90) contain on average 13 instructions.
Basic blocks in function 0x404ee0 (sub_404ee0) contain on average 13 instructions.
Basic blocks in function 0x405ea0 (sub_405ea0) contain on average 12 instructions.
Basic blocks in function 0x405db0 (sub_405db0) contain on average 12 instructions.
Basic blocks in function 0x405d70 (sub_405d70) contain on average 12 instructions.
[snip]
</code></pre></div></div>

<p>The large basic block heuristic does not produce any anomaly; all the values are in a comparable range. If we dig into the individual functions, we see that most of them share a similar structure: They are single-basic block functions that set initialize memory values. If we analyze the functions’ usages, we learn that they are called from obfuscated functions to update the control-flow flattening state:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int32_t</span> <span class="nf">sub_405c90</span><span class="p">()</span> <span class="p">{</span>
    <span class="kt">int32_t</span> <span class="n">ecx</span>
    <span class="kt">int32_t</span> <span class="n">var_4_4</span> <span class="o">=</span> <span class="n">ecx</span>
    <span class="kt">int32_t</span> <span class="n">var_4</span> <span class="o">=</span> <span class="mh">0x2224</span>
    <span class="kt">int32_t</span> <span class="n">eax</span>
    <span class="kt">int32_t</span> <span class="n">edx</span>
    <span class="nl">edx:</span><span class="n">eax</span> <span class="o">=</span> <span class="n">mulu</span><span class="p">.</span><span class="n">dp</span><span class="p">.</span><span class="n">d</span><span class="p">(</span><span class="mh">0xcccccccd</span><span class="p">,</span> <span class="mh">0x7acbf5eb</span><span class="p">)</span>
    <span class="k">return</span> <span class="p">((</span><span class="n">edx</span> <span class="n">u</span><span class="o">&gt;&gt;</span> <span class="mi">6</span><span class="p">)</span> <span class="o">-</span> <span class="mh">0xc617</span><span class="p">)</span> <span class="o">^</span> <span class="mh">0x1882c5b</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The instruction overlapping heuristic does not find any overlapping instructions. So far, both complexity heuristics pinpoint different code locations, effectively clustering the obfuscated code in complex state machines and helper functions.</p>

<h3 id="adylkuzz">Adylkuzz</h3>

<p>The <a href="https://www.virustotal.com/gui/file/8200755cbedd6f15eecd8207eba534709a01957b172d7a051b9cc4769ddbf233/detection">Adylkuzz sample</a> is protected by the <a href="https://vmpsoft.com/">VMProtect</a> obfuscator. VMProtect is a virtualization-based obfuscator that heavily relies on opaque predicates thwarting disassemblers with instruction overlapping/disalinged control flow. Furthermore, it uses dead code to bloat its code size.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Cyclomatic Complexity
Function 0x70c597 (sub_70c597) has a cyclomatic complexity of 79.
Function 0x5c0821 (sub_5c0821) has a cyclomatic complexity of 76.
Function 0x6ff664 (sub_6ff664) has a cyclomatic complexity of 75.
Function 0x70b66e (sub_70b66e) has a cyclomatic complexity of 74.
Function 0x6ff79b (sub_6ff79b) has a cyclomatic complexity of 74.
Function 0x70feea (sub_70feea) has a cyclomatic complexity of 73.
Function 0x709927 (sub_709927) has a cyclomatic complexity of 73.
Function 0x5c36db (sub_5c36db) has a cyclomatic complexity of 60.
Function 0x6fefe5 (sub_6fefe5) has a cyclomatic complexity of 41.
Function 0x5c0bfc (sub_5c0bfc) has a cyclomatic complexity of 31.
Function 0x7086ab (sub_7086ab) has a cyclomatic complexity of 28.
Function 0x703be1 (sub_703be1) has a cyclomatic complexity of 28.
Function 0x70dd19 (sub_70dd19) has a cyclomatic complexity of 27.
Function 0x6f9fd3 (sub_6f9fd3) has a cyclomatic complexity of 27.
Function 0x6e53bf (sub_6e53bf) has a cyclomatic complexity of 27.
Function 0x6fbbed (sub_6fbbed) has a cyclomatic complexity of 26.
Function 0x70ac91 (sub_70ac91) has a cyclomatic complexity of 24.
[snip]
</code></pre></div></div>

<p>Most of the identified functions with a high cyclomatic complexity are garbage, since the disassembler produces an invalid disassembly due to overlapping instructions. However, the few valid functions initialize the VM and import the hidden API calls (via <code class="language-plaintext highlighter-rouge">LoadLibraryA</code>).</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Large Basic Blocks
Basic blocks in function 0x70d941 (sub_70d941) contain on average 112 instructions.
Basic blocks in function 0x6baa2e (sub_6baa2e) contain on average 104 instructions.
Basic blocks in function 0x5becad (sub_5becad) contain on average 67 instructions.
Basic blocks in function 0x6ba981 (sub_6ba981) contain on average 59 instructions.
Basic blocks in function 0x5b98a3 (sub_5b98a3) contain on average 49 instructions.
Basic blocks in function 0x5b37a2 (sub_5b37a2) contain on average 49 instructions.
Basic blocks in function 0x6f836d (sub_6f836d) contain on average 48 instructions.
Basic blocks in function 0x5b7ac6 (sub_5b7ac6) contain on average 48 instructions.
Basic blocks in function 0x5ba8e9 (sub_5ba8e9) contain on average 46 instructions.
Basic blocks in function 0x5b1236 (sub_5b1236) contain on average 45 instructions.
Basic blocks in function 0x6fe79c (sub_6fe79c) contain on average 44 instructions.
[snip]
</code></pre></div></div>

<p>For the large basic block heuristic, the results are a bit different: While the functions with the highest scores are also garbage, many other identified functions implement the instruction semantics handler of the virtualization-based obfuscation (VM handler) within a single basic block.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Overlapping instructions in function 0x5bedbd (sub_5bedbd).
Overlapping instructions in function 0x5bf05a (sub_5bf05a).
Overlapping instructions in function 0x5bf4d6 (sub_5bf4d6).
Overlapping instructions in function 0x5bf7de (sub_5bf7de).
Overlapping instructions in function 0x5c0125 (sub_5c0125).
Overlapping instructions in function 0x5c01b5 (sub_5c01b5).
Overlapping instructions in function 0x5c0363 (sub_5c0363).
Overlapping instructions in function 0x5c03bf (sub_5c03bf).
Overlapping instructions in function 0x5c0821 (sub_5c0821).
Overlapping instructions in function 0x5c0bfc (sub_5c0bfc).
Overlapping instructions in function 0x5c1003 (sub_5c1003).
Overlapping instructions in function 0x5c1447 (sub_5c1447).
Overlapping instructions in function 0x5c1563 (sub_5c1563).
[snip]
</code></pre></div></div>

<p>As indicated by the results of the other heuristics, the instruction overlapping heuristic identifies a magnitude of functions that contain  overlapping instructions and produce incorrect disassembly. While many functions can be immediately ignored since they are only garbage, some functions <em>might</em>  contain valid instructions; in those functions, only parts of the disassembly are broken. However, by purely static analysis, it is hard to tell if the instructions are valid or not.</p>

<p>In summary, we can say that all heuristics identified garbage code. If we remove all functions that are identified by the instruction overlapping heuristic from the results of the other heuristics, we can again group the identified functions into two categories: The complex functions perform VM-related initialization routine or decrypt API calls, while the large basic block heuristic pinpoints the VM handlers.</p>

<h3 id="windows-kernel">Windows Kernel</h3>

<p>After evaluating the heuristics on obfuscated malware samples, let us have a look on how they work on a commercial real-world application: the latest version of the Windows kernel, <code class="language-plaintext highlighter-rouge">ntoskrnl.exe</code> (11 MiB, MD5: <code class="language-plaintext highlighter-rouge">c9d2f9ada42052c2a34cb3e0743caf48</code>). While most parts of the Windows kernel are not obfuscated, it contains an anti-tamper protection called <a href="https://en.wikipedia.org/wiki/Kernel_Patch_Protection">PatchGuard</a> which employs a lightweight obfuscation by Microsoft’s in-house obfuscation framework <a href="https://www.youtube.com/watch?v=gu_i6LYuePg">Warbird</a>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Function 0x140a1bee4 (sub_140a1bee4) has a cyclomatic complexity of 2964.
Function 0x1409f7010 (FsRtlMdlReadCompleteDevEx) has a cyclomatic complexity of 2371.
Function 0x1403da6d0 (sub_1403da6d0) has a cyclomatic complexity of 1506.
Function 0x1405d3a40 (PropertyEval) has a cyclomatic complexity of 718.
Function 0x14069fcf0 (NtSetInformationProcess) has a cyclomatic complexity of 642.
Function 0x14068ecb0 (ExpQuerySystemInformation) has a cyclomatic complexity of 435.
Function 0x14066bc78 (SPCall2ServerInternal) has a cyclomatic complexity of 414.
Function 0x14022fba0 (MmCheckCachedPageStates) has a cyclomatic complexity of 318.
Function 0x140a0b0fc (sub_140a0b0fc) has a cyclomatic complexity of 281.
Function 0x1406a9da0 (NtSetSystemInformation) has a cyclomatic complexity of 274.
Function 0x140675a50 (IopParseDevice) has a cyclomatic complexity of 271.
</code></pre></div></div>

<p>We notice that the first three functions have a very high complexity; afterward, the values drop quickly. Public research about PatchGuard’s internals are not very well documented, so it’s hard to tell what these functions do. However, the first three functions are definitely related to PatchGuard: The first one is related to PatchGuard’s initialization routine, while the second (<code class="language-plaintext highlighter-rouge">FsRtlMdlReadCompleteDevEx</code>) <a href="https://github.com/tandasat/PgResarch/blob/master/DisPG/DisPG/win8.cpp#L44">is known to perform some PatchGuard related checks</a>. The third function is called by <code class="language-plaintext highlighter-rouge">KiFilterFiberContext</code>, which <a href="https://blog.tetrane.com/downloads/Tetrane_PatchGuard_Analysis_RS4_v1.01.pdf">is also a known PatchGuard function</a>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Large Basic Blocks
Basic blocks in function 0x140a62e04 (SepInitSystemDacls) contain on average 491 instructions.
Basic blocks in function 0x1403e8f74 (SymCryptSha256AppendBlocks_ul1) contain on average 236 instructions.
Basic blocks in function 0x1404bdc60 (HalpRestoreHvEnlightenment) contain on average 147 instructions.
Basic blocks in function 0x140a55cb8 (MiInitializeDummyPages) contain on average 133 instructions.
Basic blocks in function 0x1409a7744 (HalpBlkInitializeProcessorState) contain on average 103 instructions.
[snip]
</code></pre></div></div>

<p>The functions with the largest average basic block size do not seem to be related to PatchGuard. Instead, based on their function names, they initialize different data structures (<code class="language-plaintext highlighter-rouge">SepInitSystemDacls</code>, <code class="language-plaintext highlighter-rouge">HalpRestoreHvEnlightenment</code>, <code class="language-plaintext highlighter-rouge">MiInitializeDummyPages</code> and <code class="language-plaintext highlighter-rouge">HalpBlkInitializeProcessorState</code>) or implement cryptographic algorithms (<code class="language-plaintext highlighter-rouge">SymCryptSha256AppendBlocks_ul1</code>).</p>

<p>While the instruction overlapping heuristic also pinpoints some functions, they can be ignored: The results are all false positives, since the disassembler wrongly interprets data as code.</p>

<p>Overall, we can summarize the experiments and say that the heuristics pinpoint all kinds of interesting code parts, no matter if the code is obfuscated, implements a complex state machine, initialization routines or cryptographic algorithms. As we have seen for Emotet and the Windows kernel, it can be beneficial to pay special intention to peaks in the values. Furthermore, we  observed that heuristics often produce different results, but can also identify the same code locations.</p>

<h2 id="setting-the-scene">Setting the Scene</h2>

<p>In a <a href="/2021/03/03/flattening_detection.html">previous blog post</a>, we introduced a heuristic to detect control-flow flattening and state machines in binaries. This time, we developed more generic heuristics to pinpoint code obfuscation and complex code. While the heuristics are different in their nature, they all are easy to implement, cheap to calculate and exploit characteristics that are shared by obfuscated as well as interesting non-obfuscated code.</p>

<p>As part of our day-to-day reverse engineering, we can use these heuristics to get an initial overview over the binary; we can spot which code areas might be worth a closer look. If the code is obfuscated, we then can try to understand the context in which the obfuscation is embedded. Afterward—if we want to better understand the obfuscated code—we may look for patterns and come up with a strategy to automatically remove the obfuscation (as we’ll do in my <a href="/training_software_deobfuscation.html">code deobfuscation training classes</a>).</p>]]></content><author><name>Tim Blazytko</name></author><category term="obfuscation" /><category term="detection" /><category term="heuristics" /><category term="cfg" /><category term="complexity" /><category term="opaque-predicates" /><category term="virtualization" /><category term="reverse-engineering" /><summary type="html"><![CDATA[Heuristic detection of obfuscated code using CFG metrics (cyclomatic complexity, SCC structure, loop depth), entropy features, opaque predicate signatures, and VM dispatcher patterns.]]></summary></entry><entry><title type="html">Automation in Reverse Engineering: String Decryption</title><link href="https://synthesis.to/2021/06/30/automating_string_decryption.html" rel="alternate" type="text/html" title="Automation in Reverse Engineering: String Decryption" /><published>2021-06-30T00:00:00+02:00</published><updated>2021-06-30T00:00:00+02:00</updated><id>https://synthesis.to/2021/06/30/automating_string_decryption</id><content type="html" xml:base="https://synthesis.to/2021/06/30/automating_string_decryption.html"><![CDATA[<p>Automation plays a crucial rule in reverse engineering, no matter whether we search for vulnerabilities in software, analyze malware or remove obfuscated layers from code. Once we manually identify repeating patterns, we try to automate the process as far as possible. For automation, it often doesn’t matter if you use <a href="https://binary.ninja/">Binary Ninja</a>, <a href="https://hex-rays.com/ida-pro/">IDA Pro</a> or <a href="https://ghidra-sre.org/">Ghidra</a>, as long as you have the knowledge <em>how</em> to realize it in your tool of choice. As you will see, you don’t have to be an expert to automate tedious reverse engineering tasks; sometimes it just takes a few lines of code to improve your understanding a lot.</p>

<p>Today, we take a closer look at this process and automate the decryption of strings for a malware sample from the <a href="https://en.wikipedia.org/wiki/Mirai_(malware)">Mirai botnet</a>. Mirai is a malware family that hijacks embedded systems such as IP cameras or home routers by scanning for devices that accept default login credentials. To impede analysis, Mirai samples store those credentials in an encoded form and decode them at runtime using a simple XOR with a constant. In the following, we first manually analyze the string obfuscation. Afterward, we use Binary Ninja’s high-level intermediate language (HLIL) API to get all string references and decrypt them.</p>

<p>If you would like to try it on your own, you’ll find the code and the used <a href="https://www.virustotal.com/gui/file/c87e5db01d2c942fa6973f4578c9a72813b42f3daa8ba78f1ad035f756a55c78/detection">malware sample</a> on <a href="https://github.com/mrphrazer/mirai_string_deobfuscation">GitHub</a>. To better understand Mirai, you can also have a look at its leaked <a href="https://github.com/jgamblin/Mirai-Source-Code">source code</a>.</p>

<h2 id="manual-analysis">Manual Analysis</h2>

<p>In static malware analysis, one of the first things to do is to have a closer look at the identified strings, since they often reveal a lot of context. In this sample, however, we mostly see strings like <code class="language-plaintext highlighter-rouge">PMMV</code>, <code class="language-plaintext highlighter-rouge">CFOKL</code>, <code class="language-plaintext highlighter-rouge">QWRRMPV</code> and others. At first glance, they don’t make much sense. However, if we have a closer look at how they are used in the code, we notice something interesting: They are repeatedly used as function parameters for the function <code class="language-plaintext highlighter-rouge">sub_10778</code>. (The corresponding function calls can be found <a href="https://github.com/jgamblin/Mirai-Source-Code/blob/master/mirai/bot/scanner.c#L123">here</a> in the leaked source code.)</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sub_10778</span><span class="p">(</span><span class="s">"PMMV"</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">data_1616c</span><span class="p">,</span> <span class="mh">0xa</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"PMMV"</span><span class="p">,</span> <span class="s">"TKXZT"</span><span class="p">,</span> <span class="mi">9</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"PMMV"</span><span class="p">,</span> <span class="s">"CFOKL"</span><span class="p">,</span> <span class="mi">8</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"CFOKL"</span><span class="p">,</span> <span class="s">"CFOKL"</span><span class="p">,</span> <span class="mi">7</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"PMMV"</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">data_16184</span><span class="p">,</span> <span class="mi">6</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"PMMV"</span><span class="p">,</span> <span class="s">"ZOJFKRA"</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"PMMV"</span><span class="p">,</span> <span class="s">"FGDCWNV"</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"PMMV"</span><span class="p">,</span> <span class="mh">0x1619c</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>  <span class="p">{</span><span class="s">"HWCLVGAJ"</span><span class="p">}</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"PMMV"</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">data_161a8</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"PMMV"</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">data_161b0</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"QWRRMPV"</span><span class="p">,</span> <span class="s">"QWRRMPV"</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
</code></pre></div></div>

<p>Based on this, we can assume that the passed strings are decoded and further processed in the called function. If we inspect the decompiled code of the function, we identify the following snippet that operates on the first function parameter <code class="language-plaintext highlighter-rouge">arg1</code>. For the second parameter <code class="language-plaintext highlighter-rouge">arg2</code>, we can find a similar snippet.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="n">r0_3</span> <span class="o">=</span> <span class="n">sub_12c90</span><span class="p">(</span><span class="n">arg1</span><span class="p">)</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">r0_5</span> <span class="o">=</span> <span class="n">sub_14100</span><span class="p">(</span><span class="n">r0_3</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">sub_12d0c</span><span class="p">(</span><span class="n">r0_5</span><span class="p">,</span> <span class="n">arg1</span><span class="p">,</span> <span class="n">r0_3</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">if</span> <span class="p">(</span><span class="n">r0_3</span> <span class="n">s</span><span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">char</span><span class="o">*</span> <span class="n">r2_3</span> <span class="o">=</span> <span class="n">nullptr</span>
    <span class="k">do</span> <span class="p">{</span>
        <span class="o">*</span><span class="p">(</span><span class="n">r2_3</span> <span class="o">+</span> <span class="n">r0_5</span><span class="p">)</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="n">r2_3</span> <span class="o">+</span> <span class="n">r0_5</span><span class="p">)</span> <span class="o">^</span> <span class="mh">0x22</span>
        <span class="n">r2_3</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">r2_3</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
    <span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">r0_3</span> <span class="o">!=</span> <span class="n">r2_3</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The code first performs some function calls using <code class="language-plaintext highlighter-rouge">arg1</code>, goes into a loop and increments a counter until the condition <code class="language-plaintext highlighter-rouge">r0_3 != r2_3</code> no longer holds. Within the loop, we notice an XOR operation <code class="language-plaintext highlighter-rouge">*(r2_3 + r0_5) ^ 0x22</code>, where <code class="language-plaintext highlighter-rouge">*(r2_3 + r0_5)</code> seems to be an array-like memory access that is xored with the constant <code class="language-plaintext highlighter-rouge">0x22</code>. After performing a deeper analysis, we can clean up the code by assigning some reasonable variable and function names.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint32_t</span> <span class="n">length</span> <span class="o">=</span> <span class="n">strlen</span><span class="p">(</span><span class="n">arg1</span><span class="p">)</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">ptr</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">length</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">strcpy</span><span class="p">(</span><span class="n">ptr</span><span class="p">,</span> <span class="n">arg1</span><span class="p">,</span> <span class="n">length</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">if</span> <span class="p">(</span><span class="n">length</span> <span class="n">s</span><span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">char</span><span class="o">*</span> <span class="n">index</span> <span class="o">=</span> <span class="n">nullptr</span>
    <span class="k">do</span> <span class="p">{</span>
        <span class="o">*</span><span class="p">(</span><span class="n">index</span> <span class="o">+</span> <span class="n">ptr</span><span class="p">)</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="n">index</span> <span class="o">+</span> <span class="n">ptr</span><span class="p">)</span> <span class="o">^</span> <span class="mh">0x22</span>
        <span class="n">index</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">index</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
    <span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">length</span> <span class="o">!=</span> <span class="n">index</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now, we have a better understanding of what the code does: It first calculates the length of the provided string, allocates memory for a new string and copies the encrypted string into the allocated buffer. Afterward, it walks over the copied string and decrypts it bytewise by xoring each byte with <code class="language-plaintext highlighter-rouge">0x22</code>. This is also in line with the <a href="https://github.com/jgamblin/Mirai-Source-Code/blob/master/mirai/bot/scanner.c#L963">decryption routine</a> of the original source code.</p>

<p>In other words, strings are encoded using a bytewise XOR with the constant value <code class="language-plaintext highlighter-rouge">0x22</code>. If we want to decode the string <code class="language-plaintext highlighter-rouge">PMMV</code> in Python, we can do this with the following one-liner.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="sh">''</span><span class="p">.</span><span class="nf">join</span><span class="p">([</span><span class="nf">chr</span><span class="p">(</span><span class="nf">ord</span><span class="p">(</span><span class="n">c</span><span class="p">)</span> <span class="o">^</span> <span class="mh">0x22</span><span class="p">)</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="sh">"</span><span class="s">PMMV</span><span class="sh">"</span><span class="p">])</span>
<span class="sh">'</span><span class="s">root</span><span class="sh">'</span>
</code></pre></div></div>

<p>We walk over each byte of the string, get its corresponding ASCII value via <code class="language-plaintext highlighter-rouge">ord</code>, xor it with <code class="language-plaintext highlighter-rouge">0x22</code> and transform it back into a character using <code class="language-plaintext highlighter-rouge">chr</code>. In a final step, we join all characters into a single string.</p>

<p>After we manually analyzed how strings can be decrypted, we will now automate this with Binary Ninja.</p>

<h2 id="automated-decryption">Automated Decryption</h2>

<p>To automate the decryption, we first have to find a way to identify all encoded strings. In particular, we have to know where they start and where they end; in other words, we aim to identify all encrypted bytes. In the second step, we can decrypt each byte individually.</p>

<p>Beforehand, we noticed that the encoded strings are passed as the first two parameters to the function <code class="language-plaintext highlighter-rouge">sub_10778</code>. To obtain the encoded strings, we can exploit this characteristic by searching for all function calls and parse all passed parameters. Using Binary Ninja’s high-level intermediate language (HLIL) API, we can realize this within a few lines of code.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># get function instance of target function
</span><span class="n">target_function</span> <span class="o">=</span> <span class="n">bv</span><span class="p">.</span><span class="nf">get_function_at</span><span class="p">(</span><span class="mh">0x10778</span><span class="p">)</span>
<span class="c1"># set of already decrypted bytes
</span><span class="n">already_decrypted</span> <span class="o">=</span> <span class="nf">set</span><span class="p">()</span>

<span class="c1"># 1: walk over all callers
</span><span class="k">for</span> <span class="n">caller_function</span> <span class="ow">in</span> <span class="nf">set</span><span class="p">(</span><span class="n">target_function</span><span class="p">.</span><span class="n">callers</span><span class="p">):</span>
    
    <span class="c1"># 2: walk over high-level IL instructions
</span>    <span class="k">for</span> <span class="n">instruction</span> <span class="ow">in</span> <span class="n">caller_function</span><span class="p">.</span><span class="n">hlil</span><span class="p">.</span><span class="n">instructions</span><span class="p">:</span>
    
        <span class="c1"># 3: if IL instruction is a call
</span>        <span class="c1">#    and call goes to target function
</span>        <span class="nf">if </span><span class="p">(</span><span class="n">instruction</span><span class="p">.</span><span class="n">operation</span> <span class="o">==</span> <span class="n">HighLevelILOperation</span><span class="p">.</span><span class="n">HLIL_CALL</span> <span class="ow">and</span>
            <span class="n">instruction</span><span class="p">.</span><span class="n">dest</span><span class="p">.</span><span class="n">constant</span> <span class="o">==</span> <span class="n">target_function</span><span class="p">.</span><span class="n">start</span><span class="p">):</span>
                
            <span class="c1"># 4: fetch pointer to encrypted strings
</span>            <span class="n">p1</span> <span class="o">=</span> <span class="n">instruction</span><span class="p">.</span><span class="n">params</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
            <span class="n">p2</span> <span class="o">=</span> <span class="n">instruction</span><span class="p">.</span><span class="n">params</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
            
            <span class="c1"># 5: decrypt strings
</span>            <span class="nf">decrypt</span><span class="p">(</span><span class="n">p1</span><span class="p">.</span><span class="n">value</span><span class="p">.</span><span class="n">value</span><span class="p">,</span> <span class="n">already_decrypted</span><span class="p">)</span>
            <span class="nf">decrypt</span><span class="p">(</span><span class="n">p2</span><span class="p">.</span><span class="n">value</span><span class="p">.</span><span class="n">value</span><span class="p">,</span> <span class="n">already_decrypted</span><span class="p">)</span>
</code></pre></div></div>

<p>After fetching the function object of the targeted function <code class="language-plaintext highlighter-rouge">sub_10778</code>, we walk over all functions calling <code class="language-plaintext highlighter-rouge">sub_10778</code>. For each of these calling functions (referred to as <em>callers</em>), we need to identify the instruction that performs the call to <code class="language-plaintext highlighter-rouge">sub_10778</code>. In order to do this, we walk over the caller’s HLIL instructions; for each instruction, we then check if its operation is a call and if the call destination is the targeted function. If so, we access its first two parameters (the pointers to the encoded strings) and pass them to the decryption function. Since some strings—such as <code class="language-plaintext highlighter-rouge">PMMV</code>—are used as parameters multiple times, we ensure that we only decrypt them once. Therefore, we collect the addresses of all bytes that we already have decrypted in a set called <code class="language-plaintext highlighter-rouge">already_decrypted</code>.</p>

<p>Up until now, we identified all parameters that flow into the decryption routine. The only thing left to do is to identify all encrypted bytes and decrypt them. Since each parameter is a pointer to a string, we can consider it as the string’s start address. Similarly, we can determine the string’s end by scanning for terminating null bytes.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">decrypt</span><span class="p">(</span><span class="n">address</span><span class="p">,</span> <span class="n">already_decrypted</span><span class="p">):</span>
    <span class="c1"># walk over string bytes until termination
</span>    <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
        <span class="c1"># read a single byte from database
</span>        <span class="n">encrypted_byte</span> <span class="o">=</span> <span class="n">bv</span><span class="p">.</span><span class="nf">read</span><span class="p">(</span><span class="n">address</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>

        <span class="c1"># return if null byte or already decrypted
</span>        <span class="k">if</span> <span class="n">encrypted_byte</span> <span class="o">==</span> <span class="sa">b</span><span class="sh">'</span><span class="se">\x00</span><span class="sh">'</span> <span class="ow">or</span> <span class="n">address</span> <span class="ow">in</span> <span class="n">already_decrypted</span><span class="p">:</span>
            <span class="k">return</span>

        <span class="c1"># decrypt byte
</span>        <span class="n">decrypted_byte</span> <span class="o">=</span> <span class="nf">chr</span><span class="p">(</span><span class="nf">int</span><span class="p">(</span><span class="n">encrypted_byte</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="o">^</span> <span class="mh">0x22</span><span class="p">)</span>
        
        <span class="c1"># write decrypted byte to database
</span>        <span class="n">bv</span><span class="p">.</span><span class="nf">write</span><span class="p">(</span><span class="n">address</span><span class="p">,</span> <span class="n">decrypted_byte</span><span class="p">)</span>

        <span class="c1"># add to set of decrypted addresses
</span>        <span class="n">already_decrypted</span><span class="p">.</span><span class="nf">add</span><span class="p">(</span><span class="n">address</span><span class="p">)</span>

        <span class="c1"># increment address
</span>        <span class="n">address</span> <span class="o">+=</span> <span class="mi">1</span>
</code></pre></div></div>

<p>Taking the string’s start address as input, we sequentially walk over the string until we reach a byte that terminates the string or that was already decrypted. For each byte, we then transform it into an integer, xor it with <code class="language-plaintext highlighter-rouge">0x22</code>, encode it as a character and write it back to the database. Afterward, we add the current address to the set <code class="language-plaintext highlighter-rouge">already_decrypted</code> and increment the address.</p>

<p>Finally, we have all parts together: We walk over all function calls of the string decryption function, parse the parameters for each call and decrypt all the strings in Binary Ninja’s database. If we put everything into a Python script and execute it, the decompiled code from above contains all strings in plain text.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sub_10778</span><span class="p">(</span><span class="s">"root"</span><span class="p">,</span> <span class="s">"xc3511"</span><span class="p">,</span> <span class="mh">0xa</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"root"</span><span class="p">,</span> <span class="s">"vizxv"</span><span class="p">,</span> <span class="mi">9</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"root"</span><span class="p">,</span> <span class="s">"admin"</span><span class="p">,</span> <span class="mi">8</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"admin"</span><span class="p">,</span> <span class="s">"admin"</span><span class="p">,</span> <span class="mi">7</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"root"</span><span class="p">,</span> <span class="s">"888888"</span><span class="p">,</span> <span class="mi">6</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"root"</span><span class="p">,</span> <span class="s">"xmhdipc"</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"root"</span><span class="p">,</span> <span class="s">"default"</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"root"</span><span class="p">,</span> <span class="mh">0x1619c</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>  <span class="p">{</span><span class="s">"juantech"</span><span class="p">}</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"root"</span><span class="p">,</span> <span class="s">"123456"</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"root"</span><span class="p">,</span> <span class="s">"54321"</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="n">sub_10778</span><span class="p">(</span><span class="s">"support"</span><span class="p">,</span> <span class="s">"support"</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
</code></pre></div></div>

<p>As a result, the decompilation reveals much more context information. By googling <a href="https://www.google.com/search?q=xmhdipc">some</a> <a href="https://www.google.com/search?q=vizxv">of</a> <a href="https://www.google.com/search?q=juantech+password">the</a> <a href="https://www.google.com/search?q=xc3511+password">strings</a>, we learn that the parameters are username/password tuples of default login credentials.</p>

<h2 id="setting-the-scene">Setting the Scene</h2>

<p>Automation allows us to spend less time with tedious and repetitive reverse engineering tasks. In this post, I tried to emphasize the thought process behind automation on the example of decrypting strings in malware. Starting with manual analysis, we first pinpointed interesting behavior: encrypted strings used as function parameters. Then, we put it into context by digging into the function, and learned that the strings are decrypted inside. By noticing a recurring pattern—that the function is called several times with different parameters—we developed an idea of how to automate the decryption. By using Binary Ninja’s decompiler API, we walked over all relevant function calls, parsed the parameters and decrypted the strings. In the end, 20 lines of code sufficed to improve the decompilation and achieve a much better understanding of the malware sample.</p>

<p>Even if you are just starting out, I encourage you to get familiar with the API that your tool of choice exposes, and to automate some of the tedious tasks you encounter during your day-to-day reversing. It is not only fun; reverse engineering also becomes so much easier.</p>]]></content><author><name>Tim Blazytko</name></author><category term="reversing" /><category term="malware" /><category term="automation" /><summary type="html"><![CDATA[Step-by-step automation of malware string decryption (Mirai) using Binary Ninja HLIL and scripting.]]></summary></entry><entry><title type="html">Introduction to Control-flow Graph Analysis</title><link href="https://synthesis.to/2021/03/15/control_flow_analysis.html" rel="alternate" type="text/html" title="Introduction to Control-flow Graph Analysis" /><published>2021-03-15T00:00:00+01:00</published><updated>2021-03-15T00:00:00+01:00</updated><id>https://synthesis.to/2021/03/15/control_flow_analysis</id><content type="html" xml:base="https://synthesis.to/2021/03/15/control_flow_analysis.html"><![CDATA[<p>Following my last blog post, I got a lot of questions about additional material on control-flow analysis. While most compiler books (such as the <a href="https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools">Dragon Book</a>) cover related topics in-depth, I decided to publish my own presentation that was initially built for (but never made it into) my <a href="/training_software_deobfuscation.html">training class on software deobfuscation</a>. The slide deck illustrates the theory of control-flow graph construction, dominance relations and loop analysis. In the second part of this post, I would like to show you how to play around with these concepts using the reverse engineering framework <a href="https://github.com/cea-sec/miasm">Miasm</a>.</p>

<ul>
  <li>
    <p>Presentation on <a href="/presentations/blazytko_control_flow_analysis.pdf">control-flow analysis</a></p>
  </li>
  <li>
    <p>Code for <a href="https://gist.github.com/mrphrazer/ee0edd2bd0c6a121011bd86ae11d5567">graph analysis in Miasm</a></p>
  </li>
</ul>

<h2 id="graph-analysis-in-miasm">Graph Analysis in Miasm</h2>

<p>For playing around with control-flow graphs, dominance relations and loop detection, we take the following graph as an example:</p>

<p class="center"><img src="/images/graph.svg" alt="graph" /></p>

<p>As described in my previous blog post on <a href="/2021/03/03/flattening_detection.html">automated detection of control-flow flattening</a>, this graph has an <em>entry</em> node <code class="language-plaintext highlighter-rouge">a</code> and a <em>loop</em> between <code class="language-plaintext highlighter-rouge">c</code>, <code class="language-plaintext highlighter-rouge">e</code> and <code class="language-plaintext highlighter-rouge">g</code>. To automatically determine all entries,  dominance relations and loops, we can use <a href="https://github.com/cea-sec/miasm/blob/master/miasm/core/graph.py">Miasm’s <code class="language-plaintext highlighter-rouge">DiGraph</code> class</a>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">miasm.core.graph</span> <span class="kn">import</span> <span class="n">DiGraph</span>

<span class="c1"># define edges
</span><span class="n">edges</span> <span class="o">=</span> <span class="p">[</span>
    <span class="p">(</span><span class="sh">"</span><span class="s">a</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">b</span><span class="sh">"</span><span class="p">),</span>
    <span class="p">(</span><span class="sh">"</span><span class="s">a</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">c</span><span class="sh">"</span><span class="p">),</span>
    <span class="p">(</span><span class="sh">"</span><span class="s">b</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">d</span><span class="sh">"</span><span class="p">),</span>
    <span class="p">(</span><span class="sh">"</span><span class="s">c</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">d</span><span class="sh">"</span><span class="p">),</span>
    <span class="p">(</span><span class="sh">"</span><span class="s">c</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">e</span><span class="sh">"</span><span class="p">),</span>
    <span class="p">(</span><span class="sh">"</span><span class="s">d</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">f</span><span class="sh">"</span><span class="p">),</span>
    <span class="p">(</span><span class="sh">"</span><span class="s">e</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">f</span><span class="sh">"</span><span class="p">),</span>
    <span class="p">(</span><span class="sh">"</span><span class="s">e</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">g</span><span class="sh">"</span><span class="p">),</span>
    <span class="p">(</span><span class="sh">"</span><span class="s">g</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">c</span><span class="sh">"</span><span class="p">),</span>
<span class="p">]</span>

<span class="c1"># init graph
</span><span class="n">g</span> <span class="o">=</span> <span class="nc">DiGraph</span><span class="p">()</span>

<span class="c1"># add edges
</span><span class="k">for</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="ow">in</span> <span class="n">edges</span><span class="p">:</span>
    <span class="n">g</span><span class="p">.</span><span class="nf">add_edge</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>

<span class="c1"># walk over graph entries
</span><span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">g</span><span class="p">.</span><span class="nf">heads</span><span class="p">():</span>
    <span class="c1"># dominators
</span>    <span class="n">dominators</span> <span class="o">=</span> <span class="n">g</span><span class="p">.</span><span class="nf">compute_dominators</span><span class="p">(</span><span class="n">entry</span><span class="p">)</span>

    <span class="c1"># dominator tree
</span>    <span class="n">dominator_tree</span> <span class="o">=</span> <span class="n">g</span><span class="p">.</span><span class="nf">compute_dominator_tree</span><span class="p">(</span><span class="n">entry</span><span class="p">)</span>

    <span class="c1"># natural loops
</span>    <span class="n">loops</span> <span class="o">=</span> <span class="n">g</span><span class="p">.</span><span class="nf">compute_natural_loops</span><span class="p">(</span><span class="n">entry</span><span class="p">)</span>

</code></pre></div></div>

<p>First, we initialize a <code class="language-plaintext highlighter-rouge">DiGraph</code> instance and add all edges to the graph. Then, we walk over all graph entry nodes and compute the dominance relations, the dominator tree and all loops in the graph. With some additional code for pretty printing (omitted here due to readability), we get the following output:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Graph entry: a


Dominators:
a: a
b: a, b
c: a, c
d: a, d
e: a, c, e
f: a, f
g: a, c, e, g


Dominator Tree:
a -&gt; b
a -&gt; c
a -&gt; d
a -&gt; f
c -&gt; e
e -&gt; g


Natural Loops
g -&gt; c: {a, c, e, g}
</code></pre></div></div>

<p>To apply these techniques to real-world binaries, we just wrap around some disassembling logic:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># init symbol table
</span><span class="n">loc_db</span> <span class="o">=</span> <span class="nc">LocationDB</span><span class="p">()</span>

<span class="c1"># open the binary for analysis
</span><span class="n">container</span> <span class="o">=</span> <span class="n">Container</span><span class="p">.</span><span class="nf">from_stream</span><span class="p">(</span><span class="nf">open</span><span class="p">(</span><span class="n">file_path</span><span class="p">,</span> <span class="sh">'</span><span class="s">rb</span><span class="sh">'</span><span class="p">),</span> <span class="n">loc_db</span><span class="p">)</span>

<span class="c1"># cpu abstraction
</span><span class="n">machine</span> <span class="o">=</span> <span class="nc">Machine</span><span class="p">(</span><span class="n">container</span><span class="p">.</span><span class="n">arch</span><span class="p">)</span>

<span class="c1"># init disassemble engine
</span><span class="n">mdis</span> <span class="o">=</span> <span class="n">machine</span><span class="p">.</span><span class="nf">dis_engine</span><span class="p">(</span><span class="n">container</span><span class="p">.</span><span class="n">bin_stream</span><span class="p">,</span> <span class="n">loc_db</span><span class="o">=</span><span class="n">loc_db</span><span class="p">)</span>

<span class="c1"># disassemble the function at address and get CFG
</span><span class="n">asm_cfg</span> <span class="o">=</span> <span class="n">mdis</span><span class="p">.</span><span class="nf">dis_multiblock</span><span class="p">(</span><span class="n">start_address</span><span class="p">)</span>
</code></pre></div></div>
<p>The code takes a binary as input (provided via <code class="language-plaintext highlighter-rouge">file_path</code>), initializes the disassembly engine and disassembles a function at <code class="language-plaintext highlighter-rouge">start_address</code>. Then, the resulting <code class="language-plaintext highlighter-rouge">asm_cfg</code> is a <code class="language-plaintext highlighter-rouge">DiGraph</code> instance holding the function’s control-flow graph. The entire script that re-implements the heuristic to detect control-flow flattening in Miasm can be found <a href="https://gist.github.com/mrphrazer/da32217f231e1dd842986f94aa6d9d37">here</a>.</p>]]></content><author><name>Tim Blazytko</name></author><category term="reversing" /><category term="graphanalysis" /><category term="miasm" /><summary type="html"><![CDATA[Intro to control-flow graph construction, dominance, and loop analysis, with slides and Miasm code examples.]]></summary></entry><entry><title type="html">Automated Detection of Control-flow Flattening</title><link href="https://synthesis.to/2021/03/03/flattening_detection.html" rel="alternate" type="text/html" title="Automated Detection of Control-flow Flattening" /><published>2021-03-03T00:00:00+01:00</published><updated>2021-03-03T00:00:00+01:00</updated><id>https://synthesis.to/2021/03/03/flattening_detection</id><content type="html" xml:base="https://synthesis.to/2021/03/03/flattening_detection.html"><![CDATA[<p>Commercial businesses and malware authors often use code obfuscation to protect specific code areas to impede reverse engineering. In my experience, knowing which code areas are obfuscated often pinpoints sensitive code parts that are worth a closer look. For example, the <a href="https://www.amnesty.org/en/latest/research/2020/09/german-made-finspy-spyware-found-in-egypt-and-mac-and-linux-versions-revealed/">FinSpy samples</a> that were discovered in September 2020 obfuscate their main modules with <a href="https://github.com/obfuscator-llvm/obfuscator/wiki">Obfuscator-LLVM</a>, while the two-staged dropper isn’t obfuscated at all.</p>

<p>Therefore, I would like to use this post to introduce a heuristic that automatically identifies one of the most common obfuscation techniques, control-flow flattening. However, before we approach this, we will have a look together at how reverse engineers use control-flow graphs, the impact of control-flow flattening on reverse engineering and a tiny bit of graph theory. Afterward, we have all the fundamentals to build a heuristic that detects flattening’s underlying structure and implement it within a few lines of Python. In the end, we investigate how the heuristic performs on obfuscated and non-obfuscated code and discuss its limitations.</p>

<p>If you would like to play around with it, you’ll find the code and all binaries on <a href="https://github.com/mrphrazer/obfuscation_detection">GitHub</a>.</p>

<h2 id="control-flow-reconstruction">Control-flow Reconstruction</h2>

<p>In many cases, we can reconstruct a function’s control flow without any code analysis, just by simply looking at its <em>control-flow graph</em>. For instance, consider the following graph:</p>

<p class="center"><img src="/images/cfg_diamond.svg" alt="graph" /></p>

<p>As we can see, basic block <code class="language-plaintext highlighter-rouge">a</code> has two outgoing edges: one to <code class="language-plaintext highlighter-rouge">b</code> and one to <code class="language-plaintext highlighter-rouge">c</code>. This is a typical pattern for a conditional branch, where we jump to one basic block (let’s say to <code class="language-plaintext highlighter-rouge">b</code>) in the <code class="language-plaintext highlighter-rouge">true</code> case and to the other one (to <code class="language-plaintext highlighter-rouge">c</code>) in the <code class="language-plaintext highlighter-rouge">false</code> case. While <code class="language-plaintext highlighter-rouge">b</code> and <code class="language-plaintext highlighter-rouge">c</code> most likely perform different calculations, their control flow <em>joins</em> in the basic block <code class="language-plaintext highlighter-rouge">d</code>. On the high-level, this is a typical <code class="language-plaintext highlighter-rouge">if-then-else</code> pattern. Thus, we can recover the control flow without reading a single assembly line:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* type */</span> <span class="n">func</span><span class="p">(</span><span class="cm">/* parameters */</span><span class="p">)</span> <span class="p">{</span>
    <span class="cm">/* block A */</span>
    <span class="k">if</span> <span class="p">(</span><span class="cm">/* jump condition */</span><span class="p">)</span> <span class="p">{</span>
        <span class="cm">/* block B */</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="cm">/* block C */</span>
    <span class="p">}</span>
    <span class="cm">/* block D */</span>
<span class="p">}</span>
</code></pre></div></div>

<p>After we recovered the high-level structure, we can reconstruct the code itself, basic block per basic block. In this case, let’s just assume that we recover the following function:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">check</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span>
    <span class="cm">/* block A */</span>
    <span class="kt">int</span> <span class="n">r</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">x</span> <span class="o">&lt;</span> <span class="mi">10</span><span class="p">)</span> <span class="p">{</span>
        <span class="cm">/* block B */</span>
        <span class="n">r</span> <span class="o">=</span> <span class="mi">20</span><span class="p">;</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
        <span class="cm">/* block C */</span>
        <span class="n">r</span> <span class="o">=</span> <span class="mi">30</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="cm">/* block D */</span>
    <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The function <code class="language-plaintext highlighter-rouge">check</code> takes a parameter <code class="language-plaintext highlighter-rouge">x</code> as input and checks if it is smaller than 10. In this case, the function returns 20, otherwise 30. This was easy, huh? Well, let’s have a look at control-flow flattening then.</p>

<h2 id="control-flow-flattening">Control-flow Flattening</h2>

<p><a href="https://tigress.wtf/flatten.html">Control-flow flattening</a> is a code transformation that removes structure from a control-flow graph. As a result, the control flow cannot be easily recovered as before. For instance, consider the following graph:</p>

<p class="center"><img src="/images/cfg_flattened.svg" alt="graph" /></p>

<p>If we apply the same method to recover the control flow, we now only see that the blocks <code class="language-plaintext highlighter-rouge">a</code>, <code class="language-plaintext highlighter-rouge">b</code>, <code class="language-plaintext highlighter-rouge">c</code> and <code class="language-plaintext highlighter-rouge">d</code> are connected with a block labeled  <code class="language-plaintext highlighter-rouge">dispatcher</code>. However, we do not know how these blocks relate to each other:</p>

<ul>
  <li>Is block <code class="language-plaintext highlighter-rouge">a</code> visited before or after block <code class="language-plaintext highlighter-rouge">b</code>?</li>
  <li>Is there any conditional branch?</li>
  <li>Is there any loop?</li>
</ul>

<p>In other words, we do not get any <em>meaningful</em> information by looking only at the graph, since all blocks are on the same level; they are <em>flattened</em>.</p>

<p>In its most basic form, control-flow flattening has a central basic block—the <em>dispatcher</em>—that directs the control flow to the individual blocks. This is realized by a <em>state variable</em> that tracks which block will be visited next. The <em>entry</em> initially sets the state variable to the first basic block—in this case <code class="language-plaintext highlighter-rouge">a</code>; then, each block updates the state variable in correspondence to the underlying program logic. In other words, control-flow flattening can be considered as a <em>state machine</em> in which each basic block represents a single state.</p>

<p>On the code level, this can be realized by an endless loop that dispatches a state variable in a <code class="language-plaintext highlighter-rouge">switch</code> statement. For example, we can represent our function <code class="language-plaintext highlighter-rouge">check</code> as follows:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">flattened</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span>
    <span class="cm">/* entry */</span>
    <span class="kt">int</span> <span class="n">r</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">state</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

    <span class="cm">/* dispatcher */</span>
    <span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">){</span>
        <span class="k">switch</span><span class="p">(</span><span class="n">state</span><span class="p">)</span> <span class="p">{</span>
            <span class="k">case</span> <span class="mi">0</span><span class="p">:</span> <span class="p">{</span>
                <span class="cm">/* block A */</span>
                <span class="k">if</span> <span class="p">(</span><span class="n">x</span> <span class="o">&lt;</span> <span class="mi">10</span><span class="p">)</span> <span class="p">{</span> 
                    <span class="n">state</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
                <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
                    <span class="n">state</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
                <span class="p">}</span>
                <span class="k">break</span><span class="p">;</span>
            <span class="p">}</span>
            <span class="k">case</span> <span class="mi">1</span><span class="p">:</span> <span class="p">{</span>
                <span class="cm">/* block B */</span>
                <span class="n">r</span> <span class="o">=</span> <span class="mi">20</span><span class="p">;</span>
                <span class="n">state</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
                <span class="k">break</span><span class="p">;</span>
            <span class="p">}</span>
            <span class="k">case</span> <span class="mi">2</span><span class="p">:</span> <span class="p">{</span>
                <span class="cm">/* block C */</span>
                <span class="n">r</span> <span class="o">=</span> <span class="mi">30</span><span class="p">;</span>
                <span class="n">state</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
                <span class="k">break</span><span class="p">;</span>
            <span class="p">}</span>
            <span class="k">case</span> <span class="mi">3</span><span class="p">:</span> <span class="p">{</span>
                <span class="cm">/* block D */</span>
                <span class="k">return</span> <span class="n">r</span><span class="p">;</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Remember, each block represents a state that is identified via a variable (here: <code class="language-plaintext highlighter-rouge">state</code>). After initializing it with 0, we enter the dispatching loop until we resolve the case for 0. The first state—basic block <code class="language-plaintext highlighter-rouge">a</code>—checks if <code class="language-plaintext highlighter-rouge">x</code> is smaller than 10 and sets <code class="language-plaintext highlighter-rouge">state</code> according to the original control flow: if the condition is true, <code class="language-plaintext highlighter-rouge">state</code> becomes 2, which directs the control flow to block <code class="language-plaintext highlighter-rouge">b</code>, otherwise to <code class="language-plaintext highlighter-rouge">c</code> (by setting <code class="language-plaintext highlighter-rouge">state</code> to 3). Afterward, both blocks <code class="language-plaintext highlighter-rouge">b</code> and <code class="language-plaintext highlighter-rouge">c</code> then set the state variable to 3, realizing the joined control flow in block <code class="language-plaintext highlighter-rouge">d</code>.</p>

<p>While we cannot recover the control flow by simply looking at the graph, control-flow flattening can often be broken in a simple manner: by looking at the code and analyzing the state transitions. In <em>static</em> analysis scenarios, this works as follows: First, we identify the state variable. Second, we look how it is initialized and follow the dispatching logic until we find the corresponding block. Since this block updates the state variable in accordance to the control flow, we know which state will be executed next. If we repeat this process for each block and track the updates to the state variable, we can consecutively reconstruct the original control flow. In <em>dynamic</em> scenarios, this process is even easier, since it is often <em>sufficient</em> to know the code relevant for dispatching. Since the dispatcher is called after each state transition, we can observe the interaction between the states without any efforts. As a matter of fact, control-flow flattening can be automatically removed in practice; <a href="https://blog.quarkslab.com/deobfuscation-recovering-an-ollvm-protected-program.html">this excellent blog post by Quarkslab</a>  as well as <a href="https://github.com/eset/stadeo">Stadeo</a> are only two examples that automate this for specific flattening implementations via <a href="https://en.wikipedia.org/wiki/Symbolic_execution">symbolic execution</a>.</p>

<p>Similarly, it is easy to automatically detect generic instances of control-flow flattening based on its underlying structure. Before we go into the details, we will first cover a bit of graph theory to better understand the subsequent sections.</p>

<h2 id="control-flow-analysis">Control-flow Analysis</h2>

<p>Even though we use control-flow graphs all the time, we may have never thought about what they really represent: Intuitively, a <a href="https://en.wikipedia.org/wiki/Control-flow_graph">control-flow graph</a> represents all paths in a function that can be traversed during program execution. Such a graph has some nice mathematical properties that are valuable for any kind of code analysis, be it for compiler optimizations, for decompilation or for reverse engineering in general. In the following, we will work towards the goal of automatically detecting loops in control-flow graphs. For this, we first discuss some basic terminology; then, we introduce the concept of dominance relations. For a better understanding, let’s keep it simple and demonstrate the concepts by means of the following graph:</p>

<p class="center"><img src="/images/graph.svg" alt="graph" /></p>

<p>We see a graph with seven nodes and nine edges; one node (<code class="language-plaintext highlighter-rouge">a</code>) has no incoming edges, another node (<code class="language-plaintext highlighter-rouge">f</code>) no outgoing edges. We call <code class="language-plaintext highlighter-rouge">a</code> an <em>entry</em> and <code class="language-plaintext highlighter-rouge">f</code> an <em>exit</em> node. Furthermore, a chain of transitions between nodes is called a <em>path</em>. If we are looking at the chain of transitions <code class="language-plaintext highlighter-rouge">a -&gt; b -&gt; d -&gt; f</code>, it is a path between <code class="language-plaintext highlighter-rouge">a</code> and <code class="language-plaintext highlighter-rouge">f</code>. We further say that a <em>loop</em> is a subgraph in which each node can be reached from each other node in the subgraph. In our example, <code class="language-plaintext highlighter-rouge">c</code>, <code class="language-plaintext highlighter-rouge">e</code> and <code class="language-plaintext highlighter-rouge">g</code> form a loop, since we can walk from <code class="language-plaintext highlighter-rouge">c</code> to <code class="language-plaintext highlighter-rouge">c</code> via <code class="language-plaintext highlighter-rouge">c -&gt; e -&gt; g -&gt; c</code>; analogously, we can walk from <code class="language-plaintext highlighter-rouge">e</code> to <code class="language-plaintext highlighter-rouge">e</code> and from <code class="language-plaintext highlighter-rouge">g</code> to <code class="language-plaintext highlighter-rouge">g</code>. In the following, we introduce the concept of dominance relations that helps us to identify loops automatically.</p>

<h3 id="dominance-relations">Dominance Relations</h3>

<p><a href="https://en.wikipedia.org/wiki/Dominator_(graph_theory)">Dominance relations</a> describe the hierarchical relations between nodes. In particular, they describe which basic blocks are executed before others. More formally, a <em>dominator</em> <code class="language-plaintext highlighter-rouge">x</code> guarantees that a basic block <code class="language-plaintext highlighter-rouge">x</code> is <em>always executed</em> before a basic block <code class="language-plaintext highlighter-rouge">y</code>. In this case, we say that a node <code class="language-plaintext highlighter-rouge">x</code> <em>dominates</em>  a node <code class="language-plaintext highlighter-rouge">y</code> if <em>all</em> paths between the entry node and <code class="language-plaintext highlighter-rouge">y</code> go through <code class="language-plaintext highlighter-rouge">x</code>. In other words, if there is a path from <em>entry</em> to <code class="language-plaintext highlighter-rouge">y</code> that does <em>not</em> go through <code class="language-plaintext highlighter-rouge">x</code>, then <code class="language-plaintext highlighter-rouge">x</code> is no dominator of <code class="language-plaintext highlighter-rouge">y</code>. By design, each node dominates itself.</p>

<p>Given that, we can determine the following dominance relations in our graph. For each node <code class="language-plaintext highlighter-rouge">x</code>, we list all nodes that are dominated by <code class="language-plaintext highlighter-rouge">x</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>node x: nodes dominated by x

a:      a,b,c,d,e,f
b:      b
c:      c, e, g
d:      d
e:      e, g
f:      f
g:      g
</code></pre></div></div>

<p>Since <code class="language-plaintext highlighter-rouge">a</code> is the graph’s entry, it dominates all the other nodes. <code class="language-plaintext highlighter-rouge">c</code> dominates <code class="language-plaintext highlighter-rouge">c</code>, <code class="language-plaintext highlighter-rouge">e</code> and <code class="language-plaintext highlighter-rouge">g</code>, since all paths from <code class="language-plaintext highlighter-rouge">a</code> to <code class="language-plaintext highlighter-rouge">e</code> go through <code class="language-plaintext highlighter-rouge">c</code> and all paths from <code class="language-plaintext highlighter-rouge">a</code> to <code class="language-plaintext highlighter-rouge">g</code> go through <code class="language-plaintext highlighter-rouge">g</code>. Consequently, <code class="language-plaintext highlighter-rouge">e</code> also dominates <code class="language-plaintext highlighter-rouge">g</code>. However, the nodes <code class="language-plaintext highlighter-rouge">b</code> and <code class="language-plaintext highlighter-rouge">d</code>, <code class="language-plaintext highlighter-rouge">f</code> and <code class="language-plaintext highlighter-rouge">g</code> only dominate themselves. <code class="language-plaintext highlighter-rouge">b</code> dominates only itself since <code class="language-plaintext highlighter-rouge">d</code> can also be reached through <code class="language-plaintext highlighter-rouge">c</code>; the same holds for <code class="language-plaintext highlighter-rouge">d</code>, since <code class="language-plaintext highlighter-rouge">f</code> is also reachable through <code class="language-plaintext highlighter-rouge">e</code>. Finally, <code class="language-plaintext highlighter-rouge">f</code> and <code class="language-plaintext highlighter-rouge">g</code> only dominate themselves since they are exit nodes.</p>

<p>Okay, what now? What does all of this mean? Intuitively, this shows us that <code class="language-plaintext highlighter-rouge">a</code> will always be executed first and <code class="language-plaintext highlighter-rouge">f</code> last. Since <code class="language-plaintext highlighter-rouge">c</code> dominates <code class="language-plaintext highlighter-rouge">g</code>, we also know that <code class="language-plaintext highlighter-rouge">c</code> is always executed before <code class="language-plaintext highlighter-rouge">g</code>. Going a step further, we can say that there is an <em>execution hierarchy</em>: <code class="language-plaintext highlighter-rouge">a</code> is executed before <code class="language-plaintext highlighter-rouge">c</code>, <code class="language-plaintext highlighter-rouge">c</code> is executed before <code class="language-plaintext highlighter-rouge">e</code> and <code class="language-plaintext highlighter-rouge">e</code> is executed before <code class="language-plaintext highlighter-rouge">g</code>. In this hierarchy, we can introduce an important concept: immediate dominators.</p>

<h3 id="immediate-dominator">Immediate Dominator</h3>

<p>We say that the <em>closest</em> node <code class="language-plaintext highlighter-rouge">x</code> that dominates a node <code class="language-plaintext highlighter-rouge">y</code>—where <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> are distinct—is the <em>immediate dominator</em> of <code class="language-plaintext highlighter-rouge">y</code>. In our graph, we know that <code class="language-plaintext highlighter-rouge">c</code> dominates <code class="language-plaintext highlighter-rouge">c</code>, <code class="language-plaintext highlighter-rouge">e</code> and <code class="language-plaintext highlighter-rouge">g</code>. Since <code class="language-plaintext highlighter-rouge">c</code> is the closest dominator to <code class="language-plaintext highlighter-rouge">e</code>, it is its immediate dominator. Similarly, since <code class="language-plaintext highlighter-rouge">e</code> dominates <code class="language-plaintext highlighter-rouge">g</code> and it is the closest dominator to <code class="language-plaintext highlighter-rouge">g</code>, we say <code class="language-plaintext highlighter-rouge">e</code> is the immediate dominator of <code class="language-plaintext highlighter-rouge">g</code>. Applying this to the whole graph, we end up with the following relations:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>node x: x is immediate dominator of nodes
a:      b, c, d, f
b:      -
c:      e
d:      -
e:      g
f:      -
g:      -
</code></pre></div></div>

<p>Basically, we see that <code class="language-plaintext highlighter-rouge">a</code> is the immediate dominator of <code class="language-plaintext highlighter-rouge">b</code>, <code class="language-plaintext highlighter-rouge">c</code>, <code class="language-plaintext highlighter-rouge">d</code> and <code class="language-plaintext highlighter-rouge">f</code>, while <code class="language-plaintext highlighter-rouge">c</code> is the immediate dominator of <code class="language-plaintext highlighter-rouge">e</code> and <code class="language-plaintext highlighter-rouge">e</code> of <code class="language-plaintext highlighter-rouge">g</code>. Now, we can introduce a compact representation of these dominance relations, the dominator tree.</p>

<h3 id="dominator-tree">Dominator Tree</h3>

<p>A <em>dominator tree</em> is a compact representation of dominance relations; it represents them in a tree-like graph. To build the dominator tree, we simply consider all immediate dominators as edges. Therefore, we can construct the tree by using the following edges for <code class="language-plaintext highlighter-rouge">a</code>:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">a -&gt; b</code></li>
  <li><code class="language-plaintext highlighter-rouge">a -&gt; c</code></li>
  <li><code class="language-plaintext highlighter-rouge">a -&gt; d</code></li>
  <li><code class="language-plaintext highlighter-rouge">a -&gt; f</code></li>
</ul>

<p>Further, we add the edges for <code class="language-plaintext highlighter-rouge">c</code> and <code class="language-plaintext highlighter-rouge">e</code>:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">c -&gt; e</code></li>
  <li><code class="language-plaintext highlighter-rouge">e -&gt; g</code></li>
</ul>

<p>Then, the final dominator tree looks as follows:</p>

<p class="center"><img src="/images/domtree.svg" alt="domtree" /></p>

<p>This tree reveals the dominance relations and hierarchy in a compact form. Especially, we see that <code class="language-plaintext highlighter-rouge">a</code> dominates the whole graph, while <code class="language-plaintext highlighter-rouge">c</code> dominates <code class="language-plaintext highlighter-rouge">e</code> and <code class="language-plaintext highlighter-rouge">g</code>. With this representation, we now can easily detect loops in the graph.</p>

<h3 id="loop-detection">Loop Detection</h3>

<p>Based on the dominator tree in the control-flow graph, we can automatically identify <em>natural loops</em> by looking for <em>back edges</em>: If we find an edge from a node <em>back</em> to one of its dominators, we identified a loop. We say that the dominator <em>controls</em> the loop since it dominates all blocks in it. In our graph, we see an edge from <code class="language-plaintext highlighter-rouge">g</code> to <code class="language-plaintext highlighter-rouge">c</code>. Since <code class="language-plaintext highlighter-rouge">c</code> dominates <code class="language-plaintext highlighter-rouge">g</code>, we have a loop between <code class="language-plaintext highlighter-rouge">c</code>, <code class="language-plaintext highlighter-rouge">e</code> and <code class="language-plaintext highlighter-rouge">g</code>, where <code class="language-plaintext highlighter-rouge">c</code> controls it. More intuitively, we can say that the dominance relations guarantee us that <code class="language-plaintext highlighter-rouge">c</code> is executed before <code class="language-plaintext highlighter-rouge">g</code>; in other words, we can reach <code class="language-plaintext highlighter-rouge">g</code> from <code class="language-plaintext highlighter-rouge">c</code>. Since we also can reach <code class="language-plaintext highlighter-rouge">c</code> from <code class="language-plaintext highlighter-rouge">g</code>, we have a loop.</p>

<p>To sum up, we can identify loops by looking for back edges to dominators. Now, we make use of this concept and build a heuristic to automatically identify control-flow flattening.</p>

<h2 id="control-flow-flattening-detection">Control-flow Flattening Detection</h2>

<p>Before we talked about control-flow analysis, we said that control-flow flattening removes structure from a control-flow graph by introducing a state machine; this is realized by dispatching a variable that keeps track of the control flow in an endless loop. The following graph is a representation of such a state machine on the binary level:</p>

<p class="center"><img src="/images/cfg_statemachine.svg" alt="graph" /></p>

<p>While this dispatcher directs the control flow to the next basic block via indirect control-flow transfers (using jump tables), other dispatchers are implemented as as a <a href="https://en.wikipedia.org/wiki/Binary_search_tree">binary search tree</a>. In most cases, the individual states merge in a single block branching to the dispatcher.</p>

<p>To automatically identify these constructs, we design a heuristic that tells us <em>how likely</em> a graph’s structure is similar to those of a flattened graph. To achieve that, we want to know</p>

<ol>
  <li>
    <p>if there is a <em>back edge</em> to a dominator …</p>
  </li>
  <li>
    <p>… that dominates <em>most of the basic blocks</em> in the function</p>
  </li>
</ol>

<p>In other words, we are looking for a block that controls a loop and also dominates large parts of the function. We express this as a score between 0 and 1 by looking for a back edge to a dominator <code class="language-plaintext highlighter-rouge">x</code> and divide the number of basic blocks that are dominated by <code class="language-plaintext highlighter-rouge">x</code> by the total number of basic blocks in the graph. In short, we define the <em>flattening score</em> as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># basic blocks dominated by x
------------------------------
# basic blocks in the function
</code></pre></div></div>

<p>To apply the heuristic to the graph above, we divide the number of basic blocks dominated by the dispatcher (9) by the total number of basic blocks in the function (10) and obtain a score of 0.9.</p>

<p>Now that we know how the heuristic works, we can implement this in a few lines of Python on top of <a href="https://binary.ninja/">Binary Ninja</a>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">calc_flattening_score</span><span class="p">(</span><span class="n">function</span><span class="p">):</span>
    <span class="n">score</span> <span class="o">=</span> <span class="mf">0.0</span>
    <span class="c1"># 1: walk over all basic blocks
</span>    <span class="k">for</span> <span class="n">block</span> <span class="ow">in</span> <span class="n">function</span><span class="p">.</span><span class="n">basic_blocks</span><span class="p">:</span> 
        <span class="c1"># 2: get all blocks that are dominated by the current block
</span>        <span class="n">dominated</span> <span class="o">=</span> <span class="nf">get_dominated_by</span><span class="p">(</span><span class="n">block</span><span class="p">)</span>
        <span class="c1"># 3: check for a back edge
</span>        <span class="k">if</span> <span class="ow">not</span> <span class="nf">any</span><span class="p">([</span><span class="n">edge</span><span class="p">.</span><span class="n">source</span> <span class="ow">in</span> <span class="n">dominated</span> <span class="k">for</span> <span class="n">edge</span> <span class="ow">in</span> <span class="n">block</span><span class="p">.</span><span class="n">incoming_edges</span><span class="p">]):</span>
            <span class="k">continue</span>
        <span class="c1"># 4: calculate relation of dominated blocks to the blocks in the graph
</span>        <span class="n">score</span> <span class="o">=</span> <span class="nf">max</span><span class="p">(</span><span class="n">score</span><span class="p">,</span> <span class="nf">len</span><span class="p">(</span><span class="n">dominated</span><span class="p">)</span><span class="o">/</span><span class="nf">len</span><span class="p">(</span><span class="n">function</span><span class="p">.</span><span class="n">basic_blocks</span><span class="p">))</span>
    <span class="k">return</span> <span class="n">score</span>
</code></pre></div></div>

<p>The implementation works as follows:</p>

<ol>
  <li>
    <p>Walk over all basic blocks in the current function.</p>
  </li>
  <li>
    <p>Get all basic blocks dominated by the current block using <code class="language-plaintext highlighter-rouge">get_dominated_by(block)</code>.</p>
  </li>
  <li>
    <p>Check if the dominator has an incoming edge from any of the blocks that it dominates.</p>
  </li>
  <li>
    <p>Calculate the flattening score by dividing the number of dominated blocks by the number of basic blocks in the function.</p>
  </li>
</ol>

<p>To obtain all blocks that are dominated by a given block, we make use of Binary Ninja’s dominator API and perform a <a href="https://en.wikipedia.org/wiki/Depth-first_search">depth-first search</a> on the dominator tree:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">get_dominated_by</span><span class="p">(</span><span class="n">dominator</span><span class="p">):</span>
    <span class="c1"># 1: initialize worklist
</span>    <span class="n">result</span> <span class="o">=</span> <span class="nf">set</span><span class="p">()</span>
    <span class="n">worklist</span> <span class="o">=</span> <span class="p">[</span><span class="n">dominator</span><span class="p">]</span>
    <span class="c1"># 2: perform a depth-first search on the dominator tree
</span>    <span class="k">while</span> <span class="n">worklist</span><span class="p">:</span>
        <span class="c1"># get next block
</span>        <span class="n">block</span> <span class="o">=</span> <span class="n">worklist</span><span class="p">.</span><span class="nf">pop</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
        <span class="c1"># add to result
</span>        <span class="n">result</span><span class="p">.</span><span class="nf">add</span><span class="p">(</span><span class="n">block</span><span class="p">)</span>
        <span class="c1"># add children from dominator tree to worklist
</span>        <span class="k">for</span> <span class="n">child</span> <span class="ow">in</span> <span class="n">block</span><span class="p">.</span><span class="n">dominator_tree_children</span><span class="p">:</span>
            <span class="n">worklist</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">child</span><span class="p">)</span>      
    <span class="k">return</span> <span class="n">result</span>
</code></pre></div></div>

<p>After initializing the worklist, we iteratively visit all children in the dominator tree. In the end, we return the set of all descendants in the dominator tree for a given block.</p>

<p>Finally, we apply this heuristic to all functions. For this, we walk over all functions, calculate their flattening scores and filter those whose score is higher or equal to 0.9.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">find_flattened_functions</span><span class="p">():</span>
    <span class="c1"># walk over all functions
</span>    <span class="k">for</span> <span class="n">function</span> <span class="ow">in</span> <span class="n">bv</span><span class="p">.</span><span class="n">functions</span><span class="p">:</span>
        <span class="c1"># calculate flattening score
</span>        <span class="n">score</span> <span class="o">=</span> <span class="nf">calc_flattening_score</span><span class="p">(</span><span class="n">function</span><span class="p">)</span>
        <span class="c1"># skip if score is too low
</span>        <span class="k">if</span> <span class="n">score</span> <span class="o">&lt;</span> <span class="mf">0.9</span><span class="p">:</span>
            <span class="k">continue</span>
        <span class="c1"># print function and score
</span>        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Function </span><span class="si">{</span><span class="nf">hex</span><span class="p">(</span><span class="n">function</span><span class="p">.</span><span class="n">start</span><span class="p">)</span><span class="si">}</span><span class="s"> has a flattening score of </span><span class="si">{</span><span class="n">score</span><span class="si">}</span><span class="s">.</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p>To sum up, we designed a simple but powerful heuristic for the detection of control-flow flattening that can be implemented within a few lines of Python. Now, we analyze how this heuristic performs on obfuscated and non-obfuscated code.</p>

<h2 id="evaluation">Evaluation</h2>

<p>To get a feeling of how good the heuristic works, we evaluate it on malware samples as well as on common non-obfuscated programs. Intuitively, we expect to pinpoint functions that implement control-flow flattening or other complex state machines which share a similar graph structure. So let’s check out how many functions we spot (relative to the amount of functions in the binary) and, in particular, what kind of state machines we identify.</p>

<h3 id="malware">Malware</h3>

<p>Starting with malware, we use three samples in total: two of them are obfuscated with control-flow flattening, the third sample is not obfuscated at all.
One sample, <a href="https://www.virustotal.com/gui/file/b4b13ecd68badacfdfadf3f01f0a6fe44056c82261668b49de43cc500a9abe6a/details">FinSpy</a>, uses the control-flow flattening from <a href="https://github.com/obfuscator-llvm/obfuscator/wiki">Obfuscator-LLVM</a>, the other, <a href="https://www.virustotal.com/gui/file/3261111211bb0de7eb9ccb14997865913e8841ec1a5701988102105bba4e4aba/detection">Emotet</a>, ships a custom flattening implementation. The third, unobfuscated sample is a version of <a href="https://www.virustotal.com/gui/file/74e1e83ac69e45a3bee78ac2fac00f9e897f281ea75ed179737e9b6fe39971e3/detection">PlugX</a>.</p>

<p><a href="https://malpedia.caad.fkie.fraunhofer.de/details/win.finfisher">FinSpy</a> is a remote access trojan being used for espionage in various countries. For this sample, the heuristic identifies 10 out of 57 functions in the binary (17.54%). While most of the functions are flattened, the only non-flattened function is of particular interest: in an endless loop, it performs anti-debug checks and terminates the application if any of the checks fail. All other functions share a similar flattening structure; the following snippet is created with Binary Ninja’s decompiler and serves as a example for the others. In this snippet,  <code class="language-plaintext highlighter-rouge">var_18</code> is used as a state variable.</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">uint64_t</span> <span class="nf">sub_405270</span><span class="p">(</span><span class="kt">int64_t</span><span class="o">*</span> <span class="n">arg1</span><span class="p">)</span> <span class="p">{</span>

  <span class="kt">int32_t</span> <span class="n">var_14</span> <span class="o">=</span> <span class="mi">1</span>
  <span class="kt">int32_t</span> <span class="n">var_18</span> <span class="o">=</span> <span class="mh">0x16a0bd81</span>
  <span class="k">while</span> <span class="p">(</span><span class="nb">true</span><span class="p">)</span> <span class="p">{</span>
      <span class="kt">int32_t</span> <span class="n">var_20_1</span> <span class="o">=</span> <span class="n">var_18</span> <span class="o">-</span> <span class="mh">0xe07f7fa9</span>
      <span class="k">if</span> <span class="p">(</span><span class="n">var_18</span> <span class="o">==</span> <span class="mh">0xe07f7fa9</span><span class="p">)</span> <span class="p">{</span>
          <span class="kt">int32_t</span> <span class="n">rcx_7</span> <span class="o">=</span> <span class="mh">0x74c24cdf</span>
          <span class="k">if</span> <span class="p">(</span><span class="n">sub_405150</span><span class="p">(</span><span class="n">arg1</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
              <span class="n">rcx_7</span> <span class="o">=</span> <span class="mh">0x16a0bd81</span>
          <span class="p">}</span>
          <span class="n">var_18</span> <span class="o">=</span> <span class="n">rcx_7</span>
      <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
          <span class="kt">int32_t</span> <span class="n">var_24_1</span> <span class="o">=</span> <span class="n">var_18</span> <span class="o">-</span> <span class="mh">0x16a0bd81</span>
          <span class="k">if</span> <span class="p">(</span><span class="n">var_18</span> <span class="o">==</span> <span class="mh">0x16a0bd81</span><span class="p">)</span> <span class="p">{</span>
              <span class="n">var_14</span> <span class="o">=</span> <span class="p">(</span><span class="n">var_14</span> <span class="o">&lt;&lt;</span> <span class="mi">1</span><span class="p">)</span> <span class="o">-</span> <span class="mh">0x55aabee5</span> <span class="o">+</span> <span class="n">sub_405150</span><span class="p">(</span><span class="n">arg1</span><span class="p">)</span> <span class="o">+</span> <span class="mh">0x55aabee5</span>
              <span class="n">var_18</span> <span class="o">=</span> <span class="mh">0xe07f7fa9</span>
          <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
              <span class="kt">int32_t</span> <span class="n">var_28_1</span> <span class="o">=</span> <span class="n">var_18</span> <span class="o">-</span> <span class="mh">0x74c24cdf</span>
              <span class="k">if</span> <span class="p">(</span><span class="n">var_18</span> <span class="o">==</span> <span class="mh">0x74c24cdf</span><span class="p">)</span> <span class="p">{</span>
                  <span class="k">break</span>
              <span class="p">}</span>
          <span class="p">}</span>
      <span class="p">}</span>
  <span class="p">}</span>
  <span class="k">return</span> <span class="n">zx</span><span class="p">.</span><span class="n">q</span><span class="p">(</span><span class="n">var_14</span><span class="p">)</span>
<span class="p">}</span>

</code></pre></div></div>

<p>For the <a href="https://en.wikipedia.org/wiki/Emotet">Emotet</a> sample, the heuristic identifies 57 out of 138 functions in the binary (41.30%). While a handful of non-flattened functions decode memory areas, the vast majority share a flattening structure similar to the following snippet, in which <code class="language-plaintext highlighter-rouge">eax</code> is the state variable:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int32_t</span> <span class="kr">__fastcall</span> <span class="nf">sub_40b370</span><span class="p">(</span><span class="kt">int32_t</span><span class="o">*</span> <span class="n">arg1</span><span class="p">,</span> <span class="kt">int32_t</span><span class="o">*</span> <span class="n">arg2</span><span class="p">)</span> <span class="p">{</span>

  <span class="kt">int32_t</span><span class="o">*</span> <span class="n">var_10</span>
  <span class="kt">int32_t</span><span class="o">*</span> <span class="n">esi</span> <span class="o">=</span> <span class="n">var_10</span>
  <span class="kt">void</span><span class="o">*</span> <span class="n">var_18</span>
  <span class="kt">void</span><span class="o">*</span> <span class="n">edi</span> <span class="o">=</span> <span class="n">var_18</span>
  <span class="kt">int32_t</span> <span class="n">eax</span> <span class="o">=</span> <span class="mh">0x3b2a39f6</span>
  <span class="k">do</span> <span class="p">{</span>
      <span class="n">bool</span> <span class="n">cond</span><span class="o">:</span><span class="mi">0</span><span class="n">_1</span> <span class="o">=</span> <span class="n">eax</span> <span class="o">==</span> <span class="mh">0x2637511e</span>
      <span class="k">if</span> <span class="p">(</span><span class="n">eax</span> <span class="n">s</span><span class="o">&lt;=</span> <span class="mh">0x2637511e</span><span class="p">)</span> <span class="p">{</span>
          <span class="k">if</span> <span class="p">(</span><span class="n">cond</span><span class="o">:</span><span class="mi">0</span><span class="n">_1</span><span class="p">)</span> <span class="p">{</span>
              <span class="kt">void</span><span class="o">*</span> <span class="n">ecx_1</span> <span class="o">=</span> <span class="n">esi</span> <span class="o">+</span> <span class="mi">4</span>
              <span class="k">if</span> <span class="p">(</span><span class="n">ecx_1</span> <span class="n">u</span><span class="o">&lt;=</span> <span class="n">edi</span><span class="p">)</span> <span class="p">{</span>
                  <span class="kt">void</span><span class="o">*</span> <span class="n">eax_3</span> <span class="o">=</span> <span class="o">*</span><span class="n">esi</span>
                  <span class="o">*</span><span class="p">(</span><span class="n">arg2</span> <span class="o">+</span> <span class="mi">8</span><span class="p">)</span> <span class="o">=</span> <span class="n">eax_3</span>
                  <span class="k">if</span> <span class="p">(</span><span class="n">eax_3</span> <span class="o">+</span> <span class="n">ecx_1</span> <span class="n">u</span><span class="o">&lt;=</span> <span class="n">edi</span><span class="p">)</span> <span class="p">{</span>
                      <span class="o">*</span><span class="p">(</span><span class="n">arg2</span> <span class="o">+</span> <span class="mi">4</span><span class="p">)</span> <span class="o">=</span> <span class="n">ecx_1</span>
                      <span class="k">return</span> <span class="mi">1</span>
                  <span class="p">}</span>
              <span class="p">}</span>
              <span class="k">break</span>
          <span class="p">}</span>
          <span class="k">if</span> <span class="p">(</span><span class="n">eax</span> <span class="o">==</span> <span class="mh">0x110d5577</span><span class="p">)</span> <span class="p">{</span>
              <span class="kt">int32_t</span><span class="o">*</span> <span class="n">ecx</span> <span class="o">=</span> <span class="n">esi</span> <span class="o">+</span> <span class="mi">4</span>
              <span class="k">if</span> <span class="p">(</span><span class="n">ecx</span> <span class="n">u</span><span class="o">&gt;</span> <span class="n">edi</span><span class="p">)</span> <span class="p">{</span>
                  <span class="k">break</span>
              <span class="p">}</span>
              <span class="kt">int32_t</span> <span class="n">eax_1</span> <span class="o">=</span> <span class="o">*</span><span class="n">esi</span>
              <span class="n">esi</span> <span class="o">=</span> <span class="n">ecx</span>
              <span class="o">*</span><span class="n">arg2</span> <span class="o">=</span> <span class="n">eax_1</span>
              <span class="n">eax</span> <span class="o">=</span> <span class="mh">0x2637511e</span>
              <span class="k">continue</span>
          <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">eax</span> <span class="o">==</span> <span class="mh">0x14df8275</span><span class="p">)</span> <span class="p">{</span>
              <span class="n">eax</span> <span class="o">=</span> <span class="mh">0x110d5577</span>
              <span class="n">esi</span> <span class="o">=</span> <span class="o">*</span><span class="n">arg1</span>
              <span class="n">edi</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="n">arg1</span> <span class="o">+</span> <span class="mi">4</span><span class="p">)</span> <span class="o">+</span> <span class="n">esi</span>
              <span class="k">continue</span>
          <span class="p">}</span>
      <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">eax</span> <span class="o">==</span> <span class="mh">0x3b2a39f6</span><span class="p">)</span> <span class="p">{</span>
          <span class="n">eax</span> <span class="o">=</span> <span class="mh">0x14df8275</span>
          <span class="k">continue</span>
      <span class="p">}</span>
  <span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">eax</span> <span class="o">!=</span> <span class="mh">0x188efd81</span><span class="p">)</span>
  <span class="k">return</span> <span class="mi">0</span>
<span class="p">}</span>
</code></pre></div></div>

<p>So far, the heuristic successfully identified control-flow flattening. What happens if the malware sample is not obfuscated? In order to find this out, we try our heuristic on <a href="https://malpedia.caad.fkie.fraunhofer.de/details/win.plugx">PlugX</a>, a remote access trojan. In total, we identify 26 out of 504 functions in the binary (5.16%). While most of these functions do not look very promising, some of them are <em>very</em> interesting: they implement command dispatching routines (which backdoor functionality will be executed) as well as the communication with the command-and-control server, two of the most interesting state machines in the binary.</p>

<h3 id="common-programs">Common Programs</h3>

<p>Up until now, we already noticed that the heuristic also detects non-flattened functions that implement state machines. How does this perform on common non-obfuscated programs? To answer this, let’s have a quick look at <code class="language-plaintext highlighter-rouge">ls</code> (140 KiB), <code class="language-plaintext highlighter-rouge">gcc</code> (1.2 MiB) and <code class="language-plaintext highlighter-rouge">gdb</code> (8.5 MiB). The heuristic identifies:</p>

<ul>
  <li>
    <p>8 out of 237 functions (3.38%) for <code class="language-plaintext highlighter-rouge">ls</code></p>
  </li>
  <li>
    <p>67 out of 1,456 functions (4.60%) for <code class="language-plaintext highlighter-rouge">gcc</code></p>
  </li>
  <li>
    <p>374 out of 15,603 functions (2.46%) for <code class="language-plaintext highlighter-rouge">gdb</code></p>
  </li>
</ul>

<p>While a considerable fraction of functions is not that interesting, others implement complex state machines that perform recursive directory traversals, parse file formats or encode/decode data streams. Often, these state machines are implemented as loops with large dispatching routines based on <code class="language-plaintext highlighter-rouge">switch</code> statements.</p>

<p>To conclude the experiments, we see that the heuristic identifies control-flow flattening as well as other state machines; some of them are very interesting for reverse engineers, since they implement complex dispatching routines that are relevant for understanding file formats, communication protocols or other important program logic. Although the heuristic identifies many functions that do not seem to be relevant for various reversing scenarios, they only constitute a small percentage of the whole code base, as these <em>non-representative</em> experiments show.</p>

<h2 id="setting-the-scene">Setting the Scene</h2>

<p>To automatically <em>break</em> a given code obfuscation scheme (as we do in my <a href="/training_software_deobfuscation.html">code deobfuscation training classes</a>), we must often first analyze its characteristics, generalize the patterns and automatically identify them. Only in the second stage, we are able to implement a strategy that removes the obfuscation, regardless whether we patch the binary, clean up the decompilation or work on a simplified intermediate language. If we want to automatically <em>identify</em> obfuscated code, the approach is similar:  We have to understand the obfuscation’s underlying structure, generalize its pattern and design a heuristic which pinpoints interesting code locations. In the second stage, we can have a closer look at the identified code parts.</p>

<p>In our case, we designed a simple but effective heuristic for discovering control-flow flattening by looking for functions which have a basic block that dominates large parts of the function <em>and</em> controls a loop. Personally, I use this heuristic quite a lot in early analysis stages to catch a glimpse of a complex code base. In cases where no control-flow flattening is present, it often finds complex program logic which is interesting for reverse engineers. To better filter the results, we can add additional data points to pinpoint a function’s complexity, such as its number of basic blocks or its <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a>: <code class="language-plaintext highlighter-rouge">#edges - #blocks + 2</code>. We could also think about refining the flattening heuristic to include checks for the state variable at the costs of requiring a code analysis.</p>

<p>For more advanced implementations of control-flow flattening which use indirect or call-based <a href="https://tigress.wtf/flatten.html">dispatching routines</a>, the heuristic will not work at all, since the graph structure is inherently different. Similarly, the heuristic won’t detect other obfuscation schemes such as opaque predicates, range dividers, mixed Boolean-Arithmetic or virtual machines, which have fundamentally different characteristics. For these schemes, we have to develop custom heuristics tailored to their underlying structures. Perhaps, this is a topic for a future post.</p>

<p>I hope you enjoyed this blog entry! :)</p>]]></content><author><name>Tim Blazytko</name></author><category term="control-flow-flattening" /><category term="cfg" /><category term="graph-analysis" /><category term="obfuscation" /><category term="detection" /><category term="reverse-engineering" /><summary type="html"><![CDATA[Heuristic for detecting control-flow flattening in binaries using CFG structure analysis, implementation details, and evaluation results.]]></summary></entry></feed>