Large language models have already become useful helpers in reverse engineering: they can clean up decompiler output, suggest better function names, answer focused questions about unfamiliar code, and write small helper scripts. I discussed some of these LLM-assisted reversing use cases in my REcon 2024 talk The Future of Reverse Engineering with Large Language Models. So far, though, the model has mostly stayed in a supporting role: we still decide what is worth following, what to inspect next, and where deeper reversing effort is needed.
What changes this picture is the rise of agents. Once a model can use tools in a loop, inspect intermediate results, and continue analysis on its own, it can take over meaningful parts of the workflow. We already see that across tasks such as solving CTF challenges, generating fuzzing harnesses, malware analysis, and exploit-generation experiments. However, while agents are already useful on their own, they become substantially more effective once they are given the right environment, tooling, and guidance.
In this post, we show how to build such an agentic malware-analysis pipeline in practice. We demonstrate this on a concrete malware case study in which some functionality is easy to identify from strings and APIs, while other parts only reveal themselves after deeper analysis. We first introduce the basic ingredients of agentic workflows, then look at what a strong general-purpose agent can find in the sample on its own, and finally compare that to the same agent embedded in a structured workflow. This lets us compare what already works, what the pipeline improves, and which limitations remain.
The companion repository with the analysis environment, pipeline, and sample setup is available here.
Agents
For the purposes of this post, we can use a simple definition: an agent is an LLM that can use tools in a loop to achieve a goal. Instead of answering once inside a single prompt, it can inspect tool output, decide what to do next, execute another step, and continue until it reaches a result. That is the key shift from LLM-assisted reverse engineering to agentic workflows.
The basic loop is straightforward. The model receives a task, chooses a tool, sees the result, and repeats this process until it can produce a final answer. In simplified pseudocode, such a reverse-engineering agent could look as follows:
# define tools
tools = {
"strings": strings,
"disasm": disasm,
"decompile": decompile,
}
# set LLM instructions
msgs = [
{
"role": "system",
"content": (
"You are a reverse-engineering agent. "
"Use tools to analyze binaries."
),
},
{
"role": "user",
"content": (
"Figure out what function sub_401230 does."
),
},
]
# agent loop
while True:
r = llm(msgs, tools=tools)
if r.type == "final":
print(r.content)
break
result = tools[r.tool_name](**r.tool_args)
msgs += [
r,
{
"role": "tool",
"name": r.tool_name,
"content": str(result),
},
]
The agent can retrieve strings, disassembly, and decompiler output, decide which tool to call next, observe the result, and iteratively refine its understanding until it can answer the question. Even this simple setup is already enough to turn a model from a passive assistant into an active analysis loop.
While there are good reasons to implement such loops from scratch, and official examples from both OpenAI and Anthropic show this approach directly, in practice many people will use command-line agents such as OpenAI’s Codex CLI, Anthropic’s Claude Code, or open-source agents like opencode. These tools already handle the interactive loop, tool execution, and session state for us.
For reverse engineering, the next question is how we connect the right analysis tooling and how we guide the workflow. The Model Context Protocol (MCP), as defined in the official specification, standardizes how agents discover and call external tools. Through MCP, an agent can query a disassembler, collect metadata, inspect cross-references, or run helper scripts without us manually copying results back and forth. Projects such as GhidraMCP and ida-pro-mcp make this kind of integration directly available for Ghidra and IDA Pro.
Agent skills, on the other hand, are reusable, task-specific workflows for agents. Their main purpose is to tell the agent how to approach a task. In practice, a skill can include a workflow, supporting scripts, file templates, examples, and concrete expectations for the output. This is useful across many reverse-engineering tasks: for a crackme, we may want the agent to first identify the relevant input checks and success paths before it starts patching the binary. For obfuscated code, we may want it to first classify the protection, determine where and how it appears in the binary, and then derive a suitable workflow and tooling strategy to deobfuscate it. In malware analysis, we may want the agent to first collect broad evidence such as strings and API imports, identify the most relevant points of interest, and only then inspect the surrounding code regions in the disassembler.
However, tool access and workflow guidance still do not solve the main practical limitation of unstructured agent runs: the context window. This is the temporary amount of information an agent can keep in view during a session, including chat history, tool output, notes, and intermediate observations. In reverse engineering, this window fills up quickly. Large amounts of disassembly or decompiler output, especially for large functions or groups of related functions, can already consume a significant fraction of the available context. Once the context becomes too large, the agent usually compacts it into a summary. As a consequence, details may be lost, the analysis may drift, work may be repeated, or the agent may simply get lost.
To deal with this limited context, two things are especially useful in practice: subagents and planning. With subagents, the main agent can offload focused work to a child agent with its own context window, such as analyzing a single function or tracing references to a string. Anthropic documents this pattern explicitly as subagents, while OpenAI discusses comparable delegation setups as multi-agent orchestration. Planning helps with a different part of the same problem: it does not give the agent more context, but it can help it spend that context more deliberately. For example, Claude Code exposes this explicitly as Plan Mode, where the agent first explores the problem with read-only operations and works out which evidence to collect and which leads are worth following.
Still, neither subagents nor planning solve the underlying issue. If the context gets compacted, the agent may still lose track of prior findings because it has no persistent memory, no stable notes to recover from, and no workflow it can resume reliably.
With these foundations in place, let us now look at what a strong agent can do with tool access alone. Afterwards, we will build a pipeline around the same setup and examine how much that changes the depth and reliability of the analysis.
Baseline: Unstructured Agent-based Malware Analysis
To get a feeling for what a strong general-purpose agent can already do on its own, we start with a simple setup: direct access to a disassembler, but no task-specific workflow, no persistent case file, and no additional guidance beyond the analysis prompt. In other words, the agent can drive the tools, but it has to structure the investigation on its own. That setup is already useful in practice, and we should expect it to produce a decent first triage and some concrete leads. The more interesting question is where it starts to flatten out once the sample requires deeper reconstruction.
As our test case, we use mfc42ul.dll from the German “Staatstrojaner” (federal Trojan horse) case publicized by the Chaos Computer Club (CCC). The sample is a good fit because some of its functionality is relatively easy to spot from strings, imports, and a quick pass through the code: screenshots, Skype interception, persistence, and covert communication. But it also contains deeper logic that is much harder to recover from a shallow analysis, including process-aware activation, command dispatching, protocol details, hardcoded C2 information, and the statically embedded AES key. That makes it a useful test case: it lets us judge not only whether the agent finds something interesting, but also how far the analysis actually goes.
In our initial run, we use OpenAI’s Codex CLI with gpt-5.4 xhigh and give it access to my headless Binary Ninja MCP server binary-ninja-headless-mcp. We then ask the agent to analyze the sample via the MCP server and produce a detailed overview of the malware, its features, and the corresponding code locations. At this point, the agent has everything it needs to inspect the binary and drive the analysis on its own:
› Analyze the malware in examples/samples/mfc42ul.dll
Give me a detailed overview of the sample’s functionality and features,
together with the corresponding code locations. Use the MCP server to
perform the analysis.
One first useful observation is that the agent does not behave randomly. Even without additional workflow guidance, it starts with a fairly sensible triage strategy: open the sample, collect high-level metadata, identify likely behavior from strings and imports, and then follow those leads into the code. The following excerpt shows how the run begins:
• I’m analyzing examples/samples/mfc42ul.dll with the Binary Ninja MCP tools.
• Updated Plan
└ Static malware triage and behavior reconstruction for the DLL using Binary
Ninja MCP.
□ Open the DLL in Binary Ninja and collect high-level metadata.
□ Identify likely behavior clusters from strings/xrefs.
□ Trace the main execution paths and map capabilities to concrete code locations.
• Called
└ binary_ninja_headless_mcp.session.open({...})
ok session_id=...
As the run continues, the same pattern remains visible: broad metadata collection first, then pivots off suspicious strings and imports, and only gradually deeper inspection of the code.
• The file is a 32-bit PE DLL with a large code surface, so I’m doing a fast
triage pass before drilling into behavior. I’m collecting exports,
imports, and structural metadata now, then I’ll pivot off suspicious
strings and entry-point code.
This is already a useful and fairly reasonable starting point. The agent is not guessing blindly; it is using the tooling in a disciplined way and follows a plausible initial workflow. At the same time, the limitation is already visible: the analysis is driven primarily by obvious pivots such as strings, imports, and surface-level structure. That is often enough for a first triage, but not necessarily enough to recover the deeper logic of the sample.
After roughly 15 minutes and 268 tool calls, the agent produced a first high-level summary of the sample. The result is already quite useful for an initial triage. The agent:
-
identifies the dropped driver
winsys32.sys, the corresponding installation logic, and the driver device\\.\KeyboardClassC, and ties them to functions such assub_1000e8a0andsub_10003a40, -
notices the Skype-related markers and target process names, and links them to nearby control logic around
sub_10007490, -
recovers the proxy-discovery logic in
sub_1000ead0, including registry-based handling of Internet Explorer proxy settings, -
identifies a proxy-aware outbound transport based on HTTP
CONNECT, together with the protocol markerC3PO-r2d2-POE, insub_10006790, and -
finds a secondary payload path around
sub_1000d470, including temporary-file staging, execution, and later cleanup.
This is already a meaningful speedup for the initial analysis: the agent produces a useful high-level overview of the sample’s features and maps them to concrete code locations. At the same time, the result remains fairly shallow. It does not identify the cryptographic code, recover the hardcoded key, or reconstruct the deeper protocol logic. In other words, it is a good first triage, but not yet a deep analysis.
Another practical problem is that the run nearly exhausts the available context window. By the end, only a small fraction of the context remains available. This makes the setup brittle: even when the agent finds useful leads, it has limited ability to preserve and build on them once the investigation grows larger or requires repeated deep dives into related code. Large chunks of disassembly and decompiler output consume context quickly, and once earlier findings are compacted into summaries, details can easily be lost.
So the baseline is already helpful, but mostly as triage. It points us to relevant functionality, recovers several visible features, and gives us concrete starting points for manual inspection. What it does not yet provide is a deeper reconstruction of how the malware is organized internally.
To see whether this is mostly a Codex-specific limitation or a more general pattern, we can repeat the same experiment with another strong agent. Using Anthropic’s Claude Code with Opus 4.6 and high effort, the overall picture improves somewhat. In a similar amount of time, Claude recovers a few additional details:
-
the audio-recording pipeline, including dual-channel capture and the related
waveIn*handling, -
the SNMP-based host fingerprinting together with the VMware MAC-prefix filter in the anti-analysis logic, and
-
the logic that reads the DLL’s own filename at runtime, suggesting that the sample checks whether it is still running under the expected module name.
At the same time, the overall pattern remains the same: Claude also gives a useful first overview, but it still does not really reconstruct the deeper logic of the sample.
Overall, these experiments suggest that changing the model helps only to a limited extent; it does not solve the main problem. As we will see, what matters much more is whether the agent has clear guidance, a structured plan, and a way to continue the analysis even after context-window resets. Let us now look at how to build that into the workflow.
Agentic Analysis Pipeline
The earlier experiments already show the main limitations that the workflow needs to address. A strong general-purpose agent with access to a binary-analysis engine can produce a useful first triage: it recovers visible functionality, finds sensible pivots, and maps obvious behavior to code locations. In our case, that was enough to identify some of the easier-to-spot features, but not the cryptographic code, the deeper protocol logic, or the internal organization of the sample. At the same time, the run was already approaching the limits of the context window.
To address these problems, we need a workflow that gives the agent a clear analytical structure and a persistent external memory it can return to. In the end, our goal is to make the analysis more repeatable, resumable, and capable of going deeper than unstructured tool-driven exploration without exhausting the context window unnecessarily. Therefore, we can now define four design goals for the workflow:
-
Start with broad fingerprinting. The agent should first collect a wide range of signals—strings, API imports, symbol names, crypto indicators, points of interest, and other easy-to-extract artifacts—to build a rough high-level picture of the sample before drifting into deeper analysis.
-
Keep raw evidence separate from higher-level reasoning. Raw artifacts (e.g., strings), extracted signals, planning results, hypotheses, and interpretations should not all be mixed together. This lets the agent work from structured summaries without losing access to the underlying evidence.
-
Follow a clearly staged process. The workflow should make explicit what has already been done, what comes next, which questions remain open, and what every stage is expected to produce. In implementation terms, stages are defined by templates, expected output artifacts, and lightweight validators that check whether the required files and fields are present before the workflow continues. This ensures that the workflow produces explicit intermediate artifacts that later stages can build on directly.
-
Externalize state to disk. Notes, artifacts, progress, hypotheses, and reports should be written to persistent storage so that the analysis can continue even after the context window fills up.
We address these goals by structuring the agent around a staged workflow that roughly mirrors how a human analyst would approach a malware sample: first collecting broad signals such as strings, API imports, and signatures; then building a rough picture of the sample; forming initial hypotheses; prioritizing the most promising leads; and finally validating or falsifying those hypotheses through deeper analysis. To keep this workflow manageable, we separate responsibilities across three roles: the orchestrator, the planner, and the reporter.
The orchestrator coordinates execution, initializes the case directory, checks artifact completeness, and manages phase transitions. The planner turns intermediate evidence into hypotheses, priorities, and concrete deep-analysis tasks. The reporter turns the accumulated artifacts into a consumable write-up, separates observed facts from inferred claims, and records uncertainty and next steps. During exploration, the orchestrator can also delegate focused collection tasks to subagents and merge their outputs back into the case state.
At a high level, the workflow looks as follows:
-
Broad fingerprinting and triage: Collect cheap signals such as file metadata, hashes, entropy-related hints, strings, imports, symbol names,
YARAmatches,caparesults, compiler or packer hints, crypto-related indicators, and other obvious points of interest to build an initial picture of the sample. -
Filtering and grouping: Filter the raw output, group related findings, and surface the strings, imports, capability clues, and code regions that look most relevant.
-
Hypothesis building: Use those findings to form initial ideas about what the sample does, which major components it seems to contain, and which leads are worth following.
-
Internal structure mapping: Build a working map of how the sample appears to be organized internally: which parts seem to handle persistence, communication, collection, injection, or other major tasks, and how they fit together.
-
Deep-analysis planning: Select and prioritize the functions, components, features, and hypotheses that are the best candidates for deeper inspection with the binary-analysis tooling.
-
Targeted deep analysis: Inspect those high-priority targets with the binary-analysis tooling to validate or revise earlier assumptions and recover the missing low-level logic.
The key improvement comes from imposing a structured workflow on the analysis. Instead of letting the agent interact with tools in an open-ended way, the workflow organizes the process into explicit stages with concrete intermediate artifacts. Raw evidence is collected first, then filtered and grouped, and then turned into hypotheses, structure maps, and plans for deeper inspection. As new evidence emerges, the agent can revise earlier assumptions and update the case state, while continuing from the artifacts produced so far.
To support this, the pipeline uses a persistent on-disk case directory for every sample. After each major step, artifacts, notes, ranked observations, hypotheses, plans, and reports are updated and stored. In effect, this directory becomes the agent’s externalized working memory. If the context window gets compacted, the agent can reread prior artifacts and continue from the current state. This also makes the workflow inspectable and reproducible, since intermediate results are no longer hidden inside transient model context.
We implemented this pipeline as a portable analysis environment that combines helper scripts, a reusable orchestration skill, and MCP-connected disassembler tooling into a single workflow. The agent runs inside a dedicated Docker container defined by the Dockerfile, with access to the available tools, scripts, and MCP servers. The workflow is packaged as a malware-analysis-orchestrator skill for Claude Code together with a matching Codex variant, and can use either our Binary Ninja Headless MCP server or Ghidra Headless MCP server for static analysis. The setup is designed to remain generic across agents, binary formats, and analysis backends. The full pipeline is available on GitHub.
With that in place, we can rerun the same mfc42ul.dll sample under the structured workflow and compare the result to the earlier analysis.
Evaluation
In the following, we focus on the Claude Code runs, since they produced the clearest write-up for this section. The runtime remained in roughly the same range as before. Codex improved in the same direction once the workflow was enabled as well, but we omit its output here because it would not change the overall conclusion.
Before turning to the results, let us briefly recall what is present in mfc42ul.dll. Some features—such as screenshot capture, Skype interception, persistence, and proxy handling—were relatively easy to identify from strings, API imports, and a shallow code pass. However, the sample also contains deeper logic and embedded data that are much harder to recover, including process-aware activation, command dispatching, protocol details, hardcoded C2 information, and the statically embedded AES key. These are exactly the parts where the structured workflow has to prove its value.
To evaluate this, we slightly expand the earlier query and explicitly instruct the agent to use the malware-analysis-orchestrator skill:
❯ Analyze the malware in examples/samples/mfc42ul.dll -- Give me a detailed
overview of the sample's functionality and features, together with the
corresponding code locations. Use the skill
/agent/agent_helpers/claude/skills/malware-analysis-orchestrator/ for
analysis.
After roughly 17 minutes, the workflow completes its first pass. The agent populates all 13 required case files, identifies 13 functional components, records 10 behavior hypotheses, and assigns HIGH confidence to 8 of them. Even before any dedicated deep-validation step, the resulting overview is already materially richer than the earlier triage-oriented summary. To keep the presentation manageable, we show a slightly shortened version of the first-pass output below:
● Analysis complete. All 13 artifacts populated. Here is the detailed overview:
1. Process-Aware Initialization (sub_1000f590 @ 0x1000f590)
2. Persistence — AppInit_DLLs (sub_1000d680 @ 0x1000d680)
3. Persistence — Kernel Driver Service (sub_10003a40 @ 0x10003a40)
6. C2 Communication (sub_10006790 @ 0x10006790)
9. Skype VoIP Interception (sub_10007720 @ 0x10007720)
14. Central Orchestrator (sub_1000ecc0 @ 0x1000ecc0)
All artifacts are in status/000-mfc42ul.dll/. The analysis identified
13 functional components, 10 behavior hypotheses (8 HIGH confidence),
56 capa capabilities, and produced comprehensive IOCs for detection engineering.
Already at this stage, the workflow no longer stops at visible features alone. It begins to group functionality into components, map those components to specific code regions, and point toward the sample’s internal control structure. That is already a qualitatively different kind of result.
The difference is not just that the workflow finds more. It also writes much more than a short final summary. Instead, the run produces a detailed set of intermediate artifacts for hypotheses, component mapping, planning, prioritization, and reporting. The following files are just a few examples of the case state the workflow yields:
-
05_behavior_hypotheses.mdrecords the current hypotheses, the supporting evidence, their confidence, and the checks that are still missing. -
08_deep_analysis_plan.mdturns those hypotheses into concrete follow-up tasks, with target functions, goals, and expected findings. -
09_priority_queue.mdtracks which open questions matter most, why they matter, and what depends on resolving them. -
10_reporting_draft.mdbrings the current findings, code references, IOCs, and remaining gaps into a draft report that the agent can return to later.
From a higher-level perspective, the earlier run compares to the structured workflow as follows:
| Aspect | Earlier run | Structured workflow |
|---|---|---|
| Runtime | ~15 min | ~17 min |
| Output form | High-level triage summary | Structured case directory with 13 required files |
| Deeper logic recovery | Limited | Materially better |
| Follow-up readiness | Weak | Strong |
| Persistent state | No | Yes |
Most importantly, the first pass no longer leaves us with just a short summary. Because it produces ranked hypotheses and a persistent case directory, we can continue directly from that state rather than restarting the investigation. We therefore ask the agent to validate the 8 HIGH-confidence hypotheses. Roughly 30 minutes later, the workflow produces a much deeper second pass:
Validation Results
H1: Full-Featured Surveillance RAT — VALIDATED
- Discovered complete command dispatch vtable at 0x1003d7d0 with 10 remotely-callable commands
- Mapped orchestrator vtable at 0x1003d7f8 with C2 data loop, heartbeat (byte 0x26), fingerprinting
H3: Process-Aware Module Activation — VALIDATED
- Confirmed GetModuleFileNameA + case-insensitive path matching in sub_10003990
- Full init chain: DllMain → process detect → module alloc → orchestrator
H6: C2 with AES-128 Encryption — VALIDATED
- Protocol vtable at 0x1003d330: encrypt+send, recv+decrypt, raw I/O via WinSock send()/recv()
- Encrypt: sub_10012130, Decrypt: sub_10013260, Key expansion: sub_10014410
- Wire format: C3PO-r2d2-POE → 4-byte length → AES-128-ECB encrypted payload
Three improvements stand out here. First, the workflow recovers the internal command dispatch rather than stopping at the sample’s visible high-level capabilities. Second, it validates the process-aware activation logic and reconstructs the initialization chain from DllMain into the orchestrator. Third, it begins to recover the protocol layer itself, including the protocol vtable, the send and receive paths, and the cryptographic routines that are used together with the C2 communication.
A closer look at the recovered command-dispatch table shows the kind of internal structure that the deeper pass can uncover:
| Vtable Index | Function | Opcode | Purpose |
|---|---|---|---|
| [0] | sub_1000d270 |
— | Driver drop/installation |
| [1] | sub_1000e2b0 |
— | Spawn monitoring thread |
| [2] | sub_1000e210 |
— | Driver cleanup (IOCTL 0xb2012) |
| [3] | sub_1000d680 |
— | AppInit_DLLs persistence write |
| [4] | sub_1000db50 |
— | Self-update via named pipe |
| [5] | sub_1000e120 |
— | System reboot (EWX_REBOOT|EWX_FORCE) |
| [6] | sub_1000e370 |
— | Config update (“%d %d %d %d %d %d”) |
| [7] | sub_1000d470 |
0x23 | Download-and-execute (%TEMP%~tmp%08x~.exe) |
| [8] | sub_10006710 |
0x27 | Enable + send command |
| [9] | sub_10006780 |
— | Disable (clear flag at obj+0x20) |
Taken together, this is a substantial step up from the earlier runs. The workflow does not just recover a longer list of visible features; it begins to expose the malware’s internal organization: how capabilities are dispatched, how activation depends on the host process, and where protocol handling and cryptographic routines sit in relation to the C2 communication. At this point, the agent already leaves behind a much more useful working model of the sample than a conventional first-pass triage.
The result is still not perfect. In this run, the agent did not recover the hardcoded C2 server or the embedded AES key, and it reported AES-128-ECB even though the sample actually uses AES-256-ECB. However, this miss should be interpreted in context. The workflow did identify the relevant protocol path, the wire format, the use of ECB mode, and the cryptographic routines involved in the C2 communication. In other words, it did not miss the protocol layer entirely; its reconstruction remained incomplete at some of the finer details. At the same time, the run remains non-deterministic: a rerun may recover additional details, emphasize different parts of the sample, or make different mistakes. The result should therefore not be treated as a final report, but as an intermediate state.
Even with these gaps, the improvement is substantial. The structured workflow pushes the agent well beyond loose triage and leaves behind a much stronger intermediate state: a case directory with evidence, ranked hypotheses, priorities, and partially validated internal structure. In practice, that can save an analyst hours to days of early-stage work. Instead of spending most of the time collecting obvious clues and organizing first impressions, we can focus on validating deeper findings, resolving the remaining unknowns, and correcting the places where the agent is still wrong.
Because the workflow preserves explicit case state, follow-up analysis also becomes much sharper. The analyst can guide the agent toward a specific hypothesis, subsystem, or unresolved question, and the agent can continue from the accumulated evidence rather than starting over. It can also annotate the analysis database itself—for example by renaming symbols, attaching notes, and marking relevant functions—so that the human analyst can continue from a better-structured and better-documented starting point.
Closing Remarks
As we’ve seen, agents are already changing reverse engineering in a significant way. In this post, even the unstructured run was already useful as a first triage: it recovered visible functionality, pointed to relevant code locations, and produced concrete leads for follow-up. The real improvement, however, came from giving the agent a structured workflow, persistent case state, and the right analysis tooling. This also shows where agents are already particularly strong: scaling the mechanical parts of analysis, collecting evidence, correlating artifacts, generating scripts, and producing a first structured understanding of a binary far faster than a human could.
And we are still early. Models, tools, and agentic workflows are improving quickly, and the current generation already shows how much repetitive reverse-engineering work can be automated. Over time, this will likely push human analysts away from manually performing every step themselves and more toward guiding workflows, validating results, refining assumptions, and deciding where deeper analysis is needed.
But that does not make human expertise obsolete. Reverse engineering remains an adversarial task: binaries may be unusual, intentionally deceptive, or heavily obfuscated. In those settings, a plausible answer is not the same as a correct one. Agents can help explore and accelerate, but they still struggle when the analysis requires novel reasoning, strong validation, or a change in strategy.
Obfuscation is a good example. Even agentic workflows that perform well on triage and structured analysis can still break down once strong control-flow obfuscation, virtualization, custom packing, or non-standard data encodings enter the picture. At that point, the challenge shifts from simply driving tools to understanding how the protections work, testing hypotheses against the binary, and designing the right custom tooling or deobfuscation workflow. An LLM can help with that process, but the direction still has to come from the analyst.
In that sense, agents do not remove the need for reverse-engineering expertise; they raise the bar for where that expertise matters most. And if there is one area where this becomes especially visible, it is deobfuscation. For analysts who have to deal with protected or obfuscated binaries in practice, I cover many of these workflows in my hands-on training Software Deobfuscation Techniques.