Practical MBA Deobfuscation with msynth

What does it compute? You will see, it’s not very complicated; it is (x + y) + (x + z).

What we can observe here is one of the most sophisticated code obfuscation techniques which is known as mixed Boolean-Arithmetic (MBA). MBAs arithmetically encode semantically simple operations in complex arithmetic expressions to hide the underlying computations. In general, it is extremely difficult to simplify such expressions, since there are endless possibilities to construct them. Nevertheless, recent research advanced in breaking certain instances of MBAs by using algebraic simplifications, program synthesis, machine learning and the exploitation of mathematical characteristics. However, one of the most generic and powerful attacks on MBAs is known as QSynth (official framework: QSynthesis) and has been introduced by Robin David and Luigi Coniglio. Earlier this year, I published my MBA deobfuscation framework msynth which incorporates similar ideas; it is specifically designed to work on real-world binaries and can simplify most MBAs found in the wild.

From this blog post, you will get a better understanding of the ideas behind msynth and learn how to use it to simplify MBA expressions in real-world binaries. Who knows, it may assist your day-to-day malware analysis and code deobfuscation tasks or just give you the ability to play around with MBA deobfuscation.

Simplification Approach

Given a complex expression like the one above, our goal is to find a shorter expression that exhibits the same input-output behavior. In our case, (x + y) + (x + z) is such a simpler expression: No matter which inputs we assign to x, y and z, both expressions always compute to the same results. To find such expressions, intuitively, we want to identify parts of the complex expression—so-called subexpressions—and repeatedly replace them with simpler representations. However, to keep this process efficient, we first collect a large number of relationships between complex and simple expressions. In short, our simplification approach uses two components:

The database can be understood as a pre-computed lookup table mapping input-output behavior to the smallest possible expression. For example, if we know that the input tuple (10, 20) evaluates to 30 (written as (10, 20) -> 30) and (10, 10) evaluates to 20 (written as (10, 10) -> 20), then we can map this behavior to an addition of two variables, for example x + y. However, we would not map this behavior to an expression such as x + y + 5 - 5, since it is not the shortest representation.

The simplification algorithm recursively walks over all subexpressions–from the largest to the shortest—and evaluates the subexpression’s input-output behavior. Afterward, we query the database whether a shorter entry exists for this behavior. If so, we replace the current subexpression with the one from the database and stop; otherwise, we continue and go to the next subexpression. We repeat this process, until we cannot simplify any more subexpressions.

If this sounds confusing, don’t worry: Let us illustrate the interplay between algorithm & database and gradually simplify the expression above. First, let us assume we have the following lookup table, in which each input-output pair uses an input tuple of length 3 and in which the expression can use the variables x, y and z:

(10, 20, 30) -> 30: x + y
(10, 20, 30) -> 40: x + z

The database contains two entries; one entry maps an input vector to x + y, another one to x + z.

To simplify the complex expression, we now derive all of its subexpressions, sort them by their lengths in descending order and iteratively walk over the following list:

((x ^ y) + (2 * (x & y))) + ((x ^ z) + (2 * (x & z)))
(x ^ y) + (2 * (x & y))
(x ^ z) + (2 * (x & z))
2 * (x & y)
2 * (x & z)
x ^ y
x & y
x ^ z
x & z
x
y
z

Starting with the original expression, we evaluate it with (10, 20, 30) and obtain 70. Since (10, 20, 30) -> 70 is not in our database, we continue with the next subexpression, (x ^ y) + (2 * (x & y)). This time, it evaluates to 30, giving us (10, 20, 30) -> 30. This entry maps to x + y. Therefore, we replace the subexpression in the complex expression with x + y, which leads us to:

(x + y) + ((x ^ z) + (2 * (x & z)))

Then, we re-start the algorithm, now beginning with the shorter expression. Once we reach the subexpression (x ^ z) + (2 * (x & z)), we evaluate it to (10, 20, 30) -> 40. If we perform a database lookup for this entry, we learn that it is equivalent to x + z. As a consequence, we again replace the subexpression and obtain:

In a final run, we re-start the process one last time. However, since we cannot simplify any further subexpressions, the algorithm terminates.

This example illustrates the deobfuscation process quite well. However, to be usable in real-world scenarios, we have to take care of a few things:

In Practice

Let us now first have a look at how we can use msynth to simplify arbitrary MBAs. Then, we combine it with symbolic execution to break an MBA on the binary level. Afterward, we examine how msynth can be used to deobfuscate MBAs in malware samples. Finally, we see that msynth can even be useful in analysis scenarios where the code is not necessarily obfuscated.

MBA Simplification

To simplify an MBA expression, we need msynth’s Simplifier class which has to be initialized with a file path to the pre-computed database. Then, we can simplify an expression in Miasm IR by calling simplify. Overall, this only takes three lines of code:

from msynth import Simplifier

# initialize simplifier
simplifier = Simplifier(oracle_path)
# simplify expression
simplified = simplifier.simplify(expression)

If we add some more lines, we turn the snippet into a fully-working script that simplifies our MBA from before (a similar script which can be used for own experiments can be found here):

from miasm.expression.expression import *
from msynth import Simplifier

def mba(size):
    # ((x ^ y) + (2 * (x & y))) + ((x ^ z) + (2 * (x & z)))
    x = ExprId("x", size)
    y = ExprId("y", size)
    z = ExprId("z", size)
    return ((x ^ y) + (ExprInt(2, size) * (x & y))) + ((x ^ z) + (ExprInt(2, size) * (x & z)))

# path to pre-computed lookup table
oracle_path = "./oracle.pickle"
# initialize simplifier
simplifier = Simplifier(oracle_path)
# simplify expression
simplified = simplifier.simplify(mba(32))

# print results
print(f"original: {mba(32)}")
print(f"simplified: {simplified}")

original: (x ^ y) + 0x2 * (x & y) + (x ^ z) + 0x2 * (x & z)
simplified: y + z + x * 0x2

msynth does not only simplify the MBA, the resulting expression is even shorter than (x + y) + (x + z). This is due to Miasm’s symbolic execution engine which simplifies x + x to x * 0x2. If we aim at simplifying more complex MBAs, we can easily modify the MBA functions and for example receive:

original: ((x ^ 0xFFFFFFFF) | z) + -(((((((0x1 + z + -0x1) | (x ^ 0xFFFFFFFF)) + -(x ^ 0xFFFFFFFF)) ^ 0xFFFFFFFF) & z) + z + (z & (z ^ 0xFFFFFFFF))) ^ 0xFFFFFFFF) + (x & ((x & y) + (x & y) + (x ^ y))) + (x & ((x & y) + (x & y) + (x ^ y))) + (x ^ ((x & y) + (x & y) + (x ^ y))) + 0x1 + ((((y + ((y ^ 0xFFFFFFFF) & z)) | (y + ((y ^ 0xFFFFFFFF) & z) + (((y + ((y ^ 0xFFFFFFFF) & z)) ^ 0xFFFFFFFF) & z))) + -(y & (y + ((y ^ 0xFFFFFFFF) & z)))) ^ 0xFFFFFFFF) + --x + ((y + ((y ^ 0xFFFFFFFF) & z) + (((y + ((y ^ 0xFFFFFFFF) & z)) ^ 0xFFFFFFFF) & z)) & (z ^ 0xFFFFFFFF)) + ((x ^ 0xFFFFFFFF) | z) + -(((((((0x1 + z + -0x1) | (x ^ 0xFFFFFFFF)) + -(x ^ 0xFFFFFFFF)) ^ 0xFFFFFFFF) & z) + z + (z & (z ^ 0xFFFFFFFF))) ^ 0xFFFFFFFF) + (x & ((x & y) + (x & y) + (x ^ y))) + (x & ((x & y) + (x & y) + (x ^ y))) + (x ^ ((x & y) + (x & y) + (x ^ y))) + 0x1 + ((((y + ((y ^ 0xFFFFFFFF) & z)) | (y + ((y ^ 0xFFFFFFFF) & z) + (((y + ((y ^ 0xFFFFFFFF) & z)) ^ 0xFFFFFFFF) & z))) + -(y & (y + ((y ^ 0xFFFFFFFF) & z)))) ^ 0xFFFFFFFF) + --x + ((y + ((y ^ 0xFFFFFFFF) & z) + (((y + ((y ^ 0xFFFFFFFF) & z)) ^ 0xFFFFFFFF) & z)) & (z ^ 0xFFFFFFFF)) + ((x ^ 0xFFFFFFFF) | z) + -(((((((0x1 + z + -0x1) | (x ^ 0xFFFFFFFF)) + -(x ^ 0xFFFFFFFF)) ^ 0xFFFFFFFF) & z) + z + (z & (z ^ 0xFFFFFFFF))) ^ 0xFFFFFFFF) + (x & ((x & y) + (x & y) + (x ^ y))) + (x & ((x & y) + (x & y) + (x ^ y))) + (x ^ ((x & y) + (x & y) + (x ^ y))) + 0x1 + ((((y + ((y ^ 0xFFFFFFFF) & z)) | (y + ((y ^ 0xFFFFFFFF) & z) + (((y + ((y ^ 0xFFFFFFFF) & z)) ^ 0xFFFFFFFF) & z))) + -(y & (y + ((y ^ 0xFFFFFFFF) & z)))) ^ 0xFFFFFFFF) + --x + ((y + ((y ^ 0xFFFFFFFF) & z) + (((y + ((y ^ 0xFFFFFFFF) & z)) ^ 0xFFFFFFFF) & z)) & (z ^ 0xFFFFFFFF))
simplified: z * 0x3 + (x + y) * 0x6

MBA Challenge

Since we know how to simplify MBAs, we go a step further and combine it with Miasm’s symbolic execution engine. For this, we rely on the script symbolic_simplification.py from msynth’s repository as an example. Provided with a binary, an address and a path to the pre-computed database, the script symbolically executes a single basic block at the given address and simplifies all derived expressions. To use it, we have to call it as follows:

python symbolic_simplification.py <binary> <address> <path to database>

Let us try this on an artificial MBA challenge which is inspired by a commercial protector. In particular, let us simplify a function-level MBA which returns the result in rax and uses rdi, rsi and rdx as inputs:

00001200  89f1               mov     ecx, esi
00001202  f7d1               not     ecx
00001204  21d1               and     ecx, edx
00001206  01f1               add     ecx, esi
00001208  4189c8             mov     r8d, ecx
0000120b  21ce               and     esi, ecx
0000120d  41f7d0             not     r8d
00001210  8d043e             lea     eax, [rsi+rdi]
00001213  4121d0             and     r8d, edx
00001216  f7d2               not     edx
00001218  4101c8             add     r8d, ecx
0000121b  4421c2             and     edx, r8d
0000121e  4409c1             or      ecx, r8d
00001221  01d0               add     eax, edx
00001223  29c8               sub     eax, ecx
00001225  c3                 retn     {__return_addr}

uint64_t mba3(int32_t arg1, int32_t arg2, int32_t arg3) {

    int32_t rcx_2 = (not.d(arg2) & arg3) + arg2
    int32_t r8_3 = (not.d(rcx_2) & arg3) + rcx_2
    return zx.q((arg2 & rcx_2) + arg1 + (not.d(arg3) & r8_3) - (rcx_2 | r8_3))
}

From looking at the expression, we don’t get any clue what it computes. However, if we symbolically execute it with Miasm and simplify it with msynth, we get the following output (for the return value in rax):

before: {(((((((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32]) ^ 0xFFFFFFFF) & RDX[0:32]) + ((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32]) & (RDX[0:32] ^ 0xFFFFFFFF)) + -(((((((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32]) ^ 0xFFFFFFFF) & RDX[0:32]) + ((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32]) | (((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32])) + (RDI + {(((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32]) & RSI[0:32] 0 32, 0x0 32 64})[0:32] 0 32, 0x0 32 64}


[snip]

simplified: {-RDX[0:32] + RDI[0:32] + RSI[0:32] 0 32, 0x0 32 64}

We clearly see that the the simplified code performs an addition of rdi and rsi and subtracts rdx. However, these operations are only performed on the register’s lower 32-bit (denoted by [0:32]); thus, the final result is extended (with zeros) to 64-bits. If we want to clean up the decompiler output, we could rewrite it as follows:

uint64_t mba3(int32_t arg1, int32_t arg2, int32_t arg3) {
    return zx.q(arg1 + arg2 - arg3)
}

Malware Analysis

Leaving the artificial examples behind, let us now use msynth to support us in the process of malware analysis. Often, malware is obfuscated with frameworks such as Obfuscator-LLVM, which commonly deploy MBAs. For example, the FinSpy sample—which we already used in a prior blog post—uses MBAs to to hide data movements based on the conditional move instruction cmov. If we can simplify the MBA, we know whether the data will be conditionally moved or not.

To deobfuscate MBAs within a basic block, we call the script as we did in the example before:

$ python symbolic_simplification.py samples/finspy 0x4035af oracle.pickle

After we symbolically executed the basic block at 0x4035af and simplified it with msynth, we get the following output:

before: (((@32[@64[0x614F40]][31:32]?({@32[@64[0x614F40]] 0 32, 0xFFFFFFFF 32 64},{@32[@64[0x614F40]] 0 32, 0x0 32 64})) * ((@32[@64[0x614F40]] + 0xFFFFFFFF)[31:32]?({@32[@64[0x614F40]] + 0xFFFFFFFF 0 32, 0xFFFFFFFF 32 64},{@32[@64[0x614F40]] + 0xFFFFFFFF 0 32, 0x0 32 64})))[0:32] & 0x1)?({(((({((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) & (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} ^ 0x1) | 0x1) ^ 0x1) & 0x1) | {((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) & (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} 0 8, 0xFFFFFF 8 64},{({((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) & (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} ^ 0x1) | {((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) & (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} 0 8, 0xFFFFFF 8 64})

[snip]

simplified: 0xFFFFFF01

While the original symbolic expression is long and complex, msynth simplifies it to the constant 0xFFFFFF01. In another example, we can simplify a similarly complex expression to the constant 0x0:

before: ((((@32[@64[0x614F40]][31:32]?({@32[@64[0x614F40]] 0 32, 0xFFFFFFFF 32 64},{@32[@64[0x614F40]] 0 32, 0x0 32 64})) * ((@32[@64[0x614F40]] + 0xFFFFFFFF)[31:32]?({@32[@64[0x614F40]] + 0xFFFFFFFF 0 32, 0xFFFFFFFF 32 64},{@32[@64[0x614F40]] + 0xFFFFFFFF 0 32, 0x0 32 64})))[0:32] & 0x1)?(((((({((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) & (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} ^ 0x1) | 0x1) ^ 0x1) & 0x1) | {((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) & (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8}) & 0x1,(({((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) & (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} ^ 0x1) | {((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) & (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8}) & 0x1))?(0x0,0x1)

[snip]

simplified: 0x0

Complex Code

Sometimes we have to analyze code which is not necessarily obfuscated but so complex, for example due to aggressive compiler optimizations, that we don’t have any clue of what is going on. In these cases, msynth can also support us in our analysis by removing at least some complexity in the expressions.

As an example, let us use the Windows kernel ntoskrnl.exe which we already used in a previous blog post to detect PatchGuard. If we take a closer look at PatchGuard’s initialization function, we find a repetitive pattern that is rather complex. This pattern might be guarded by MBAs, but it might also be just a piece of weird code whereof we currently can’t determine the functionality, given our current level of knowledge. However, even in this case, msynth allows us to massively simplify the expression:

before: {((({(({RDI 0 64, 0x0 64 128} * {((RAX | (RDX << 0x20)) >>> 0x3) ^ (RAX | (RDX << 0x20)) 0 64, 0x0 64 128})[0:32] ^ ({RDI 0 64, 0x0 64 128} * {((RAX | (RDX << 0x20)) >>> 0x3) ^ (RAX | (RDX << 0x20)) 0 64, 0x0 64 128})[64:96]) >> 0xF 0 32, 0x0 32 64} * 0x4EC4EC4F)[32:64] >> 0x3)[31:32]?({({(({RDI 0 64, 0x0 64 128} * {((RAX | (RDX << 0x20)) >>> 0x3) ^ (RAX | (RDX << 0x20)) 0 64, 0x0 64 128})[0:32] ^ ({RDI 0 64, 0x0 64 128} * {((RAX | (RDX << 0x20)) >>> 0x3) ^ (RAX | (RDX << 0x20)) 0 64, 0x0 64 128})[64:96]) >> 0xF 0 32, 0x0 32 64} * 0x4EC4EC4F)[32:64] >> 0x3 0 32, 0xFFFFFFFF 32 64} * 0x1A,{({(({RDI 0 64, 0x0 64 128} * {((RAX | (RDX << 0x20)) >>> 0x3) ^ (RAX | (RDX << 0x20)) 0 64, 0x0 64 128})[0:32] ^ ({RDI 0 64, 0x0 64 128} * {((RAX | (RDX << 0x20)) >>> 0x3) ^ (RAX | (RDX << 0x20)) 0 64, 0x0 64 128})[64:96]) >> 0xF 0 32, 0x0 32 64} * 0x4EC4EC4F)[32:64] >> 0x3 0 32, 0x0 32 64} * 0x1A))[0:32] 0 32, 0x0 32 64}

[snip]

simplified: {(({(({RDI 0 64, 0x0 64 128} * {((RAX | (RDX << 0x20)) >>> 0x3) ^ (RAX | (RDX << 0x20)) 0 64, 0x0 64 128})[0:32] ^ ({RDI 0 64, 0x0 64 128} * {((RAX | (RDX << 0x20)) >>> 0x3) ^ (RAX | (RDX << 0x20)) 0 64, 0x0 64 128})[64:96]) >> 0xF 0 32, 0x0 32 64} * 0x4EC4EC4F)[32:64] >> 0x3) * 0x1A 0 32, 0x0 32 64}

Although the simplified expression is still complex and doesn’t tell us much without further analysis, the expression is significantly better to read due to its eliminated subexpressions.

Overall, the aforementioned scenarios are only example usages for msynth. In practice, you may stumble upon a wide variety of MBAs or other complex expressions of which you suspect shorter representations to exist. Try it on your own; msynth does not solve all of your problems, but it may be a useful tool that supports your analysis.

Closing Remarks

I wrote msynth with the goal to create a stable and usable MBA deobfuscation framework that supports me in my day-to-day reverse engineering tasks; it has been used in audits of various obfuscation schemes as well as in the analysis of some malware samples; furthermore, my students use it regularly in my code deobfuscation training to break sophisticated protection schemes such as virtual machines with MBA-obfuscated handlers. msynth isn’t perfect (especially in terms of performance for very large expressions), but it will improve over time. Furthermore, it is a playground for me (and hopefully others) to test and develop novel deobfuscation techniques which will be integrated over time. So, have fun with it and stay tuned!

Contact

For questions, feel free to reach out via Twitter @mr_phrazer, mail tim@blazytko.to or various other channels.