11 Nov 2021 - Tim Blazytko
Let’s say we see the following arithmetic expression:
((x ^ y) + (2 * (x & y))) + ((x ^ z) + (2 * (x & z)))
What does it compute? You will see, it’s not very complicated; it is (x + y) + (x + z)
.
What we can observe here is one of the most sophisticated code obfuscation techniques which is known as mixed Boolean-Arithmetic (MBA). MBAs arithmetically encode semantically simple operations in complex arithmetic expressions to hide the underlying computations. In general, it is extremely difficult to simplify such expressions, since there are endless possibilities to construct them. Nevertheless, recent research advanced in breaking certain instances of MBAs by using algebraic simplifications, program synthesis, machine learning and the exploitation of mathematical characteristics. However, one of the most generic and powerful attacks on MBAs is known as QSynth (official framework: QSynthesis) and has been introduced by Robin David and Luigi Coniglio. Earlier this year, I published my MBA deobfuscation framework msynth which incorporates similar ideas; it is specifically designed to work on real-world binaries and can simplify most MBAs found in the wild.
From this blog post, you will get a better understanding of the ideas behind msynth and learn how to use it to simplify MBA expressions in real-world binaries. Who knows, it may assist your day-to-day malware analysis and code deobfuscation tasks or just give you the ability to play around with MBA deobfuscation.
Given a complex expression like the one above, our goal is to find a shorter expression that exhibits the same input-output behavior. In our case, (x + y) + (x + z)
is such a simpler expression: No matter which inputs we assign to x
, y
and z
, both expressions always compute to the same results. To find such expressions, intuitively, we want to identify parts of the complex expression—so-called subexpressions—and repeatedly replace them with simpler representations. However, to keep this process efficient, we first collect a large number of relationships between complex and simple expressions. In short, our simplification approach uses two components:
The database can be understood as a pre-computed lookup table mapping input-output behavior to the smallest possible expression. For example, if we know that the input tuple (10, 20)
evaluates to 30
(written as (10, 20) -> 30
) and (10, 10)
evaluates to 20
(written as (10, 10) -> 20
), then we can map this behavior to an addition of two variables, for example x + y
. However, we would not map this behavior to an expression such as x + y + 5 - 5
, since it is not the shortest representation.
The simplification algorithm recursively walks over all subexpressions–from the largest to the shortest—and evaluates the subexpression’s input-output behavior. Afterward, we query the database whether a shorter entry exists for this behavior. If so, we replace the current subexpression with the one from the database and stop; otherwise, we continue and go to the next subexpression. We repeat this process, until we cannot simplify any more subexpressions.
If this sounds confusing, don’t worry: Let us illustrate the interplay between algorithm & database and gradually simplify the expression above. First, let us assume we have the following lookup table, in which each input-output pair uses an input tuple of length 3 and in which the expression can use the variables x
, y
and z
:
(10, 20, 30) -> 30: x + y
(10, 20, 30) -> 40: x + z
The database contains two entries; one entry maps an input vector to x + y
, another one to x + z
.
To simplify the complex expression, we now derive all of its subexpressions, sort them by their lengths in descending order and iteratively walk over the following list:
((x ^ y) + (2 * (x & y))) + ((x ^ z) + (2 * (x & z)))
(x ^ y) + (2 * (x & y))
(x ^ z) + (2 * (x & z))
2 * (x & y)
2 * (x & z)
x ^ y
x & y
x ^ z
x & z
x
y
z
Starting with the original expression, we evaluate it with (10, 20, 30)
and obtain 70
. Since (10, 20, 30) -> 70
is not in our database, we continue with the next subexpression, (x ^ y) + (2 * (x & y))
. This time, it evaluates to 30
, giving us (10, 20, 30) -> 30
. This entry maps to x + y
. Therefore, we replace the subexpression in the complex expression with x + y
, which leads us to:
(x + y) + ((x ^ z) + (2 * (x & z)))
Then, we re-start the algorithm, now beginning with the shorter expression. Once we reach the subexpression (x ^ z) + (2 * (x & z))
, we evaluate it to (10, 20, 30) -> 40
. If we perform a database lookup for this entry, we learn that it is equivalent to x + z
. As a consequence, we again replace the subexpression and obtain:
(x + y) + (x + z)
In a final run, we re-start the process one last time. However, since we cannot simplify any further subexpressions, the algorithm terminates.
This example illustrates the deobfuscation process quite well. However, to be usable in real-world scenarios, we have to take care of a few things:
If the pre-computed database contains only a handful of entries, simplification will not be very effective. In general we can say: The more entries the pre-computed database contains, the higher the success rate. To derive more database entries, we can perform an exhaustive search over all possible combinations of variables and mathematical operations up to a certain upper bound. Additionally we can generate random expressions of higher complexity and also add them to the database.
It is error-prone to use only a single input-output pair for database lookups. Consider the following example: The input-output pair (2, 2) -> 4
maps to more than one expression; it holds for x + y
, x + x
, x * y
and others, which all have the same length. To circumvent this and increase the accuracy, we have to use a higher number of input-output pairs. In practice, numbers between 10 and 30 are sufficient. While higher numbers further increase the accuracy, they slow down the simplification process enormously. To exclude false positives entirely, we can use an SMT solver to check if the subexpression and the matching expression from the database are equivalent.
Expressions on the binary level often use up/downcasts to realize, for example, 32-bit operations on 64-bit architectures. To simplify such arithmetic constructs, we have to include type casts in our pre-computed lookup tables.
With that in mind, let us now dive into msynth and use it to break some MBAs.
msynth is an MBA deobfuscation framework built on top of the Miasm reverse engineering framework, which we already used in my last blog post on VM-based deobfuscation. It ships with a pre-computed database (referred to as oracle) including common constants and type casts. To simplify MBAs in binaries, msynth can be combined with Miasm’s symbolic execution engine to derive expressions from assembly code. Example scripts for this can be found in msynth’s repository and in the repository of one of my code deobfuscation workshops.
Let us now first have a look at how we can use msynth to simplify arbitrary MBAs. Then, we combine it with symbolic execution to break an MBA on the binary level. Afterward, we examine how msynth can be used to deobfuscate MBAs in malware samples. Finally, we see that msynth can even be useful in analysis scenarios where the code is not necessarily obfuscated.
To simplify an MBA expression, we need msynth’s Simplifier
class which has to be initialized with a file path to the pre-computed database. Then, we can simplify an expression in Miasm IR by calling simplify
. Overall, this only takes three lines of code:
from msynth import Simplifier
# initialize simplifier
simplifier = Simplifier(oracle_path)
# simplify expression
simplified = simplifier.simplify(expression)
If we add some more lines, we turn the snippet into a fully-working script that simplifies our MBA from before (a similar script which can be used for own experiments can be found here):
from miasm.expression.expression import *
from msynth import Simplifier
def mba(size):
# ((x ^ y) + (2 * (x & y))) + ((x ^ z) + (2 * (x & z)))
x = ExprId("x", size)
y = ExprId("y", size)
z = ExprId("z", size)
return ((x ^ y) + (ExprInt(2, size) * (x & y))) + ((x ^ z) + (ExprInt(2, size) * (x & z)))
# path to pre-computed lookup table
oracle_path = "./oracle.pickle"
# initialize simplifier
simplifier = Simplifier(oracle_path)
# simplify expression
simplified = simplifier.simplify(mba(32))
# print results
print(f"original: {mba(32)}")
print(f"simplified: {simplified}")
The output looks as follows:
original: (x ^ y) + 0x2 * (x & y) + (x ^ z) + 0x2 * (x & z)
simplified: y + z + x * 0x2
msynth does not only simplify the MBA, the resulting expression is even shorter than (x + y) + (x + z)
. This is due to Miasm’s symbolic execution engine which simplifies x + x
to x * 0x2
. If we aim at simplifying more complex MBAs, we can easily modify the MBA functions and for example receive:
original: ((x ^ 0xFFFFFFFF) | z) + -(((((((0x1 + z + -0x1) | (x ^ 0xFFFFFFFF)) + -(x ^ 0xFFFFFFFF)) ^ 0xFFFFFFFF) & z) + z + (z & (z ^ 0xFFFFFFFF))) ^ 0xFFFFFFFF) + (x & ((x & y) + (x & y) + (x ^ y))) + (x & ((x & y) + (x & y) + (x ^ y))) + (x ^ ((x & y) + (x & y) + (x ^ y))) + 0x1 + ((((y + ((y ^ 0xFFFFFFFF) & z)) | (y + ((y ^ 0xFFFFFFFF) & z) + (((y + ((y ^ 0xFFFFFFFF) & z)) ^ 0xFFFFFFFF) & z))) + -(y & (y + ((y ^ 0xFFFFFFFF) & z)))) ^ 0xFFFFFFFF) + --x + ((y + ((y ^ 0xFFFFFFFF) & z) + (((y + ((y ^ 0xFFFFFFFF) & z)) ^ 0xFFFFFFFF) & z)) & (z ^ 0xFFFFFFFF)) + ((x ^ 0xFFFFFFFF) | z) + -(((((((0x1 + z + -0x1) | (x ^ 0xFFFFFFFF)) + -(x ^ 0xFFFFFFFF)) ^ 0xFFFFFFFF) & z) + z + (z & (z ^ 0xFFFFFFFF))) ^ 0xFFFFFFFF) + (x & ((x & y) + (x & y) + (x ^ y))) + (x & ((x & y) + (x & y) + (x ^ y))) + (x ^ ((x & y) + (x & y) + (x ^ y))) + 0x1 + ((((y + ((y ^ 0xFFFFFFFF) & z)) | (y + ((y ^ 0xFFFFFFFF) & z) + (((y + ((y ^ 0xFFFFFFFF) & z)) ^ 0xFFFFFFFF) & z))) + -(y & (y + ((y ^ 0xFFFFFFFF) & z)))) ^ 0xFFFFFFFF) + --x + ((y + ((y ^ 0xFFFFFFFF) & z) + (((y + ((y ^ 0xFFFFFFFF) & z)) ^ 0xFFFFFFFF) & z)) & (z ^ 0xFFFFFFFF)) + ((x ^ 0xFFFFFFFF) | z) + -(((((((0x1 + z + -0x1) | (x ^ 0xFFFFFFFF)) + -(x ^ 0xFFFFFFFF)) ^ 0xFFFFFFFF) & z) + z + (z & (z ^ 0xFFFFFFFF))) ^ 0xFFFFFFFF) + (x & ((x & y) + (x & y) + (x ^ y))) + (x & ((x & y) + (x & y) + (x ^ y))) + (x ^ ((x & y) + (x & y) + (x ^ y))) + 0x1 + ((((y + ((y ^ 0xFFFFFFFF) & z)) | (y + ((y ^ 0xFFFFFFFF) & z) + (((y + ((y ^ 0xFFFFFFFF) & z)) ^ 0xFFFFFFFF) & z))) + -(y & (y + ((y ^ 0xFFFFFFFF) & z)))) ^ 0xFFFFFFFF) + --x + ((y + ((y ^ 0xFFFFFFFF) & z) + (((y + ((y ^ 0xFFFFFFFF) & z)) ^ 0xFFFFFFFF) & z)) & (z ^ 0xFFFFFFFF))
simplified: z * 0x3 + (x + y) * 0x6
Since we know how to simplify MBAs, we go a step further and combine it with Miasm’s symbolic execution engine. For this, we rely on the script symbolic_simplification.py
from msynth’s repository as an example. Provided with a binary, an address and a path to the pre-computed database, the script symbolically executes a single basic block at the given address and simplifies all derived expressions. To use it, we have to call it as follows:
python symbolic_simplification.py <binary> <address> <path to database>
Let us try this on an artificial MBA challenge which is inspired by a commercial protector. In particular, let us simplify a function-level MBA which returns the result in rax
and uses rdi
, rsi
and rdx
as inputs:
00001200 89f1 mov ecx, esi
00001202 f7d1 not ecx
00001204 21d1 and ecx, edx
00001206 01f1 add ecx, esi
00001208 4189c8 mov r8d, ecx
0000120b 21ce and esi, ecx
0000120d 41f7d0 not r8d
00001210 8d043e lea eax, [rsi+rdi]
00001213 4121d0 and r8d, edx
00001216 f7d2 not edx
00001218 4101c8 add r8d, ecx
0000121b 4421c2 and edx, r8d
0000121e 4409c1 or ecx, r8d
00001221 01d0 add eax, edx
00001223 29c8 sub eax, ecx
00001225 c3 retn {__return_addr}
Let’s also have a look at its decompilation:
uint64_t mba3(int32_t arg1, int32_t arg2, int32_t arg3) {
int32_t rcx_2 = (not.d(arg2) & arg3) + arg2
int32_t r8_3 = (not.d(rcx_2) & arg3) + rcx_2
return zx.q((arg2 & rcx_2) + arg1 + (not.d(arg3) & r8_3) - (rcx_2 | r8_3))
}
From looking at the expression, we don’t get any clue what it computes. However, if we symbolically execute it with Miasm and simplify it with msynth, we get the following output (for the return value in rax
):
before: {(((((((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32]) ^ 0xFFFFFFFF) & RDX[0:32]) + ((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32]) & (RDX[0:32] ^ 0xFFFFFFFF)) + -(((((((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32]) ^ 0xFFFFFFFF) & RDX[0:32]) + ((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32]) | (((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32])) + (RDI + {(((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32]) & RSI[0:32] 0 32, 0x0 32 64})[0:32] 0 32, 0x0 32 64}
[snip]
simplified: {-RDX[0:32] + RDI[0:32] + RSI[0:32] 0 32, 0x0 32 64}
We clearly see that the the simplified code performs an addition of rdi
and rsi
and subtracts rdx
. However, these operations are only performed on the register’s lower 32-bit (denoted by [0:32]
); thus, the final result is extended (with zeros) to 64-bits. If we want to clean up the decompiler output, we could rewrite it as follows:
uint64_t mba3(int32_t arg1, int32_t arg2, int32_t arg3) {
return zx.q(arg1 + arg2 - arg3)
}
Leaving the artificial examples behind, let us now use msynth to support us in the process of malware analysis. Often, malware is obfuscated with frameworks such as Obfuscator-LLVM, which commonly deploy MBAs. For example, the FinSpy sample—which we already used in a prior blog post—uses MBAs to to hide data movements based on the conditional move instruction cmov
. If we can simplify the MBA, we know whether the data will be conditionally moved or not.
To deobfuscate MBAs within a basic block, we call the script as we did in the example before:
$ python symbolic_simplification.py samples/finspy 0x4035af oracle.pickle
After we symbolically executed the basic block at 0x4035af
and simplified it with msynth, we get the following output:
before: (((@32[@64[0x614F40]][31:32]?({@32[@64[0x614F40]] 0 32, 0xFFFFFFFF 32 64},{@32[@64[0x614F40]] 0 32, 0x0 32 64})) * ((@32[@64[0x614F40]] + 0xFFFFFFFF)[31:32]?({@32[@64[0x614F40]] + 0xFFFFFFFF 0 32, 0xFFFFFFFF 32 64},{@32[@64[0x614F40]] + 0xFFFFFFFF 0 32, 0x0 32 64})))[0:32] & 0x1)?({(((({((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) & (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} ^ 0x1) | 0x1) ^ 0x1) & 0x1) | {((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) & (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} 0 8, 0xFFFFFF 8 64},{({((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) & (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} ^ 0x1) | {((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) & (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} 0 8, 0xFFFFFF 8 64})
[snip]
simplified: 0xFFFFFF01
While the original symbolic expression is long and complex, msynth simplifies it to the constant 0xFFFFFF01
. In another example, we can simplify a similarly complex expression to the constant 0x0
:
before: ((((@32[@64[0x614F40]][31:32]?({@32[@64[0x614F40]] 0 32, 0xFFFFFFFF 32 64},{@32[@64[0x614F40]] 0 32, 0x0 32 64})) * ((@32[@64[0x614F40]] + 0xFFFFFFFF)[31:32]?({@32[@64[0x614F40]] + 0xFFFFFFFF 0 32, 0xFFFFFFFF 32 64},{@32[@64[0x614F40]] + 0xFFFFFFFF 0 32, 0x0 32 64})))[0:32] & 0x1)?(((((({((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) & (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} ^ 0x1) | 0x1) ^ 0x1) & 0x1) | {((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) & (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8}) & 0x1,(({((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) & (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8} ^ 0x1) | {((@32[@64[0x614F90]] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)) & (@32[@64[0x614F90]] ^ 0xA))[31:32] ^ (@32[@64[0x614F90]] + 0xFFFFFFF6)[31:32] 0 1, 0x0 1 8}) & 0x1))?(0x0,0x1)
[snip]
simplified: 0x0
Sometimes we have to analyze code which is not necessarily obfuscated but so complex, for example due to aggressive compiler optimizations, that we don’t have any clue of what is going on. In these cases, msynth can also support us in our analysis by removing at least some complexity in the expressions.
As an example, let us use the Windows kernel ntoskrnl.exe
which we already used in a previous blog post to detect PatchGuard. If we take a closer look at PatchGuard’s initialization function, we find a repetitive pattern that is rather complex. This pattern might be guarded by MBAs, but it might also be just a piece of weird code whereof we currently can’t determine the functionality, given our current level of knowledge. However, even in this case, msynth allows us to massively simplify the expression:
before: {((({(({RDI 0 64, 0x0 64 128} * {((RAX | (RDX << 0x20)) >>> 0x3) ^ (RAX | (RDX << 0x20)) 0 64, 0x0 64 128})[0:32] ^ ({RDI 0 64, 0x0 64 128} * {((RAX | (RDX << 0x20)) >>> 0x3) ^ (RAX | (RDX << 0x20)) 0 64, 0x0 64 128})[64:96]) >> 0xF 0 32, 0x0 32 64} * 0x4EC4EC4F)[32:64] >> 0x3)[31:32]?({({(({RDI 0 64, 0x0 64 128} * {((RAX | (RDX << 0x20)) >>> 0x3) ^ (RAX | (RDX << 0x20)) 0 64, 0x0 64 128})[0:32] ^ ({RDI 0 64, 0x0 64 128} * {((RAX | (RDX << 0x20)) >>> 0x3) ^ (RAX | (RDX << 0x20)) 0 64, 0x0 64 128})[64:96]) >> 0xF 0 32, 0x0 32 64} * 0x4EC4EC4F)[32:64] >> 0x3 0 32, 0xFFFFFFFF 32 64} * 0x1A,{({(({RDI 0 64, 0x0 64 128} * {((RAX | (RDX << 0x20)) >>> 0x3) ^ (RAX | (RDX << 0x20)) 0 64, 0x0 64 128})[0:32] ^ ({RDI 0 64, 0x0 64 128} * {((RAX | (RDX << 0x20)) >>> 0x3) ^ (RAX | (RDX << 0x20)) 0 64, 0x0 64 128})[64:96]) >> 0xF 0 32, 0x0 32 64} * 0x4EC4EC4F)[32:64] >> 0x3 0 32, 0x0 32 64} * 0x1A))[0:32] 0 32, 0x0 32 64}
[snip]
simplified: {(({(({RDI 0 64, 0x0 64 128} * {((RAX | (RDX << 0x20)) >>> 0x3) ^ (RAX | (RDX << 0x20)) 0 64, 0x0 64 128})[0:32] ^ ({RDI 0 64, 0x0 64 128} * {((RAX | (RDX << 0x20)) >>> 0x3) ^ (RAX | (RDX << 0x20)) 0 64, 0x0 64 128})[64:96]) >> 0xF 0 32, 0x0 32 64} * 0x4EC4EC4F)[32:64] >> 0x3) * 0x1A 0 32, 0x0 32 64}
Although the simplified expression is still complex and doesn’t tell us much without further analysis, the expression is significantly better to read due to its eliminated subexpressions.
Overall, the aforementioned scenarios are only example usages for msynth. In practice, you may stumble upon a wide variety of MBAs or other complex expressions of which you suspect shorter representations to exist. Try it on your own; msynth does not solve all of your problems, but it may be a useful tool that supports your analysis.
I wrote msynth with the goal to create a stable and usable MBA deobfuscation framework that supports me in my day-to-day reverse engineering tasks; it has been used in audits of various obfuscation schemes as well as in the analysis of some malware samples; furthermore, my students use it regularly in my code deobfuscation training to break sophisticated protection schemes such as virtual machines with MBA-obfuscated handlers. msynth isn’t perfect (especially in terms of performance for very large expressions), but it will improve over time. Furthermore, it is a playground for me (and hopefully others) to test and develop novel deobfuscation techniques which will be integrated over time. So, have fun with it and stay tuned!
For questions, feel free to reach out via Twitter @mr_phrazer, mail tim@blazytko.to or various other channels.