During my exploration of malware analysis, I’ve found that malware developers use many tricks, aiming at making it harder to analyze for us. Malware obfuscation techniques are one of the main tools used by these developers, I can confidently say that you will find at least three or four of them in each of your analyses.
Their main goal, as you can imagine, is to make the malware’s code very difficult to understand. In this discussion, I want to share what I’ve learned along this tough journey.
But before we get into the details, let’s first explain what code obfuscation means.
Code obfuscation is a technique that consists of changing clear code into a very complex and hard-to-understand form. Even though the changed code does the same thing, it’s much harder to understand. Such a technique isn’t always bad, because there are some cases where software makers use it to protect their work in regular software projects. But the only certainty we have is that among the harmful software, obfuscation is used very frequently and with the only goal of making the analysis harder or becoming undetected from security solutions.
- Evade detection by security software
- Complicate the analysis process for cybersecurity experts.
In this article, we’ll explore how malware obfuscation works behind the scenes. We are going to discuss an essential topic that will show us how these techniques in practice what actions you can take upon encountering obfuscated malware, and what are clues to spot them. During this reading, always keep in mind that understanding obfuscation is critical for anyone in cybersecurity, especially if your work/passion is related to reverse engineering or malware analysis.
Cybercriminals employ these strategies to slip past defenses, but you, as a cybersecurity expert, need to decode and tackle these methods like in a cat-and-mouse game. This article provides a clear look into malware obfuscation, even though it’s only a high-level overview. From my side, I’ll try to make the whole article as clear as possible even for beginners.
So let’s start with the interesting part!
Summary of Common Malware Obfuscation Techniques
- Encoding: Consists in the usage of encoding, like Base64, to transform malicious code into a different format, making it harder for security tools and analysts to recognize the threat.
- Encryption: This technique transforms data into a secure format, readable only with a decryption key. I prefer to classify In this categories the malware that only encrypts strings, system API, or shellcode
- Packing: Malware is compressed or encrypted (it does that on one or more sections) and combined with a stub that unpacks or decrypts it when executed. This process occurs in memory, evading static analysis tools that examine the code before execution.
- API Hashing: It’s the usage of cryptographic hash functions (they can also be custom) to obfuscate API calls within their code so that at first glance is impossible to have an idea of its behavior.
- Dead Code: Inserting irrelevant or never-used code into malware with the goal of confusing analysts and automated tools.
- Fooling the disassembler: This category includes everything that aims to trick the disassembling algorithms by strategically inserting bytes or instructions to deceive the disassembler’s parser, rendering the code unintelligible.
- Polymorphism: It alters the malware’s code or behavior to avoid detection so that it will change its signature.
- Metamorphism: Metamorphic malware completely rewrites its code in each iteration, drastically changing its appearance to evade detection
In general, a combination of several common malware obfuscation techniques is used to disguise malicious activities, including the one listed above. The bad guys’ creativity has no limits! For example, they can use encryption in a way that can convert parts of the malware into an unreadable format that’s only decrypted at execution time.
Polymorphism and metamorphism advance this concept by changing the malware’s code in every iteration so that the signature-based detection systems can hardly spot them.
Due to their complexity and effectiveness, I’m considering writing much more about them in a separate discussion. In the following sections, we’ll delve deeper into all the methods in our list, so that at the end of the article we’ll be a bit more prepared to face the malware analysis challenge.
In-Depth Analysis of Malware Obfuscation Techniques
Encoding
It’s a technique that evades basic or automated detection by transforming critical parts of malware into a weird and hard-to-understand format. Unlike encryption, encoding doesn’t require a key and is relatively easy to reverse once after detection. Common methods include Base64 or Base32 encoding (even with a custom alphabet).
The attacker might encode the shellcode or dynamically import libraries using functions like
decoding them only when the program is run.
These actions, however, are detectable. An experienced analyst can often identify Base64 encoded patterns in the strings. The task is simplified by tools like PEStudio, which marks such strings as Base64.
Additionally, dynamic imports leave traces in the Import Address Table (IAT), evident in tools like PEStudio, as shown in the screenshot below.
In a way to increase complexity, attackers can use an alphabet different from the standard one, even though the analysts can uncover this without big effort. He could also convert the byte sequences into less suspicious forms, such as a series of IP addresses, to maintain low entropy.
These are just some examples but they can give you an idea of the versatility of encoding in malware obfuscation.
Encryption
Is a common technique that malware developers use to conceal parts of their malware’s code, such as strings or shellcodes. This obfuscation makes it challenging for malware analysts to identify Indicators of Compromise (IOCs) and for tools like strings or floss to analyze the malware effectively.
Malware developers use the encryption in particular to hide:
- File paths
- URLs or IP addresses
- Dynamically imported libraries (as we’ve seen for encoding)
By encrypting these elements, malware creators make it difficult to extract IOCs, complicating the analysis process.
Attackers often use to encrypt shellcodes, another critical component, to evade static analysis security solutions that rely on code signatures. Encryption allows malware to bypass these solutions by making the code appear benign or random.
Encryption Analysis
A key concept in detecting malware is analyzing the entropy which is nothing else than a measure of randomness in a byte sequence.
When we find high entropy we can interpret that as a clue of a possible encryption. To avoid detection, malware developers typically encrypt only critical parts of their code.
However, the Operating System always needs to decrypt the code before its execution. This introduces a challenge for malware creators: how to store or generate the decryption key?
They might use static storage, algorithmic generation, or external retrieval methods. From the analysts’ point of view, dynamic analysis is often the most effective way to counter-decrypt the code. This approach involves intercepting the output of the decryption function. However, malware developers continually experiment with new methods to hinder analysis, such as re-encrypting data after use.
The encryption algorithms can range from complex ones like AES to simple substitution ciphers. Basic ciphers (like XOR) remain in use even though they are so simple because they can be implemented easily, also directly in assembly without any effort. Another advantage is that there is no need to import any suspicious cryptography library. Also, the more sophisticated obfuscation techniques are not foolproof.
As you can see in the example below, we can see suspicious cryptographic imports into the IAT.
While closely related to “Packing,” this obfuscation method I preferred to categorize it as “Encryption” because the malware encrypts only specific strings or bytes of the code, causing just a minimal impact on the entropy. As we’ve seen for the encoding technique, this approach can result in the presence of meaningless strings (if present, because the encryption can result in no-printable chars) in the PEStudio strings section.
Packing
What Is Packing?
Packing is another technique that complicates malware analysis, particularly in static examination when the code has still to be unpacked. Its original goal was compression or encryption but packing is now a common tool for malware creators. One of the consequences of having a packed code is that the Portable Executable (PE) header, has an obfuscated Import Address Table (IAT), and the text section in the file, has a pseudo-random bytes that for the majority are no real instructions.
A packed malware file consists of an unpacking routine and the packed code. The unpacking stub, which replaces the original entry point, decrypts or decompresses the actual malware code and then transfers control to the malware’s original entry point (OEP).
Packing Tools
There are well-known packers like UPX and MPRESS, however, malware developers often customize or create their packers so that the unpacking process becomes harder. The analysis consists of finding the jump instruction to the packed section and setting breakpoints on long jumps that can potentially go to the OEP (Original Entry Point.
I want also to mention Scylla, a tool often used with OllyDBG, that gives essential help in repairing the IAT after the manual unpacking.
Packaging Indicators
Packing, like encryption, significantly impacts the entropy of malware, serving as a strong indicator of its presence. High entropy levels in a packed file suggest that the content is compressed or encrypted, making it less recognizable and more challenging to analyze.
A key hallmark of packing is the discrepancy between:
- The virtual size (the size when loaded into memory)
- The raw size (the actual size on disk) of the malware.
Packed malware typically has a smaller raw size compared to its virtual size due to compression or encryption.
This size difference is a critical signal that analysts use to identify packed malware, prompting a deeper investigation to unpack and analyze the underlying code.
Another indicator of packing is that there are just a few readable strings. We can check them with PEStudio, FLOSS, or our tool!
API Hashing
API-Hashing is a method malware authors use to obfuscate their code. Instead of invoking system functions, which will give clues to security analysts, the malware creates a unique identifier, or hash, for each function. An approach that makes the malware bypass direct imports that will no longer appear on the IAT.
API Hashing Steps
Here’s how the process unfolds when the malware executes:
- Get the Loaded Modules: From the PEB the malware enumerates the loaded modules. Those modules are inside the PEB_LDR_DATA structure.
- Locating the MODULE: Initially, from the retrieved module’s list, the malware identifies the HMODULE. This important step enables the malware to interact directly with the module’s contents and structure.
- Enumerating Modules and Calculating Hashes: The malware enumerates the modules currently loaded in the process memory. For each module, it computes hashes for the module names. This is typically done to stealthily identify specific modules without directly referring to their names so that it can evade detection based on string matching.
- Identify the Target Module: Upon finding a hash match from its array of precomputed hashes, the malware identifies the relevant module. The next step involves computing the correct offset within this module to access the IMAGE_EXPORT_DIRECTORY, which is fundamental for mapping function names to their addresses.
- Function Name Hash Matching: Within the IMAGE_EXPORT_DIRECTORY, there’s an ArrayOfNames, an array listing the names of functions exported by the module. The malware iterates through this array, applying hash calculations to each function name and comparing them against its list of target function hashes.
- Dynamic Function Invocation: Once the malware identifies a function whose hash matches its target, it locates the corresponding function address in the module’s FunctionAddressArray. Once it obtains that address, it can dynamically invoke the function, executing what it was programmed for without using static references in its code.
Involved Structures
Here is the structure of PEB from Microsoft’s official documentation.
typedef struct _PEB {
BYTE Reserved1[2];
BYTE BeingDebugged;
BYTE Reserved2[1];
PVOID Reserved3[2];
PPEB_LDR_DATA Ldr;
PRTL_USER_PROCESS_PARAMETERS ProcessParameters;
PVOID Reserved4[3];
PVOID AtlThunkSListPtr;
PVOID Reserved5;
ULONG Reserved6;
PVOID Reserved7;
ULONG Reserved8;
ULONG AtlThunkSListPtr32;
PVOID Reserved9[45];
BYTE Reserved10[96];
PPS_POST_PROCESS_INIT_ROUTINE PostProcessInitRoutine;
BYTE Reserved11[128];
PVOID Reserved12[1];
ULONG SessionId;
} PEB, *PPEB;
And this is the structure of IMAGE_EXPORT_DIRECTORY, from this documentation.
struct IMAGE_EXPORT_DIRECTORY {
pub Characteristics: u32,
pub TimeDateStamp: u32,
pub MajorVersion: u16,
pub MinorVersion: u16,
pub Name: u32,
pub Base: u32,
pub NumberOfFunctions: u32,
pub NumberOfNames: u32,
pub AddressOfFunctions: u32,
pub AddressOfNames: u32,
pub AddressOfNameOrdinals: u32,
}
This process effectively obfuscates the malware’s activities, complicating the task for analysts who must reverse-engineer the hashes to discern the malware’s operations. Through API-Hashing, malware maintains a low profile, enhancing its ability to evade detection and complicating the analysis process.
Dead Code
It’s another obfuscation strategy thanks to which the malware developers fill their code with irrelevant or non-functional operations. This technique aims to confuse human analysts and automated analysis tools. Attackers can heavily confuse the process of reverse engineering by filling the malware with useless instructions that are not related to the principal flow. This tactic also affects the size of the malware, which can provide clues for our analysis. With practice, you will be more and more able to distinguish and ignore those kind of functions
Fooling the disassembler
A particular set of techniques revolves around the strategic insertion of bytes or instructions designed to deceive disassemblers. These techniques are carefully crafted to exploit the way disassemblers interpret binary code, leading them to produce incorrect or misleading analyses.
One such example, as detailed by SentinelOne, involves manipulating an ELF file’s program header to reference an area outside the actual file. This causes disassemblers like IDA Pro to stumble, preventing the straightforward loading and analysis of the malware. Analysts are forced to correct the file to proceed manually, adding a layer of complexity to their work.
Another technique, outlined in “Practical Malware Analysis”, employs a Jump Instruction with a Constant Condition. By designing a jump that always executes and is followed by deceptive code, the disassembler’s linear interpretation is thrown off, leading to an erroneous disassembly of the subsequent instructions.
These techniques show malware creators’ cleverness in making their code hard to analyze. They use smart tricks to confuse disassemblers, creating big challenges for analysts.
I want to group under this category a set of techniques that implies the strategic change or placement of bytes in the executable file to fool the next disassembler’s output.
These techniques are smartly crafted to exploit the way disassemblers interpret binary code. One such example, as detailed by SentinelOne, consists of manipulating an ELF file’s program header to reference an area that is outside the actual file. This makes disassemblers like IDA Pro fails the parsing so that is impossible the normal loading and consequently the analysis of the malware. Analysts are forced to fix the file’s bytes to proceed with the analysis, adding a layer of complexity to their work.
Another technique, outlined in the marvelous book “Practical Malware Analysis”, employs a Jump Instruction with a Constant Condition. By designing a jump that always executes and is followed by tricky code, the disassembler’s linear interpretation is misled, and the result is an erroneous disassembly of the subsequent instructions. These techniques show malware creators’ creativity and systems’ knowledge.
Polymorphism
Polymorphism is the ability of a malicious program to alter parts of its code or structure whenever it creates its copy. It makes the detection difficult for antivirus programs with the use of signature-based methods.
The concept of polymorphic malware emerged in the early 1990s, with one of the first known examples: a virus called “1260” and written by Mark Washburn.
Polymorphic malware is built with, a mutation engine inside that is responsible for the variations in code. It applies changes to the code without altering its core functionality. The changes can be of many types, such as encrypting different parts of the program or changing the order of instructions while ensuring the outcome remains the same. The mutation engine can be extremely complex and we can find more advanced versions making the malware nearly indistinguishable from one iteration to the next.
This technique easily bypasses signature-based detection and it’s also challenging to spot from heuristic and behavior-based analysis. Due to what we said, cybersecurity professionals must employ more adaptive methods to develop tools capable of detecting and deactivating polymorphic threats.
Metamorphism
This kind of malware is an even more advanced step in the evolution of malicious software. The idea behind this technique is to rewrite its entire codebase whenever it infects a new victim. We saw this technique for the first time in the late 1990s, with some malware like the ZMist (also known as Z0mbie’s Mistfall). Unlike polymorphism, the changes are not only to some parts of the code, but they result in the creation of a new and different version of the malware.
The main actor for the changes is the metamorphic engine, which can accomplish the goal by :
- Changing instruction sequences
- Using different register allocations
- Modifying the logic flow of the malware.
Metamorphic engines have a high complexity level, since even after radical changes, the malware must keep its original behavior. This can involve sophisticated programming techniques like code permutation, garbage code insertion, and using different algorithmic approaches to achieve the same objectives.
This new kind of malware introduced a big challenge to traditional detection methods, teaching us that staying updated is a necessity to fight the advanced techniques that are introduced by malware developers day by day.
Obfuscation vs Evasion
Understanding the distinctions between obfuscation and evasion in malware can give you a deeper understanding of how to plan your tasks as an analyst.
- Obfuscation is the art of making malware’s code and intent hard to understand, using the previously listed methods like encryption, making it unreadable, or encoding, altering the data format. As we have seen there’s also dead code, simple but effective to slow down the analysis. Finally, the more advanced techniques like polymorphism and metamorphism.
- Evasion is a set of techniques related more to interaction with the environment. It’s how malware identifies and reacts to being analyzed or detected. This could involve the malware recognizing it’s running in a virtual machine or a debugger (as we’ve seen in this article) and changing its behavior to appear benign or stopping its operations. Evasion techniques are about understanding and responding to the surroundings to stay undetected, allowing the malware to operate covertly for as long as possible.
The differences in a nutshell are: obfuscation is centered on the internal complexity of the malware, and it’s mainly focused on tricking static analysis, evasion is about external awareness so that the malware can identify potential threats from security solutions and adapt. Evasion it’s more oriented toward hiding from dynamic analysis.
Tools and Strategies for Analyzing Obfuscated Malware
When facing obfuscated malware, analysts use various tools and strategies to reverse and understand the underlying code or intent. There are common software in the analyst toolbox like PEiD and Detect It Easy which are a valid help for initial examinations because they can point out the indicators for spotting when malware is likely obfuscated. In particular, Pestudio is useful for finding the main indicators of packing because it provides insights into the various sections, and highlighting the related parts within the executable.
Tools
Debugging tools are essential during a detailed dynamic analysis.
They can allow analysts to step through the malware’s execution, observing its behavior in real-time by stopping at the right time. This hands-on approach is critical for understanding how the obfuscation is implemented. In almost all cases, with a breakpoint at the right place we can see the deobfuscated part because it’s the only way the process can use it.
For a deeper dive into the code, disassemblers and decompilers like Ghidra and IDA Pro come into play. These powerful tools can convert binary code back into a more human-readable format, enabling analysts to examine the structure and flow of the malware’s code, even if it has been obfuscated.
When we talk about static analysis, disassemblers and decompilers like Ghidra and IDA Pro come into play. These tools’ goal is to convert binary code into a more human-readable format (the assembly language). A good and complete disassembler usually provides us also many facilities like flowcharts, syntax highlighting, links, etc.
I cannot avoid mentioning Process Hacker which provides a dynamic view of the system’s processes. This can reveal some activities that can help us identify where to focus our efforts to understand how the malware tries to conceal its presence.
Strategies
The strategies that we can put up are several, for example, we can apply heuristic and behavior-based analysis strategies. We can also use sandbox environments to observe how the malware operates in a controlled setting. Analysts can gradually deconstruct the malware’s obfuscation techniques by combining these tools and approaches.
In addition to the tools listed in the previous paragraphs, an analyst may rely on network monitoring solutions to reconstruct the malware’s communication patterns. Another promising strategy lately that has often been exploited with success is the integration of artificial intelligence that simplifies the detection of sophisticated obfuscated threats by identifying patterns that could escape a human.
Conclusion: The Critical Nature of Malware Obfuscation Expertise
Our journey through malware obfuscation techniques showed how the new cyber threats are complex, and how being in the cybersecurity field requires you to be so curious to never get tired of learning new concepts. It can sound too hard. But I’m sure that with the right guide, you can find the entire process interesting and funny.
That’s the reason why I invite you to stay connected with us, in particular, to follow our blog and social media profiles. I’ll do my best to provide you with the best and most updated content about the latest trends, threats, and defense mechanisms.
I’ll put all my effort into making the content precise, technical, and practical but accessible at the same time. For any questions or suggestions, feel free to contact me on the contacts page, and I’ll be happy to answer or receive your feedback.
Stay tuned and see you at the next post!