The Windows API is a large, complex topic with decades of development history and design behind it. Although it is far too vast to cover in a single article, even a cursory knowledge is enough to improve your event analysis and your basic malware analysis skills. Understanding how Windows works can help defenders to better understand and defend against threats, know where attackers might be hiding, and identify improvements to limit attackers’ abilities.
This Windows technical deep dive will provide an overview of what the Windows API is, how and why executables use the API, and how to apply that knowledge to improve your defenses.
The Windows API: What Is It?
Simply, the Windows API is a set of standard libraries developers can use to interact with the Windows Operating System. These libraries are made up of functions, which are exposed (or exported) in various dynamic-link libraries (DLLs) throughout the Windows Operating System.
There are many advantages to providing an API like the Windows API, not least of which is abstraction. Abstraction is a common concept in Computer Science and programming, and it’s simple yet powerful. According to John V. Guttag, “The essence of abstractions is preserving information that is relevant in a given context, and forgetting information that is irrelevant in that context.”
A perfect example of abstraction is the OSI networking model, in which each layer of the model does not concern itself with the workings or content handled by the other layers. The Internet Protocol functions identically whether it is being transported via WiFi radio signals or Ethernet cables, and TCP functions identically whether carried by IPv4 or IPv6.
Another advantage of the Windows API is the standardization it provides. Because Microsoft has made the Windows API so robust, developers have immediate access to a huge number of functions, but they all have access to the same functions. This allows someone to design an application on one computer and have assurances it will work on another system. Through backwards compatibility there can even be an expectation of an application working on different versions of the operating system (in theory…).
Windows Architecture: The Basics
Not all DLLs are created equally. In addition to the standard DLLs which comprise the Windows API there is a set of DLLs which make up the Native API. The Native API functions in the same fashion as the Windows API, but there are a few key differences. The Native API is designed to support a smaller set of basic functions before the Windows API is loaded. Therefore, any Windows components loaded before the Windows API are designed to use the Native API, including the Windows API itself. The Native API is exposed by
ntdll.dll, but most functions are undocumented, so native applications are usually only developed by Microsoft.
Ntdll.dll interfaces with
ntoskrnl.exe, where most of the Native API functionality is actually implemented.
The Windows kernel is implemented in
ntoskrnl.exe, but both
Win32k.sys provide interfaces between user-mode and kernel-mode.
Win32k.sys is a driver which contains kernel-level graphics functionality. Below the Windows kernel lies the Hardware Abstraction Layer (HAL), which is implemented in hal.dll. The HAL ensures that Windows will run regardless of the underlying hardware (notice “Abstraction” right there in the name).
For a more detailed description of the Windows architecture, refer to Windows Internals Part 1, 7th edition.
PEs, DLLs, Importing, and Exporting
One of the most commonly encountered file types on a Windows system is the EXE file. The structure of these files is known as the Portable Executable (PE) format. The PE format is actually used for many different file extensions, (CPL, OCX, and SYS to name a few) but the main focus of this article is the DLL file extension. While they share a file structure, EXE and DLL files typically differ slightly from each other. Besides a flag indicating the file is a DLL file EXE files have very few exported functions while DLL files have many. The difference between importing and exporting is fairly straightforward, but understanding the difference sheds light on why EXEs differ from DLLs.
Importing a function is essentially asking Windows for the ability to use a specific function. Exporting a function allows other files access to the capabilities of the exported function, assuming they are able to locate the exported function. For example, a file called
executable.exe may want to initialize a Windows socket via the
WSAStartup function in the
ws2_32.dll library. This file wishing to use this function must associate the function
WSAStartup with itself through a process called linking. There are multiple ways to link functions to a file; static, dynamic, and runtime.
executable.exe uses static linking, the actual code for
WSAStartup would be copied from
executable.exe when it is compiled, adding the functionality of
WSAStartup directly to itself. With dynamic linking,
executable.exe adds information describing all the functions it needs the operating system to provide,
WSAStartup in this example, to the PE header. This information is called the Import Table and it has interesting applications in malware analysis, but more on that later.
executable.exe also has the option of runtime linking, in which functions or libraries are only loaded when they are needed. This is accomplished by utilizing two additional functions;
LoadLibrary is called, specifying the library name (
ws2_32.dll) to be loaded. Once loaded,
LoadLibrary returns a handle to
ws2_32.dll, which is a reference to the loaded library. This handle, along with the specific function (
WSAStartup) can then be passed to
GetProcAddress, which returns the memory address for the specified function. Because the functions to be used via runtime linking are specified during execution, they are not located in one, easy to read place. This can be leveraged by malware authors to hide what functions they are importing, thus hiding some of the capabilities of the malware.
All functions that can be imported must be defined somewhere; in the files that define these functions that are exported. Exporting functions is simply a way to make the functionality of those functions available for other files to use. At a fundamental level, the Windows API is a large collection of files exporting a larger number of functions.
How Attackers and Defenders Use This Knowledge
Now that we have covered the details of the Windows API as well as how and why executables use the API, we can get into some ways to apply that knowledge. Savvy attackers can use this understanding to make their malware stealthier or otherwise obfuscate its operation. Conversely, defenders have several ways to enhance their analysis and detection capabilities with this knowledge.
One of the most basic ways to use import is to get an at-a-glance view of a file’s potential capabilities. Since any executable that wants to use built-in Windows libraries needs to import the relevant libraries, this is an effective technique, but there are some other interesting ways to leverage this information.
The Import Table
As previously mentioned, the Import Table contains information about the external files and functions a particular PE file requires to execute. Examining this data yields information about the file’s capabilities; for example a file that imports functions from
advapi32.dll can likely make changes to the registry, and looking at the specific functions would provide even more information.
Import Table Hashing
Import Tables are generated in a repeatable way, which just means that if identical source code is compiled by the same compiler, the same Import Table will be generated. By calculating a hash value only for the data in the Import Table, files of similar code structure can be identified, even if they aren’t identical elsewhere. This special hash value is called an imphash (short for import hash). For example, if the same code was compiled twice using the same compiler, first at 13:00 UTC and again at 14:00 UTC, the resulting files would have different MD5s because the PE compile timestamp would have changed, but the files would have identical imphashes. Another example would be if attackers changed their C2 domains or IP addresses and recompiled their backdoor. Again the MD5 would change, and the malware would now communicate with new C2 infrastructure, but the imphash would still be the same.
This technique is nothing new. Even VirusTotal calculates imphash values, but it is not as widely used as hashes of complete file contents. In early 2014, Mandiant released a public blog entry and added code to the pefile module on GitHub, enabling users to calculate their own imphashes.
DLL Search Order Hijacking
One interesting way for attackers to subvert the import process is search order hijacking. As I alluded to earlier, when a file is executed, the functions listed in its Import Table must first be located in order to then be loaded into memory. This operation is performed by the operating system, and Windows always looks for the required files in the same places and in the same order. If attackers are able to place a malicious DLL in a directory which gets searched before the directory containing the legitimate DLL, they can have their code loaded by a legitimate application.
For example (I like examples), if a file called
legitimate.exe is located on the desktop and tries to load
ws2_32.dll in order to use the
WSAStartup function, it first checks the directory in which
legitimate.exe was loaded (the desktop), then it looks in the system directory (typically
C:\Windows\System32), which is where the legitimate
ws2_32.dll is located, and proceeds to load the legitimate DLL. If an attacker places a malicious file named
ws2_32.dll on the desktop, the next time
legitimate.exe is launched from the desktop the malicious DLL would be loaded, since it would be found before the legitimate library.
DLL Search Order Hijacking can be mitigated in a few ways. First, ensure that
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SafeDllSearchMode exists and is set to 1 (this is enabled by default in Windows XP SP2 and later). Enabling this setting moves the current directory lower in the search order, thus moving the system directories higher in the search order. Additionally, consider adding critical DLLs to the list of known DLLs here
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\KnownDLLs. The DLLs listed in this registry entry are loaded directly from the location specified and are not searched via the standard mechanism. Additional information on DLL security can be found in this MSDN article.
Imports are only half of the story, because DLL and EXE files both share the PE file format, they also both can have exports.
Exporting Functions and DLL Main
A file’s exports are defined in the unsurprisingly named Export Table and, equally unsurprising, there are multiple ways to export functions. The main way to export functions is by name, which is also the most convenient way to import functions (the same reason domain names are easier to remember than IP addresses). The other way to export functions is by ordinal, which can be used to hide the functionality of the exported functions but must be done intentionally since it requires additional steps. There is an optional, special export called the entry-point which, if defined, always gets called whenever the library it belongs to is loaded or unloaded.
Because DLLs are typically libraries of many functions designed to lighten the load of other executable files, most DLLs have a large number of exported functions. Conversely, EXE files should have few to no exported functions. The number of imported functions is less reliable. DLLs may rely heavily on other libraries and have a large number of imports, or be relatively self-contained and have a few imports; the same goes for EXE files. Looking for EXE or DLL files which do not follow these conventions can uncover files attempting to masquerade as another file type.
While this post covers a few topics in-depth, I’ve barely scratched the surface in the greater scheme of all things Windows. In this day and age, it is easy to find the “what” on a topic, but the “why” and “how” are also important. Analysts with this kind of in-depth understanding can develop new and interesting techniques for detecting today’s threats. Restating a previous example, understanding that PE files should not have a lot of exports allows for the detection of files that do not conform to the norm. I hope you can take what you have learned here and use it to develop your own new and interesting ways of finding evil.
Related Article: Windows Registry Attacks: Knowledge Is the Best Defense
If you would like to learn more about anything related to Windows, refer to the Microsoft Developer Network (MSDN). The articles referenced in the writing of this post are listed below. For a comprehensive under-the-hood look at Windows, check out the Windows Internals series, Part 1 of the 7th edition was released this past May. “Practical Malware Analysis” is another great resource, and provides a different perspective since it focuses on analyzing software written by others, rather than writing your own. It also contains information on how attackers leverage legitimate Windows features in their malware as well as a handy reference of commonly used API functions.
Selected MSDN Articles:
- Dynamic-Link Libraries
- API Index
- Dynamic-Link Library Search Order
- DllMain entry point
- Exporting from a DLL
- Tracking Malware with Import Hashing (FireEye Blog)
- Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software By Michael Sikorski and Andrew Honig
- Windows Internals Part 1: System architecture, processes, threads, memory management, and more, 7th Edition By Pavel Yosifovich, Alex Ionescu, Mark E. Russinovich, and David A. Solomon