Lab 7-3

For this lab, we obtained the malicious executable, Lab07-03.exe, and DLL, Lab07-03.dll, prior to executing. This is important to note because the malware might change once it runs. Both files were found in the same directory on the victim machine. If you run the program, you should ensure that both files are in the same directory on the analysis machine. A visible IP string beginning with 127 (a loopback address) connects to the local machine. (In the real version of this malware, this address connects to a remote machine, but we’ve set it to connect to localhost to protect you.)

This lab may cause considerable damage to your computer and may be difficult to remove once installed. Do not run this file without a virtual machine with a snapshot taken prior to execution.

This lab may be a bit more challenging than previous ones. You’ll need to use a combination of static and dynamic methods, and focus on the big picture in order to avoid getting bogged down by the details.

Questions and Short Answers

  1. How does this program achieve persistence to ensure that it continues running when the computer is restarted?

    A: This program achieves persistence by writing a DLL to C:\Windows\System32 and modifying every .exe file on the system to import that DLL.

  2. What are two good host-based signatures for this malware?

    A: The program is hard-coded to use the filename kerne132.dll, which makes a good signature. (Note the use of the number 1 instead of the letter l.) The program uses a hard-coded mutex named SADFHUHF.

  3. What is the purpose of this program?

    A: The purpose of this program is to create a difficult-to-remove backdoor that connects to a remote host. The backdoor has two commands: one to execute a command and one to sleep.

  4. How could you remove this malware once it is installed?

    A: This program is very hard to remove because it infects every .exe file on the system. It’s probably best in this case to restore from backups. If restoring from backups is particularly difficult, you could leave the malicious kerne132.dll file and modify it to remove the malicious content. Alternatively, you could copy kernel32.dll and name it kerne132.dll, or write a program to undo all changes to the PE files.

Detailed Analysis

First, we’ll look at Lab07-03.exe using basic static analysis techniques. When we run Strings on the executable, we get the usual invalid strings and the imported functions. We also get days of the week, months of the year, and other strings that are part of the library code, not part of the malicious executable.

The following listing shows that the code has several interesting strings.

The string kerne132.dll is clearly designed to look like kernel32.dll but replaces the l with a 1.

NOTE

For the remainder of this section, the imposter kerne132.dll will be in bold to make it easier to differentiate from kernel32.dll.

The string Lab07-03.dll tells us that the .exe may access the DLL for this lab in some way. The string WARNING_THIS_WILL_DESTROY_YOUR_MACHINE is interesting, but it’s actually an artifact of the modifications made to this malware for this book. Normal malware would not contain this string, and we’ll see more about its usage in the malware later.

Next, we examine the imports for Lab07-03.exe. The most interesting of these are as follows:

The imports CreateFileA, CreateFileMappingA, and MapViewOfFile tell us that this program probably opens a file and maps it into memory. The FindFirstFileA and FindNextFileA combination tells us that the program probably searches directories and uses CopyFileA to copy files that it finds. The fact that the program does not import Lab07-03.dll (or use any of the functions from the DLL), LoadLibrary, or GetProcAddress suggests that it probably doesn’t load that DLL at runtime. This behavior is suspect and something we need to examine as part of our analysis.

Next, we check the DLL for any interesting strings and imports and find a few strings worth investigating, as follows:

The most interesting string is an IP address, 127.26.152.13, that the malware might connect to. (You can set up your network-based sensors to look for activity to this address.) We also see the strings hello, sleep, and exec, which we should examine when we open the program in IDA Pro.

Next, we check the imports for Lab07-03.dll. We see that the imports from ws2_32.dll contain all the functions necessary to send and receive data over a network. Also of note is the CreateProcess function, which tells us that this program may create another process.

IDA 查询 Lab07-03.dll 的 Import 的结果:

We also check the exports for Lab07-03.dll and see, oddly, that it has none. Without any exports, it can’t be imported by another program, though a program could still call LoadLibrary on a DLL with no exports. We’ll keep this in mind when we look more closely at the DLL.

We next try basic dynamic analysis. When we run the executable, it exits quickly without much noticeable activity. (We could try to run the DLL using rundll32, but because the DLL has no exports, that won’t work.) Unfortunately, basic dynamic analysis doesn’t tell us much.

注:关于 .dll 文件使用 rundll32.exe 跑的方法可参考此帖对应处。这里我输入的命令是:

C:\>rundll32.exe Lab07-03.dll

注:在 Process Explorer 中没有看到任何可以行为。

The next step is to perform analysis using IDA Pro. Whether you start with the DLL or EXE is a matter of preference. We’ll start with the DLL because it’s simpler than the EXE.

Analyzing the DLL

When looking at the DLL in IDA Pro, we see no exports, but we do see an entry point. We should navigate to DLLMain, which is automatically labeled by IDA Pro. Unlike the prior two labs, the DLL has a lot of code, and it would take a really long time to go through each instruction. Instead, we use a simple trick and look only at call instructions, ignoring all other instructions. This can help you get a quick view of the DLL’s functionality. Let’s see what the code would look like with only the relevant call instructions.

下面再附一张 Text search(slow!)窗口勾选 Identifier 的截图:

注:直接鼠标左键双击对于的函数我们就能跟进分析相应的反汇编(disassembly)代码了。

The first call is to the library function __alloca_probe to allocate stack on the space. All we can tell here is that this function uses a large stack. Following this are calls to OpenMutexA and CreateMutexA, which, like the malware in Lab 7-1, are here to ensure that only one copy of the malware is running at one time.

The other listed functions are needed to establish a connection with a remote socket, and to transmit and receive data. This function ends with calls to Sleep and CreateProcessA. At this point, we don’t know what data is sent or received, or which process is being created, but we can guess at what this DLL does. The best explanation for a function that sends and receives data and creates processes is that it is designed to receive commands from a remote machine.

Now that we know what this function is doing, we need to see what data is being sent and received. First, we check the destination address of the connection. A few lines before the connect call, we see a call to inet_addr with the fixed IP address of 127.26.152.13. We also see that the port argument is 0x50, which is port 80, the port normally used for web traffic.

But what data is being communicated? The call to send is shown in the following listing.

As you can see at \({\color{red}1}\), the buf argument stores the data to be sent over the network, and IDA Pro recognizes that the pointer to buf represents the string "hello" and labels it as such. This appears to be a greeting that the victim machine sends to let the server know that it’s ready for a command.

Next, we can see what data the program is expecting in response, as follows:

If we go to the call to recv \({\color{red}1}\), we see that the buffer on the stack has been labeled by IDA Pro at \({\color{red}2}\). Notice that the instruction that first accesses buf is an lea instruction at \({\color{red}3}\). The instruction doesn’t dereference the value stored at that location, but instead only obtains a pointer to that location. The call to recv will store the incoming network traffic on the stack.

Now we must determine what the program is doing with the response. We see the buffer value checked a few lines later at \({\color{red}1}\), as shown in the following listing.

The buffer accessed at \({\color{red}1}\) is the same as the one from the previous listing, even though the offset from ESP is different (esp+1208+buf in one and esp+120C+buf in the other). The difference is due to the fact that the size of the stack has changed. IDA Pro labels both buf to make it easy to tell that they’re the same value.

This code calls strncmp at \({\color{red}2}\), and it checks to see if the first five characters are the string sleep. Then, immediately after the function call, it checks to see if the return value is 0 at \({\color{red}3}\); if so, it calls the Sleep function to sleep for 60 seconds. This tells us that if the remote server sends the command sleep, the program will call the Sleep function.

We see the buffer accessed again a few instructions later, as follows:

This time, we see that the code is checking to see if the buffer begins with exec. If so, the strncmp function will return 0, as shown at \({\color{red}1}\), and the code will fall through the jnz instruction at \({\color{red}2}\) and call the CreateProcessA function.

There are a lot of parameters to the CreateProcessA function shown at \({\color{red}3}\), but the most interesting is the CommandLine parameter at \({\color{red}4}\), which tells us the process that will be created. The listing suggests that the string in CommandLine was stored on the stack somewhere earlier in code, and we need to determine where. We search backward in our code to find CommandLine by placing the cursor on the CommandLine operator to highlight all instances within this function where the CommandLine value is accessed. Unfortunately, when you look through the whole function, you’ll see that the CommandLine pointer does not seem to be accessed or set elsewhere in the function.

At this point, we’re stuck. We see that CreateProcessA is called and that the program to be run is stored in CommandLine, but we don’t see CommandLine written anywhere. CommandLine must be written prior to being used as a parameter to CreateProcessA, so we still have some work to do.

This is a tricky case where IDA Pro’s automatic labeling has actually made it more difficult to identify where CommandLine was written. The IDA Pro function information shown in the following listing tells us that CommandLine corresponds to the value of 0x0FFB at \({\color{red}2}​\).

Remember our receive buffer started at 0x1000 \({\color{red}1}\), and that this value is set using the lea instruction, which tells us that the data itself is stored on the stack, and is not just a pointer to the data. Also, the fact that 0x0FFB is 5 bytes into our receive buffer tells us that the command to be executed is whatever is stored 5 bytes into our receive buffer. In this case, that means that the data received from the remote server would be exec FullPathOfProgramToRun. When the malware receives the exec FullPathOfProgramToRun command string from the remote server, it will call CreateProcessA with FullPathOfProgramToRun.

This brings us to the end of this function and DLL. We now know that this DLL implements backdoor functionality that allows the attacker to launch an executable on the system by sending a response to a packet on port 80. There’s still the mystery of why this DLL has no exported functions and how this DLL is run, and the content of the DLL offers no explanations, so we’ll need to defer those questions until later.

Analyzing the EXE

Next, we navigate to the main method in the executable. One of the first things we see is a check for the command-line arguments, as shown in the following listing.

The first comparison at \({\color{red}1}\) checks to see if the argument count is 2. If the argument count is not 2, the code jumps at \({\color{red}2}\) to another section of code, which prematurely exits. (This is what happened when we tried to perform dynamic analysis and the program ended quickly.) The program then moves argv[1] into EAX at \({\color{red}3}\) and the "WARNING_THIS_WILL_DESTROY_YOUR_MACHINE" string into ESI. The loop between \({\color{red}4}\) and \({\color{red}5}\) compares the values stored in ESI and EAX. If they are not the same, the program jumps to a location that will return from this function without doing anything else.

We’ve learned that this program exits immediately unless the correct parameters are specified on the command line. The correct usage of this program is as follows:

Lab07-03.exe WARNING_THIS_WILL_DESTROY_YOUR_MACHINE

NOTE

Malware that has different behavior or requires command-line arguments is realistic, although this message is not. The arguments required by malware will normally be more cryptic. We chose to use this argument to ensure that you won’t accidentally run this on an important machine, because it can damage your computer and is difficult to remove.

At this point, we could go back and redo our basic dynamic analysis and enter the correct parameters to get the program to execute more of its code, but to keep the momentum going, we’ll continue with the static analysis. If we get stuck, we can perform basic dynamic analysis.

Continuing in IDA Pro, we see calls to CreateFile, CreateFileMapping, and MapViewOfFile where it opens kernel32.dll and our DLL Lab07-03.dll. Looking through this function, we see a lot of complicated reads and writes to memory. We could carefully analyze every instruction, but that would take too long, so let’s try looking at the function calls first.

We see two other function calls: sub_401040 and sub_401070. Each of these functions is relatively short, and neither calls any other function. The functions are comparing memory, calculating offsets, or writing to memory. Because we’re not trying to determine every last operation of the program, we can skip the tedious memory-operation functions. (Analyzing time-consuming functions like these is a common trap and should be avoided unless absolutely necessary.) We also see a lot of arithmetic, as well as memory movement and comparisons in this function, probably within the two open files (kernel32.dll and Lab07-03.dll). The program is reading and writing the two open files. We could painstakingly track every instruction to see what changes are being made, but it’s much easier to skip over that for now and use dynamic analysis to observe how the files are accessed and modified.

Scrolling down in IDA Pro, we see more interesting code that calls Windows API functions. First, it calls CloseHandle on the two open files, so we know that the malware is finished editing those files. Then it calls CopyFile, which copies Lab07-03.dll and places it in C:\Windows\System32\kerne132.dll, which is clearly meant to look like kernel32.dll. We can guess that kerne132.dll will be used to run in place of kernel32.dll, but at this point, we don’t know how kerne132.dll will be loaded.

The calls to CloseHandle and CopyFile tell us that this portion of code is complete, and the next section of code probably performs a separate logical task. We continue to look through the main method, and near the end, we see another function call that takes the string argument C:\\*, as follows:

Unlike the other functions called by main, sub_4011E0 calls several other imported functions and looks interesting. Navigating to sub_4011E0, we would expect to see that IDA Pro has named the first argument to the function as arg_0, but it has labeled it lpFilename instead. It knows that it is a filename, because it is used as a parameter to a Windows API function that accepts a filename as a parameter. One of the first things this function does is call FindFirstFile on C:\\* to search the C: drive.

Following the call to FindFirstFile, we see a lot of arithmetic and comparisons. This is another tedious and time-consuming function that we should skip and return to only if we need more information later. The first call we see (other than malloc) is to sub_4011e0, the function that we’re currently analyzing, which tells us that this is a recursive function that calls itself. The next function called is stricmp at \({\color{red}1}\), as follows:

The arguments to the stricmp function are pushed onto the stack about 30 instructions before the function call, but you can still find them by looking for the most recent push instructions. The string comparison checks a string against .exe, and then it calls the function sub_4010a0 at \({\color{red}2}\) to see if they match.

We’ll finish reviewing this function before we see what sub_4010a0 does. Digging further, we see a call to FindNextFileA, and then we see a jump call, which indicates that this functionality is performed in a loop. At the end of the function, FindClose is called, and then the function ends with some exception-handling code.

At this point, we can say with high confidence that this function is searching the C: drive for .exe files and doing something if a file has an .exe extension. The recursive call tells us that it’s probably searching the whole filesystem. We could go back and verify the details to be sure, but this would take a long time. A much better approach is to perform the basic dynamic analysis with Process Monitor (procmon) to verify that it’s searching every directory for files ending in .exe.

In order to see what this program is doing to .exe files, we need to analyze the function sub_4010a0, which is called when the .exe extension is found. sub_4010a0 is a complex function that would take too long to analyze carefully. Instead, we once again look only at the function calls. Here, we see that it first calls CreateFile, CreateFileMapping, and MapViewOfFile to map the entire file into memory. This tells us that the entire file is mapped into memory space, and the program can read or write the file without any additional function calls. This complicates analysis because it’s harder to tell how the file is being modified. Again, we’ll just move quickly through this function and use dynamic analysis to see what changes are made to the file.

Continuing to review the function, we see more arithmetic calls to IsBadPtr, which verify that the pointer is valid. Then we see a call to stricmp as shown at \({\color{red}1}\) in the following listing.

At this call to stricmp, the program checks for a string value of kernel32.dll at \({\color{red}2}\). A few instructions later, we see that the program calls repne scasb at \({\color{red}3}\) and rep movsd at \({\color{red}4}\), which are functionally equivalent to the strlen and memcpy functions. In order to see which memory address is being written by the memcpy call, we need to determine what’s stored in EDI, the register used by the rep movsd instruction. EDI is loaded with the value from EBX at \({\color{red}5}​\), so we need to see where EBX is set.

We see that EBX is loaded with the value that we passed to stricmp at \({\color{red}6}\). This means that if the function finds the string kernel32.dll, the code replaces it with something. To determine what it replaces that string with, we go to the rep movsd instruction and see that the source is at offset dword_403010.

It doesn’t make sense for a DWORD value to overwrite a string of kernel32.dll, but it does make sense for one string value to overwrite another. The following listing shows what is stored at dword_403010.

You should recognize that hex values beginning with 3, 4, 5, 6, or 7 are ASCII characters. IDA Pro has mislabeled our data. If we put the cursor on the same line as dword_403010 and press the A key on the keyboard, it will convert the data into the string kerne132.dll.

Now we know that the executable searches through the filesystem for every file ending in .exe, finds a location in that file with the string kernel32.dll, and replaces it with kerne132.dll. From our previous analysis, we know that Lab07-03.dll will be copied into C:\Windows\System32 and named kerne132.dll. At this point, we can conclude that the malware modifies executables so that they access kerne132.dll instead of kernel32.dll. This indicates that kerne132.dll is loaded by executables that are modified to load kerne132.dll instead of kernel32.dll.

At this point, we’ve reached the end of the program and should be able to use dynamic analysis to fill in the gaps. We can use procmon to confirm that the program searches the filesystem for .exe files and then opens them. (Procmon will show the program opening every executable on the system.) If we select an .exe file that has been opened and check the imports directory, we confirm that the imports from kernel32.dll have been replaced with imports from kerne132.dll. This means that every executable on the system will attempt to load our malicious DLL—every single one.

我们可以自己找一个使用了 kernel32.dll 库的 .exe 文件做测试。我这里使用的是 HelloWorld.exe

Lab07-03.exe WARNING_THIS_WILL_DESTROY_YOUR_MACHINE

未执行 HelloWorld.exeLab07-03.exe 的导入表如下:

执行 HelloWorld.exe 后,Lab07-03.exe 的导入表如下:

我们可以看到 kernel32.dll 没有被替换为 kerne132.dll。这是为什么呢?因为我们查看的是静态的文件,当然看不到预期的效果。也就是说我们的恶意代码并没有修改系统里 .exe 文件的源码,而是让 .exe 文件运行时需要调用 kernel32.dll 库时,选择调用 kerne132.dll (Lab07-03.dll)

Next, we check to see how the program modified kernel32.dll and Lab07-03.dll. We can calculate the MD5 hash of kernel32.dll before and after the program runs to clearly see that this malware does not modify kernel32.dll. When we open the modified Lab07-03.dll (now named kerne132.dll), we see that it now has an export section. Opening it in PEview, we see that it exports all the functions that kernel32.dll exported, and that these are forwarded exports, so that the actual functionality is still in kernel32.dll. The overall effect of this modification is that whenever an .exe file is run on this computer, it will load the malicious kerne132.dll and run the code in DLLMain. Other than that, all functionality will be unchanged, and the code will execute as if the program were still calling the original kernel32.dll.

We have now analyzed this malware completely. We could create host- and network-based signatures based on what we know, or we could write a malware report.

We did gloss over a lot of code in this analysis because it was too complicated, but did we miss anything? We did, but nothing of importance to malware analysis. All of the code in the main method that accessed kernel32.dll and Lab07-03.dll was parsing the export section of kernel32.dll and creating an export section in Lab07-03.dll that exported the same functions and created forward entries to kernel32.dll.

The malware needs to scan kernel32.dll for all the exports and create forward entries for the imposter kerne132.dll, because kernel32.dll is different on different systems. The tailored version of kerne132.dll exports exactly the same functions as the real kernel32.dll. In the function that modified the .exe, the code found the import directory, so it could modify the import to kernel32.dll and set the bound import table to zero so that it would not be used.

With careful and time-consuming analysis, we could determine what all of these functions do. However, when analyzing malware, time is often of the essence, and you should typically focus on what’s important. Try not to worry about the little details that won’t affect your analysis.

Preference

Finding instructions

恶意代码分析实战 Lab 7-3 习题笔记

病毒分析教程第四话--高级静态逆向分析(下)

posted @ 2019-01-17 23:19  houhaibushihai  阅读(605)  评论(0编辑  收藏  举报