Nt系列和Zw系列内核函数的关系
The NT Insider, Vol 10, Issue 4, July-August 2003 | Published: 15-Aug-03| Modified: 27-Aug-03
Click Here to Download: Code Associated With This Article Zip Archive, 14KB
The NT native API is nothing new. It’s been discussed ad nauseum, it’s been exploited by umpteen different utilities, and portions of it have even migrated into the realm of the fully documented and supported in the DDK. Come to think of it, why am I even writing about this then? I think I will just pop in my Kung Faux DVD and watch the Ill Master take care of business…Oh yeah, now I remember why I started writing this: Believe it or not, people are still confused about certain aspects of the native API. Common questions include:
- Why are there two flavors, NtXxx and ZwXxx?
- Why do my calls to ZwXxx sometimes fail, but sometimes work?
- What does the Zw stand for? Was the name of the original NT developer really Zimbanza Woobie!?
OK, well, maybe people don’t really ask that last one very often. But for those that do, Zw is entirely random and the developers chose it specifically because it could never mean anything. The other questions do often come up on the NTDEV and NTFSD mailing lists though (see http://www.osronline.com/lists for more info on the NTDEV and NTFSD peer help lists), so it is about time the record was set straight. In order to do this we are going to do a bit of disassembly. All listings will be from an XP SP1 Free build. Also, note that we’ve got sample driver code to accompany this article that shows you how to use the native system services from Kernel Mode. See the description (and URL) of the sample code provided at the end of this article.
This article assumes that the reader already understands that there is a native API and understands how the native API relates to the other subsystems in Windows. Enough information already exists on this out there that it is not worth repeating in this article.
Vanilla or Chocolate
First, let’s do a bit of math that even I can handle. We have two sets of APIs, NtXxx and ZwXxx, and two modes to call them from, User and Kernel. This means that we have four different scenarios under which we can call these routines. Using XxReadFile as the example, we have:
-
User Mode application calls NtReadFile
-
User Mode application calls ZwReadFile
-
Kernel Mode driver calls NtReadFile
-
Kernel Mode driver calls ZwReadFile
What exactly are the differences in these scenarios? Let us start by talking about the land where no driver writer feels safe, User Mode.
Calling From User Mode
As you (probably) know, User Mode applications link with NTDLL.LIB. Sticking with our example of XxReadFile, let’s compare the disassembly of the NtReadFile and ZwReadFile routines within NTDLL:
0: kd> u ntdll!NtReadFile
ntdll!NtReadFile:
77f761e8 b8b7000000 mov eax,0xb7
77f761ed ba0003fe7f mov edx,0x7ffe0300
77f761f2 ffd2 call edx
77f761f4 c22400 ret 0x24
That looks to me to be a stub that calls another routine and returns. Further inspection will definitely be necessary, but let’s just check out ZwReadFile before we move on.
0: kd> u ntdll!ZwReadFile
ntdll!NtReadFile:
77f761e8 b8b7000000 mov eax,0xb7
77f761ed ba0003fe7f mov edx,0x7ffe0300
77f761f2 ffd2 call edx
77f761f4 c22400 ret 0x24
Well look at that! They both point to the exact same place, which means that from a User Mode program it does not matter which routine you call because you are going to end up in the same place anyway. If you pick any other system service call you will notice that they all have this exact format, so our example will apply to any API you choose. The good news is that this article just got a bit shorter and I’ll be chillin’ with the Ill Master in less time than I thought.
Now let’s see what exactly is at address 0x7ffe0300, which is where we jump when we make these calls (and, as mentioned previously, where we jump when we make any native API call from User Mode).
0: kd> ln 0x7ffe0300
(7ffe0300) SharedUserData!SystemCallStub
Exact matches:
SharedUserData!SystemCallStub
0: kd> u SharedUserData!SystemCallStub
SharedUserData!SystemCallStub:
7ffe0300 8bd4 mov edx,esp
7ffe0302 0f34 sysenter
7ffe0304 c3 ret
Now how’s that for a straight-forward routine: Something (turns out it’s the code that represents which system service was called) gets put into EAX by the caller, then this routine puts a pointer to the top of the User Mode stack into EDX. Ohh, SYSENTER, uhm, ah, of course…Must be a new instruction. Let me see, that was added in…1997?? Forging ever onward, let us check the Intel documentation for SYSENTER. It says here that the SYSENTER instruction switches the current thread into Kernel Mode and executes the routine pointed to by the SYSENTER_EIP_MSR, which is MSR 0x176.
This is a good time to point out why hooking INT 2E is a bad idea and why old INT 2E hooks will not work. On systems that support SYSENTER, INT 2E is simply not used anymore. Your hook is useless if no one is ever going to call it!
Going back to WinDBG, let us execute the rdmsr command and see what is in the SYSENTER_EIP_MSR:
0: kd> rdmsr 176
msr[176] = 00000000:8053a270
Very interesting. Let’s see what that address is:
0: kd> ln 8053a270
(8053a270) nt!KiFastCallEntry | (8053a2fb) nt!KiSystemService
Exact matches:
nt!KiFastCallEntry
I will spare all of you the disassembly of KiFastCallEntry. It is an interesting read so I suggest it if you are curious, but all the code is going to do is build a trap frame so that when we exit Kernel Mode we can continue executing from where we left off. I will show the very last line of the function though:
053a2f9 eb5c jmp nt!KiSystemService+0x5c (8053a357)
We can see here that KiFastCallEntry does not actually return, it just does an unconditional jump to some offset into KiSystemService. Again sparing the reader large amounts of disassembly, the code in KiSystemService eventually takes the service number that was put into EAX on the first line of the call to XxReadFile and looks up its entry in the system service table, KiServiceTable. Each entry in this table is a pointer to a native API, also known as "system service", routine. Before calling the "system service" routine, the system service dispatch code copies the parameters that are being passed to the system service from the top of the User stack to the top of the Kernel stack. Ah! Guess that’s why a pointer to the top of the stack is saved into EDX before executing the SYSENTER.
Using the debugger extension DLL accompanying this article, we can see that index 0xb7 points to the kernel version of NtReadFile:
0: kd> !osrexts.sst
0: 0x805912c2 (nt!NtAcceptConnectPort)
1: 0x805d87b0 (nt!NtAccessCheck)
2: 0x805dc3e4 (nt!NtAccessCheckAndAuditAlarm)
...
b7: 0x8056b2ec (nt!NtReadFile)
...
And, if we look at the address of nt!NtReadFile (address 0x8056b2ec), we see:
0: kd> u nt!NtReadFile
nt!NtReadFile:
8056b2ec 6a58 push 0x58
8056b2ee 6858044e80 push 0x804e0458
8056b2f3 e8e09ffcff call nt!_SEH_prolog (805352d8)
8056b2f8 33ff xor edi,edi
8056b2fa 897de4 mov [ebp-0x1c],edi
8056b2fd 897de0 mov [ebp-0x20],edi
8056b300 897dd8 mov [ebp-0x28],edi
8056b303 897ddc mov [ebp-0x24],edi
8056b306 64a124010000 mov eax,fs:[00000124]
8056b30c 8945d4 mov [ebp-0x2c],eax
8056b30f 8a8040010000 mov al,[eax+0x140]
8056b315 8845d0 mov [ebp-0x30],al
8056b318 57 push edi
8056b319 8d45cc lea eax,[ebp-0x34]
8056b31c 50 push eax
8056b31d ff75d0 push dword ptr [ebp-0x30]
Ah, Finally! It looks like the function that actually implements the read file system service.
So, to summarize the flow of a native API call from User Mode
User Mode program calls either NtXxx or ZwXxx, both of which point to the same location
All native API calls from User Mode have a body that simply loads an index into EAX, executes SystemCallStub, and returns
SystemCallStub saves a pointer to the top of the User Mode stack into EDX and executes a SYSENTER instruction
SYSENTER disables interrupts, switches the thread into Kernel Mode and executes the instruction located in the SYSENTER_EIP_MSR (which on XP SP1 is KiFastCallEntry)
KiFastCallEntry builds a trap frame so it knows where to go when returning back to User Mode, enables interrupts, and jumps into KiSystemService
KiSystemService, amongst doing other things, copies the parameters from the User stack (pointed to by EDX) and takes the value previously stored in EAX and executes the function located at KiServiceTable[EAX]
The native API now executes in Kernel Mode with the previous mode of the thread set to User Mode. This indicates the caller came from User Mode. If you are going to remember one thing about this exercise, remember this! We’ll talk about it much more later in this article.
Now that we have gone through a gross amount of detail for the User Mode portion, we should be able to zip right through the Kernel Mode variants.
Calling From Kernel Mode
As you (should) know, Kernel Mode components link with NTOSKRNL.LIB. Let’s continue to use XxReadFile and see what the two variants look like from the kernel side of things. First, let’s try NtReadFile:
0: kd> u nt!NtReadFile
nt!NtReadFile:
8056b2ec 6a58 push 0x58
8056b2ee 6858044e80 push 0x804e0458
8056b2f3 e8e09ffcff call nt!_SEH_prolog (805352d8)
8056b2f8 33ff xor edi,edi
...
Well, this looks familiar! It’s the function that implements NtReadFile that was eventually called from User Mode (because it is where the system service table points to). Therefore, notice that if we call NtReadFile from a driver, we just execute the function, bypassing any common system service dispatcher type of entry point.
Going on what I have seen before in User Mode, where NtXxxx and ZwXxxx were identical, when I disassemble nt!ZwReadFile I’d probably expect to see exactly what I saw in nt!NtReadFile. Let’s check:
0: kd> u nt!ZwReadFile
nt!ZwReadFile:
80504d4c b8b7000000 mov eax,0xb7
80504d51 8d542404 lea edx,[esp+0x4]
80504d55 9c pushfd
80504d56 6a08 push 0x8
80504d58 e89e550300 call nt!KiSystemService (8053a2fb)
80504d5d c22400 ret 0x24
Blast! I guess I have got a bit longer before I can lounge.
We see a familiar instruction in the beginning, move 0xb7 into EAX. Then we put a pointer to the parameters that appear on the Kernel stack into EDX, push the EFLAGS and a constant value onto the stack, and finally call KiSystemService!? That was the function that we wound-up calling from KiFastCallEntry when we did the SYSENTER from User Mode.
So why aren’t we executing a SYSENTER here? Duh! Because we are already in Kernel Mode, so what is the point of entering it again? The most important thing that is going to happen when we go this route is that we are going to call the native API from Kernel Mode, execute in Kernel Mode, and in the course of going through KiSystemService our previous mode will be set to Kernel Mode. Note that this is definitely not the case if we just call the NtXxx version from Kernel Mode. In that case, our previous mode stays untouched and we go right to the function and start executing.
So, to summarize the flow of a native API call from Kernel Mode:
Case A:
- Kernel Mode component calls NtXxx
-
This is a direct call to the function that implements the system service. The call does not change previous mode.
Case B:
- Kernel Mode component calls ZwXxxx
- This leads to a step that puts the system service code (index value) into EAX, and a pointer to the arguments that have already been pushed onto the (Kernel Mode) stack into EDX.
- Then calls KiSystemService, which amongst doing other things, copies the parameters from the location pointed to by EDX and takes the value previously stored in EAX and executes the function located at KiServiceTable[EAX].
-
The native API now executes (still in Kernel Mode) with the previous mode set to Kernel Mode. This indicates the caller came from Kernel Mode.
So, it’s clear that calling NtXxx directly has less overhead, but calling ZwXxxx changes previous mode. So, what’s up with that? It seems like previous mode must be something pretty important.
Previous Mode
Time to step back and figure out what all of this means. An important fact to know is that Kernel Mode components by default trust all other Kernel Mode components. Because system services are always processed in Kernel Mode, Windows keeps track of whether the request originated from User Mode or Kernel Mode to determine if the caller is to be implicitly trusted. The system uses the previous mode indicator to determine the mode from which a system service call came. When a call comes from User Mode, previous mode is set to User. When a system service processing routine needs to determine whether or not to implicitly trust its caller, it checks the value of previous mode. If previous mode is set to User, the system service processing routine knows the call came from User Mode and thus any parameters passed in to the function need to be validated before they can be used.
This is why the previous mode being set is really the most important part about what we have talked about so far. No matter what a User Mode application does, the system treats its system service request as a User request, coming from User Mode, and goes out of its way to validate the request. All buffers are subject to validation, all access checks are performed, and absolutely no part of the request is implicitly trusted. However, a Kernel Mode request is not as scrutinized and it is assumed that the passed in parameters are valid.
If a Kernel component calls the ZwXxx version of a native API, all is well. The previous mode is set to Kernel and the credentials of the Kernel are used. The system service processing routine that is called assumes that any parameters that are passed are valid, because the request came from a Kernel Mode component (and Kernel Mode components implicitly trust each other).
The NtXxxx version of the native system service is the name of the function itself. Thus, when a Kernel Mode component calls the NtXxxx version of the system service, whatever is presently set into previous mode is unchanged. Thus, it is quite possible that the Kernel component could be running on an arbitrary User stack, with the requestor mode set to User. The system service will not know any better, attempt to validate the request parameters, possibly using the credentials of the arbitrary User Mode thread, and thus possibly fail the request. Another problem here is that one step in the validation process for a User Mode request is that all passed in buffers have either ProbeForRead or ProbeForWrite executed on them, depending on the buffer’s usage. These routines raise exceptions if executed on Kernel Mode addresses. Therefore, if you pass in Kernel Mode buffers with your request mode set to User, your calls into the native API return STATUS_ACCESS_VIOLATION.
The moral of this bedtime story is that if you are in User Mode, use whatever variant you think makes your code look pretty. In Kernel Mode, use the ZwXxx routines and get your previous mode set properly, to Kernel Mode.
If I keep this up I am going to be seriously late for my date with Queenie, but she is just going to have to wait because there is still more to cover.
I’ll Handle This
All of the native API calls work with handle values, which index into one of two types of handle tables. A Handle either describes an entry in a table that is effectively a part of the EPROCESS structure (which means it describes an object that is specific to a particular process context) or it describes an entry in a global handle table (which means it describes an object that is visible to all process contexts). This makes for some interesting scenarios.
Say you have an existing driver and you decide that being able to optionally log to a file would be a nice feature. First thing you do is setup two IOCTLs, one to enable the logging and the other to disable the logging. In the handler for the IOCTL, to enable logging, you have the driver call ZwCreateFile (remember to use the Zw versions!), which returns you a handle to use to write to the file. So far so good.
InitializeObjectAttributes(&oa, &logFileName, OBJ_CASE_INSENSITIVE,
NULL, NULL);
code = ZwCreateFile(&devExt->LogFileHandle, GENERIC_WRITE,
&oa, &iosb, NULL, FILE_ATTRIBUTE_NORMAL,
0, FILE_OVERWRITE_IF,
FILE_NON_DIRECTORY_FILE | FILE_SYNCHRONOUS_IO_NONALERT,
NULL, 0);
From here, you set up a flag in your device extension that indicates that you are logging to a file, and start to add calls to ZwWriteFile to all of your dispatch entry points.
if (devExt->LoggingEnabled) {
code = ZwWriteFile(devExt->LogFileHandle, NULL, NULL, NULL,
&iosb, (PVOID)logMessage,
logMessageLen),
NULL, NULL);
}
You note that a restriction of ZwWriteFile is that you must call it at PASSIVE_LEVEL, so you setup work items to log your timer DPC and DpcForIsr. Then you enable logging on your device and something weird happens. All of your calls to ZwWriteFile in your dispatch entry points succeed, but the ones in your work items return STATUS_INVALID_ HANDLE! How can a handle switch back and forth between being valid and invalid when you have done nothing but open it and write to it?
Remember that you created that handle in your dispatch entry point. Therefore, you could have been running in the process context of the calling application when you created that handle. In this case, your handle references an object in your User Mode application’s handle table, which is located via its EPROCESS. Your work items are running in the SYSTEM process context, so your call to ZwWriteFile is correctly failed with STATUS_INVALID_HANDLE. This is because the handle that you’re passing in is meaningless in the SYSTEM process’ context.
So what is the answer? Give up on Windows and start a revolution to bring back MULTICS? Luckily, it doesn’t have to come to that. There is already a built-in solution to this problem. All you need to do is specify OBJ_KERNEL_HANDLE as one of your object’s attributes and that handle will be good in any context you might end up calling it in. This flag is the cue to the Object Manager that you want the handle to go into the global handle table, making it visible in all process contexts.
InitializeObjectAttributes(&oa, &logFileName,
OBJ_CASE_INSENSITIVE | OBJ_KERNEL_HANDLE,
NULL, NULL);
Accompanying Samples
To see some of what we’ve talked about in action, this article has an accompanying sample for you to experiment with. The sample driver creates a log file in the root directory of the C: drive in response to an IOCTL. At the beginning of the main C file is a compile time flag USER_HANDLE. If this flag is not set, then the driver creates the handle as a Kernel Mode handle by using OBJ_KERNEL_HANDLE. Otherwise, the driver creates a User Mode handle that is valid only in the application’s context. The file is then written to using both NtWriteFile and ZwWriteFile from various parts of the driver. Each call has a full explanation of what NTSTATUS values we expect to be returned and why for both the User and Kernel handle cases. The driver portion of the sample is a legacy driver (non-WDM compliant) and must be installed with a utility such as OSR’s Driver Loader.
Also included in the samples download is a WinDBG extension DLL that locates the system service table and displays the system services located within it. To use it, simply put osrexts.dll into WinDBG’s extension DLL directory and execute !osrexts.sst in the command window.
In Summary
I hope that this article has finally put to rest the most common problems that people experience with the native API and cleared up the NtXxx versus ZwXxx question once and for all.
Now, where’s that DVD…