SQLOS's memory manager and SQL Server's Buffer Pool
SQLOS's memory manager and SQL Server's Buffer Pool
SQLOS's memory manager consists of several components such as memory nodes, memory clerks, memory caches, and memory objects. Fig 1. depicts memory manager components and their relationship:
----------------
| Memory Object |
----------------
|
|
V
----------------
| Page Allocator |
----------------
/ \
/ \
\/ \/
--------------- ---------------
| Memory Clerk | | Caches |
--------------- ---------------
\ /
\/ \/
----------------
| Memory Node |
----------------
Fig. 1
Memory Node
A memory node is not exposed to memory manager clients. It is internal SQLOS's object. The major goal for a memory node is to provide locality of allocation. It consists of several memory allocators. There are three major types of allocators. The first type is a set of page allocators. The second type is virtual allocator leveraging Windows VirtualAlloc APIs. The third type is a shared memory allocator which is fully based on Window's file mapping APIs.
The page allocators are the most commonly used allocators in SQLOS memory manager. The reason they are called page allocators is because they allocate memory in multiple of SQLOS's page. A size of a page is 8k, the same as a size of database page in SQL Server. As you will learn further this is not coincidence.
There are four different types of page allocators. Single page allocator, multi page allocator, large page allocator and reserved page allocator. Single page allocator can only provide one page at the time. Multiple page allocator, as you might guess, provides a set of pages at the time. Large page allocator could be used to allocate large pages. SQLOS and SQL Server use large pages to minimize TLB misses when accessing hot data structures. Currently large pages are supported only on IA64 or x64 hardware with at least 8GB of RAM. A size of a large page on IA64 is 16MB. The last type of page allocators reserved could be used to allocate special set of pages reserved for emergency, i.e. when SQLOS is low on memory. Fig2. shows the memory node and its allocators.
---------------------- ---------------- ---------------------------
| Large Page Allocator |<--------| Memory Node |--------->| Reserved Page Allocator |
--------------------- / ---------------- \ ---------------------------
/ | \
/ | \
/ | \
/ | \
\/ \/ \/
-------------------- ---------------------- ----------------------
| VM & SM Allocator | | Single Page Allocator | | Multi Page Allocator |
-------------------- ---------------------- ----------------------
Fig. 2
At this point SQL Server doesn't have a dmv, dynamic management view, that would dump a set of all memory nodes and information about their allocators. Dbcc memorystatus, discussed further, comes pretty close but it dumps information about cpu nodes not about memory nodes. You might remember that CPU nodes are proper subset of memory nodes. It means that information presented by dbcc memorystatus is sufficient to understand memory distribution on the system.
Memory Clerks
Memory nodes are hidden from memory manager users. If a client of memory manager needs to allocate memory it first creates a memory clerk. There are four types of memory clerks such as generic, cache store, user store and object store. The latter three a bit convoluted. Along with memory clerk functionality they provide data caching.
One can think of a memory clerk as a bag of statistics. It supports the same type of allocators as memory nodes as well as it enables large memory consumers to hook into memory brokerage infrastructure. (I will describe infrastructure in one of the next posts). There are several global memory clerks provided by SQLOS. SQLOS's middle and large memory consumers are encouraged to use their own clerk so that one could understand memory consumption by a component. Memory clerks infrastructure enables us to track and control amount of memory consumed by a memory component. Each CPU node has a list of memory clerks that we can safely walk during runtime. SQL Server implements sys.dm_os_memory_clerks dmv to dump clerk information. In addition combined clerk information could be derived from dbcc memory status.
Memory objects
SQLOS's memory object is a heap. A memory object requires a memory clerk to allocate its memory. We support three types of memory objects. A variable memory objects is a regular heap. An incremental memory object is a mark/shrink heap. This allocation policy is very handy during compilations and execution processes. Usually both of the processes happen in two phases. First phase is to grow memory usage and the second is to shrink memory usage. If the process is isolated we don't have to call any of destructors when freeing memory. It significantly improves performance. The last type of memory object is fixed size. As you can guess components can use such policy when they need to allocate objects of a given size.
A payload for a given memory object is 8kb. It is exactly the same as a SQLOS's page size. It also means that a memory object could be created from memory clerk leveraging single page allocator. (This is yet another very important point! Keep this in mind until I will cover SQL Server's Buffer Pool) SQL Server exposes a dmv to dump all memory objects in its process: sys.dm_os_memory_objects.
If you notice both memory clerks and memory objects dmvs expose page allocator column. Also I depicted page allocator in Fig.1. Under the hood memory object uses memory clerks's page allocator interface to allocate pages. This is useful to know when you want to join memory clerk and memory object dmvs.
So far I have described how SQLOS's memory manager structured inside. Now it is time to start talking how all this fits into SQL Server.
Buffer Pool
Now we got to the point where the life becomes very interesting. In this part all the pieces that I covered so far including memory management should start fall in their places.
Remember SQL Server has two memory settings that you can control using sp_conifigure. They are max and min server memory. I am not sure if you know but these two setting really control the size of the buffer pool. They do not control overall amount of physical memory consumed by SQL Server. In reality we can't control amount of memory consumed by SQL Server because there could be external components loaded into server's process.
When SQL Server starts, during initialization, Buffer Pool first decides how much of VAS it needs to reserve for its usage. It bases its decision on the amount of physical memory, RAM, present on the box. If amount of physical memory is equal or larger than amount of VAS it can use, remember that VAS is limited resource especially on x86, it will leave 256MB of VAS for external components plus a number of threads SQL Server is configured to use multiplied by 512KB. You might remember that 512KB is SQL Server's thread stack size. In default configuration with physical memory larger than 2GB, Buffer Pool will leave 256MB+256*512KB = 384MB of VAS space. Some people name this region as MemToLeave but in reality it is in correct. SQL Server might end up using this part of VAS itself and I will show you how it could happen latter on. You might also remember -g parameter that some people recommend to use when SQL Server starts outputting "Can't Reserve Virtual Address Space" errors. First 256MB is exactly what -g parameter controls. If you specify -g 512MB, amount of VAS that BP won't use is 512MB+256*512KB = 640MB. There is no point in specifying -g 256MB. This input parameter is the same as default value.
Once BP decides amount of VAS it will use. It reserves all of it right a way. To observe such behavior you might want to monitor SQL Server's virtual bytes from perfmon or you could use vasummary view I talked about in my previous posts. In normal case Buffer Pool can't get this much memory in one chunk so if you take a closer look at SQL Server's VAS you will see several large regions reserved. This behavior is very different from many other servers that you might have seen. Some people report it as a VAS leak in SQL Server. In reality this behavior is by design.
Buffer Pool commits pages on demand. Depending on internal memory requirements and external memory state, it calculates its target, amount of memory it thinks it should commit before it can get into memory pressure. To keep system out of paging target is constantly recalculated. Target memory can't exceed max memory that represents max server memory settings. Even if you set min server memory equal to max server memory Buffer Pool will only commit its memory on demand. You can observe this behavior by monitoring corresponding profiler event.
The size of SQL Server database page is 8KB. Buffer Pool is a cache of data pages. Consequently Buffer Pool operates on pages of 8KB in size. It commits and decommits memory blocks of 8KB granularity only. If external components decide to borrow memory out of Buffer Pool they can only get blocks of 8KB in size. These blocks are not continues in memeory. Interesting, right? It means that Buffer Pool can be used as underneath memory manager forSQL Server components as long as they allocate buffers of 8KB. (Sometimes pages allocated from BP are referred as stolen)
Here is where SQLOS and Buffer Pool meet. See Fig.3
-----------------
| Memory Node |
-----------------
|
|
V
------------------------
| Single Page Allocator |
------------------------
|
|
V
-----------------
| Buffer Pool |
-----------------
Fig. 3
SQLOS' memory manager can be dynamically configured to use specific single page allocator. This is exactly what SQL Server does during a startup it configures Buffer Pool to be SQLOS's single page allocator. From that point on all dynamic single page allocations are provided by Buffer Pool. For example remember that memory object's payload is 8KB. When a component creates a memory object the allocation is served by SQLOS's single page allocator which is BP.
When describing the memory manager I mentioned that every large component has its own memory clerk. It means that Buffer Pool has its own memory clerk as well. How is it possible, BP leverages SQLOS memory clerk but SQLOS' memory manager relies on BP? This is common chicken and egg problem that you often can observe in operating systems. The key here is that Buffer Pool never uses any type of page allocator from SQLOS. It only leverages Virtual and AWE SQLOS's interfaces.
-----------------
| Buffer Pool |
-----------------
|
|
V
--------------------------
| Memory Clerk (VM/AWE) |
--------------------------
|
|
V
-----------------
| Memory Node |
-----------------
Fig. 4
All SQL Server's components optimized for 8KB allocations so that they can allocate memory through SQLOS's single page allocator and consequently through Buffer Pool. However there are cases when a component requires large buffers. If it happens allocation will be either satisfied by memory node's multi page allocator or by virtual allocator. As you might guess that memory will be allocated outside of Buffer Pool. This is exactly why I don’t like term MemToLeave, SQL Server does allocate memory out of that area!
Buffer Pool and AWE mechanism
When describing SQLOS memory manager and Buffer Pool, the discussion would be incomplete without describtion of how AWE fits in all of this. It is really important to understand how Buffer Pool allocates its memory when SQL Server configured to use AWE mechanisms. First, please remember, BP leverages SQLOS's memory clerk interfaces to allocate both VAS and physical pages through AWE. Second, there are several differences that you need to keep in mind. First BP reserves VAS in 4MB chunks instead of "single" large region. This enables SQL Server to release VAS when process is under VAS pressure. (We didn't have all bits and pieces to do this when server is not configured to use AWE mechanisms). Then it allocates all of its memory using AWE mechanism on demand. This is very big difference between SQL2000 and Yukon. In SQL Server 2000 BP would allocate all of its memory when using AWE mechanism right a way.
Buffer Pool is a preferred memory allocator for the whole server. In AWE mode it allocates its memory leveraging AWE mechanism. It means that all allocations allocated through SQLOS's single page allocator will come from pages allocated through AWE. This is what many people really missing. Let me make the point again: When Server is configured for AWE mode, most of it allocations are allocated through AWE mechanism. This is exactly the reason why you won't see private bytes and memory usage growing for SQL Server in this mode.
Since data pages are use relative addressing, i.e. self contained, Buffer Pool can map and unmap them into and out of process's VAS. Other components could have done the same if they were not relying on the actual allocation address. Unfortunately there are no components right now other than BP that can take advantage of AWE mechanism.
Future posts
I haven't completed discussion about SQLOS memory management yet . There is still much to talk about. In my next posts I will cover SQLOS caches and handling of memory pressure. It is also really important to look at dbcc memory status and related dmvs.
I understand that this material is not easy to grasp. Please let me know if you have any questions or you would like me to explain some areas in deeper details.
If you have any feedback I would love to hear that as well. This is really important to me so can adjust my posts accordingly.
The weekend is just a day away: Enjoy it!