linux 内存
free 命令显示系统内存的使用情况,包括物理内存、交换内存(swap)和内核缓冲区内存。
如果加上 -h 选项,输出的结果会友好很多:
有时我们需要持续的观察内存的状况,此时可以使用 -s 选项并指定间隔的秒数:
$ free -h -s 3
上面的命令每隔 3 秒输出一次内存的使用情况,直到你按下 ctrl + c。
由于 free 命令本身比较简单,所以本文的重点会放在如何通过 free 命令了解系统当前的内存使用状况。
Mem 行(第二行)是内存的使用情况。
Swap 行(第三行)是交换空间的使用情况。
total 列显示系统总的可用物理内存和交换空间大小。
used 列显示已经被使用的物理内存和交换空间。
free 列显示还有多少物理内存和交换空间可用使用。
shared 列显示被共享使用的物理内存大小。
buff/cache 列显示被 buffer 和 cache 使用的物理内存大小。
available 列显示还可以被应用程序使用的物理内存大小。
先来提一个问题: buffer 和 cache 应该是两种类型的内存,但是 free 命令为什么会把它们放在一起呢?要回答这个问题需要我们做些准备工作。让我们先来搞清楚 buffer 与 cache 的含义。
buffer 在操作系统中指 buffer cache, 中文一般翻译为 "缓冲区"。要理解缓冲区,必须明确另外两个概念:"扇区" 和 "块"。扇区是设备的最小寻址单元,也叫 "硬扇区" 或 "设备块"。块是操作系统中文件系统的最小寻址单元,也叫 "文件块" 或 "I/O 块"。每个块包含一个或多个扇区,但大小不能超过一个页面,所以一个页可以容纳一个或多个内存中的块。当一个块被调入内存时,它要存储在一个缓冲区中。每个缓冲区与一个块对应,它相当于是磁盘块在内存中的表示(下图来自互联网):
注意,buffer cache 只有块的概念而没有文件的概念,它只是把磁盘上的块直接搬到内存中而不关心块中究竟存放的是什么格式的文件。
cache 在操作系统中指 page cache,中文一般翻译为 "页高速缓存"。页高速缓存是内核实现的磁盘缓存。它主要用来减少对磁盘的 I/O 操作。具体地讲,是通过把磁盘中的数据缓存到物理内存中,把对磁盘的访问变为对物理内存的访问。页高速缓存缓存的是内存页面。缓存中的页来自对普通文件、块设备文件(这个指的就是 buffer cache 呀)和内存映射文件的读写。
页高速缓存对普通文件的缓存我们可以这样理解:当内核要读一个文件(比如 /etc/hosts)时,它会先检查这个文件的数据是不是已经在页高速缓存中了。如果在,就放弃访问磁盘,直接从内存中读取。这个行为称为缓存命中。如果数据不在缓存中,就是未命中缓存,此时内核就要调度块 I/O 操作从磁盘去读取数据。然后内核将读来的数据放入页高速缓存中。这种缓存的目标是文件系统可以识别的文件(比如 /etc/hosts)。
页高速缓存对块设备文件的缓存就是我们在前面介绍的 buffer cahce。因为独立的磁盘块通过缓冲区也被存入了页高速缓存(缓冲区最终是由页高速缓存来承载的)。
那么为什么 free 命令不直接称为 cache 而非要写成 buff/cache? 这是因为缓冲区和页高速缓存的实现并非天生就是统一的。在 linux 内核 2.4 中才将它们统一。更早的内核中有两个独立的磁盘缓存:页高速缓存和缓冲区高速缓存。前者缓存页面,后者缓存缓冲区。当你知道了这些故事之后,输出中列的名称可能已经不再重要了。
free 与 available
在 free 命令的输出中,有一个 free 列,同时还有一个 available 列。这二者到底有何区别?
free 是真正尚未被使用的物理内存数量。至于 available 就比较有意思了,它是从应用程序的角度看到的可用内存数量。Linux 内核为了提升磁盘操作的性能,会消耗一部分内存去缓存磁盘数据,就是我们介绍的 buffer 和 cache。所以对于内核来说,buffer 和 cache 都属于已经被使用的内存。当应用程序需要内存时,如果没有足够的 free 内存可以用,内核就会从 buffer 和 cache 中回收内存来满足应用程序的请求。所以从应用程序的角度来说,available = free + buffer + cache。请注意,这只是一个很理想的计算方式,实际中的数据往往有较大的误差。
交换空间(swap space)
swap space 是磁盘上的一块区域,可以是一个分区,也可以是一个文件。所以具体的实现可以是 swap 分区也可以是 swap 文件。当系统物理内存吃紧时,Linux 会将内存中不常访问的数据保存到 swap 上,这样系统就有更多的物理内存为各个进程服务,而当系统需要访问 swap 上存储的内容时,再将 swap 上的数据加载到内存中,这就是常说的换出和换入。交换空间可以在一定程度上缓解内存不足的情况,但是它需要读写磁盘数据,所以性能不是很高。
现在的机器一般都不太缺内存,如果系统默认还是使用了 swap 是不是会拖累系统的性能?理论上是的,但实际上可能性并不是很大。并且内核提供了一个叫做 swappiness 的参数,用于配置需要将内存中不常用的数据移到 swap 中去的紧迫程度。这个参数的取值范围是 0~100,0 告诉内核尽可能的不要将内存数据移到 swap 中,也即只有在迫不得已的情况下才这么做,而 100 告诉内核只要有可能,尽量的将内存中不常访问的数据移到 swap 中。在 ubuntu 系统中,swappiness 的默认值是 60。如果我们觉着内存充足,可以在 /etc/sysctl.conf 文件中设置 swappiness:
/proc/meminfo 文件
其实 free 命令中的信息都来自于 /proc/meminfo 文件。/proc/meminfo 文件包含了更多更原始的信息,只是看起来不太直观:
$ cat /proc/meminfo
内存 | 你称它 | Linux称它 |
被应用使用 | used |
used |
被占用,但可以使用 | free (或available ) |
used (或available ) |
没有用来做任何事 | free |
free |
Experiments and fun with the Linux disk cache
Hopefully you are now convinced that Linux didn't just eat your ram. Here are some interesting things you can do to learn how the disk cache works.
Effects of disk cache on application memory allocation
Since I've already promised that disk cache doesn't prevent applications from getting the memory they want, let's start with that. Here is a C app (munch.c) that gobbles up as much memory as it can, or to a specified limit:
#include <stdlib.h> #include <stdio.h> #include <string.h> int main(int argc, char** argv) { int max = -1; int mb = 0; char* buffer; if(argc > 1) max = atoi(argv[1]); while((buffer=malloc(1024*1024)) != NULL && mb != max) { memset(buffer, 0, 1024*1024); mb++; printf("Allocated %d MB\n", mb); } return 0; }
Running out of memory isn't fun, but the OOM killer should end just this process and hopefully the rest will remain undisturbed. We'll definitely want to disable swap for this, or the app will gobble up that as well.
$ sudo swapoff -a $ free -m
(note that your free
output could be different, and have an 'available' column instead of a '-/+' row)
total used free shared buffers cached Mem: 1504 1490 14 0 24 809 -/+ buffers/cache: 656 848 Swap: 0 0 0 $ gcc munch.c -o munch $ ./munch Allocated 1 MB Allocated 2 MB (...) Allocated 877 MB Allocated 878 MB Allocated 879 MB Killed $ free -m total used free shared buffers cached Mem: 1504 650 854 0 1 67 -/+ buffers/cache: 581 923 Swap: 0 0 0 $
Even though it said 14MB "free", that didn't stop the application from grabbing 879MB. Afterwards, the cache is pretty empty2, but it will gradually fill up again as files are read and written. Give it a try.
Effects of disk cache on swapping
I also said that disk cache won't cause applications to use swap. Let's try that as well, with the same 'munch' app as in the last experiment. This time we'll run it with swap on, and limit it to a few hundred megabytes:
$ free -m total used free shared buffers cached Mem: 1504 1490 14 0 10 874 -/+ buffers/cache: 605 899 Swap: 2047 6 2041 $ ./munch 400 Allocated 1 MB Allocated 2 MB (...) Allocated 399 MB Allocated 400 MB $ free -m total used free shared buffers cached Mem: 1504 1090 414 0 5 485 -/+ buffers/cache: 598 906 Swap: 2047 6 2041
munch ate 400MB of ram, which was taken from the disk cache without resorting to swap. Likewise, we can fill the disk cache again and it will not start eating swap either. If you run watch free -m
in one terminal, and find . -type f -exec cat {} + > /dev/null
in another, you can see that "cached" will rise while "free" falls. After a while, it tapers off but swap is never touched1
Clearing the disk cache
For experimentation, it's very convenient to be able to drop the disk cache. For this, we can use the special file /proc/sys/vm/drop_caches
. By writing 3 to it, we can clear most of the disk cache:
$ free -m total used free shared buffers cached Mem: 1504 1471 33 0 36 801 -/+ buffers/cache: 633 871 Swap: 2047 6 2041 $ echo 3 | sudo tee /proc/sys/vm/drop_caches 3 $ free -m total used free shared buffers cached Mem: 1504 763 741 0 0 134 -/+ buffers/cache: 629 875 Swap: 2047 6 2041
Notice how "buffers" and "cached" went down, free mem went up, and free+buffers/cache stayed the same.
Effects of disk cache on load times
Let's make two test programs, one in Python and one in Java. Python and Java both come with pretty big runtimes, which have to be loaded in order to run the application. This is a perfect scenario for disk cache to work its magic.
$ cat print "Hello World! Love, Python" $ cat class Hello { public static void main(String[] args) throws Exception { System.out.println("Hello World! Regards, Java"); } } $ javac $ python Hello World! Love, Python $ java Hello Hello World! Regards, Java $
Our hello world apps work. Now let's drop the disk cache, and see how long it takes to run them.
$ echo 3 | sudo tee /proc/sys/vm/drop_caches 3 $ time python Hello World! Love, Python real 0m1.026s user 0m0.020s sys 0m0.020s $ time java Hello Hello World! Regards, Java real 0m2.174s user 0m0.100s sys 0m0.056s $
Wow. 1 second for Python, and 2 seconds for Java? That's a lot just to say hello. However, now all the file required to run them will be in the disk cache so they can be fetched straight from memory. Let's try again:
$ time python Hello World! Love, Python real 0m0.022s user 0m0.016s sys 0m0.008s $ time java Hello Hello World! Regards, Java real 0m0.139s user 0m0.060s sys 0m0.028s $
Yay! Python now runs in just 22 milliseconds, while java uses 139ms. That's 45 and 15 times faster! All your apps get this boost automatically!
Effects of disk cache on file reading
Let's make a big file and see how disk cache affects how fast we can read it. I'm making a 200mb file, but if you have less free ram, you can adjust it.
$ echo 3 | sudo tee /proc/sys/vm/drop_caches 3 $ free -m total used free shared buffers cached Mem: 1504 546 958 0 0 85 -/+ buffers/cache: 461 1043 Swap: 2047 6 2041 $ dd if=/dev/zero of=bigfile bs=1M count=200 200+0 records in 200+0 records out 209715200 bytes (210 MB) copied, 6.66191 s, 31.5 MB/s $ ls -lh bigfile -rw-r--r-- 1 vidar vidar 200M 2009-04-25 12:30 bigfile $ free -m total used free shared buffers cached Mem: 1504 753 750 0 0 285 -/+ buffers/cache: 468 1036 Swap: 2047 6 2041 $
Since the file was just written, it will go in the disk cache. The 200MB file caused a 200MB bump in "cached". Let's read it, clear the cache, and read it again to see how fast it is:
$ time cat bigfile > /dev/null real 0m0.139s user 0m0.008s sys 0m0.128s $ echo 3 | sudo tee /proc/sys/vm/drop_caches 3 $ time cat bigfile > /dev/null real 0m8.688s user 0m0.020s sys 0m0.336s $
That's more than fifty times faster!
The Linux disk cache is very unobtrusive. It uses spare memory to greatly increase disk access speeds, and without taking any memory away from applications. A fully used store of ram on Linux is efficient hardware use, not a warning sign. was presented by
These pages do simplify a little:
While newly allocated memory will always (though see point #2) be taken from the disk cache instead of swap, Linux can be configured to preemptively swap out other unused applications in the background to free up memory for cache. The is tunable through the 'swappiness' setting, accessible through /proc/sys/vm/swappiness.
A server might want to swap out unused apps to speed up disk access of running ones (making the system faster), while a desktop system might want to keep apps in memory to prevent lag when the user finally uses them (making the system more responsive). This is the subject of much debate.
Some parts of the cache can't be dropped, not even to accomodate new applications. This includes mmap'd pages that have been mlocked by some application, dirty pages that have not yet been written to storage, and data stored in tmpfs (including /dev/shm, used for shared memory). The mmap'd, mlocked pages are stuck in the page cache. Dirty pages will for the most part swiftly be written out. Data in tmpfs will be swapped out if possible.