RHEL 交换内存(Swap)使用率为 100%
环境
- Red Hat Enterprise Linux
问题
- Swap memory usage is at 100%
- Swap memory usage is above the error threshold
- Swap memory usage is higher than average
- Why Swap is being used while there is available physical memory?
决议
Introduction - what is Swap?
The primary function of Swap space is to substitute disk space for RAM memory when real RAM fills up and more space is needed.
The kernel uses a memory management program that detects blocks, aka
pages, of memory in which the contents have not been used recently. The
memory management program swaps enough of these relatively infrequently
used pages of memory out to a special partition on the hard drive
specifically designated for “paging,” or swapping. This frees up RAM and
makes room for more data to be entered into your spreadsheet. Those
pages of memory swapped out to the hard drive are tracked by the
kernel’s memory management code and can be paged back into RAM if they
are needed.
Overview
Regarding full Swap, multiple points should be taken into consideration:
1. On RHEL6, a bug which did lead to increased Swap usage was resolved in kernel-2.6.32-504.el6 (RHSA-2014-1392) and later. Details can be found in Red Hat BugZilla #949166.
2. Increasing the system's physical memory will make it less likely that Swap has to be used
3. A further approach tries to find the process using most Swap (using
below script) and kill that process. This is not desirable in most
situations.
4. The available Swap space can be increased by creating a swapfile, and
this can be done without service downtime. Please refer to How do I add a swap file to my Red Hat Enterprise Linux system? for details.
5. If it is not desired that current processes have pages which are also in Swap, then swapoff -a
, followed by a swapon -a
can be performed.
Resolution 1: Increase the system's physical memory
In some cases, full Swap memory issue causes thrashing (continuously
heavy Swap in/out activity). When sar files exist in the system (the
files are located in /var/log/sa, sysstat
package must be installed and sysstat
service should run) and a sar file contains continuous pswpin and
pswpout record, it indicates that the system does not have enough
physical memory in current workload. The system requires additional
system memory whose size is at least the current total Swap size.
Thrashing usually causes system performance degradation because it leads
to heavy disk I/O. Recent systems with SSD might not show such system
performance degradation.
Example of sar output:
12:00:00 AM pswpin/s pswpout/s <snip> 05:20:00 AM 0.21 0.00 05:30:00 AM 0.08 0.85 05:40:00 AM 0.47 0.00 05:50:00 AM 3.58 1.71 06:00:00 AM 2.48 0.00 06:10:00 AM 39.91 7.17 <<<<----- example of thrashing 06:20:00 AM 0.21 2.72 06:30:00 AM 13.30 1.04
Resolution 2: Find process's memory regions that are using the most Swap
To find out what 10 process's memory regions are using the most Swap space, copy and paste these commands onto a bash script and then execute it via the terminal:
#!/bin/bash ps ax -o pid,args | grep -v '^ PID'|sed -e 's,^ *,,' > /tmp/ps_ax.output echo -n >/tmp/results # SwapPss can provide more accurate output # Only RHEL8 onward available SWAP_KEYWORD=$(grep -l SwapPss /proc/self/smaps) if [ "$SWAP_KEYWORD" == "" ]; then SWAP_KEYWORD="Swap" else SWAP_KEYWORD="SwapPss" fi for swappid in $(grep -l ${SWAP_KEYWORD} /proc/[1-9]*/smaps ); do swapusage=0 for x in $( grep ${SWAP_KEYWORD} $swappid 2>/dev/null |grep -v '\W0 kB'|awk '{print $2}' ); do let swapusage+=$x done pid=$(echo $swappid| cut -d' ' -f3|cut -d'/' -f3) if ( [ $swapusage -ne 0 ] ); then echo -ne "$swapusage kb\t\t" >>/tmp/results egrep "^$pid " /tmp/ps_ax.output |sed -e 's,^[0-9]* ,,' >>/tmp/results fi done echo "top swap using processes which are still running:" sort -nr /tmp/results | head -n 10
This will display the 10 processes' memory regions that were using the most Swap since they were started, sorted in decreasing Swap usage order. This script does not typically display total Swap in use by a process. If you kill some of these processes that own the listed memory segments then you will observe a decrease in Swap usage. The processes killed may either be a direct contributor or a victim of the root cause of high Swap usage.
Note 1: If the above script produces no output, then it could be that none of the currently running processes in /proc/*/smaps are using Swap. You can test that by simply running:
# grep Swap /proc/[1-9]*/smaps
It is Important to keep in mind that the aforementioned script will only show the active processes that have memory swapped at that point in time, meaning at the time when the script was run. It might be possible that the system has already swapped a chunk of memory and that is visible on free
output but the script shows no output. Point being, this script shows the current swapping activity and cannot be used for historical data gathering. For that purpose, sar can be used.
Note 2: To change the number of processes displayed to something other than 10, modify 'tail -10' to the desired number.
Note 3: There is no way to know how much Swap space is used by a process in kernel versions prior to version 2.6.18-128.el5 (RHEL 5 update 3). So in any prior RHEL versions (all of RHEL 3, RHEL 4, and RHEL 5 up to and including RHEL 5 update 2) the necessary kernel code for determining how much Swap space is used by individual processes is not present.
Note 4: This can also be accomplished by running the top
command and adding the SWAP column to the output.
To add the Swap usage to "top":
- Run
top
, press "f" to enter the field editor - Navigate to the SWAP field and select it
- Toggle the display of the column by pressing "d" or "space"
- Press "q" to exit the the field editor and return to the
top
output - To permanently add the SWAP column, press "W" while running
top
. You can confirm this by reviewing ~/.config/procps/toprc
You may expect this field to show how much a program is swapped out.
However, this is not always the case, as top
natively shows this information by using the below formula:
VIRT = SWAP + RES or equal
SWAP = VIRT - RES
- You can also save the output to a file: quit top and run "top -b -n1 > top_b_n1.txt"
Note 5 System V IPC maintains a list of swapped out shared memory (shmem) pages on the shmem_swaplist
list. In releases prior to RHEL 5 update 8 (2.6.18-293.el5) the swapped out System V IPC shared memory segments are not exposed to user space. Under newer kernels this information is exported through the /proc/sysvipc/shm
file.
Example of a partial output:
# cat /proc/sysvipc/shm key shmid perms size cpid lpid nattch uid gid rss swap 8337 65536 1600 393216 9673 19030 2 106581 106581 393216 0 0 5865473 1600 524288 9673 9364 2 106581 106581 114688 4096 <<<----- 0 458756 1600 524288 9655 10686 2 106581 106581 512000 0 3300001 491525 1600 50216960 9673 19030 2 106581 106581 10448896 8716288 <<<-----
Resolution 3: Increase Swap space
To increase Swap space on your system, follow the instructions in these knowledge base articles:
How do I add a swap file to my Red Hat Enterprise Linux system?
How do I check for information about the swap on my Red Hat Enterprise Linux system?
How can we increase or decrease the size of an LVM-based swap volume on Red Hat Enterprise Linux?
Use this guide to determine how much Swap size the system should have - What is the recommended swap size for Red Hat platforms?.
In case you are experiencing Swap threshold alerts by a monitoring
process/program, consider adding another 4GB of Swap via the terminal
command line to your system to reduce the alerts.
The example for that command would be:
# dd if=/dev/zero of=/swapfile bs=1k count=4M
Afterwards, follow the rest of kbase How do I check for information about the swap on my Red Hat Enterprise Linux system? to create and activate the Swap.
Note 6 The swapon command accepts a priority option which states which Swap device should be used at first compared to the other Swap devices. The details are written in man 2 swapon
.
Quote from man 2 swapon
Each swap area has a priority, either high or low. The default priority is low. Within the low-priority areas, newer areas are even lower priority than older areas. All priorities set with swapflags are high-priority, higher than default. They may have any nonnegative value chosen by the caller. Higher numbers mean higher priority. Swap pages are allocated from areas in priority order, highest priority first. For areas with different priorities, a higher-priority area is exhausted before using a lower-priority area. If two or more areas have the same priority, and it is the highest priority available, pages are allocated on a round-robin basis between them.
Resolution 4: Flushing the Swap
Another option is to flush the Swap by utilizing the following commands:
# swapoff -a
# swapon -a
Warning: Flushing Swap in this way will force the entire contents of swap back into main memory. If your system is already low on memory this may cause it to go into an Out Of Memory condition (OOM). Care and consideration should be taken before using this option. If the system is somewhat low on memory (and not very low) pages may have to be reclaimed while bringing the contents of Swap into memory - and this may degrade performance for a short time.
Resolution 5: Clear POSIX shared memory (/dev/shm/)
In case you have followed some of the steps above, or all of them, and still have high Swap Memory usage, it might be an overload on the /dev/shm POSIX shared memory directory.
/dev/shm is an implementation of the traditional shared memory concept,
and an efficient means of passing data between programs. One program
will create a memory portion, which other processes (if permitted) can
access.
shm / shmfs (also known as tmpfs), is a common name for a temporary file
storage facility on many Unix-like operating systems. It is intended to
appear as a mounted file system, but one which uses virtual memory
instead of a persistent storage device.
In the example SOS Report analysis below, it can be seen that due to an already high percentage of physical memory usage (~77%= ~390 GB), the overload on the /dev/shm/ directory (~252 GB) was Swapped as a whole segment instead of utilizing the available RAM (~114 GB) first.
# cat sos_commands/memory/free_-m total used free shared buffers cached Mem: 516544 399419 117124 5427 41 23272 -/+ buffers/cache: 376105 140438 Swap: 263167 263144 23
- Measuring the approximate RAM used by processes running in the system does explain the high Swap usage
# awk '{ SUM += $6 } END { print SUM/1024/1024}' ps 359.867
- If we convert the amount of used memory in /dev/shm to megabytes, we will receive an amount that is close to the memory being used by Swap:
264470476/1024 = 258272 MB
# egrep "^Filesystem|^tmpfs" df Filesystem 1K-blocks Used Available Use% Mounted on tmpfs 264470540 264470476 64 100% /dev/shm
- In this case the user needs to clean irrelevant files and folders from the POSIX shared memory directory (/dev/shm/), and perhaps even investigate which application wrote into the /dev/shm/ and see if there is an abuse.
根源
If a system has 64GB of physical memory and 2GB of Swap memory, then it is relatively easy for Swap to become 100% full. This will cause alerts to be triggered on the system where alerts are generated by a kind of system monitoring tool. It is recommended to increase the amount of Swap space to prevent it from becoming 100% full. This will prevent alerts from being generated.
Also, it is important to note that in a Linux system, swapping itself is not bad. Any data that is filesystem backed, is never swapped. Only the anonymous pages go into Swap memory. The Linux kernel's page frame reclaim algorithm (PFRA) essentially comes into play in three occasions:
- Out of memory reclaiming - kernel detects a 'low memory' condition.
- Hibernation reclaiming - kernel must free memory because it is entering suspend-to-disk state.
- Periodic reclaiming - kernel employs a thread to reclaim memory periodically.
Speaking from a wide perspective, the problem description of this article gleans towards the issue of periodic reclaiming. Here, the 'kswapd' kernel thread is periodically awoken to check whether the number of free memory pages are below the pages_high watermark in a particular memory zone. When it detects such a case, it transfers the anonymous pages to Swap, based on the scanning priority and access time of the page. Over time, those swapped out pages might gather a substantial size, which is then indicated by the monitoring system. However, this mechanism of swapping is in fact, one of the many wonders that memory management subsystem does to improve performance.
When Swap memory usage is at 100% in Linux operating systems, it does not necessarily mean it's an issue. If the processes that are using the Swap do not need that memory, and the operating system does not have anything else to put into Swap, then it will be left at 100% without further issues. If this is generating alerts then it is possible to increase the Swap space to stop these alerts, but the system can't be force to not use the Swap memory unless you flush the Swap or start killing processes.
诊断步骤
- Review the output of
free
command- Example:
# free -m
total used free shared buffers cached
Mem: 516544 399419 117124 5427 41 23272
-/+ buffers/cache: 376105 140438
Swap: 263167 263144 23 <<<<------
- Review the SAR file output for continuously heavy Swap in/out activity. This is represented by high values of "pswpin" / "pswpout"
- Example of sar output :
12:00:00 AM pswpin/s pswpout/s <snip> 05:20:00 AM 0.21 0.00 05:30:00 AM 0.08 0.85 05:40:00 AM 0.47 0.00 05:50:00 AM 3.58 1.71 06:00:00 AM 2.48 0.00 06:10:00 AM 39.91 7.17 <<<<----- 06:20:00 AM 0.21 2.72 06:30:00 AM 13.30 1.04
摘抄 : https://access.redhat.com/solutions/33375