RHEL 交换内存(Swap)使用率为 100%

环境

  • Red Hat Enterprise Linux

问题

  • Swap memory usage is at 100%
  • Swap memory usage is above the error threshold
  • Swap memory usage is higher than average
  • Why Swap is being used while there is available physical memory?

决议

Introduction - what is Swap?

The primary function of Swap space is to substitute disk space for RAM memory when real RAM fills up and more space is needed.
The kernel uses a memory management program that detects blocks, aka pages, of memory in which the contents have not been used recently. The memory management program swaps enough of these relatively infrequently used pages of memory out to a special partition on the hard drive specifically designated for “paging,” or swapping. This frees up RAM and makes room for more data to be entered into your spreadsheet. Those pages of memory swapped out to the hard drive are tracked by the kernel’s memory management code and can be paged back into RAM if they are needed.

Overview

Regarding full Swap, multiple points should be taken into consideration:
1. On RHEL6, a bug which did lead to increased Swap usage was resolved in kernel-2.6.32-504.el6 (RHSA-2014-1392) and later. Details can be found in Red Hat BugZilla #949166.
2. Increasing the system's physical memory will make it less likely that Swap has to be used
3. A further approach tries to find the process using most Swap (using below script) and kill that process. This is not desirable in most situations.
4. The available Swap space can be increased by creating a swapfile, and this can be done without service downtime. Please refer to How do I add a swap file to my Red Hat Enterprise Linux system? for details.
5. If it is not desired that current processes have pages which are also in Swap, then swapoff -a, followed by a swapon -a can be performed.

Resolution 1: Increase the system's physical memory

In some cases, full Swap memory issue causes thrashing (continuously heavy Swap in/out activity). When sar files exist in the system (the files are located in /var/log/sa, sysstat package must be installed and sysstat service should run) and a sar file contains continuous pswpin and pswpout record, it indicates that the system does not have enough physical memory in current workload. The system requires additional system memory whose size is at least the current total Swap size. Thrashing usually causes system performance degradation because it leads to heavy disk I/O. Recent systems with SSD might not show such system performance degradation.

Example of sar output:

12:00:00 AM  pswpin/s pswpout/s
<snip>
05:20:00 AM  0.21      0.00
05:30:00 AM  0.08      0.85
05:40:00 AM  0.47      0.00
05:50:00 AM  3.58      1.71
06:00:00 AM  2.48      0.00
06:10:00 AM 39.91      7.17   <<<<----- example of thrashing
06:20:00 AM  0.21      2.72
06:30:00 AM 13.30      1.04

 

Resolution 2: Find process's memory regions that are using the most Swap

To find out what 10 process's memory regions are using the most Swap space, copy and paste these commands onto a bash script and then execute it via the terminal:

#!/bin/bash
ps ax -o pid,args | grep -v '^  PID'|sed -e 's,^ *,,' > /tmp/ps_ax.output
echo -n >/tmp/results

# SwapPss can provide more accurate output
# Only RHEL8 onward available
SWAP_KEYWORD=$(grep -l SwapPss /proc/self/smaps)
if [ "$SWAP_KEYWORD" == "" ]; then
    SWAP_KEYWORD="Swap"
else
    SWAP_KEYWORD="SwapPss"
fi

for swappid in $(grep -l ${SWAP_KEYWORD} /proc/[1-9]*/smaps ); do
        swapusage=0
        for x in $( grep ${SWAP_KEYWORD} $swappid 2>/dev/null |grep -v '\W0 kB'|awk '{print $2}' ); do
                let swapusage+=$x
        done
        pid=$(echo $swappid| cut -d' ' -f3|cut -d'/' -f3)
        if ( [ $swapusage -ne 0 ] ); then
                echo -ne "$swapusage kb\t\t" >>/tmp/results
                egrep "^$pid " /tmp/ps_ax.output |sed -e 's,^[0-9]* ,,' >>/tmp/results
        fi
done

echo "top swap using processes which are still running:"
sort -nr /tmp/results | head -n 10

 

This will display the 10 processes' memory regions that were using the most Swap since they were started, sorted in decreasing Swap usage order. This script does not typically display total Swap in use by a process. If you kill some of these processes that own the listed memory segments then you will observe a decrease in Swap usage. The processes killed may either be a direct contributor or a victim of the root cause of high Swap usage.

Note 1: If the above script produces no output, then it could be that none of the currently running processes in /proc/*/smaps are using Swap. You can test that by simply running:

 
# grep Swap /proc/[1-9]*/smaps 

 

It is Important to keep in mind that the aforementioned script will only show the active processes that have memory swapped at that point in time, meaning at the time when the script was run. It might be possible that the system has already swapped a chunk of memory and that is visible on free output but the script shows no output. Point being, this script shows the current swapping activity and cannot be used for historical data gathering. For that purpose, sar can be used.

Note 2: To change the number of processes displayed to something other than 10, modify 'tail -10' to the desired number.

Note 3: There is no way to know how much Swap space is used by a process in kernel versions prior to version 2.6.18-128.el5 (RHEL 5 update 3). So in any prior RHEL versions (all of RHEL 3, RHEL 4, and RHEL 5 up to and including RHEL 5 update 2) the necessary kernel code for determining how much Swap space is used by individual processes is not present.

Note 4: This can also be accomplished by running the top command and adding the SWAP column to the output.
To add the Swap usage to "top":

  1. Run top, press "f" to enter the field editor
  2. Navigate to the SWAP field and select it
  3. Toggle the display of the column by pressing "d" or "space"
  4. Press "q" to exit the the field editor and return to the top output
  5. To permanently add the SWAP column, press "W" while running top. You can confirm this by reviewing ~/.config/procps/toprc

You may expect this field to show how much a program is swapped out.
However, this is not always the case, as top natively shows this information by using the below formula:

 
    VIRT = SWAP + RES or equal
    SWAP = VIRT - RES
  • You can also save the output to a file: quit top and run "top -b -n1 > top_b_n1.txt"

Note 5 System V IPC maintains a list of swapped out shared memory (shmem) pages on the shmem_swaplist list. In releases prior to RHEL 5 update 8 (2.6.18-293.el5) the swapped out System V IPC shared memory segments are not exposed to user space. Under newer kernels this information is exported through the /proc/sysvipc/shm file.
Example of a partial output:

 # cat /proc/sysvipc/shm
key         shmid       perms   size            cpid    lpid            nattch      uid     gid        rss          swap
8337        65536       1600    393216      9673    19030       2           106581  106581     393216       0
0           5865473 1600    524288      9673    9364        2           106581  106581     114688       4096   <<<-----
0           458756          1600    524288      9655    10686       2           106581  106581     512000       0
3300001 491525           1600   50216960    9673    19030       2           106581  106581     10448896         8716288   <<<-----

 

Resolution 3: Increase Swap space

To increase Swap space on your system, follow the instructions in these knowledge base articles:
How do I add a swap file to my Red Hat Enterprise Linux system?
How do I check for information about the swap on my Red Hat Enterprise Linux system?
How can we increase or decrease the size of an LVM-based swap volume on Red Hat Enterprise Linux?

Use this guide to determine how much Swap size the system should have - What is the recommended swap size for Red Hat platforms?.

In case you are experiencing Swap threshold alerts by a monitoring process/program, consider adding another 4GB of Swap via the terminal command line to your system to reduce the alerts.
The example for that command would be:

# dd if=/dev/zero of=/swapfile bs=1k count=4M

 

Afterwards, follow the rest of kbase How do I check for information about the swap on my Red Hat Enterprise Linux system? to create and activate the Swap.

Note 6 The swapon command accepts a priority option which states which Swap device should be used at first compared to the other Swap devices. The details are written in man 2 swapon.

Quote from man 2 swapon

Each swap area has a priority, either high or low.  The default priority is low.  Within the low-priority areas, newer areas are even lower priority than older areas.

All priorities set with swapflags are high-priority, higher than default.  They may have any nonnegative value chosen by the caller.  Higher numbers mean higher priority.

Swap pages are allocated from areas in priority order, highest priority first.  For areas with different priorities, a higher-priority area is exhausted before using a lower-priority area.  If two or more areas have the same priority, and it is  the  highest
priority available, pages are allocated on a round-robin basis between them.

 

Resolution 4: Flushing the Swap

Another option is to flush the Swap by utilizing the following commands:

# swapoff -a  
# swapon -a

 

Warning: Flushing Swap in this way will force the entire contents of swap back into main memory. If your system is already low on memory this may cause it to go into an Out Of Memory condition (OOM). Care and consideration should be taken before using this option. If the system is somewhat low on memory (and not very low) pages may have to be reclaimed while bringing the contents of Swap into memory - and this may degrade performance for a short time.

Resolution 5: Clear POSIX shared memory (/dev/shm/)

In case you have followed some of the steps above, or all of them, and still have high Swap Memory usage, it might be an overload on the /dev/shm POSIX shared memory directory.
/dev/shm is an implementation of the traditional shared memory concept, and an efficient means of passing data between programs. One program will create a memory portion, which other processes (if permitted) can access.
shm / shmfs (also known as tmpfs), is a common name for a temporary file storage facility on many Unix-like operating systems. It is intended to appear as a mounted file system, but one which uses virtual memory instead of a persistent storage device.

In the example SOS Report analysis below, it can be seen that due to an already high percentage of physical memory usage (~77%= ~390 GB), the overload on the /dev/shm/ directory (~252 GB) was Swapped as a whole segment instead of utilizing the available RAM (~114 GB) first.

# cat sos_commands/memory/free_-m 
                  total            used            free            shared       buffers     cached
Mem:       516544       399419       117124        5427          41               23272
-/+ buffers/cache:    376105        140438
Swap:      263167       263144        23

 

  • Measuring the approximate RAM used by processes running in the system does explain the high Swap usage
 # awk '{ SUM += $6 } END { print SUM/1024/1024}' ps
359.867

 

  • If we convert the amount of used memory in /dev/shm to megabytes, we will receive an amount that is close to the memory being used by Swap:
    264470476/1024 = 258272 MB
# egrep "^Filesystem|^tmpfs" df
Filesystem       1K-blocks           Used                   Available   Use%     Mounted on
tmpfs                264470540      264470476      64               100%     /dev/shm

 

  • In this case the user needs to clean irrelevant files and folders from the POSIX shared memory directory (/dev/shm/), and perhaps even investigate which application wrote into the /dev/shm/ and see if there is an abuse.

根源

If a system has 64GB of physical memory and 2GB of Swap memory, then it is relatively easy for Swap to become 100% full. This will cause alerts to be triggered on the system where alerts are generated by a kind of system monitoring tool. It is recommended to increase the amount of Swap space to prevent it from becoming 100% full. This will prevent alerts from being generated.

Also, it is important to note that in a Linux system, swapping itself is not bad. Any data that is filesystem backed, is never swapped. Only the anonymous pages go into Swap memory. The Linux kernel's page frame reclaim algorithm (PFRA) essentially comes into play in three occasions:

  1. Out of memory reclaiming - kernel detects a 'low memory' condition.
  2. Hibernation reclaiming - kernel must free memory because it is entering suspend-to-disk state.
  3. Periodic reclaiming - kernel employs a thread to reclaim memory periodically.

Speaking from a wide perspective, the problem description of this article gleans towards the issue of periodic reclaiming. Here, the 'kswapd' kernel thread is periodically awoken to check whether the number of free memory pages are below the pages_high watermark in a particular memory zone. When it detects such a case, it transfers the anonymous pages to Swap, based on the scanning priority and access time of the page. Over time, those swapped out pages might gather a substantial size, which is then indicated by the monitoring system. However, this mechanism of swapping is in fact, one of the many wonders that memory management subsystem does to improve performance.

When Swap memory usage is at 100% in Linux operating systems, it does not necessarily mean it's an issue. If the processes that are using the Swap do not need that memory, and the operating system does not have anything else to put into Swap, then it will be left at 100% without further issues. If this is generating alerts then it is possible to increase the Swap space to stop these alerts, but the system can't be force to not use the Swap memory unless you flush the Swap or start killing processes.

诊断步骤

  • Review the output of free command
    • Example:
 # free -m
                  total            used            free            shared       buffers     cached
Mem:       516544       399419       117124        5427          41               23272
-/+ buffers/cache:    376105        140438
Swap:      263167       263144        23   <<<<------
  • Review the SAR file output for continuously heavy Swap in/out activity. This is represented by high values of "pswpin" / "pswpout"
    • Example of sar output :
12:00:00 AM  pswpin/s pswpout/s
<snip>
05:20:00 AM  0.21      0.00
05:30:00 AM  0.08      0.85
05:40:00 AM  0.47      0.00
05:50:00 AM  3.58      1.71
06:00:00 AM  2.48      0.00
06:10:00 AM 39.91      7.17   <<<<-----
06:20:00 AM  0.21      2.72
06:30:00 AM 13.30      1.04








摘抄 : https://access.redhat.com/solutions/33375

posted @ 2021-06-28 13:09  augusite  阅读(1302)  评论(0编辑  收藏  举报