27-案例篇：为什么我的磁盘I/O延迟很高？

案例

环境准备

# 服务器
# 机器配置：2cpu，4GB内存
# 预先安装docker、sysstat包

# 客户端
# 任意linux服务器即可

在服务端执行下面的命令，模拟案例

[root@local_sa_192-168-1-6 ~]# docker run --name=app -p 10000:80 -itd feisky/word-pop

[root@local_sa_192-168-1-6 ~]# docker ps
CONTAINER ID   IMAGE             COMMAND            CREATED         STATUS         PORTS                                     NAMES
ad3887e297e3   feisky/word-pop   "python /app.py"   8 seconds ago   Up 6 seconds   0.0.0.0:10000->80/tcp, :::10000->80/tcp   app

[root@local_sa_192-168-1-6 ~]# curl http://192.168.1.6:10000/
hello world

在客户端访问服务端接口

# 发现访问接口很慢
[root@local_deploy_192-168-1-5 ~]# curl http://192.168.1.6:10000/popularity/word

# 访问一个接口居然要2分多钟
[root@local_deploy_192-168-1-5 ~]# time curl http://192.168.1.6:10000/popularity/word
{
  "popularity": 0.0,
  "word": "word"
}

real	2m45.296s
user	0m0.009s
sys	0m0.002s

# 一直访问，便于分析
[root@local_deploy_192-168-1-5 ~]# while true; do time curl http://192.168.1.6:10000/popularity/word; sleep 1; done

在服务端，执行df命令，查看一下文件系统的使用情况

[root@local_sa_192-168-1-6 ~]# df -h
文件系统                 容量  已用  可用 已用% 挂载点
devtmpfs                 2.0G     0  2.0G    0% /dev
tmpfs                    2.0G     0  2.0G    0% /dev/shm
tmpfs                    2.0G  8.6M  2.0G    1% /run
tmpfs                    2.0G     0  2.0G    0% /sys/fs/cgroup
/dev/mapper/centos-root   98G   20G   79G   20% /
/dev/sda1               1014M  256M  759M   26% /boot
tmpfs                    394M     0  394M    0% /run/user/1007
overlay                   98G   20G   79G   20% 
##
系统还有足够多的磁盘空间，那为什么响应会变慢呢？

在服务端，用top来观察CPU和内存的使用情况

[root@local_sa_192-168-1-6 ~]# top
top - 14:36:02 up 1 day,  1:22,  1 user,  load average: 1.98, 1.18, 0.66
Tasks: 117 total,   1 running,  57 sleeping,   0 stopped,   0 zombie
%Cpu0  :  1.7 us,  1.0 sy,  0.0 ni, 29.5 id, 67.8 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  2.7 us,  1.0 sy,  0.0 ni, 24.9 id, 71.4 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  4027996 total,  1778052 free,   284884 used,  1965060 buff/cache
KiB Swap:  8388604 total,  8388604 free,        0 used.  3508532 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 5890 root      20   0  105912  31372   7312 S   6.6  0.8   0:19.76 python
##
两个 CPU 的 iowait 都非常高
##
剩余内存还有3GB，看起来也是充足的
##
还有python这个进程目前是使用CPU最高的进程之一，首先要分析一下它

用iostat来观察磁盘的I/O情况

# -d 选项是指显示出IO的性能指标
# -x 选项是指显示出扩展统计信息（即显示所有IO指标）
[root@local_sa_192-168-1-6 ~]# iostat -d -x 1
Linux 5.15.7-1.el7.elrepo.x86_64 (local_sa_192-168-1-6) 	2021年12月20日 	_x86_64_	(2 CPU)
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00   20.00     0.00 15382.50  1538.25     0.96   48.15    0.00   48.15  47.70  95.40
scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    8.00     0.00 15813.00  3953.25     1.44  179.38    0.00  179.38  42.00  33.60
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
##
磁盘sda的I/O使用率已经达到95%，接近饱和了
而且，写请求的响应时间高达20秒，每秒的写数据为15MB
这些I/O请求到底是哪些进程导致的呢？

pidstat观察进程IO情况

[root@local_sa_192-168-1-6 ~]# pidstat -d 1
Linux 5.15.7-1.el7.elrepo.x86_64 (local_sa_192-168-1-6) 	2021年12月20日 	_x86_64_	(2 CPU)

14时44分49秒   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
14时44分50秒     0      5890      0.00  18043.56      0.00  python

14时44分50秒   UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
14时44分51秒     0      5890      0.00  17952.00      0.00  python
##
从pidstat的输出，看到了PID号为5890的结果
这说明，正是案例应用引发I/O的性能瓶颈

先用strace确认它是不是在写文件

[root@local_sa_192-168-1-6 ~]# strace -p 5890 -f
[pid  5890] stat("/usr/local/lib/python3.7/lib-dynload/select.cpython-37m-x86_64-linux-gnu.so", {st_mode=S_IFREG|0755, st_size=87240, ...}) = 0
[pid  5890] stat("/usr/local/lib/python3.7/lib-dynload/select.cpython-37m-x86_64-linux-gnu.so", {st_mode=S_IFREG|0755, st_size=87240, ...}) = 0
[pid  5890] stat("/usr/local/lib/python3.7/lib-dynload/select.cpython-37m-x86_64-linux-gnu.so", {st_mode=S_IFREG|0755, st_size=87240, ...}) = 0
[pid  5890] stat("/usr/local/lib/python3.7/urllib/__init__.py", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid  5890] stat("/usr/local/lib/python3.7/urllib/__init__.py", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid  5890] stat("/usr/local/lib/python3.7/urllib/__init__.py", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid  5890] stat("/usr/local/lib/python3.7/urllib/parse.py", {st_mode=S_IFREG|0644, st_size=36925, ...}) = 0
[pid  5890] stat("/usr/local/lib/python3.7/urllib/parse.py", {st_mode=S_IFREG|0644, st_size=36925, ...}) = 0
[pid  5890] stat("/usr/local/lib/python3.7/urllib/parse.py", {st_mode=S_IFREG|0644, st_size=36925, ...}) = 0
[pid  5890] stat("/usr/local/lib/python3.7/email/_parseaddr.py", {st_mode=S_IFREG|0644, 
##
从strace中，可以看到大量的stat系统调用，并且大都为python的文件，但是这里并没有任何write系统调用

## 由于strace的输出比较多，可以用grep，来过滤一下write
[root@local_sa_192-168-1-6 ~]# strace -p 5890 2>&1 | grep write
##
仍然没有任何输出

只好综合strace、pidstat和iostat这三个结果来分析了
iostat已经证明磁盘I/O有性能瓶颈，而pidstat也证明了，这个瓶颈是由5890号进程导致的
但strace跟踪这个进程，却没有找到任何write系统调用文件写，
明明应该有相应的write系统调用，但用现有工具却找不到痕迹，这时就该想想换工具的问题了

filetop是bcc软件包的一部分，主要跟踪内核中文件的读写情况，并输出线程ID、读写大小、读写类型以及文件名称

# -C 选项表示输出新内容时不清空屏幕
[root@local_sa_192-168-1-6 ~]# filetop -C
TID     COMM             READS  WRITES R_Kb    W_Kb    T FILE
5968    python           0      1      0       4199    R 711.txt
5968    python           0      1      0       3369    R 707.txt
5968    python           0      1      0       3222    R 710.txt
5968    python           0      1      0       2636    R 708.txt
5968    python           0      1      0       2392    R 709.txt
5969    filetop          1      0      4       0       R type
5969    filetop          1      0      4       0       R retprobe
5969    filetop          2      0      2       0       R loadavg
656     auditd           0      5      0       0       R audit.log
......
TID     COMM             READS  WRITES R_Kb    W_Kb    T FILE
5968    python           0      1      0       3369    R 714.txt
5968    python           0      1      0       2685    R 712.txt
5968    python           0      1      0       2587    R 718.txt
5968    python           0      1      0       2490    R 716.txt
5968    python           0      1      0       2099    R 717.txt
5968    python           0      1      0       1904    R 715.txt
5968    python           0      1      0       1757    R 713.txt
5969    filetop          2      0      2       0       R loadavg

##
filetop输出了8列内容，分别是线程ID、线程命令行、读写次数、读写的大小、文件类型以及读写的文件名称
##
多观察一会儿会发现，每隔一段时间，线程号为5968的python应用就会先写入大量的txt文件，再大量地读

##
线程号为5968的线程，属于哪个进程呢？可以用ps命令查看
[root@local_sa_192-168-1-6 ~]# ps -efT|grep 5968
root      5890  5968  5860  8 14:54 pts/0    00:00:07 /usr/local/bin/python /app.py
root      5985  5985  5348  0 14:55 pts/0    00:00:00 grep --color=auto 5971
##
这个线程正是案例应用5890的线程
filetop只给出了文件名称，却没有文件路径

opensnoop属于bcc软件包，可以动态跟踪内核中的open系统调用，这样就可以找出这些 txt 文件的路径

[root@local_sa_192-168-1-6 ~]# opensnoop
libbpf: failed to find valid kernel BTF
libbpf: vmlinux BTF is not found
PID    COMM               FD ERR PATH
5890   python              6   0 /tmp/0a9f7e26-6162-11ec-b257-0242ac110002/454.txt
5890   python              6   0 /tmp/0a9f7e26-6162-11ec-b257-0242ac110002/455.txt
5890   python              6   0 /tmp/0a9f7e26-6162-11ec-b257-0242ac110002/456.txt
5890   python              6   0 /tmp/0a9f7e26-6162-11ec-b257-0242ac110002/457.txt
5890   python              6   0 /tmp/0a9f7e26-6162-11ec-b257-0242ac110002/458.txt
5890   python              6   0 /tmp/0a9f7e26-6162-11ec-b257-0242ac110002/459.txt
5890   python              6   0 /tmp/0a9f7e26-6162-11ec-b257-0242ac110002/460.txt
5890   python              6   0 /tmp/0a9f7e26-6162-11ec-b257-0242ac110002/461.txt
5890   python              6   0 /tmp/0a9f7e26-6162-11ec-b257-0242ac110002/462.txt
5890   python              6   0 /tmp/0a9f7e26-6162-11ec-b257-0242ac110002/463.txt
5890   python              6   0 /tmp/0a9f7e26-6162-11ec-b257-0242ac110002/464.txt
5890   python              6   0 /tmp/0a9f7e26-6162-11ec-b257-0242ac110002/465.txt
5890   python              6   0 /tmp/0a9f7e26-6162-11ec-b257-0242ac110002/466.txt
##
通过opensnoop的输出可以看到，这些txt路径位于/tmp目录下
它打开的文件数量，按照数字编号，从0.txt依次增大到999.txt，这可远多于前面用filetop看到的数量
##
综合filetop和opensnoop就可以进一步分析了
可以大胆猜测，案例应用在写入1000个txt文件后，又把这些内容读到内存中进行处理。
来检查一下，这个目录中是不是真的有1000个文件
[root@local_sa_192-168-1-6 ~]# ls /tmp/0a9f7e26-6162-11ec-b257-0242ac110002|wc -l
ls: 无法访问/tmp/0a9f7e26-6162-11ec-b257-0242ac110002: 没有那个文件或目录
0
##
操作后却发现，目录居然不存在了。怎么回事呢？回到opensnoop再观察一会儿
[root@local_sa_192-168-1-6 ~]# opensnoop
libbpf: failed to find valid kernel BTF
libbpf: vmlinux BTF is not found
PID    COMM               FD ERR PATH
5890   python              6   0 /tmp/6ab71080-6162-11ec-b257-0242ac110002/309.txt
5890   python              6   0 /tmp/6ab71080-6162-11ec-b257-0242ac110002/310.txt
5890   python              6   0 /tmp/6ab71080-6162-11ec-b257-0242ac110002/311.txt
5890   python              6   0 /tmp/6ab71080-6162-11ec-b257-0242ac110002/312.txt
5890   python              6   0 /tmp/6ab71080-6162-11ec-b257-0242ac110002/313.txt
##
原来，这时的路径已经变成了另一个目录。这说明，这些目录都是应用程序动态生成的，用完就删了
##
结合前面的所有分析，基本可以判断，案例应用会动态生成一批文件，用来临时存储数据，用完就会删除它们
但不幸的是，正是这些文件读写，引发了I/O的性能瓶颈，导致整个处理过程非常慢

查看应用程序的源码 app.py

# 拷贝代码到本地目录
[root@local_sa_192-168-1-6 ~]# docker cp app:/app.py .
[root@local_sa_192-168-1-6 ~]# ls
app.py

# 查看源码
[root@local_sa_192-168-1-6 ~]# cat app.py
#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import os
import uuid
import random
import shutil
from concurrent.futures import ThreadPoolExecutor
from flask import Flask, jsonify

app = Flask(__name__)


def validate(word, sentence):
    return word in sentence


def generate_article():
    s_nouns = [
        "A dude", "My mom", "The king", "Some guy", "A cat with rabies",
        "A sloth", "Your homie", "This cool guy my gardener met yesterday",
        "Superman"
    ]
    p_nouns = [
        "These dudes", "Both of my moms", "All the kings of the world",
        "Some guys", "All of a cattery's cats",
        "The multitude of sloths living under your bed", "Your homies",
        "Like, these, like, all these people", "Supermen"
    ]
    s_verbs = [
        "eats", "kicks", "gives", "treats", "meets with", "creates", "hacks",
        "configures", "spies on", "retards", "meows on", "flees from",
        "tries to automate", "explodes"
    ]
    infinitives = [
        "to make a pie.", "for no apparent reason.",
        "because the sky is green.", "for a disease.",
        "to be able to make toast explode.", "to know more about archeology."
    ]
    sentence = '{} {} {} {}'.format(
        random.choice(s_nouns), random.choice(s_verbs),
        random.choice(s_nouns).lower() or random.choice(p_nouns).lower(),
        random.choice(infinitives))
    return '\n'.join([sentence for i in range(50000)])


@app.route('/')
def hello_world():
    return 'hello world'


@app.route("/popularity/<word>")
def word_popularity(word):
    dir_path = '/tmp/{}'.format(uuid.uuid1())
    count = 0
    sample_size = 1000

    def save_to_file(file_name, content):
        with open(file_name, 'w') as f:
            f.write(content)

    try:
        # initial directory firstly
        os.mkdir(dir_path)

        # save article to files
        for i in range(sample_size):
            file_name = '{}/{}.txt'.format(dir_path, i)
            article = generate_article()
            save_to_file(file_name, article)

        # count word popularity
        for root, dirs, files in os.walk(dir_path):
            for file_name in files:
                with open('{}/{}'.format(dir_path, file_name)) as f:
                    if validate(word, f.read()):
                        count += 1
    finally:
        # clean files
        shutil.rmtree(dir_path, ignore_errors=True)

    return jsonify({'popularity': count / sample_size * 100, 'word': word})


@app.route("/popular/<word>")
def word_popular(word):
    count = 0
    sample_size = 1000
    articles = []

    try:
        for i in range(sample_size):
            articles.append(generate_article())

        for article in articles:
            if validate(word, article):
                count += 1
    finally:
        pass

    return jsonify({'popularity': count / sample_size * 100, 'word': word})


if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=80)
    
##
源码中可以看到，这个案例应用，在每个请求的处理过程中，都会生成一批临时文件
然后读入内存处理，最后再把整个目录删除掉
##
这是一种常见的利用磁盘空间处理大量数据的技巧，不过，本次案例中的I/O请求太重，导致磁盘I/O利用率过高
##
要解决这一点，其实就是算法优化问题了
比如在内存充足时，就可以把所有数据都放到内存中处理，这样就能避免I/O的性能问题

在终端二中分别访问http://192.168.1.6:10000/popularity/word和http://192.168.1.6:10000/popular/word ，对比前后的效果

[root@local_deploy_192-168-1-5 ~]# time curl  http://192.168.1.6:10000/popular/word
{
  "popularity": 0.0,
  "word": "word"
}

real	0m30.063s
user	0m0.002s
sys	0m0.002s

[root@local_deploy_192-168-1-5 ~]# time curl http://192.168.1.6:10000/popularity/word
{
  "popularity": 0.0,
  "word": "word"
}

real	2m45.296s
user	0m0.009s
sys	0m0.002s
##
新的接口只要30秒就可以返回，明显比一开始的3分钟好很多
##
当然，这只是优化的第一步，并且方法也不算完善，还可以做进一步的优化
不过，在实际系统中，大都是类似的做法，先用最简单的方法，尽早解决线上问题，然后再继续思考更好的优化方法

小结

今天分析了一个响应过慢的单词热度案例

首先，用top、iostat分析了系统的CPU和磁盘使用情况，发现了磁盘I/O瓶颈，也知道了这个瓶颈是案例应用导致的
接着用strace来观察进程的系统调用，没找到任何write系统调用
于是用新的工具，借助动态追踪工具包bcc中的filetop和opensnoop，找出了案例应用的问题，发现这个根源是大量读写临时文件

找出问题后，优化方法就相对比较简单了
如果内存充足时，最简单的方法，就是把数据都放在速度更快的内存中，这样就没有磁盘I/O的瓶颈了
当然，再进一步，还可以利用Trie树等各种算法，进一步优化单词处理的效率

posted @ 2021-12-20 15:41 李成果阅读(1319) 评论(0) 编辑收藏举报

刷新页面返回顶部

Alnk

邮箱:1029612787@qq.com

27-案例篇：为什么我的磁盘I/O延迟很高？

案例

小结

公告