Ceph 集群 Slow Requests 和 Requests are Blocked 问题分析
Slow Requests, and Requests are Blocked 慢速请求,并且请求被阻止
The ceph-osd daemon is slow to respond to a request and the ceph health detail command returns an error message similar to the following one:
HEALTH_WARN 30 requests are blocked > 32 sec; 3 osds have slow requests
30 ops are blocked > 268435 sec
1 ops are blocked > 268435 sec on osd.11
1 ops are blocked > 268435 sec on osd.18
28 ops are blocked > 268435 sec on osd.39
3 osds have slow requests
In addition, the Ceph logs include an error message similar to the following ones:
2015-08-24 13:18:10.024659 osd.1 127.0.0.1:6812/3032 9 : cluster [WRN] 6 slow requests, 6 included below; oldest blocked for > 61.758455 secs
2016-07-25 03:44:06.510583 osd.50 [WRN] slow request 30.005692 seconds old, received at {date-time}: osd_op(client.4240.0:8 benchmark_data_ceph-1_39426_object7 [write 0~4194304] 0.69848840) v4 currently waiting for subops from [610]
What This Means 这意味着什么
An OSD with slow requests is every OSD that is not able to service the I/O operations per second (IOPS) in the queue within the time defined by the osd_op_complaint_time parameter. By default, this parameter is set to 30 seconds. 请求缓慢的OSD是每个不能在osd_op_complaint_time参数定义的时间内在队列中每秒处理I/O操作(IOPS)的OSD。默认情况下,此参数设置为30秒。
The main causes of OSDs having slow requests are: OSD请求缓慢的主要原因
Problems with the underlying hardware, such as disk drives, hosts, racks, or network switches 基础硬件(例如磁盘驱动器,主机,机架或网络交换机)的问题。
Problems with network. These problems are usually connected with flapping OSDs. See Section 5.1.4, “Flapping OSDs” for details. 网络问题。这些问题通常与OSD抖动有关,例如移动OSD。
System load 系统负荷。
The following table shows the types of slow requests. Use the dump_historic_ops administration socket command to determine the type of a slow request. For details about the administration socket, see the Using the Administration Socket section in the Administration Guide for Red Hat Ceph Storage 2.
下表显示了慢速请求的类型。使用dump_historic_ops管理套接字命令来确定慢速请求的类型。有关管理套接字的详细信息,请参阅《 Red Hat Ceph Storage 2管理指南》中的“使用管理套接字”部分。
Slow request type Description
------------------------------------------------------------------------------------------------------------------------
waiting for rw locks 等待RW锁 The OSD is waiting to acquire a lock on a placement group for the operation. OSD正在等待获取该操作的放置组上的锁。
waiting for subops 等待子操作 The OSD is waiting for replica OSDs to apply the operation to the journal. OSD正在等待副本OSD将操作应用于日志。
no flag points reached 未达到标志点 The OSD did not reach any major operation milestone. OSD没有达到任何主要的操作里程碑。
waiting for degraded object 等待退化的对象 The OSDs have not replicated an object the specified number of times yet. OSD尚未复制对象指定次数。
To Troubleshoot This Problem 解决此问题
1. Determine if the OSDs with slow or block requests share a common piece of hardware, for example a disk drive, host, rack, or network switch. 确定请求缓慢或阻塞的OSD是否共享公用硬件,例如磁盘驱动器,主机,机架或网络交换机。
2. If the OSDs share a disk:
i. Use the smartmontools utility to check the health of the disk or the logs to determine any errors on the disk. 使用smartmontools实用工具检查磁盘或日志的运行状况,以确定磁盘上的任何错误。
# smartctl -i /dev/sda 检查磁盘的 Smart 功能是否启用。
# smartctl -H /dev/sda 显示磁盘总体健康状况。
# smartctl -l error /dev/sda 显示磁盘错误日志。
# smartctl -s on -a /dev/sda 检查非阵列磁盘。
# smartctl -a -d megaraid,0 /dev/sda 检查阵列磁盘,其中megaraid,0的0代表的是在megaraid中的物理盘编号。
Note:
The smartmontools utility is included in the smartmontools package. smartmontools实用程序包含在smartmontools软件包中。
ii. Use the iostat utility to get the I/O wait report (%iowai) on the OSD disk to determine if the disk is under heavy load. 使用iostat实用程序获取OSD磁盘上的I / O等待报告(%iowai),以确定该磁盘是否处于高负载状态。例如:iostat -c 1 20
Note:
The iostat utility is included in the sysstat package. iostat实用程序包含在sysstat软件包中。
3. If the OSDs share a host:
i. Check the RAM and CPU utilization 检查RAM和CPU利用率,例如:free -h, top
ii. Use the netstat utility to see the network statistics on the Network Interface Controllers (NICs) and troubleshoot any networking issues. See also Chapter 3, Troubleshooting Networking Issues for further information. 使用netstat实用工具可以查看网络接口控制器(NIC)上的网络统计信息,并解决所有网络问题。另请参阅第3章,对网络问题进行故障排除。例如:netstat -an
4. If the OSDs share a rack, check the network switch for the rack. For example, if you use jumbo frames, verify that the NIC in the path has jumbo frames set. 如果OSD共享一个机架,请检查机架的网络交换机。例如,如果使用巨型帧,请验证路径中的NIC是否设置了巨型帧。
5. If you are unable to determine a common piece of hardware shared by OSDs with slow requests, or to troubleshoot and fix hardware and networking problems, open a support ticket. See Chapter 7, Contacting Red Hat Support Service for details. 如果您无法确定请求缓慢的OSD共享的通用硬件,或者无法解决和修复硬件和网络问题,请打开支持通知单。有关详细信息,请参见第7章,与Red Hat支持服务联系。
See Also
The Using the Administration Socket section: https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html-single/administration_guide/#using_the_administration_socket
在服务器中通过netstat -na 查看连接状态
在服务器中通过netstat -na查看连接状态,可以看到连接正常、连接断开、TIME_WAIT等等可能状态的TCP连接,若一个系统出现了大量 TIME_WAIT 状态的连接,说明该服务器经常主动发起连接关闭操作,这也是不可取的。如一个系统频繁出现 CLOSE_WAIT 状态的连接,说明该系统并未立即处理连接关闭请求,系统也存在缺陷。
同时通过观察 netstat -na 的 send-q 和 recv-q 队列的大小,可以分析系统服务能力,若send-q过大,说明系统发包速度过快以至于连接无法及时将数据发出。若recv-q过大,说明系统未能及时处理外部发来的请求。
通过netstat还可以检测服务器是否能正常处理客户端连接。服务器在调用listen时,会传递backlog参数,该参数未已建立连接但未被程序accept的连接数,内核层会根据 /proc/sys/net/core/somaxconn 值与传入的backlog值,选择两者中的小值作为已建立连接但未被服务器accept的连接队列长度。
netstat -na |grep PORT | grep LISTEN 可以查看到监听句柄的recv-q队列大小,如果该值较大升值>=backlog值,说明服务器无法适应当前连接建立速度,不能及时的accept新连接,此时即使服务器内部统计无压力,各种请求处理指标都正常也会影响外部服务,因为新的连接可能会失败(不失败也会等待较长时间才被服务器处理,而此时可能客户端已经超时重连了...一旦发生这种情形就会恶性循环-连接一直建立,但每个连接都失败)。