controlling the variance of request response times and not just worrying about maximizing queries per second

http://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.html

Thursday, November 4, 2010 at 8:48AM

Facebook gave a MySQL Tech Talk where they talked about many things MySQL, but one of the more subtle and interesting points was their focus on controlling the variance of request response times and not just worrying about maximizing queries per second.

But first the scalability porn. Facebook's OLTP performance numbers were as usual, quite dramatic:

Query response times: 4ms reads, 5ms writes.
Rows read per second: 450M peak
Network bytes per second: 38GB peak
Queries per second: 13M peak
Rows changed per second: 3.5M peak
InnoDB disk ops per second: 5.2M peak

Some thoughts on creating quality, not quantity:

They don't care about average response times, instead, they want to minimize variance. Every click must be responded to quickly. The quality of service for each request matters.
It's OK if a query is slow as long as it is always slow.
They don't try to get the highest queries per second out of each machine. What is important is that the edge cases are not the bad.
They figure out why the response time for the worst query is bad and then fix it.
The performance community is often focussed on getting the highest queries per second. It's about making sure they have the best mix of IOPs available, cache size, and space.

To minimize variance they must be able notice, diagnose, and then fix problems:

They measure how things work in operation. They can monitor at subsecond levels so they catch problems.
Servers have miniature fractures in their performance which they call "stalls." They've built tools to find these.
Dogpile collection. Every second it notices if something is wrong and ships it out for analysis.
Poor man's profiler. Attach GDB to servers to know what's going on, they can see when stalls happen.
Problems are usually counter-intuitive this can never happen type problems.
- Extending a table locks the entire instance.
- Flushing dirty pages was actually blocking.
- How statistics get sampled in InnoDB.
- Problems happen on medium-loaded systems too. Their systems aren't that loaded to ensure quality of service, yet the problems still happen.
Analyze and understand every layer of the software stack to see how it performs at scale.
Monitoring system monitors different aspects of performance so they can notice a change in performance, drill down to the host, then drill down to the query that might be causing the problem, then kill the query, and then trace it back to the source file where it occurred.
They have a small team, so they make very specific changes to Linux and MySQL to support their use cases. Longer term changes are made by others.

Please watch the MySQL Tech Talk for more color and details.

IOPS_百度百科 https://baike.baidu.com/item/IOPS

IOPS (Input/Output Operations Per Second)，即每秒进行读写（I/O）操作的次数，多用于数据库等场合，衡量随机访问的性能。存储端的IOPS性能和主机端的IO是不同的，IOPS是指存储每秒可接受多少次主机发出的访问，主机的一次IO需要多次访问存储才可以完成。例如，主机写入一个最小的数据块，也要经过“发送写入请求、写入数据、收到写入确认”等三个步骤，也就是3个存储端访问。

中文名: 每秒进行读写操作的次数
外文名: Input/Output Operations Per Second

缩写

IOPS

作用

衡量随机访问的性能

应用

数据库

IOPS

编辑

中文名: 每秒进行读写操作的次数
外文名: Input/Output Operations Per Second

缩写: IOPS
作用: 衡量随机访问的性能
应用: 数据库

两大瓶颈

编辑

两大瓶颈主要体现在2个方面：吞吐量与IOPS。

吞吐量

吞吐量主要取决于阵列的构架，光纤通道的大小(阵列一般

都是光纤阵列，至于SCSI这样的SSA阵列，我们不讨论)以及硬盘的个数。阵列的构架与每个阵列不同而不同，他们也都存在内部带宽(类似于pc的系统总线)，不过一般情况下，内部带宽都设计的很充足，不是瓶颈的所在。

光纤通道的影响还是比较大的，如数据仓库环境中，对数据的流量要求很大，而一块2Gb的光纤卡，所能支撑的最大流量应当是2Gb/10(小B)=200MB/s(大B)的实际流量，当5块光纤卡才能达到1GB/s的实际流量，所以数据仓库环境可以考虑换4Gb的光纤卡。

最后说一下硬盘的限制，这里是最重要的，当前面的瓶颈不再存在的时候，就要看硬盘的个数了，我下面列一下不同的硬盘所能支撑的流量大小：

10 K rpm 15 K rpm ATA

——— ——— ———

10M/s 13M/s 8M/s

那么，假定一个阵列有120块15K rpm的光纤硬盘，那么硬盘上最大的可以支撑的流量为120*13=1560MB/s，如果是2Gb的光纤卡，可能需要6块才能够，而4Gb的光纤卡，3-4块就够了。

IOPS

决定IOPS的主要取决于阵列的算法，cache命中率，以及磁盘个数。阵列的算法因为不同的阵列不同而不同，如我们遇到在hds usp上面，可能因为ldev(lun)存在队列或者资源限制，而单个ldev的iops就上不去，所以，在使用这个存储之前，有必要了解这个存储的一些算法规则与限制。

cache的命中率取决于数据的分布，cache size的大小，数据访问的规则，以及cache的算法，如果完整的讨论下来，这里将变得很复杂，可以有一天好讨论了。我这里只强调一个cache的命中率，如果一个阵列，读cache的命中率越高越好，一般表示它可以支持更多的IOPS，为什么这么说呢?这个就与我们下面要讨论的硬盘IOPS有关系了。

硬盘的限制，每个物理硬盘能处理的IOPS是有限制的，如

10 K rpm 15 K rpm ATA

——— ——— ———

100 150 50

同样，如果一个阵列有120块15K rpm的光纤硬盘，那么，它能撑的最大IOPS为120*150=18000，这个为硬件限制的理论值，如果超过这个值，硬盘的响应可能会变的非常缓慢而不能正常提供业务。

在raid5与raid10上，读iops没有差别，但是，相同的业务写iops，最终落在磁盘上的iops是有差别的，而我们评估的却正是磁盘的IOPS，如果达到了磁盘的限制，性能肯定是上不去了。

那我们假定一个case，业务的iops是10000，读cache命中率是30%，读iops为60%，写iops为40%，磁盘个数为120，那么分别计算在raid5与raid10的情况下，每个磁盘的iops为多少。

raid5:

单块盘的iops = (10000*(1-0.3)*0.6 + 4 * (10000*0.4))/120

= (4200 + 16000)/120

= 168

这里的10000*(1-0.3)*0.6表示是读的iops，比例是0.6，除掉cache命中，实际只有4200个iops

而4 * (10000*0.4) 表示写的iops，因为每一个写，在raid5中，实际发生了4个io，所以写的iops为16000个

为了考虑raid5在写操作的时候，那2个读操作也可能发生命中，所以更精确的计算为：

单块盘的iops = (10000*(1-0.3)*0.6 + 2 * (10000*0.4)*(1-0.3) + 2 * (10000*0.4))/120

= (4200 + 5600 + 8000)/120

= 148

计算出来单个盘的iops为148个，基本达到磁盘极限

raid10

单块盘的iops = (10000*(1-0.3)*0.6 + 2 * (10000*0.4))/120

= (4200 + 8000)/120

= 102

可以看到，因为raid10对于一个写操作，只发生2次io，所以，同样的压力，同样的磁盘，每个盘的iops只有102个，还远远低于磁盘的极限iops。

在一个实际的case中，一个恢复压力很大的standby(这里主要是写，而且是小io的写)，采用了raid5的方案，发现性能很差，通过分析，每个磁盘的iops在高峰时期，快达到200了，导致响应速度巨慢无比。后来改造成raid10，就避免了这个性能问题，每个磁盘的iops降到100左右。

posted @ 2016-12-30 14:27 papering 阅读(216) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

papering

controlling the variance of request response times and not just worrying about maximizing queries per second

IOPS

目录

两大瓶颈

吞吐量

IOPS