记测试环境一次容量报错
案发现场:收到报警登上去看发现master容量快满了,但slave还有60-70g的剩余空间,master上也没有堆积的binlog,感觉很奇怪。过了一会master容量满了,发生切换,然后老master的容量自动恢复了,感觉又很奇怪
第一时间processlist抓到一条语句如下:
*************************** 8. row ***************************
ID: 3384308
USER: us_test_zsj
HOST: 10.32.132.7:61523
DB: fltreimbursementdb
COMMAND: Query
TIME: 552
STATE: Sending data
INFO: select * from exp_expressdetail ed
inner join exp_expressinfo ei on ed.expressinfoid=ed.expressinfoid and ei.getticketway='pjs'
where isvalidinpackage=1 and invtype=2 group by ed.expressinfoid having(count(distinct ed.orderid)=2) order by ed.datachange_lasttime desc limit 10
然后去翻errorlog
[op1@FATM00001160 mysql]$ cat mysql_error.log|grep 2020-07-30
2020-07-30T16:40:00.634672+08:00 4580228 [ERROR] InnoDB: posix_fallocate(): Failed to preallocate data for file ./ibtmp1, desired size 67108864 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2020-07-30T16:40:00.660929+08:00 4580228 [Warning] InnoDB: Retry attempts for writing partial data failed.
2020-07-30T16:40:00.660956+08:00 4580228 [ERROR] InnoDB: Write to file ./ibtmp1failed at offset 66315091968, 1048576 bytes should have been written, only 0 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
2020-07-30T16:40:00.660985+08:00 4580228 [ERROR] InnoDB: Error number 28 means 'No space left on device'
2020-07-30T16:40:00.661002+08:00 4580228 [Warning] InnoDB: Error while writing 67108864 zeroes to ./ibtmp1 starting at offset 66249031680
2020-07-30T16:40:00.661034+08:00 4580228 [ERROR] /usr/sbin/mysqld: The table '/data/tmp/#sql_2d_7' is full
2020-07-30T16:40:00.661082+08:00 4579873 [ERROR] InnoDB: posix_fallocate(): Failed to preallocate data for file ./ibtmp1, desired size 1048576 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
第一句:给ibtmp1预分配空间失败,结合后面的语句,猜测就是这个语句引起的。Mysql中join会创建一个临时表用于执行语句,猜测可能是临时表太大把它撑满了。
问题复现:
#数据+索引
mysql>select concat(round(sum((data_length+index_length)/1024/1024/1024),2),'GB') as data from tables where table_schema='fltreimbursementdb' and table_name='exp_expressdetail';
+--------+
| data |
+--------+
| 0.29GB |
+--------+
1 row in set (0.00 sec)
mysql> select concat(round(sum((data_length+index_length)/1024/1024/1024),2),'GB') as data from tables where table_schema='fltreimbursementdb' and table_name='exp_expressinfo';
+--------+
| data |
+--------+
| 0.23GB |
+--------+
发现这2个表好像也不是很大
#行数
mysql> select count(*) from fltreimbursementdb.exp_expressdetail;
select count(*) from fltreimbursementdb.exp_expressinfo;
+----------+
| count(*) |
+----------+
| 1809134 |
+----------+
1 row in set (7.21 sec)
mysql> +----------+
| count(*) |
+----------+
| 668840 |
+----------+
1 row in set (2.49 sec)
但是有很多行,2个表join,笛卡尔积操作后会生产1809134*668840=1210021184560条数据(一万亿+)
把2个表dump到自己的测试机上试了一下,跑了这个语句
mysql> select * from exp_expressdetail ed
-> inner join exp_expressinfo ei on ed.expressinfoid=ed.expressinfoid and ei.getticketway='pjs'
-> where isvalidinpackage=1 and invtype=2 group by ed.expressinfoid having(count(distinct ed.orderid)=2) order by ed.datachange_lasttime desc limit 10;
发现的确临时表空间在增大,到37g时语句报错:ERROR 1114 (HY000): The table '/data/tmp/#sql_12c_0' is full,(我的机器就40g容量)
问题复现成功。容量满了以后触发切换,重启以后容量恢复正常就说得通了。Master爆了slave没爆也解释的通了,因为这个语句在master爆了的时候还没执行完,也就不会进binlog被传到slave那边执行了。
学到了不止是insert,Select也可能导致容量不足。