时不待我 天道酬勤

没有多少时间可以虚度了....

导航

Kanzher 数据库优化日志

Posted on 2011-07-04 23:02  jadesun  阅读(311)  评论(0编辑  收藏  举报
2009-08-19 ; 192.168.0.120 ; 数据库版本 5.4.1 ;
优化目的:kanzher 用户登陆后,默认打开 我的圈子 这个功能模块,这个模块里面有 我参与圈子的最新博文 的信息展示。这一块查询量很大,严重的影响了用户体验,需要针对它进行性能优化,几个表的数据如下 kanzher_article_ 700 百万的数据量,kanzher_rss_article_a 将近 800百万的数据量。
需要优化的SQL 语句如下:

 

SELECT a.aid FROM kanzher_article_ a JOIN (
SELECT ra.AID AS aid,rssid,group_id,gname FROM kanzher_rss_article_a ra JOIN (
SELECT DISTINCT krg.RSS_ID AS rssid,krg.group_id AS group_id,g.name AS gname FROM kzr_rss_gmember_group krg
JOIN xjj_group_member gm ON krg.GROUP_ID = gm.group_id JOIN xjj_group g ON krg.group_id = g.group_id
WHERE gm.member_id=24 AND krg.is_show = 1) rg ON ra.rid=rg.rssid ORDER BY ra.AID DESC LIMIT 100 ) b
ON a.aid = b.aid ORDER BY a.UPDATED_DATE DESC limit 10

 

首次在 MYSQL 中执行, flush table后的查询效率。
+----------+
| aid |
+----------+
| 10607882 |
| 10607394 |
| 10608411 |
| 10608190 |
| 10608069 |
| 10608189 |
| 10608179 |
| 10606631 |
| 10608068 |
| 10608067 |
+----------+
10 rows in set (23.23 sec)
查询需要 23 .23 sec ,够呛。

 

开始优化步骤
一:最小化语句
执行最内层SQL: SELECT DISTINCT krg.RSS_ID AS rssid,krg.group_id AS group_id,g.name AS gname FROM kzr_rss_gmember_group krg
JOIN xjj_group_member gm ON krg.GROUP_ID = gm.group_id JOIN xjj_group g ON krg.group_id = g.group_id
WHERE gm.member_id=24 AND krg.is_show = 1
116 rows in set (0.02 sec)
返回 116 行,耗时 0.02 sec 。

 

执行外层SQL : SELECT ra.AID AS aid,rssid,group_id,gname FROM kanzher_rss_article_a ra JOIN (
SELECT DISTINCT krg.RSS_ID AS rssid,krg.group_id AS group_id,g.name AS gname FROM kzr_rss_gmember_group krg
JOIN xjj_group_member gm ON krg.GROUP_ID = gm.group_id JOIN xjj_group g ON krg.group_id = g.group_id
WHERE gm.member_id=24 AND krg.is_show = 1) rg ON ra.rid=rg.rssid ORDER BY ra.AID DESC LIMIT 100
100 rows in set (19.89 sec)
返回 100 行,耗时 19.89 sec 。

先解决在这里的查询耗时问题,使用 SET PROFILING = 1; 打开 PROFILING 功能,执行 flush table 后,再次执行 SQL 语句,然后查看 PROFILING 信息。
| Copying to tmp table | 2.572060 | 0.176973 | 0.160976 |
| converting HEAP to MyISAM | 0.138391 | 0.067990 | 0.069989 |
| Copying to tmp table on disk | 9.095262 | 0.553915 | 0.611907 |
| Sorting result | 0.633986 | 0.398940 | 0.229965 |

 

可以看出,性能损耗在 I / O 这一块。 执行 EXPLAIN 来查看一下 MySQL Query Qotimizer 给出的方案。
-+----------------------------+---------+---------------------+------+-------------------------------------------+
| key | key_len | ref | rows | Extra |
-+----------------------------+---------+---------------------+------+-------------------------------------------+
| NULL |NULL | NULL | 116 | Using temporary; Using filesort |
| FK_kanzher_rss_article_a_2 | 8 | rg.rssid | 335 | |
| Unique_Member_Group | 9 | | 14 | Using where; Using index; Using temporary |
| PRIMARY | 8 | gooweb2.gm.group_id | 1 | |
| NULL | NULL | NULL | 425 | Using where; Using join buffer |

 

-+----------------------------+---------+---------------------+------+-------------------------------------------+

1,采取小结果集驱动大结果集的方案,减少 I/O 操作,更新语句如下:
SELECT ra.AID as aid ,rssid,group_id,gname FROM (SELECT DISTINCT krg.RSS_ID AS rssid,krg.group_id AS group_id,g.name AS gname FROM kzr_rss_gmember_group krg JOIN xjj_group_member gm ON krg.GROUP_ID = gm.group_id JOIN xjj_group g ON krg.group_id = g.group_id WHERE gm.member_id=24 AND krg.is_show = 1) rg JOIN kanzher_rss_article_a ra ON rg.rssid=ra.rid ORDER BY ra.AID DESC LIMIT 100
执行 SQL ,100 rows in set (2.98 sec) 。 返回 100 行, 耗时 2.98 sec 。多次 flush table 后, 查询范围在 3.02 - 3.08 之间。提高了很多效率,再查看 PROFILING 信息。
| Copying to tmp table | 0.238766 | 0.142979 | 0.095985 | 0 | 0 |
| converting HEAP to MyISAM | 0.139159 | 0.060990 | 0.077988 | 0 | 0 |
| Copying to tmp table on disk | 1.908297 | 0.491926 | 0.497925 | 0 | 0 |
| Sorting result | 0.640761 | 0.403938 | 0.232964 | 0 | 0 |

 

Copying to tmp table 的操作消耗的时间大幅度下降。继续小结果集驱动大结果集的原则,改动内层的 SQL 语句。
SELECT ra.AID as aid ,rssid,group_id,gname FROM (SELECT DISTINCT krg.RSS_ID AS rssid,krg.group_id AS group_id,g.name AS gname FROM xjj_group g JOIN kzr_rss_gmember_group krg ON krg.group_id = g.group_id JOIN xjj_group_member gm ON krg.GROUP_ID = gm.group_id WHERE gm.member_id=24 AND krg.is_show = 1) rg JOIN kanzher_rss_article_a ra ON rg.rssid=ra.rid ORDER BY ra.AID DESC LIMIT 100
执行SQL ,100 rows in set (2.38 sec)。 返回 100 行, 耗时 2.38 sec 。查看 PROFILING 信息。
| Copying to tmp table | 0.304732 | 0.205968 | 0.098985 |
| converting HEAP to MyISAM | 0.136316 | 0.062991 | 0.073989 |
| Copying to tmp table on disk | 1.218768 | 0.666898 | 0.526920 |
| Sorting result | 0.657609 | 0.414937 | 0.238964 |

 

针对 Copying to tmp table on disk 的问题,在 my.cnf 增加 tmp_table_size = 256M 的命令,提高 copy tmp table 的效率。
执行 service mysql restart 重启 MYSQL ,执行 SQL 语句
SELECT ra.AID as aid ,rssid,group_id,gname FROM (SELECT DISTINCT krg.RSS_ID AS rssid,krg.group_id AS group_id,g.name AS gname FROM xjj_group g JOIN kzr_rss_gmember_group krg ON krg.group_id = g.group_id JOIN xjj_group_member gm ON krg.GROUP_ID = gm.group_id WHERE gm.member_id=24 AND krg.is_show = 1) rg JOIN kanzher_rss_article_a ra ON rg.rssid=ra.rid ORDER BY ra.AID DESC LIMIT 100

 

返回 100 rows in set (2.09 sec) ,返回 100 行, 耗时 2.09 sec。查看 PROFILING 信息。
| Copying to tmp table | 0.230333 | 0.134979 | 0.094986 |
| converting HEAP to MyISAM | 0.138827 | 0.055992 | 0.082987 |
| Copying to tmp table on disk | 1.008573 | 0.498924 | 0.486926 |
| Sorting result | 0.648606 | 0.422936 | 0.220967 |

 

Copying to tmp table 的操作消耗的时间相对的下降, 但几次查询的 Sorting result 消耗始终在 0.63 - 0.65 之间。看来 ORDER BY ra.AID DESC 也需要优化。
取消 ORDER BY ra.AID DESC 的语句进行查询,执行SQL语句
SELECT ra.AID as aid ,rssid,group_id,gname FROM (SELECT DISTINCT krg.RSS_ID AS rssid,krg.group_id AS group_id,g.name AS gname
FROM xjj_group g JOIN kzr_rss_gmember_group krg ON krg.group_id = g.group_id JOIN xjj_group_member gm ON krg.GROUP_ID =
gm.group_id WHERE gm.member_id=24 AND krg.is_show = 1) rg JOIN kanzher_rss_article_a ra ON rg.rssid=ra.rid limit 100;
100 rows in set (0.02 sec) ,不敢相信这个值,做了几次 flush table 和 reset query cache 的 操作后,查询还是这样。太离谱了。

 

取消ORDER BY 和 LIMIT 执行SQL语句,测试结果集和全表扫描效率。
SELECT ra.AID as aid ,rssid,group_id,gname FROM (SELECT DISTINCT krg.RSS_ID AS rssid,krg.group_id AS group_id,g.name AS gname
FROM xjj_group g JOIN kzr_rss_gmember_group krg ON krg.group_id = g.group_id JOIN xjj_group_member gm ON krg.GROUP_ID =
gm.group_id WHERE gm.member_id=24 AND krg.is_show = 1) rg JOIN kanzher_rss_article_a ra ON rg.rssid=ra.rid ;
返回143416 rows in set (2.13 sec) ,一共有 14 万条数据 ,查询时间 2.13 sec 。
2009-08-20 ; 192.168.0.120 ; 数据库版本 5.4.1 ;

 

GOTO 一个新的方案,易明亮同学建了一个新的KANZHER_RSS_表,该表中有一个字段 Aritlce_IDS ,该字段保存了 RSS 中的最新 50 篇文章。基于易明亮同学的做法,修改语句,查询效率估计能大大的提高,不需要再去 14 万条数据里面找最大的 ID 值,采用 ORDER BY AID DESC 的语句消耗时间。测试开始:
1,还是小结果集驱动大结果集,从最内层的SQL开始。
SELECT DISTINCT krg.RSS_ID AS rssid,krg.group_id AS group_id,g.name AS gname FROM xjj_group g JOIN kzr_rss_gmember_group krg ON krg.group_id = g.group_id JOIN xjj_group_member gm ON krg.GROUP_ID = gm.group_id WHERE gm.member_id=24 AND krg.is_show = 1

 

2,连接 KANZHER_RSS_ 表,查询 Article_IDS 的值。
SELECT Article_IDS,rg.rssid,group_id,gname FROM KANZHER_RSS_ rss,(
SELECT DISTINCT krg.RSS_ID AS rssid,krg.group_id AS group_id,g.name AS gname FROM xjj_group g JOIN
kzr_rss_gmember_group krg ON krg.group_id = g.group_id JOIN xjj_group_member gm ON krg.GROUP_ID = gm.group_id
WHERE gm.member_id=24 AND krg.is_show = 1 )
rg WHERE rss.rss_id = rg.rssid
116 rows in set (0.08 sec),返回 116 行数据,需要消耗时间 0.08 sec 。

 

3,连接 kanzher_article_ 表,查询最终结果 AID ,打开profiling 功能,执行 flush table; reset query cache;
SELECT aid FROM kanzher_article_ ka , (
SELECT Article_IDS,rg.rssid,group_id,gname FROM KANZHER_RSS_ rss,(
SELECT DISTINCT krg.RSS_ID AS rssid,krg.group_id AS group_id,g.name AS gname FROM xjj_group g JOIN kzr_rss_gmember_group krg
ON krg.group_id = g.group_id JOIN xjj_group_member gm ON krg.GROUP_ID = gm.group_id
WHERE gm.member_id=24 AND krg.is_show = 1) rg WHERE rss.rss_id = rg.rssid) ra
WHERE ka.aid IN (ra.Article_IDS) ORDER BY ka.UPDATED_DATE DESC limit 10
10 rows in set (0.98 sec),返回 10 行数据,需要消耗 0.98 sec 。查询时间在1 sec 内,基本可以接受了。

 

查看 profiling 的信息,Copying to tmp table 和 Sorting result 降低。
| Creating tmp table | 0.000072 | 0.000000 | 0.000000 | 0 |
| executing | 0.000012 | 0.000000 | 0.000000 | 0 |
| Copying to tmp table | 0.894903 | 0.007999 | 0.006999 | 0 |
| Sorting result | 0.000127 | 0.000000 | 0.000000 | 0 |
完成 我参与圈子的最新博文 的功能优化。 总结: 小结果集驱动大结果集,能用EXISTS尽量不使用JOIN。