代码改变世界

修复分区表中错位的行

  abce  阅读(144)  评论(0编辑  收藏  举报

在MySQL中,分区表的数据被分到不同的表空间中,但仍然被视为单个表。在处理大量数据集的某些情况下,分区可能是一种有用的方法。分区表可以加快删除大型数据集的速度,但如果处理不当,可能会将数据错误地放置在表中。

在这将分享如何检查和修复这样一个表中的数据,同时尽量减少对表的破坏。

分区表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
mysql> show create table salariesG
*************************** 1. row ***************************
       Table: salaries
Create Table: CREATE TABLE `salaries` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50500 PARTITION BY RANGE  COLUMNS(from_date)
(PARTITION p01 VALUES LESS THAN ('1985-12-31') ENGINE = InnoDB,
 PARTITION p02 VALUES LESS THAN ('1986-12-31') ENGINE = InnoDB,
 PARTITION p03 VALUES LESS THAN ('1987-12-31') ENGINE = InnoDB,
 PARTITION p04 VALUES LESS THAN ('1988-12-31') ENGINE = InnoDB,
 PARTITION p05 VALUES LESS THAN ('1989-12-31') ENGINE = InnoDB,
 PARTITION p06 VALUES LESS THAN ('1990-12-31') ENGINE = InnoDB,
 PARTITION p07 VALUES LESS THAN ('1991-12-31') ENGINE = InnoDB,
 PARTITION p08 VALUES LESS THAN ('1992-12-31') ENGINE = InnoDB,
 PARTITION p09 VALUES LESS THAN ('1993-12-31') ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN ('1994-12-31') ENGINE = InnoDB,
 PARTITION p11 VALUES LESS THAN ('1995-12-31') ENGINE = InnoDB,
 PARTITION p12 VALUES LESS THAN ('1996-12-31') ENGINE = InnoDB,
 PARTITION p13 VALUES LESS THAN ('1997-12-31') ENGINE = InnoDB,
 PARTITION p14 VALUES LESS THAN ('1998-12-31') ENGINE = InnoDB,
 PARTITION p15 VALUES LESS THAN ('1999-12-31') ENGINE = InnoDB,
 PARTITION p16 VALUES LESS THAN ('2000-12-31') ENGINE = InnoDB,
 PARTITION p17 VALUES LESS THAN ('2001-12-31') ENGINE = InnoDB,
 PARTITION p18 VALUES LESS THAN ('2002-12-31') ENGINE = InnoDB,
 PARTITION p19 VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */

查询任意一个数据集,MySQL可以快速地返回数据,没有任何错误:

1
2
3
4
5
6
7
mysql> SELECT * FROM salaries where emp_no=10001 and from_date='1999-06-23'
+--------+--------+------------+------------+ 
| emp_no | salary | from_date  | to_date    | 
+--------+--------+------------+------------+ 
|  10001 |  84917 | 1999-06-23 | 2000-06-22 | 
+--------+--------+------------+------------+ 
1 row in set (0.00 sec)

但是如果数据发生了分区错位,就不能返回任何值了:

1
mysql> SELECT * FROM salaries where emp_no=10001 and from_date='1999-06-23';

乍看数据像是已经被删除了,使使用explain查看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mysql> EXPLAIN SELECT * FROM salaries where emp_no=10001 and from_date='1999-06-23'G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
   partitions: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: no matching row in const table
1 row in set, 1 warning (0.00 sec)

但是数据还在表中,只是在错误的分区中:

1
2
3
4
5
6
7
8
mysql> SELECT * FROM salaries PARTITION(p08) LIMIT 1;
+--------+--------+------------+------------+
| emp_no | salary | from_date  | to_date    |
+--------+--------+------------+------------+
|  10001 |  84917 | 1999-06-23 | 2000-06-22 |
+--------+--------+------------+------------+
1 row in set (0.00 sec)
Empty set (0.00 sec)

 

如何检测是否有错位的行呢?

可以借助工具check,来检测:(大表需要的时间会很长)

1
2
3
4
5
6
7
8
9
mysql> CHECK TABLE salaries;
+--------------------+-------+----------+---------------------------------------------------------------------------------------+
| Table              | Op    | Msg_type | Msg_text                                                                              |
+--------------------+-------+----------+---------------------------------------------------------------------------------------+
| employees.salaries | check | error    | Found a misplaced row in part 7 should be in part 14:  from_date:1999-06-23           |
| employees.salaries | check | error    | Partition p08 returned error                                                          |
| employees.salaries | check | error    | Table upgrade required. Please do "REPAIR TABLE `salaries`" or dump/reload to fix it! |
+--------------------+-------+----------+---------------------------------------------------------------------------------------+
3 rows in set (0.83 sec)

 

行是如何错位的?

上面的例子抛出了一个问题:MySQL怎么允许这种情况发生的,如何修复它?对于前者的答案是MySQL允许用户用一个单独的表交换一个分区;默认情况下,它在插入数据到表之前会检查每一行。

1
2
mysql> ALTER TABLE salaries EXCHANGE PARTITION p15 WITH TABLE salaries_p15;
Query OK, 0 rows affected (0.04 sec)

这里,salaries_p15是一个空表,结构和salaries表相同,只是没有做分区而已。

1
2
3
4
5
6
7
8
9
10
11
mysql> show create table salaries_p15G
*************************** 1. row ***************************
       Table: salaries_p15
Create Table: CREATE TABLE `salaries_p15` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

交换分区之后,分区p15就是空的,而salaries_p15包含了分区P15的数据。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
mysql> SELECT count(*) from salaries_p15;
+----------+
| count(*) |
+----------+
|   260926 |
+----------+
1 row in set (0.02 sec)
 
 
mysql> SELECT count(*) from salaries PARTITION(p15);
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.06 sec)

 

MySQL为了加快分区交换的速度,可以不做数据检查:

1
2
mysql> ALTER TABLE salaries EXCHANGE PARTITION p08 WITH TABLE salaries_p15 WITHOUT VALIDATION;
Query OK, 0 rows affected (0.04 sec)

这里分区p8可以和salaries_p15快速地交换,从而就导致了记录的错位!在生产环境下,建议谨慎使用without validation选项。

1
2
3
4
5
6
7
mysql> SELECT * FROM salaries where emp_no=10001 and from_date='1999-06-23'
+--------+--------+------------+------------+ 
| emp_no | salary | from_date  | to_date    | 
+--------+--------+------------+------------+ 
|  10001 |  84917 | 1999-06-23 | 2000-06-22 | 
+--------+--------+------------+------------+ 
1 row in set (0.00 sec)

 

如何修复?

有两种修复方法:

1.使用repair table命令

2.执行一次空的alter交换操作

repair操作会将错位的记录安排到正确的分区,但是需要在表上加上system lock。

1
2
3
4
5
6
7
8
mysql> repair table salaries;
+--------------------+--------+----------+-----------------------------+
| Table              | Op     | Msg_type | Msg_text                    |
+--------------------+--------+----------+-----------------------------+
| employees.salaries | repair | warning  | Moved 260926 misplaced rows |
| employees.salaries | repair | status   | OK                          |
+--------------------+--------+----------+-----------------------------+
2 rows in set (5.30 sec)

第二种方法也要在表上加锁,直到运行结束:

1
ALTER  employees.salaries ENGINE=Innodb;

此外,可以借助pt-online工具,在不加锁的情况下重建表。比如:

1
pt-online-schema-change --execute  --alter "ENGINE=InnoDb" h=localhost,D=employees,t=salaries

 

相关博文:
阅读排行:
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· .NET10 - 预览版1新功能体验(一)
历史上的今天:
2021-04-03 pg_repack安装和使用
2018-04-03 Innotop的安装和使用
点击右上角即可分享
微信分享提示