Mycat分片规则与数据分布不一致问题
Mycat作为数据库中间件,本身并不存储数据。Mycat通过其分片规则与读写规则,实现对后端众多mysql数据库实例的分布式访问。但是,在实际使用过程中,可能会出现实际的数据分布与分片规则不一致的情况。譬如:
1. mysql可通过直连方式访问,这就有可能将任意数据写到任意的数据库实例中;
2. Mycat后期调整分片规则时,前期已写入的数据与调整后的分片规则不一致;
当Mycat的分片规则与实际的数据分布不一致时,在执行sql时,有些问题需要注意。下面以一个例子来具体说明。
例子:
1. 建表、配置表分片规则
#建employee表,主键为id字段
CREATE TABLE employee (
id
int(11) NOT NULL,
name
varchar(100) DEFAULT NULL,
sharding_id
int(11) NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
#分片规则,dn00 - dn04
<table name="employee" primaryKey="
ID
" autoIncrement="true" dataNode="
dn0$0-3
" rule="employee" />
#分片字段为sharding_id
<tableRule name="employee">
<rule>
<columns>
sharding_id
</columns>
<algorithm>myfunc1</algorithm>
</rule>
</tableRule>
#分片规则:根据sharding_id的枚举值分片
0=0
1=1
2=2
3=3
|
2. 通过直接mysql,分别在dn00和dn03写入数据
insert into employee(id,name,sharding_id) values(101,'dn00',3);
--写入dn00
insert into employee(id,name,sharding_id) values(101,'dn03',3);
--写入dn03
这里注意2点:
1. id字段均为101,也就是重复主键了。但数据存在mysql不同的实例中,故不会报错。
2. 分片字段sharding_id的值均为3。但实际上这2条记录是存在不同的分片上的。
|
3. 查询employee,到4个分片上广播查询,没问题。实际查询出来的数据,也是4个分片的数据合集。
mysql> explain select * from employee;
+-----------+----------------------------------+
| DATA_NODE | SQL |
+-----------+----------------------------------+
| dn00 | SELECT * FROM employee LIMIT 100 |
| dn01 | SELECT * FROM employee LIMIT 100 |
| dn02 | SELECT * FROM employee LIMIT 100 |
| dn03 | SELECT * FROM employee LIMIT 100 |
#到每个分片上广播查询
+-----------+----------------------------------+
4 rows in set (0.01 sec)
mysql> select * from employee;
+-----+------+-------------+
| id | name | sharding_id |
+-----+------+-------------+
| 101 | dn00 | 3 |
| 101 | dn03 | 3 |
+-----+------+-------------+
2 rows in set (0.00 sec)
|
4. 下面再来看按分片字段查询的情况。从测试结果来看,
若按分片字段查询,Mycat只会通过分片规则到指定的分片上进行查询,即使在其它分片上也有满足where条件的记录。
mysql> explain select * from employee where sharding_id=3;
+-----------+--------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+--------------------------------------------------------+
|
dn03
| SELECT * FROM employee WHERE sharding_id = 3 LIMIT 100 |
#根据分片规则到指定分片查询
+-----------+--------------------------------------------------------+
1 row in set (0.04 sec)
mysql> select * from employee where sharding_id=3;
+-----+------+-------------+
| id | name | sharding_id |
+-----+------+-------------+
| 101 |
dn03
| 3 |
+-----+------+-------------+
1 row in set (0.00 sec)
|
5. 按主键查询。先来看第一次按主键查询的情况,可以看到,第一次按主键(id=101)查询,由于主键(到分片)缓存并未命中该主键值,Mycat会到每个分片进行广播查询。故查询结果为所有分片的查询结果合集。
mysql> explain select * from employee where id=101;
+-----------+-------------------------------------+
| DATA_NODE | SQL |
+-----------+-------------------------------------+
|
dn00
| select * from employee where id=101 |
|
dn01
| select * from employee where id=101 |
|
dn02
| select * from employee where id=101 |
|
dn03
| select * from employee where id=101 |
+-----------+-------------------------------------+
4 rows in set (0.00 sec)
mysql> select * from employee where id=101;
+-----+------+-------------+
| id | name | sharding_id |
+-----+------+-------------+
| 101 | dn00 | 3 |
| 101 | dn03 | 3 |
+-----+------+-------------+
2 rows in set (0.00 sec)
|
6. 再来看第二次按主键(id=101)查询的情况,此时,由于Mycat已对该主键作了主键(到分片)缓存,故Mycat无须到每个分片上进行广播查询。
但是,由于每一次查询时,有2条记录返回,而Mycat只会对第1条记录的主键和分片作缓存处理。所以当再次查询时,Mycat从缓存命中的是这一条记录的分片。
另外,从例子可看到,从缓存中命中到的记录,其实它的实际分片位置与分片规则是不一致的。
mysql> show @@cache;
+---------------------------------------+-------+------+--------+------+------+---------------+---------------+
| CACHE | MAX | CUR | ACCESS | HIT | PUT | LAST_ACCESS | LAST_PUT |
+---------------------------------------+-------+------+--------+------+------+---------------+---------------+
| ER_SQL2PARENTID | 1000 | 0 | 0 | 0 | 0 | 0 | 0 |
| SQLRouteCache | 10000 | 2 | 13 | 7 | 2 | 1513329652279 | 1513328908328 |
| TableID2DataNodeCache.TESTDB_ORDERS | 50000 | 0 | 0 | 0 | 0 | 0 | 0 |
| TableID2DataNodeCache.TESTDB_EMPLOYEE | 10000 |
1
| 4 | 2 | 1 | 1513329652280 | 1513329310256 |
+---------------------------------------+-------+------+--------+------+------+---------------+---------------+
4 rows in set (0.01 sec)
mysql> explain select * from employee where id=101;
+-----------+-------------------------------------+
| DATA_NODE | SQL |
+-----------+-------------------------------------+
|
dn00
| select * from employee where id=101 |
#只到所缓存的一个分片上进行查询
+-----------+-------------------------------------+
1 row in set (0.00 sec)
mysql> select * from employee where id=101;
+-----+------+-------------+
| id | name | sharding_id |
+-----+------+-------------+
| 101 |
dn00
| 3 |
+-----+------+-------------+
1 row in set (0.00 sec)
|
7. 下面再来看按主键删除记录:
(1)由于主键缓存命中到该主键,故只删除了主键缓存指向分片的记录(另一条记录仍存在);
(2)删除后,主键缓存中并未删除该主键值的缓存,故再按主键查询时,并无记录返回(缓存所指向的分片已无该主键记录)。
mysql> delete from employee where id=101;
Query OK,
1 row affected
(0.01 sec)
#只删除了一条记录(通过缓存命中)
mysql> select * from employee;
+-----+------+-------------+
| id | name | sharding_id |
+-----+------+-------------+
| 101 | dn03 | 3 |
#dn03上的记录仍然存在
+-----+------+-------------+
1 row in set (0.00 sec)
mysql> explain select * from employee where id=101;
+-----------+-------------------------------------+
| DATA_NODE | SQL |
+-----------+-------------------------------------+
| dn00 | select * from employee where id=101 |
#再通过主键查询,仍到dn00分片上执行
+-----------+-------------------------------------+
1 row in set (0.00 sec)
mysql> select * from employee where id=101;
#dn03分片上的记录并未查询出来
Empty set (0.00 sec)
mysql> show @@cache;
+---------------------------------------+-------+------+--------+------+------+---------------+---------------+
| CACHE | MAX | CUR | ACCESS | HIT | PUT | LAST_ACCESS | LAST_PUT |
+---------------------------------------+-------+------+--------+------+------+---------------+---------------+
| ER_SQL2PARENTID | 1000 | 0 | 0 | 0 | 0 | 0 | 0 |
| SQLRouteCache | 10000 | 2 | 19 | 9 | 2 | 1513330571512 | 1513328908328 |
| TableID2DataNodeCache.TESTDB_ORDERS | 50000 | 0 | 0 | 0 | 0 | 0 | 0 |
| TableID2DataNodeCache.TESTDB_EMPLOYEE | 10000 |
1
| 9 | 7 | 1 | 1513330571512 | 1513329310256 |
+---------------------------------------+-------+------+--------+------+------+---------------+---------------+
4 rows in set (0.00 sec)
#缓存中该主键值的缓存仍然存在
|
8. 回到第7步,若不按主键删除,而是按分片字段删除。则只会删除指定分片的记录,其它分片的记录不会被删除。
mysql> delete from employee where sharding_id=3;
Query OK, 1 row affected (0.01 sec)
#删除了指定分片的记录
mysql> select * from employee;
+-----+------+-------------+
| id | name | sharding_id |
+-----+------+-------------+
| 101 | dn00 | 3 |
#其它分片的记录仍存在
+-----+------+-------------+
1 row in set (0.00 sec)
|