MySQ 迁移到uft8mb4需要考虑的事项

2022-04-29 09:05 abce 阅读(314) 评论(0) 编辑收藏举报

MySQL--迁移到uft8mb4需要考虑的事项

在MySQL8中，默认的字符集是utf8mb4，这一改变既不会影响已有的数据，也不会强制任何升级。

迁移到utf8mb4有很多好处：

·可以存储更多的符号，包含表情符号

·对于亚洲语言，增加了新的collations

·比utf8mb3更快

存储的需求

故名思意，使用utf8mb4字符集，一个字符最多可以使用4 bytes。

utf8mb3是utf8mb4的子集，迁移已有数据不会增加对磁盘空间的使用；每个字符按需使用空间。比如，任何数字和latin字母只需要1byte。其它字符最多可能会占用4bytes。

mysql🐬> set names utf8mb4;
Query OK, 0 rows affected (0,00 sec)

mysql🐬> CREATE TABLE charset_len( name VARCHAR(255), val CHAR(1) ) CHARACTER SET=utf8mb4;
Query OK, 0 rows affected (0,03 sec)

mysql🐬> INSERT INTO charset_len VALUES('Latin A', 'A'),  ('Cyrillic А', 'А'), ('Korean ㉿', '㉿'), ('Dolphin 🐬', '🐬');
Query OK, 4 rows affected (0,02 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql🐬> SELECT name, val, HEX(val), BIT_LENGTH(val)/8 FROM charset_len;
+--------------+------+----------+-------------------+
| name         | val  | HEX(val) | BIT_LENGTH(val)/8 |
+--------------+------+----------+-------------------+
| Latin A      | A    | 41       |            1.0000 |
| Cyrillic А   | А    | D090     |            2.0000 |
| Korean ㉿    | ㉿    | E389BF   |            3.0000 |
| Dolphin 🐬   | 🐬    | F09F90AC |            4.0000 |
+--------------+------+----------+-------------------+
4 rows in set (0,00 sec)

可以看到，最多使用到3个bytes的数据，存储不会有什么改变。但是可以存储需要4个bytes的字符编码。

列的最大长度

虽然数据存储不会改变，当MySQL估算列可以存储的最大量数据的时候，有些在utf8mb3字符集下正常的数据，在utf8mb4字符集下可能会发生错误。

比如：

mysql🐬> CREATE TABLE len_test(
      -> foo VARCHAR(16384)
      -> ) ENGINE=InnoDB CHARACTER SET utf8mb3;
Query OK, 0 rows affected, 1 warning (0,06 sec)

如果想转换成utf8mb4，就会发生错误：

mysql🐬> ALTER TABLE len_test CONVERT TO CHARACTER SET utf8mb4;
ERROR 1074 (42000): Column length too big for column 'foo' (max = 16383); use BLOB or TEXT instead

这是因为，MySQL的varchar列最大可以存储的是varchar(65535)，对utf8mb3是21845个字符，而对utf8mb4是16383个字符。

因此，如果你的列包含超过16383个字符，你可能需要将类型转换成text或longtext数据类型。

可以通过以下的语句：

select table_schema, table_name, column_name,character_maximum_length, data_type
from information_schema.columns
where character_maximum_length > 16383 and
      data_type not like '%text%' and 
      data_type not like '%blob%' and
      table_schema not in ('mysql', 'information_schema', 'performance_schema');

比如：

*************************** 1. row ***************************
TABLE_SCHEMA: test
TABLE_NAME: setup
COLUMN_NAME: value
CHARACTER_MAXIMUM_LENGTH: 20000
DATA_TYPE: varchar
1 row in set (0,02 sec

索引存储需求

创建索引的时候，MySQL无法提前知道列中存储的是什么字符。当估算存储索引需要的存储空间的时候，会根据字符集需要的最大长度来估算。因此，在将索引转换成utf8mb4的时候，可能会遇到空间的问题。对于innodb，行格式为redundant、compact的时候，索引的最大长度是767bytes；行格式为dynamic、compressed的时候，索引的最大长度是3072bytes

这就意味着，在转换之前，要查看一下是否会有超出限制的索引。（该查询是基于MySQL8的，低版本可能需要修改一下）

with indexes as (
     with tables as  (
          select substring_index(t.name, '/', 1) as `database`, substring_index(t.name, '/', -1) as `table`, i.name as `index`, row_format
          from information_schema.innodb_indexes i join information_schema.innodb_tables t using(table_id)
    )
    select `database`, `table`, `index`, row_format, group_concat(kcu.column_name) as columns,
           sum(c.character_maximum_length) * 4 as index_len_bytes
    from tables join information_schema.key_column_usage kcu
         on (`database` = table_schema and `table` = kcu.table_name and `index` = kcu.constraint_name)
         join information_schema.columns c on (kcu.column_name = c.column_name and `database` = c.table_schema and `table` = c.table_name)
    where c.character_maximum_length is not null
    group by `database`, `table`, `index`, row_format order by index_len_bytes
) select * from indexes where index_len_bytes >= 768;

+----------+--------------+---------+------------+------------+-----------------+
| database | table        | index   | ROW_FORMAT | columns    | index_len_bytes |
+----------+--------------+---------+------------+------------+-----------------+
| cookbook | hitcount     | PRIMARY | Dynamic    | path       |            1020 |
| cookbook | phrase       | PRIMARY | Dynamic    | phrase_val |            1020 |
| cookbook | ruby_session | PRIMARY | Dynamic    | session_id |            1020 |
+----------+--------------+---------+------------+------------+-----------------+
3 rows in set (0,04 sec)

临时表

升级成utf8mb4之后，可能会导致临时表需要的空间变大。

可以通过观察全局状态变量Created_tmp_disk_table来确认。如果转换后，该变量增加明显，可以考虑增加一些内存。当然，这个现象也有可能是因为查询sql没有优化导致的。

刷新页面返回顶部

abce don't dream it , be it

MySQ 迁移到uft8mb4需要考虑的事项

MySQL--迁移到uft8mb4需要考虑的事项

存储的需求

列的最大长度

索引存储需求

临时表

About