转 mysql 的推荐字符集 和 排序规则的使用介绍

##sample 1

https://www.percona.com/blog/migrating-to-utf8mb4-things-to-consider/

解释0:
utf8mb4 是推荐的字符集和,但是在从utf8迁移到utf8mb4 字符集的时候,需要注意几个事项

Conclusion
Converting to the utf8mb4 character set brings you the advantages of better performance, a larger range of characters that you can use, including emojis and new collations (sorting rules). This conversion comes at almost no price, and it can be done smoothly.

Ensure:
但是在从utf8迁移到utf8mb4 字符集的时候,需要注意几个事项,比如字段最大长度,索引最大长度等等,具体sql可以查询
https://www.percona.com/blog/migrating-to-utf8mb4-things-to-consider/

You converted all VARCHAR columns that could store more than 16383 characters to the TEXT or LONGTEXT data type
You adjusted index definitions that could take more than 767 bytes for the REDUNDANT and COMPACT row formats, and 3072 bytes for DYNAMIC and COMPRESSED row formats after migration.
You optimized your queries so that they should not start using internal disk-based temporary tables

 

##sample 2

 


https://www.percona.com/blog/2017/03/28/troubleshooting-issues-with-mysql-character-sets-q-a/


解释1 :每个级别,table ,database 都可以配置排序规则COLLATE, CHARSET,
column 级别可以配置 CHARSET,COLLATE

--table 配置 COLLATE

CREATE TABLE `his_config_info` (
`id` bigint unsigned NOT NULL,
`tenant_id` varchar(128) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT '' COMMENT '租户字段',
PRIMARY KEY (`nid`) USING BTREE,
KEY `idx_gmt_create` (`gmt_create`) USING BTREE,
KEY `idx_gmt_modified` (`gmt_modified`) USING BTREE,
KEY `idx_did` (`data_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=196 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_bin ROW_FORMAT=DYNAMIC COMMENT='多租户改造' |

 

--database 配置 COLLATE
CREATE DATABASE `account` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci */ /*!80016 DEFAULT ENCRYPTION='N' */ |

解释2 : --table 的字段名字配置 character
注意的是如果是在字段级别配置的排序规则,那么所有字段的character规则必须是一致的,不然的话有在执行排序的时候,有可能可能导致报错
"Illegal mix of collations for operation 'concat_ws' ":

set names utf8是设置插入的客户的环境变量。类似nls_lang

如下为测试案例:表的cp1251的 一个字段字符集为 latin1,一个字段字符集为cp1251,虽然可以插入成功,但是执行排序的时候会报错。

mysql> create table cp1251(f1 varchar(100) character set latin1, f2 varchar(100) character set cp1251) engine=innodb;
Query OK, 0 rows affected (0,32 sec)

mysql> set names utf8;
Query OK, 0 rows affected (0,00 sec)

mysql> insert into cp1251 values('Sveta', 'Света');
Query OK, 1 row affected (0,07 sec)

sveta@Thinkie:~/build/mysql-8.0/mysql-test$ ~/build/percona-toolkit/bin/pt-table-checksum h=127.0.0.1,P=13000,u=root,D=test
Diffs cannot be detected because no slaves were found. Please read the --recursion-method documentation for information.
03-18T03:51:58 Error executing EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `f1`, `f2`, CONCAT(ISNULL(`f1`), ISNULL(`f2`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `db1`.`cp1251` /*explain checksum table*/: DBD::mysql::st execute failed: Illegal mix of collations for operation 'concat_ws' [for Statement "EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `f1`, `f2`, CONCAT(ISNULL(`f1`), ISNULL(`f2`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `db1`.`cp1251` /*explain checksum table*/"] at /home/sveta/build/percona-toolkit/bin/pt-table-checksum line 11351.

03-18T03:51:58 Error checksumming table db1.cp1251: Error executing checksum query: DBD::mysql::st execute failed: Illegal mix of collations for operation 'concat_ws' [for Statement "REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?, ?, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `f1`, `f2`, CONCAT(ISNULL(`f1`), ISNULL(`f2`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `db1`.`cp1251` /*checksum table*/" with ParamValues: 0='db1', 1='cp1251', 2=1, 3=undef, 4=undef, 5=undef] at /home/sveta/build/percona-toolkit/bin/pt-table-checksum line 10741.

 

A: With this combination, you won’t have any issues: pt-table-checksum uses a complicated set of functions that joins columns and calculates a crc32 checksum on them. In your case, all data will be converted to utf8mb4 and no conflicts will happen.

However, if you use incompatible character sets in a single table, you may get the error "Illegal mix of collations for operation 'concat_ws' ":

 

解释3: 关于排序规则的使用范围, 是指sql 在排序时候使用的,

如下为案例:yo
<charset name="utf8mb4">
<collation name="utf8mb4_russian_ci" id="1033">
<rules>
<reset>u0415</reset><p>u0451</p><t>u0401</t>
</rules>
</collaiton>
</charset>

mysql> show collation like 'utf8mb4_russian_ci';
+--------------------+---------+------+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+--------------------+---------+------+---------+----------+---------+
| utf8mb4_russian_ci | utf8mb4 | 1033 | | | 8 |
+--------------------+---------+------+---------+----------+---------+
1 row in set (0,03 sec)

mysql> create table test_yo(gen varchar(100) CHARACTER SET utf8mb4, yo varchar(100) CHARACTER SET utf8mb4 collate utf8mb4_russian_ci) engine=innodb default character set=utf8mb4;
Query OK, 0 rows affected (0,25 sec)

mysql> set names utf8mb4;
Query OK, 0 rows affected (0,02 sec)

mysql> insert into test_yo values('ел', 'ел'), ('ель', 'ель'), ('ёлка', 'ёлка');
Query OK, 3 rows affected (0,05 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> insert into test_yo values('Ел', 'Ел'), ('Ель', 'Ель'), ('Ёлка', 'Ёлка');
Query OK, 3 rows affected (0,06 sec)
Records: 3 Duplicates: 0 Warnings: 0

插入2个字段同样的记录,使用不同的字段作为排序字段,可以发现结果输出的不同之处, gen字段 是默认的排序规则
mysql> select * from test_yo order by gen;
+----------+----------+
| gen | yo |
+----------+----------+
| ел | ел |
| Ел | Ел |
| ёлка | ёлка | <-
| Ёлка | Ёлка | <-
| ель | ель |
| Ель | Ель |
+----------+----------+
6 rows in set (0,00 sec)

插入2个字段同样的记录,使用不同的字段作为排序字段,可以发现结果输出的不同之处, yo字段 是自定义的排序规则 utf8mb4_russian_ci
mysql> select * from test_yo order by yo;
+----------+----------+
| gen | yo |
+----------+----------+
| ел | ел |
| Ел | Ел |
| ель | ель | <-
| Ель | Ель | <-
| ёлка | ёлка |
| Ёлка | Ёлка |
+----------+----------+
6 rows in set (0,00 sec)

 

posted @ 2022-04-12 11:08  feiyun8616  阅读(135)  评论(0编辑  收藏  举报