MySQL如何使用字符集配置选项
2024-04-22 11:26 abce 阅读(76) 评论(0) 编辑 收藏 举报MySQL中与character_set有关的配置选项有8个,分别是:
mysql> show variables like 'character_set%'; +--------------------------+-------------------------------------+ | Variable_name | Value | +--------------------------+-------------------------------------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_server | utf8mb4 | | character_set_system | utf8mb3 | | character_sets_dir | /usr/share/percona-server/charsets/ | +--------------------------+-------------------------------------+
如果,不仔细阅读文档,很难区分这些配置项的用途;此外,有些配置项如果不经过深入的测试,也很难知道MySQL是如何使用这些配置项的。
下面将通过对它们进行分组和解释来简化这些配置项的目的。我还将使用示例说明MySQL如何使用和处理这些character_set选项,以便用户可以减少数据丢失和/或错误的发生。最后,我将解释如何使用“set names”命令将必要的配置组织成相同的。
·character_sets_dir:定义动态加载字符集的目录。
·character_set_filesystem:MySQL处理文件的时候,需要知道文件名,在打开文件之前,文件名会从character_set_client转换成character_set_filesystem的设置,默认是binary,表示不会发生转换。大多数时候,保持默认值,除非你明确知道要转换。
如果列中存储的是字符类型的数据(比如varchar、char、text等),MySQL需要知道这些字符属于哪个字符集,从而正确地存储和解析。涉及的变量有:
·character_set_system:如果没有设置character_set_server的值,character_set_system定义了system字符集和collation
·character_set_server:如果没有设置character_set_database的值,character_set_server定义了server的字符集和collation
·character_set_database:在执行create database命令的时候,如果没有指定database的字符集和collation,character_set_database定义了database的字符集和collation;如果create table没有指定表的字符集和collation,则继承数据库的配置;如果create table没有指定列的字符集和collation,则继承表的配置
换句话说,如果在create table的时候,为列指定了字符集和collation,其它级别的配置都将被忽略。例如:
mysql> CREATE TABLE t1 ( c1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY, c2 VARCHAR(100) COLLATE utf8mb4_general_ci, c3 VARCHAR(100) ) DEFAULT CHARACTER SET=latin1; mysql> select TABLE_SCHEMA,TABLE_NAME,COLUMN_NAME, CHARACTER_SET_NAME from information_schema.columns where table_schema='test' and table_name='t1'; +--------------+------------+-------------+--------------------+ | TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | CHARACTER_SET_NAME | +--------------+------------+-------------+--------------------+ | test | t1 | c1 | NULL | | test | t1 | c2 | utf8mb4 | | test | t1 | c3 | latin1 | +--------------+------------+-------------+--------------------+
这里,没有没有为列c3指定字符集,因此默认是使用latin1,即create table时指定的字符集。
·character_set_client:MySQL Server将客户端发送的语句的字符集设置成character_set_client的值。
·character_set_connection:MySQL Server将客户端发送的语句的字符集从character_set_client转换成character_set_connection的值。
·character_set_results:返回结果给客户端使用的字符集。如果不想做任何转换,将character_set_results设置成null、或binary。
下面来通过例子进一步演示一下。
示例一
mysql> set names latin1; mysql> select @@character_set_client,@@character_set_connection;##二者的设置相同,不会发生转换 +------------------------+----------------------------+ | @@character_set_client | @@character_set_connection | +------------------------+----------------------------+ | latin1 | latin1 | +------------------------+----------------------------+ mysql> select binary("王"); +------------------------------+ | binary("王") | +------------------------------+ | 0xE78E8B | +------------------------------+ mysql> select binary("李"); +------------------------------+ | binary("李") | +------------------------------+ | 0xE69D8E | +------------------------------+ mysql> set names utf8mb4; mysql> select @@character_set_client,@@character_set_connection; +------------------------+----------------------------+ | @@character_set_client | @@character_set_connection | +------------------------+----------------------------+ | utf8mb4 | utf8mb4 | +------------------------+----------------------------+ mysql> select binary("王"); +------------------------------+ | binary("王") | +------------------------------+ | 0xE78E8B | +------------------------------+ mysql> select binary("李"); +------------------------------+ | binary("李") | +------------------------------+ | 0xE69D8E | +------------------------------+
不管字符集是latin1、utf8mb4, binary("王")和binary("李")的返回值不变。
示例二
(1)从小的字符集转成大的字符集
mysql> set character_set_client=latin1; set character_set_connection=utf8mb4; mysql> select @@character_set_client,@@character_set_connection; +------------------------+----------------------------+ | @@character_set_client | @@character_set_connection | +------------------------+----------------------------+ | latin1 | utf8mb4 | +------------------------+----------------------------+ mysql> select binary("王"); +------------------------------+ | binary("王") | +------------------------------+ | 0xC3A7C5BDE280B9 | +------------------------------+ mysql> select binary("李"); +------------------------------+ | binary("李") | +------------------------------+ | 0xC3A6C29DC5BD | +------------------------------+
为何binary("王")和binary("李")的返回值和示例一不同?因为MySQL做了转换,从character_set_client(latin1)转换成了character_set_connection(utf8mb4),即MySQL会使用character_set_client(latin1)解析字符串,然后将结果解析成character_set_connection(utf8mb4)。
可以来确认一下:
mysql> select binary(convert(convert('王' using latin1) using utf8mb4)); +-------------------------------------------------------------------------------+ | binary(convert(convert('王' using latin1) using utf8mb4)) | +-------------------------------------------------------------------------------+ | 0xC3A7C5BDE280B9 | +-------------------------------------------------------------------------------+ mysql> select binary(convert(convert('李' using latin1) using utf8mb4)); +-------------------------------------------------------------------------------+ | binary(convert(convert('李' using latin1) using utf8mb4)) | +-------------------------------------------------------------------------------+ | 0xC3A6C29DC5BD | +-------------------------------------------------------------------------------+
(2)从大的字符集转成小的字符集
mysql> set character_set_client=utf8mb4; set character_set_connection=latin1; mysql> select @@character_set_client,@@character_set_connection; +------------------------+----------------------------+ | @@character_set_client | @@character_set_connection | +------------------------+----------------------------+ | utf8mb4 | latin1 | +------------------------+----------------------------+ mysql> select binary("王"); +--------------------------+ | binary("?") | +--------------------------+ | 0x3F | +--------------------------+ mysql> select binary("李"); +--------------------------+ | binary("?") | +--------------------------+ | 0x3F | +--------------------------+
MySQL会做character_set_client(utf8mb4)到character_set_connection(latin1)的转换,因为是从大字符集转换成小字符集,所以会发生数据丢失。binary("王")和binary("李")的返回值相同,就是因为发生了数据丢失。
mysql> select binary(convert(convert('王' using utf8mb4) using latin1)); +-------------------------------------------------------------------------------+ | binary(convert(convert('王' using utf8mb4) using latin1)) | +-------------------------------------------------------------------------------+ | 0x3F | +-------------------------------------------------------------------------------+ mysql> select binary(convert(convert('李' using utf8mb4) using latin1)); +-------------------------------------------------------------------------------+ | binary(convert(convert('李' using utf8mb4) using latin1)) | +-------------------------------------------------------------------------------+ |0x3F | +-------------------------------------------------------------------------------+
示例三
列的字符集和character_set_connection不同,也会发生转换。
(1)从小的字符集转成大的字符集
C2列是utf8mb4字符集,与character_set_connection latin1不同,所以实际插入的数据是转换的结果,将字符串解析成latin1,然后将输出的结果解析成utf8mb4。
convert(convert('王' using latin1) using utf8mb4),binary("王")按照utf8mb4返回的结果是:0xC3A7C5BDE280B9
c3列是latin1字符集,不会涉及到转换,插入的数据就是最初的数据,即binary("李")按latin1就是0xE69D8E
mysql> set names latin1; mysql> select @@character_set_client,@@character_set_connection; +------------------------+----------------------------+ | @@character_set_client | @@character_set_connection | +------------------------+----------------------------+ | latin1 | latin1 | +------------------------+----------------------------+ mysql> select binary("王"); +------------------------------+ | binary("王") | +------------------------------+ | 0xE78E8B | +------------------------------+ mysql> select binary("李"); +------------------------------+ | binary("李") | +------------------------------+ | 0xE69D8E | +------------------------------+ mysql> insert into t1(c1,c2,c3) values(1,"王","李"); mysql> select * from t1 where c1=1; +----+------+------+ | c1 | c2 | c3 | +----+------+------+ | 1 | 王 | 李 | +----+------+------+ mysql> select c1,binary(c2),binary(c3) from t1 where c1=1; +----+------------------------+------------------------+ | c1 | binary(c2) | binary(c3) | +----+------------------------+------------------------+ | 1 | 0xC3A7C5BDE280B9 | 0xE69D8E | +----+------------------------+------------------------+
C2列是utf8mb4字符集,不同于character_set_results latin1。因此,在读取数据的时候,需要数据转换。
mysql> select @@character_set_results; +-------------------------+ | @@character_set_results | +-------------------------+ | latin1 | +-------------------------+ mysql> select * from t1 where c1=1; +----+------+------+ | c1 | c2 | c3 | +----+------+------+ | 1 | 王 | 李 | +----+------+------+
MySQL会将c2解析成utf8mb4,然后转换成latin1:
mysql> select convert(convert(0xC3A7C5BDE280B9 using utf8mb4) using latin1); +---------------------------------------------------------------+ | convert(convert(0xC3A7C5BDE280B9 using utf8mb4) using latin1) | +---------------------------------------------------------------+ | 王 | +---------------------------------------------------------------+
c3直接被解析成latin1:
mysql> select convert(0xE69D8E using latin1); +--------------------------------+ | convert(0xE69D8E using latin1) | +--------------------------------+ | 李 | +--------------------------------+
(2)从大的字符集转成小的字符集
character_set_connection 被设置成utf8mb4,c2列被定义为utf8mb4字符集,c3列是latin1字符集。
c2可以工作的很好,c3列会报错error 1366 (HY000): Incorrect string value
因为convert(convert('李' using utf8mb4) using latin1会发生数据丢失,从而在插入错误数据的时候会报错:
mysql> set names utf8mb4 ; mysql> select @@character_set_client,@@character_set_connection; +------------------------+----------------------------+ | @@character_set_client | @@character_set_connection | +------------------------+----------------------------+ | utf8mb4 | utf8mb4 | +------------------------+----------------------------+ mysql> select binary("王"); +------------------------------+ | binary("王") | +------------------------------+ | 0xE78E8B | +------------------------------+ mysql> select binary("李"); +------------------------------+ | binary("李") | +------------------------------+ | 0xE69D8E | +------------------------------+ mysql> insert into t1(c1,c2,c3) values(2,"王","李"); ERROR 1366 (HY000): Incorrect string value: '\xE6\x9D\x8E' for column 'c3' at row 1 mysql> select binary(convert(convert('李' using utf8mb4) using latin1)); +------------------------------------------------------------------------------+ | binary(convert(convert('李' using utf8mb4) using latin1)) | +------------------------------------------------------------------------------+ | 0x3F | +------------------------------------------------------------------------------+
示例四
在innodb中,如果想直接将一个列从大的字符集转换成小的字符集,不涉及配置变量的话,也会发生相同的错误:
mysql> select c1, binary(c2), binary(c3) from t1; +----+------------------------+------------------------+ | c1 | binary(c2) | binary(c3) | +----+------------------------+------------------------+ | 1 | 0xC3A7C5BDE280B9 | 0xE69D8E | | 3 | 0xE78E8B | 0x4131 | +----+------------------------+------------------------+ mysql> ALTER TABLE t1 MODIFY c2 varchar(100) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL; ERROR 1366 (HY000): Incorrect string value: '\xE7\x8E\x8B' for column 'c2' at row 2
示例五
在上面的例子中,多次使用了set names命令。这样可以方便的修改设置。
mysql> set names latin1; mysql> show variables like '%character_set%'; +--------------------------+-------------------------------------+ | Variable_name | Value | +--------------------------+-------------------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | utf8mb4 | | character_set_system | utf8mb3 | | character_sets_dir | /usr/share/percona-server/charsets/ | +--------------------------+-------------------------------------+ 8 rows in set (0.00 sec) mysql> set names utf8mb4; mysql> show variables like '%character_set%'; +--------------------------+-------------------------------------+ | Variable_name | Value | +--------------------------+-------------------------------------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_server | utf8mb4 | | character_set_system | utf8mb3 | | character_sets_dir | /usr/share/percona-server/charsets/ | +--------------------------+-------------------------------------+ 8 rows in set (0.00 sec)
MySQL进行潜在的转换,以确保character_set_client、character_set_connection和列的目标字符集是对齐的。
我们需要小心地执行转换,因为当MySQL从一个允许更多值(更大)的字符集转换到一个允许更少值(更小)的字符集,如utf8mb4->latin1,它可能会导致数据丢失或"ERROR 1366 (HY000) Incorrect string value"可能发生。
我们可以通过命令"set names"使character_set_client, character_set_connection, character_set_results完全相同。
虽然可以根据需要定义列的字符集,在将列定义为不同于character_set_connection的字符集之前,你可能需要重新考虑一下。