mysql 字符集

MySQL提供了多种字符集和排序规则选择,其中字符集设置和数据存储以及客户端与MySQL实例的交互相关,排序规则和字符串的对
比规则相关字符集的设置可以在MySQL实例、数据库、表、列四个级别
MySQL设置字符集支持在InnoDB, MyISAM, Memory三个存储引擎
查看当前MySQL支持的字符集的方式有两种,一种是通过查看information_schema.character_set系统表,一种是通过命令show character set查看

mysql> select * from information_schema.character_sets;
mysql> show character set;
字符集                默认排序规则                                               字节
+--------------------+----------------------+---------------------------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION                     | MAXLEN |
+--------------------+----------------------+---------------------------------+--------+
| big5               | big5_chinese_ci      | Big5 Traditional Chinese        |      2 |
| dec8               | dec8_swedish_ci      | DEC West European               |      1 |
| cp850              | cp850_general_ci     | DOS West European               |      1 |
| hp8                | hp8_english_ci       | HP West European                |      1 |
| koi8r              | koi8r_general_ci     | KOI8-R Relcom Russian           |      1 |
| latin1             | latin1_swedish_ci    | cp1252 West European            |      1 |
| latin2             | latin2_general_ci    | ISO 8859-2 Central European     |      1 |
| swe7               | swe7_swedish_ci      | 7bit Swedish                    |      1 |
| ascii              | ascii_general_ci     | US ASCII                        |      1 |
| ujis               | ujis_japanese_ci     | EUC-JP Japanese                 |      3 |
| sjis               | sjis_japanese_ci     | Shift-JIS Japanese              |      2 |
| hebrew             | hebrew_general_ci    | ISO 8859-8 Hebrew               |      1 |
| tis620             | tis620_thai_ci       | TIS620 Thai                     |      1 |
| euckr              | euckr_korean_ci      | EUC-KR Korean                   |      2 |
| koi8u              | koi8u_general_ci     | KOI8-U Ukrainian                |      1 |
| gb2312             | gb2312_chinese_ci    | GB2312 Simplified Chinese       |      2 |
| greek              | greek_general_ci     | ISO 8859-7 Greek                |      1 |
| cp1250             | cp1250_general_ci    | Windows Central European        |      1 |
| gbk                | gbk_chinese_ci       | GBK Simplified Chinese          |      2 |
| latin5             | latin5_turkish_ci    | ISO 8859-9 Turkish              |      1 |
| armscii8           | armscii8_general_ci  | ARMSCII-8 Armenian              |      1 |
| utf8               | utf8_general_ci      | UTF-8 Unicode                   |      3 |
| ucs2               | ucs2_general_ci      | UCS-2 Unicode                   |      2 |
| cp866              | cp866_general_ci     | DOS Russian                     |      1 |
| keybcs2            | keybcs2_general_ci   | DOS Kamenicky Czech-Slovak      |      1 |
| macce              | macce_general_ci     | Mac Central European            |      1 |
| macroman           | macroman_general_ci  | Mac West European               |      1 |
| cp852              | cp852_general_ci     | DOS Central European            |      1 |
| latin7             | latin7_general_ci    | ISO 8859-13 Baltic              |      1 |
| cp1251             | cp1251_general_ci    | Windows Cyrillic                |      1 |
| utf16              | utf16_general_ci     | UTF-16 Unicode                  |      4 |
| utf16le            | utf16le_general_ci   | UTF-16LE Unicode                |      4 |
| cp1256             | cp1256_general_ci    | Windows Arabic                  |      1 |
| cp1257             | cp1257_general_ci    | Windows Baltic                  |      1 |
| utf32              | utf32_general_ci     | UTF-32 Unicode                  |      4 |
| binary             | binary               | Binary pseudo charset           |      1 |
| geostd8            | geostd8_general_ci   | GEOSTD8 Georgian                |      1 |
| cp932              | cp932_japanese_ci    | SJIS for Windows Japanese       |      2 |
| eucjpms            | eucjpms_japanese_ci  | UJIS for Windows Japanese       |      3 |
| gb18030            | gb18030_chinese_ci   | China National Standard GB18030 |      4 |
| utf8mb4            | utf8mb4_0900_ai_ci   | UTF-8 Unicode                   |      4 |
+--------------------+----------------------+---------------------------------+--------+
41 rows in set (0.00 sec)

utf8mb4 支持新媒体(微信表情)与utf8 不同
utf8 3个字节
utf8mb4 4个字节
不同字符集底层编码不同

见excel 示例

 

排序规则名字的结尾字符代表是否大小写敏感,重音敏感以及是否是二进制的
实例级别字符集

当排序规则名字中没有指定_as或者_ai时,则是否口音敏感是由_ci或者_cs决定,当使用的是_ci是则暗指_ai,反之则暗指_as
比如latin1_general_ci显示指定大小写不敏感,也暗指口音不敏感,而latin1_general_cs显示指定大小写敏感,也暗指口音敏感

每个指定的字符集都会有一个或多个支持的排序规则,可以通过两种方式查看,一种是查看information_schema.collations表,另一种是通过show collation命令查看
不同的字符集不可能有相同的排序规则
每个字符集都会有一个默认的排序规则
mysql> SHOW CHARACTER SET LIKE 'latin%';
+---------+-----------------------------+-------------------+--------+
| Charset | Description                 | Default collation | Maxlen |
+---------+-----------------------------+-------------------+--------+
| latin1  | cp1252 West European        | latin1_swedish_ci |      1 |
| latin2  | ISO 8859-2 Central European | latin2_general_ci |      1 |
| latin5  | ISO 8859-9 Turkish          | latin5_turkish_ci |      1 |
| latin7  | ISO 8859-13 Baltic          | latin7_general_ci |      1 |
+---------+-----------------------------+-------------------+--------+
4 rows in set (0.01 sec)

mysql> SHOW COLLATION WHERE Charset = 'latin1';
+-------------------+---------+----+---------+----------+---------+---------------+
| Collation         | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
+-------------------+---------+----+---------+----------+---------+---------------+
| latin1_bin        | latin1  | 47 |         | Yes      |       1 | PAD SPACE     |
| latin1_danish_ci  | latin1  | 15 |         | Yes      |       1 | PAD SPACE     |
| latin1_general_ci | latin1  | 48 |         | Yes      |       1 | PAD SPACE     |
| latin1_general_cs | latin1  | 49 |         | Yes      |       1 | PAD SPACE     |
| latin1_german1_ci | latin1  |  5 |         | Yes      |       1 | PAD SPACE     |
| latin1_german2_ci | latin1  | 31 |         | Yes      |       2 | PAD SPACE     |
| latin1_spanish_ci | latin1  | 94 |         | Yes      |       1 | PAD SPACE     |
| latin1_swedish_ci | latin1  |  8 | Yes     | Yes      |       1 | PAD SPACE     |
+-------------------+---------+----+---------+----------+---------+---------------+
8 rows in set (0.01 sec)

每个字符集可以对应多个排序规则,但每个排序规则只能对应一个字符集,例如下面的对应关系会报错

mysql> SELECT _latin1 'x' COLLATE latin2_bin;
ERROR 1253 (42000): COLLATION 'latin2_bin' is not valid for CHARACTER SET 'latin1'

mysql> create table temp1(name varchar(10) charset latin1 collate latin1_general_ci,name2 varchar(10) charset latin1 collate latin1_general_cs);
Query OK, 0 rows affected (0.13 sec)

mysql> insert into temp1 values('a','a'),('A','A');
Query OK, 2 rows affected (0.09 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from temp1 where name='a';
+------+-------+
| name | name2 |
+------+-------+
| a    | a     |
| A    | A     |
+------+-------+
2 rows in set (0.00 sec)

mysql> mysql> select * from temp1 where name2='a';
+------+-------+
| name | name2 |
+------+-------+
| a    | a     |
+------+-------+
1 row in set (0.00 sec)

mysql> show variables like '%character%';
+--------------------------+----------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.01 sec)

8.0之前的 默认情况下的字符集是latin1,8.0之后默认字符集变成了了utf8mb4
latin1 为什么能够存中文

答:

Latin1覆盖了所有的单字节,因此,可以将任意字符串保存在latin1字符集中,而不用担心有内容不符合
latin1的编码规范而被抛弃。——gbk和utf8是多字节编码,没有这种特性。

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into temp1 values('中国','中国');
Query OK, 1 row affected (0.01 sec)

mysql> select * from temp1;
+--------+--------+
| name   | name2  |
+--------+--------+
| a      | a      |
| A      | A      |
| 中国   | 中国   |
+--------+--------+
3 rows in set (0.00 sec)
退出重新登陆

mysql> insert into temp1 values('中国','中国');
ERROR 1366 (HY000): Incorrect string value: '\xE4\xB8\xAD\xE5\x9B\xBD' for column 'name' at row 1

mysql> show variables like '%character%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | utf8mb4                          |
| character_set_connection | utf8mb4                          |
| character_set_database   | latin1                           |
| character_set_filesystem | binary                           |
| character_set_results    | utf8mb4                          |
| character_set_server     | latin1                           |
| character_set_system     | utf8                             |
| character_sets_dir       | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.01 sec)

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into temp1 values('中国','中国');
ERROR 1046 (3D000): No database selected
mysql> use A2;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> insert into temp1 values('中国','中国');
Query OK, 1 row affected (0.04 sec)

mysql> set name utf8;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'utf8' at line 1
mysql> set names utf8;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show variables like '%character%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | utf8                             |
| character_set_connection | utf8                             |
| character_set_database   | utf8mb4                          |
| character_set_filesystem | binary                           |
| character_set_results    | utf8                             |
| character_set_server     | latin1                           |
| character_set_system     | utf8                             |
| character_sets_dir       | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.00 sec)

mysql> select * from temp1;
+---------------+---------------+
| name          | name2         |
+---------------+---------------+
| a             | a             |
| A             | A             |
| -        | -        |
| -        | -        |
+---------------+---------------+
4 rows in set (0.00 sec)

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from temp1;
+--------+--------+
| name   | name2  |
+--------+--------+
| a      | a      |
| A      | A      |
| 中国   | 中国   |
| 中国   | 中国   |
+--------+--------+
4 rows in set (0.00 sec)
[root@centos7 ~]# cat /etc/my.cnf
[mysqld]
secure_file_priv=/tmp/
server-id=1
character_set_server=latin1
mysql> create database B2;
Query OK, 1 row affected (0.03 sec)

mysql> show create database B2;
+----------+---------------------------------------------------------------+
| Database | Create Database                                               |
+----------+---------------------------------------------------------------+
| B2       | CREATE DATABASE `B2` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+---------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> use B2;
Database changed
mysql> create table jicheng(id int,name varchar(10));
Query OK, 0 rows affected (0.16 sec)

mysql> show create table jicheng;
+---------+--------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                                                   |
+---------+--------------------------------------------------------------------------------------------------------------------------------+
| jicheng | CREATE TABLE `jicheng` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+---------+--------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)



mysql> show create database A1;
+----------+-------------------------------------------------------------------------------------------+
| Database | Create Database                                                                           |
+----------+-------------------------------------------------------------------------------------------+
| A1       | CREATE DATABASE `A1` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci */ |
+----------+-------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> show tables;
+--------------+
| Tables_in_B2 |
+--------------+
| jicheng      |
+--------------+
1 row in set (0.00 sec)

mysql> use A1;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+--------------+
| Tables_in_A1 |
+--------------+
| temp         |
+--------------+
1 row in set (0.00 sec)

mysql> show create table temp;
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                            |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| temp  | CREATE TABLE `temp` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(20) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> alter database A1 character set latin1;
Query OK, 1 row affected (0.09 sec)

mysql> show create database A1;
+----------+---------------------------------------------------------------+
| Database | Create Database                                               |
+----------+---------------------------------------------------------------+
| A1       | CREATE DATABASE `A1` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+---------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> show create table temp;
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                            |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| temp  | CREATE TABLE `temp` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(20) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> show create table temp;
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                            |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| temp  | CREATE TABLE `temp` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(20) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)


mysql> alter table temp character set latin1;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table temp;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                 |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| temp  | CREATE TABLE `temp` (
  `id` int(11) DEFAULT NULL,
  `name` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)


mysql> select case when 'a'='A' then 1 else 0 end;
+-------------------------------------+
| case when 'a'='A' then 1 else 0 end |
+-------------------------------------+
|                                   1 |
+-------------------------------------+
mysql> show variables like '%collation%';
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | latin1_swedish_ci  |
| collation_database            | latin1_swedish_ci  |
| collation_server              | latin1_swedish_ci  |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.01 sec)
mysql> set collation_connection=latin1_general_cs;
Query OK, 0 rows affected (0.00 sec)

mysql> select case when 'a'='A' then 1 else 0 end;
+-------------------------------------+
| case when 'a'='A' then 1 else 0 end |
+-------------------------------------+
|                                   0 |
+-------------------------------------+
1 row in set (0.00 sec)

如果通过alter table语句句修改某一列列的字符集时,MySQL会试图转换其中已有的数据,这样的转
换有可能会导致数据丢失

mysql> insert into temp2 values('中国');
Query OK, 1 row affected (0.02 sec)
mysql> select * from temp2;
+--------+ 
| name | 
+--------+ 
| 中国 | 
mysql> alter table temp2 modify name varchar(100) character set greek;
Query OK, 1 row affected, 1 warning (0.03 sec)
Records: 1 Duplicates: 0 Warnings: 1 
mysql> select * from temp2;
+--------+ 
| name | 
+--------+ 
|  |


mysql> create table temp112(id int,name nchar(10),name2 nvarchar(10));
Query OK, 0 rows affected, 2 warnings (0.11 sec)

mysql> show create table temp112;
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                                                                                                                                                                                                      |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| temp112 | CREATE TABLE `temp112` (
  `id` int(11) DEFAULT NULL,
  `name` char(10) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `name2` varchar(10) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)


mysql> create table temp113(id int,name nchar(10),name2 nvarchar(10),name3 varchhar(10) character set utf8);
Query OK, 0 rows affected, 3 warnings (0.07 sec)

mysql> show create table temp113;
+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                                                                                                                                                                                                                                                                                     |
+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| temp113 | CREATE TABLE `temp113` (
  `id` int(11) DEFAULT NULL,
  `name` char(10) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `name2` varchar(10) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `name3` varchar(10) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

字符集转化
每个数据库客户端连接都有自己的字符集和排序规则属性,客户端发送的语句句的字符集是由character_set_client决定,而与服务端交互时会根据character_set_connection和collation_connection两个参数将接收到的语句句转化。当涉及到显示字符串串的比较时,由collation_connection参数决定,而当比较的是字段里里的字符串串时则根据字段本身的排序规则决定character_set_result参数决定了了语句句的执行行结果以什什么字符集返回给客户端

客户端可以很方便便的调整字符集和排序规则,比如使用SET NAMES 'charset_name' [COLLATE 'collation_name']表明后续的语句句都以该字符集格式传送给服务端,而执行行结果也以此字符集格式返回。

比如一个set names语句相当于执行了以下三行语句:
SET character_set_client = charset_name;
SET character_set_results = charset_name;
SET character_set_connection = charset_name;

[root@centos7 ~]# cat /etc/my.cnf
[mysqld]
secure_file_priv=/tmp/
server-id=1
character_set_server=latin1
init_connect='set names utf8'
[root@centos7 ~]# vi /etc/my.cnf
[root@centos7 ~]# /etc/init.d/mysql.server restart
Shutting down MySQL.. SUCCESS! 
Starting MySQL.. SUCCESS! 
[root@centos7 ~]# mysql -uroot -pmysql
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 8.0.13 MySQL Community Server - GPL

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like '%character%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | utf8mb4                          |
| character_set_connection | utf8mb4                          |
| character_set_database   | utf8mb4                          |
| character_set_filesystem | binary                           |
| character_set_results    | utf8mb4                          |
| character_set_server     | utf8mb4                          |
| character_set_system     | utf8                             |
| character_sets_dir       | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.01 sec)
mysql> select super_priv from mysql.user where user='root' and host='localhost';
+------------+
| super_priv |
+------------+
| Y          |
+------------+
1 row in set (0.00 sec)


mysql> create user hou@localhost identified by 'mysql';
Query OK, 0 rows affected (0.05 sec)

mysql> grant select,insert,delete,update on A2.* to hou@localhost;
Query OK, 0 rows affected (0.03 sec)

[root@centos7 ~]# mysql -uhou -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 21
Server version: 8.0.13 MySQL Community Server - GPL

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like '%character%';
+--------------------------+----------------------------------+
| Variable_name            | Value                            |
+--------------------------+----------------------------------+
| character_set_client     | utf8                             |
| character_set_connection | utf8                             |
| character_set_database   | utf8mb4                          |
| character_set_filesystem | binary                           |
| character_set_results    | utf8                             |
| character_set_server     | utf8mb4                          |
| character_set_system     | utf8                             |
| character_sets_dir       | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.01 sec)

mysql> select * from temp1 where name=name2;
ERROR 1267 (HY000): Illegal mix of collations (latin1_general_ci,IMPLICIT) and (latin1_general_cs,IMPLICIT) for operation '='
mysql> select * from temp1 where name='a';
+------+-------+
| name | name2 |
+------+-------+
| a    | a     |
| A    | A     |
+------+-------+
2 rows in set (0.01 sec)

mysql> select * from temp1 where name2='a';
+------+-------+
| name | name2 |
+------+-------+
| a    | a     |
+------+-------+
1 row in set (0.00 sec)
写出来的权限最高

mysql> select * from temp1 where name2='a' collate latin1_general_ci;
+------+-------+
| name | name2 |
+------+-------+
| a    | a     |
| A    | A     |
+------+-------+
2 rows in set (0.00 sec)

首先计算每种情况的排序规则权重:
1. 当有显示的写明优先级的,则权重为0
2. 当两个有不不同排序规则的字符串串连接在一起的,则权重为1
3. 字段和本地参数的排序规则,权重为2
4. 由部分字符串串函数返回的系统常量量所带的排序规则,权重为3
5. 字符串串自带的排序规则,权重为4

mysql> SELECT COERCIBILITY('A' COLLATE latin1_swedish_ci);
+---------------------------------------------+
| COERCIBILITY('A' COLLATE latin1_swedish_ci) |
+---------------------------------------------+
|                                           0 |
+---------------------------------------------+
1 row in set (0.00 sec)

mysql> select coercibility(name),coercibility(name2) from temp1;
+--------------------+---------------------+
| coercibility(name) | coercibility(name2) |
+--------------------+---------------------+
|                  2 |                   2 |
|                  2 |                   2 |
+--------------------+---------------------+
2 rows in set (0.00 sec)

在最终选择使用哪种排序规则上,规则为:
1. 优先使用权重最低的
2. 如果两者拥有相同的优先级,则
1)如果两者都是Unicode,或者都不不是Unicode,则返回错误
2)如果一边是Unicode而另一边不是,则将不是Unicode的一边转化成Unicode字符集,

mysql> CREATE TABLE `temp4` (
    ->   `name` varchar(10) DEFAULT NULL,
    ->   `name2` varchar(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_cs DEFAULT NULL,
    ->   `name3` varchar(10) CHARACTER SET latin1 COLLATE latin1_general_cs DEFAULT NULL
    -> );
Query OK, 0 rows affected (0.06 sec)

mysql> insert into temp4 values(10,'a','a','a');
ERROR 1136 (21S01): Column count doesn't match value count at row 1
mysql> insert into temp4 values('a','a','a');
Query OK, 1 row affected (0.05 sec)

mysql> insert into temp4 values('A','A','A');
Query OK, 1 row affected (0.04 sec)

mysql> select * from temp4 where name=name3;
+------+-------+-------+
| name | name2 | name3 |
+------+-------+-------+
| a    | a     | a     |
| A    | A     | A     |
+------+-------+-------+
2 rows in set (0.00 sec)

mysql> select * from temp4 where name=name2;
ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_0900_ai_ci,IMPLICIT) and (utf8mb4_0900_as_cs,IMPLICIT) for operation '='


mysql> CREATE TABLE `temp4` (
    ->   `id` int(10) not null,
    ->   `name` varchar(10) DEFAULT NULL,
    ->   `name2` varchar(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_cs DEFAULT NULL,
    ->   `name3` varchar(10) CHARACTER SET latin1 COLLATE latin1_general_cs DEFAULT NULL
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Query OK, 0 rows affected (0.16 sec)

mysql> CREATE TABLE `temp5` (
    ->   `id` int(11) NOT NULL, 
    ->   `name` varchar(10) DEFAULT NULL)
    ->   CHARSET=utf8mb4 COLLATE=utf8mb4_0900_as_cs;
Query OK, 0 rows affected (0.09 sec)

mysql> insert into temp4 values(10,'a','a','a'),(11,'A','A','A');
Query OK, 2 rows affected (0.05 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into temp5 values(1,'a');
Query OK, 1 row affected (0.01 sec)

mysql> insert into temp5 values(2,'A');
Query OK, 1 row affected (0.10 sec)

mysql> select * from temp4 inner join temp5 on temp4.id=temp5.id;
Empty set (0.00 sec)

mysql> select * from temp4 inner join temp5 on temp4.name=temp5.name;
ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_0900_ai_ci,IMPLICIT) and (utf8mb4_0900_as_cs,IMPLICIT) for operation '='
解决
mysql> select * from temp4 inner join temp5 on temp4.name=temp5.name collate utf8mb4_0900_as_cs;
+----+------+-------+-------+----+------+
| id | name | name2 | name3 | id | name |
+----+------+-------+-------+----+------+
| 10 | a    | a     | a     |  1 | a    |
| 11 | A    | A     | A     |  2 | A    |
+----+------+-------+-------+----+------+
2 rows in set (0.00 sec)

Unicode(Universal Code)是一种在计算机上使用的字符编码。
Unicode 是为了解决传统的字符编码方案的局限而产生的,它为每种语言中的每个字符设定了统一并且唯一的二进制编码,以满足跨语言、跨平台进行文本转换、处理理的要求。

UTF8字符集
UTF8(Unicode Transformation Format with 8-bit units)是一种存放
Unicode数据的编码规则。
1. 基础拉丁字母,数字,标点符号会占用一个字节
2. 扩展的拉丁字符,希腊语,斯拉夫语,阿拉伯语等会占用两个字节
3. 韩语,中文,日语字符会占用3个或4个字节

mysql> CREATE TABLE `temp1` (
    -> `name` varchar(100) DEFAULT NULL
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected, 1 warning (0.03 sec)

mysql>  insert into temp1 values(12),(‘ab’),('中国'),('a中'),('中a');
ERROR 1054 (42S22): Unknown column '‘ab’' in 'field list'
mysql>  insert into temp1 values(12),('ab'),('中国'),('a中'),('中a');
Query OK, 5 rows affected (0.04 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> select length(name),char_length(name) from temp1;
+--------------+-------------------+
| length(name) | char_length(name) |
+--------------+-------------------+
|            2 |                 2 |
|            2 |                 2 |
|            6 |                 2 |
|            4 |                 2 |
|            4 |                 2 |
+--------------+-------------------+
5 rows in set (0.00 sec)

mysql> select * from temp1;
+--------+
| name   |
+--------+
| 12     |
| ab     |
| 中国   |
| a中    |
| 中a    |
+--------+
5 rows in set (0.01 sec)

不同字符集下字符空间消耗

mysql需要转换字符集,用mysqldump导入导出方式,不能强行更改字符集;

mysql> create table temp1(name varchar(3));
Query OK, 0 rows affected (0.14 sec)

mysql> show create table temp1;
+-------+----------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                               |
+-------+----------------------------------------------------------------------------------------------------------------------------+
| temp1 | CREATE TABLE `temp1` (
  `name` varchar(3) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> insert into temp1 values('aaa');
Query OK, 1 row affected (0.06 sec)

mysql> insert into temp1 values('aaaa');
ERROR 1406 (22001): Data too long for column 'name' at row 1
mysql> insert into temp1 values('中国人');
Query OK, 1 row affected (0.10 sec)

mysql> insert into temp1 values('中国人好');
ERROR 1406 (22001): Data too long for column 'name' at row 1
mysql> alter table temp1 add name2 char(255);
Query OK, 0 rows affected (0.09 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table temp1 add name2 char(256);
ERROR 1074 (42000): Column length too big for column 'name2' (max = 255); use BLOB or TEXT instead
mysql> desc information_schema.character_sets;
+----------------------+------------------+------+-----+---------+-------+
| Field                | Type             | Null | Key | Default | Extra |
+----------------------+------------------+------+-----+---------+-------+
| CHARACTER_SET_NAME   | varchar(64)      | NO   |     | NULL    |       |
| DEFAULT_COLLATE_NAME | varchar(64)      | NO   |     | NULL    |       |
| DESCRIPTION          | varchar(2048)    | NO   |     | NULL    |       |
| MAXLEN               | int(10) unsigned | NO   |     | NULL    |       |
+----------------------+------------------+------+-----+---------+-------+
4 rows in set (0.01 sec)

mysql> select * from information_schema.character_sets;
+--------------------+----------------------+---------------------------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION                     | MAXLEN |
+--------------------+----------------------+---------------------------------+--------+
| big5               | big5_chinese_ci      | Big5 Traditional Chinese        |      2 |
| dec8               | dec8_swedish_ci      | DEC West European               |      1 |
| cp850              | cp850_general_ci     | DOS West European               |      1 |
| hp8                | hp8_english_ci       | HP West European                |      1 |
| koi8r              | koi8r_general_ci     | KOI8-R Relcom Russian           |      1 |
| latin1             | latin1_swedish_ci    | cp1252 West European            |      1 |
| latin2             | latin2_general_ci    | ISO 8859-2 Central European     |      1 |
| swe7               | swe7_swedish_ci      | 7bit Swedish                    |      1 |
| ascii              | ascii_general_ci     | US ASCII                        |      1 |
| ujis               | ujis_japanese_ci     | EUC-JP Japanese                 |      3 |
| sjis               | sjis_japanese_ci     | Shift-JIS Japanese              |      2 |
| hebrew             | hebrew_general_ci    | ISO 8859-8 Hebrew               |      1 |
| tis620             | tis620_thai_ci       | TIS620 Thai                     |      1 |
| euckr              | euckr_korean_ci      | EUC-KR Korean                   |      2 |
| koi8u              | koi8u_general_ci     | KOI8-U Ukrainian                |      1 |
| gb2312             | gb2312_chinese_ci    | GB2312 Simplified Chinese       |      2 |
| greek              | greek_general_ci     | ISO 8859-7 Greek                |      1 |
| cp1250             | cp1250_general_ci    | Windows Central European        |      1 |
| gbk                | gbk_chinese_ci       | GBK Simplified Chinese          |      2 |
| latin5             | latin5_turkish_ci    | ISO 8859-9 Turkish              |      1 |
| armscii8           | armscii8_general_ci  | ARMSCII-8 Armenian              |      1 |
| utf8               | utf8_general_ci      | UTF-8 Unicode                   |      3 |
| ucs2               | ucs2_general_ci      | UCS-2 Unicode                   |      2 |
| cp866              | cp866_general_ci     | DOS Russian                     |      1 |
| keybcs2            | keybcs2_general_ci   | DOS Kamenicky Czech-Slovak      |      1 |
| macce              | macce_general_ci     | Mac Central European            |      1 |
| macroman           | macroman_general_ci  | Mac West European               |      1 |
| cp852              | cp852_general_ci     | DOS Central European            |      1 |
| latin7             | latin7_general_ci    | ISO 8859-13 Baltic              |      1 |
| cp1251             | cp1251_general_ci    | Windows Cyrillic                |      1 |
| utf16              | utf16_general_ci     | UTF-16 Unicode                  |      4 |
| utf16le            | utf16le_general_ci   | UTF-16LE Unicode                |      4 |
| cp1256             | cp1256_general_ci    | Windows Arabic                  |      1 |
| cp1257             | cp1257_general_ci    | Windows Baltic                  |      1 |
| utf32              | utf32_general_ci     | UTF-32 Unicode                  |      4 |
| binary             | binary               | Binary pseudo charset           |      1 |
| geostd8            | geostd8_general_ci   | GEOSTD8 Georgian                |      1 |
| cp932              | cp932_japanese_ci    | SJIS for Windows Japanese       |      2 |
| eucjpms            | eucjpms_japanese_ci  | UJIS for Windows Japanese       |      3 |
| gb18030            | gb18030_chinese_ci   | China National Standard GB18030 |      4 |
| utf8mb4            | utf8mb4_0900_ai_ci   | UTF-8 Unicode                   |      4 |
+--------------------+----------------------+---------------------------------+--------+
41 rows in set (0.00 sec)

mysql> desc temp1;
+-------+------------+------+-----+---------+-------+
| Field | Type       | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| name  | varchar(3) | YES  |     | NULL    |       |
| name2 | char(255)  | YES  |     | NULL    |       |
+-------+------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

mysql> create table temp1(id int,name varchar(60000),name2 varchar(6000)) character set latin1;
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
mysql> create table temp1(id int,name varchar(60000),name2 varchar(5000)) character set latin1;
Query OK, 0 rows affected (0.14 sec)

mysql> drop table temp1;
Query OK, 0 rows affected (0.05 sec)

mysql> create table temp1(id int,name varchar(60000),name2 varchar(5000)) character set utf8;
ERROR 1074 (42000): Column length too big for column 'name' (max = 21845); use BLOB or TEXT instead
mysql> select 65535/3;
+------------+
| 65535/3    |
+------------+
| 21845.0000 |
+------------+
1 row in set (0.00 sec)

mysql> create table temp1(id int,name varchar(20000),name2 varchar(1000)) character set utf8;
Query OK, 0 rows affected, 1 warning (0.07 sec)

mysql> drop table temp1;
Query OK, 0 rows affected (0.06 sec)

mysql> create table temp1(id int,name varchar(20000),name2 varchar(1000)) character set utf8mb4;
ERROR 1074 (42000): Column length too big for column 'name' (max = 16383); use BLOB or TEXT instead


mysql> mysql> set names utf8;
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> create table temp1(id int,name varchar(10));
Query OK, 0 rows affected (0.11 sec)

mysql> insert into temp1 values(1,'');
Query OK, 1 row affected (0.11 sec)

mysql> select hex(convert('' using utf8));
+--------------------------------+
| hex(convert('' using utf8))  |
+--------------------------------+
| E68891                         |
+--------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> set names gbk;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from temp1;
+------+------+
| id   | name |
+------+------+
|    1 | ?     |
+------+------+
1 row in set (0.00 sec)

mysql> set names utf8;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> select * from temp1;
+------+------+
| id   | name |
+------+------+
|    1 | 我   |
+------+------+
1 row in set (0.00 sec)

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into temp1 values(2,'');
Query OK, 1 row affected (0.09 sec)

mysql> select * from temp1;
+------+------+
| id   | name |
+------+------+
|    1 | ?    |
|    2 | 中   |
+------+------+
2 rows in set (0.00 sec)

mysql> set names utf8;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> select * from temp1;
+------+--------+
| id   | name   |
+------+--------+
|    1 | 我     |
|    2 | ??-    |
+------+--------+
2 rows in set (0.00 sec)

mysql> set names gbk;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into temp values(3,'');
ERROR 1406 (22001): Data too long for column 'name' at row 1

mysql> create table temp(name varchar(10)) charset latin1;
Query OK, 0 rows affected (0.02 sec)

mysql> set names utf8;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> insert into temp values('中国');
ERROR 1366 (HY000): Incorrect string value: '\xE4\xB8\xAD\xE5\x9B\xBD' for column 'name' at row 1

 

posted @ 2019-04-08 19:45  HoneyBuddy  阅读(472)  评论(0编辑  收藏  举报