MYSQL 5.7 VARCHAR 类型实验
MYSQL 的VARCHAR 类型字段的最多能存储多少字符?模糊记得 VARCHAR 最多能存65535个字符,真的吗?
理论上,一个字符类型能存的字符数量跟选取的编码字符集和存储长度限制肯定是有关系的,字符编码长度越小,长度上限越大,能存的字符就越多。
OK!我们先用字符编码长度最小的latin1做测试:
[testdb]> create table t5(name varchar(65535)) charset=latin1;
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
[testdb]> create table t5(name varchar(65534)) charset=latin1;
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
[testdb]> create table t5(name varchar(65533)) charset=latin1;
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
[testdb]> create table t5(name varchar(65532)) charset=latin1;
Query OK, 0 rows affected (0.01 sec)
一番折腾下来,我们发现被 Row size 限制了,不过测试结果很明显,使用 latin1字符编码时varchar最多能存 65532ge字符,真的如此吗?
答案是 NO!
这个结论明显经不起推敲,参考文档,VARCHAR存储长度超过255的字符串时,需要使用2个字节的前缀表示存储字符串占用的存储空间长度(字节数)。
(2个字节16bit,2^16-1=65535 这也从从另一个层面解释了65535 字节这个限制)
参考MYSQL 5.7 官档:
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 65,535.
The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.
See Section C.10.4, “Limits on Table Column Count and Row Size”.
In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data.
The length prefix indicates the number of bytes in the value.
A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.
那么,65535-2 =65533 ,但是 create table t5(name varchar(65533)) charset=latin1 依然执行失败了,why?
因为我们忽略了行格式中的 null 标志位,因为我们的表只定义了一个字段,所以标志位需要占用行的一个字节(关于null标志位这里不延伸)。
将name字段定义字段为not null 即可以关闭null 标志位,继续测试:
root@localhost 17:00: [testdb]> create table t6(name varchar(65534) not null) charset=latin1;
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
root@localhost 17:00: [testdb]> create table t6(name varchar(65533) not null) charset=latin1;
Query OK, 0 rows affected (0.01 sec)
OK!测试符合理论!
那么在 utf8mb4 下最多能存多少个字符呢?
首先我们来看下试验环境的字符集和行格式相关设置,MYSQL 版本是5.7.22。数据库默认字符集是 utf8mb4
[testdb]> show variables like '%char%';
+--------------------------+----------------------------------------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/mysql-5.7.22-linux-glibc2.12-x86_64/share/charsets/ |
+--------------------------+----------------------------------------------------------------+
8 rows in set (0.00 sec)
创建一个表,指定字段长度为65535:
[testdb]> create table t3(name varchar(65535) primary key);
ERROR 1074 (42000): Column length too big for column 'name' (max = 16383); use BLOB or TEXT instead
根据以上错误信息提示,字段长度最大值为16383;为什么是16383这个值,而不是其他值?
首先依然是被 65,535这个行长度限制了,我们来看看官档中关于 Row size 的描述。
Row Size Limits
The maximum row size for a given table is determined by several factors:
The internal representation of a MySQL table has a maximum row size limit of 65,535 bytes, even if the storage engine is capable of supporting larger rows.
BLOB and TEXT columns only contribute 9 to 12 bytes toward the row size limit because their contents are stored separately from the rest of the row.
也就是说,即使你的存储引擎支持更大的行长度,但是MYSQL 依然限制 Row size为65535;
BLOB and TEXT 这两种类型字段只占用行存储的9-12个字节,其他的内容分开存储。
其次创建表时没有指定表的字符集,所以默认继承数据库字符集 utf8mb4;
在utf8mb4 编码中,字符的最大编码长度是4,比如中文;
所以为了保证存储的字符串实际存储空间小于65535字节,字符串长度不能大于 floor(65535/4)=16383
但是以16383长度再次创建表格,依然提示错误,why?
[testdb]> create table t3(name varchar(16383) primary key);
ERROR 1071 (42000): Specified key was too long; max key length is 3072 bytes
注意看提示信息!这次不再是提示 Column length too big ,而是 Specified key was too long;
Look 下面的官方描述:
Both DYNAMIC and COMPRESSED row formats support index key prefixes up to 3072 bytes.
This feature is controlled by the innodb_large_prefix configuration option, which is enabled by default.
See the innodb_large_prefix option description for more information.
原来 DYNAMIC and COMPRESSED 行格式默认支持索引长度不能超过3072字节.
而我们的 name是聚集索引,整个字段值作为索引键值,所以索引长度必然超限。
而且它还告诉我们,可通过 innodb_large_prefix这个变量来控制这个特性。
检查下我们的试验环境,行格式刚好是 dynamic :
[testdb]> show variables like '%format%';
+---------------------------+-------------------+
| Variable_name | Value |
+---------------------------+-------------------+
| binlog_format | ROW |
| date_format | %Y-%m-%d |
| datetime_format | %Y-%m-%d %H:%i:%s |
| default_week_format | 0 |
| innodb_default_row_format | dynamic |
| innodb_file_format | Barracuda |
| innodb_file_format_check | ON |
| innodb_file_format_max | Barracuda |
| time_format | %H:%i:%s |
+---------------------------+-------------------+
3072字节除以 utf8mb4 的最大编码长度4字节,在主键字段上长度上限应该是768,测试如下:
[testdb]> create table t4(name varchar(769) primary key) charset=utf8mb4;
ERROR 1071 (42000): Specified key was too long; max key length is 3072 bytes
[testdb]> create table t4(name varchar(768) primary key) charset=utf8mb4;
Query OK, 0 rows affected (0.01 sec)
不出所料,769长度字段建表失败,768长度字段建表成功。
现在抛开索引长度的限制,再次测试:
[testdb]> create table t41(name varchar(16383) not null) charset=utf8mb4;
Query OK, 0 rows affected (0.02 sec)
建表成功!
基于以上理论和实验:
在utf8 编码字符集中,字符的最大编码长度是3字节,比如中文;所以如果 name作为主键,这个字段字符长度不能超过 3072/3=1024;
[testdb]> create table t3(name varchar(1025) primary key) charset=utf8;
ERROR 1071 (42000): Specified key was too long; max key length is 3072 bytes
[testdb]> create table t3(name varchar(1024) primary key) charset=utf8;
Query OK, 0 rows affected (0.01 sec)
在utf8 编码字符集环境中,如果不使用索引,基于验证上面的理论 65535/3= 21845:
[testdb]> create table t32(name varchar(21845) not null ) charset=utf8;
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
建表语句依然报错?因为 "VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data."
存储空间字符串前缀需要占用2个字节,所以创建失败。
[testdb]> create table t32(name varchar(21844) not null ) charset=utf8;
Query OK, 0 rows affected (0.01 sec)
建表成功了!
结论:
在latin1 编码字符集中,VARCHAR 类型字段最多能存储65533 个字符;
在utf8 编码字符集中,VARCHAR 类型字段最多能存储21844 个字符;
在utf8mb4 编码字符集中,VARCHAR 类型字段最多能存储16383 个字符;
以上是关于VARCHAR 类型字段存储字符长度,行长度以及索引长度的限制的一个小试验!
不妥之处欢迎指正!