(转)MySQL之char、varchar和text的设计

from:

http://www.cnblogs.com/billyxp/p/3548540.html

最近有表结构设计中出现了varchar（10000）的设计引起了大家的讨论，我们下面就来分析分析。

首先我们先普及一下常识：

1、char（n）和varchar（n）中括号中n代表字符的个数，并不代表字节个数，所以当使用了中文的时候(UTF8)意味着可以插入m个中文，但是实际会占用m*3个字节。

2、同时char和varchar最大的区别就在于char不管实际value都会占用n个字符的空间，而varchar只会占用实际字符应该占用的空间+1，并且实际空间+1<=n。

3、超过char和varchar的n设置后，字符串会被截断。

4、char的上限为255字节，varchar的上限65535字节，text的上限为65535。

5、char在存储的时候会截断尾部的空格，varchar和text不会。

6、varchar会使用1-3个字节来存储长度，text不会。

下图可以非常明显的看到结果：

Value	CHAR(4)	Storage Required	VARCHAR(4)	Storage Required
''	' '	4 bytes	''	1 byte
'ab'	'ab '	4 bytes	'ab'	3 bytes
'abcd'	'abcd'	4 bytes	'abcd'	5 bytes
'abcdefgh'	'abcd'	4 bytes	'abcd'	5 bytes

总体来说：

1、char，存定长，速度快，存在空间浪费的可能，会处理尾部空格，上限255。

2、varchar，存变长，速度慢，不存在空间浪费，不处理尾部空格，上限65535，但是有存储长度实际65532最大可用。

3、text，存变长大数据，速度慢，不存在空间浪费，不处理尾部空格，上限65535，会用额外空间存放数据长度，顾可以全部使用65535。

接下来，我们说说这个场景的问题：

当varchar（n）后面的n非常大的时候我们是使用varchar好，还是text好呢？这是个明显的量变引发质变的问题。我们从2个方面考虑，第一是空间，第二是性能。

首先从空间方面：

从官方文档中我们可以得知当varchar大于某些数值的时候，其会自动转换为text，大概规则如下：

- 大于varchar（255）变为 tinytext
- 大于varchar（500）变为 text
- 大于varchar（20000）变为 mediumtext

所以对于过大的内容使用varchar和text没有太多区别。

其次从性能方面：

索引会是影响性能的最关键因素，而对于text来说，只能添加前缀索引，并且前缀索引最大只能达到1000字节。

而貌似varhcar可以添加全部索引，但是经过测试，其实也不是。由于会进行内部的转换，所以long varchar其实也只能添加1000字节的索引，如果超长了会自动截断。

localhost.test>create table test (a varchar(1500));
Query OK, 0 rows affected (0.01 sec)

localhost.test>alter table test add index idx_a(a);
Query OK, 0 rows affected, 2 warnings (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 2

localhost.test>show warnings;
+---------+------+---------------------------------------------------------+
| Level   | Code | Message                                                 |
+---------+------+---------------------------------------------------------+
| Warning | 1071 | Specified key was too long; max key length is 767 bytes |
| Warning | 1071 | Specified key was too long; max key length is 767 bytes |
+---------+------+---------------------------------------------------------+

从上面可以明显单看到索引被截断了。而这个767是怎么回事呢？这是由于innodb自身的问题，使用innodb_large_prefix设置。

从索引上看其实long varchar和text也没有太多区别。

所以我们认为当超过255的长度之后，使用varchar和text没有本质区别，只需要考虑一下两个类型的特性即可。（主要考虑text没有默认值的问题）

CREATE TABLE `test` (
  `id` int(11) DEFAULT NULL,
  `a` varchar(500) DEFAULT NULL,
  `b` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8

+----------+------------+-----------------------------------+
| Query_ID | Duration   | Query                             |
+----------+------------+-----------------------------------+
|        1 | 0.01513200 | select a from test where id=10000 |
|        2 | 0.01384500 | select b from test where id=10000 |
|        3 | 0.01124300 | select a from test where id=15000 |
|        4 | 0.01971600 | select b from test where id=15000 |
+----------+------------+-----------------------------------+

从上面的简单测试看，基本上是没有什么区别的，但是个人推荐使用varchar（10000），毕竟这个还有截断，可以保证字段的最大值可控，如果使用text那么如果code有漏洞很有可能就写入数据库一个很大的内容，会造成风险。

故，本着short is better原则，还是使用varchar根据需求来限制最大上限最好。

附录：各个字段类型的存储需求

Data Type	Storage Required
TINYINT	1 byte
SMALLINT	2 bytes
MEDIUMINT	3 bytes
INT, INTEGER	4 bytes
BIGINT	8 bytes
FLOAT(p)	4 bytes if 0 <= p <= 24, 8 bytes if 25 <= p <= 53
FLOAT	4 bytes
DOUBLE [PRECISION], REAL	8 bytes
DECIMAL(M,D), NUMERIC(M,D)	Varies; see following discussion
BIT(M)	approximately (M+7)/8 bytes

Data Type	Storage Required Before MySQL 5.6.4	Storage Required as of MySQL 5.6.4
YEAR	1 byte	1 byte
DATE	3 bytes	3 bytes
TIME	3 bytes	3 bytes + fractional seconds storage
DATETIME	8 bytes	5 bytes + fractional seconds storage
TIMESTAMP	4 bytes	4 bytes + fractional seconds storage

Data Type	Storage Required
CHAR(M)	M × w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set
BINARY(M)	M bytes, 0 < = M <= 255
VARCHAR(M), VARBINARY(M)	L + 1 bytes if column values require 0 – 255 bytes, L + 2 bytes if values may require more than 255 bytes
TINYBLOB, TINYTEXT	L + 1 bytes, where L < 2⁸
BLOB, TEXT	L + 2 bytes, where L < 2¹⁶
MEDIUMBLOB, MEDIUMTEXT	L + 3 bytes, where L < 2²⁴
LONGBLOB, LONGTEXT	L + 4 bytes, where L < 2³²
ENUM('*value1','value2*',...)	1 or 2 bytes, depending on the number of enumeration values (65,535 values maximum)
SET('*value1','value2*',...)	1, 2, 3, 4, or 8 bytes, depending on the number of set members (64 members maximum)

posted @ 2018-05-18 22:52 Dar_Alpha 阅读(2579) 评论(0) 收藏举报

刷新页面返回顶部

(转)MySQL之char、varchar和text的设计

公告