VARCHAR and CHAR differences in mysql
MySQL supports quite a few string data types, with many variations on each. These
data types changed greatly in versions 4.1 and 5.0, which makes them even more
complicated. Since MySQL 4.1, each string column can have its own character set
and set of sorting rules for that character set, or collation (see Chapter 5 for more on
these topics). This can impact performance greatly.
VARCHAR and CHAR types
The two major string types are VARCHAR and CHAR, which store character values.
Unfortunately, it’s hard to explain exactly how these values are stored on disk and in
memory, because the implementations are storage engine-dependent (for example,
Falcon uses its own storage formats for almost every data type). We assume you are
using InnoDB and/or MyISAM. If not, you should read the documentation for your
storage engine.
Let’s take a look at how VARCHAR and CHAR values are typically stored on disk. Be
aware that a storage engine may store a CHAR or VARCHAR value differently in memory
from how it stores that value on disk, and that the server may translate the value into
yet another storage format when it retrieves it from the storage engine. Here’s a general
comparison of the two types:
VARCHAR
VARCHAR stores variable-length character strings and is the most common string
data type. It can require less storage space than fixed-length types, because it
uses only as much space as it needs (i.e., less space is used to store shorter values).
The exception is a MyISAM table created with ROW_FORMAT=FIXED, which
uses a fixed amount of space on disk for each row and can thus waste space.
VARCHAR uses 1 or 2 extra bytes to record the value’s length: 1 byte if the column’s
maximum length is 255 bytes or less, and 2 bytes if it’s more. Assuming
the latin1 character set, a VARCHAR(10) will use up to 11 bytes of storage space. A
VARCHAR(1000) can use up to 1002 bytes, because it needs 2 bytes to store length
information.
VARCHAR helps performance because it saves space. However, because the rows
are variable-length, they can grow when you update them, which can cause extra
work. If a row grows and no longer fits in its original location, the behavior is
storage engine-dependent. For example, MyISAM may fragment the row, and
InnoDB may need to split the page to fit the row into it. Other storage engines
may never update data in place at all.
It’s usually worth using VARCHAR when the maximum column length is much
larger than the average length; when updates to the field are rare, so fragmentation
is not a problem; and when you’re using a complex character set such as
UTF-8, where each character uses a variable number of bytes of storage.
In version 5.0 and newer, MySQL preserves trailing spaces when you store and
retrieve values. In versions 4.1 and older, MySQL strips trailing spaces.
CHAR
CHAR is fixed-length: MySQL always allocates enough space for the specified
number of characters. When storing a CHAR value, MySQL removes any trailing
spaces. (This was also true of VARCHAR in MySQL 4.1 and older versions—CHAR
and VARCHAR were logically identical and differed only in storage format.) Values
are padded with spaces as needed for comparisons.
CHAR is useful if you want to store very short strings, or if all the values are nearly
the same length. For example, CHAR is a good choice for MD5 values for user passwords,
which are always the same length. CHAR is also better than VARCHAR for
data that’s changed frequently, because a fixed-length row is not prone to fragmentation.
For very short columns, CHAR is also more efficient than VARCHAR; a
CHAR(1) designed to hold only Y and N values will use only one byte in a singlebyte
character set,* but a VARCHAR(1) would use two bytes because of the length
byte.
How data is stored is up to the storage engines, and not all storage engines handle
fixed-length and variable-length data the same way. The Memory storage engine uses
fixed-size rows, so it has to allocate the maximum possible space for each value even
when it’s a variable-length field. On the other hand, Falcon uses variable-length columns
even for fixed-length CHAR fields. However, the padding and trimming behavior
is consistent across storage engines, because the MySQL server itself handles that.
The sibling types for CHAR and VARCHAR are BINARY and VARBINARY, which store binary
strings. Binary strings are very similar to conventional strings, but they store bytes
instead of characters. Padding is also different: MySQL pads BINARY values with \0
(the zero byte) instead of spaces and doesn’t strip the pad value on retrieval.*
These types are useful when you need to store binary data and want MySQL to compare
the values as bytes instead of characters. The advantage of byte-wise comparisons
is more than just a matter of case insensitivity. MySQL literally compares BINARY
strings one byte at a time, according to the numeric value of each byte. As a result,
binary comparisons can be much simpler than character comparisons, so they are
faster.
Generosity Can Be Unwise
Storing the value 'hello' requires the same amount of space in a VARCHAR(5) and a
VARCHAR(200) column. Is there any advantage to using the shorter column?
As it turns out, there is a big advantage. The larger column can use much more memory,
because MySQL often allocates fixed-size chunks of memory to hold values internally.
This is especially bad for sorting or operations that use in-memory temporary
tables. The same thing happens with filesorts that use on-disk temporary tables.
The best strategy is to allocate only as much space as you really need.
最后一句话非常重要!!!可以做一个测试!