postgresql 字符集

postgresql数据库支持多种字符集,在配置字符集时要分清楚服务器与客户端的字符集,字符集不一致尽管有时能够发生转换,但带来的问题也很头疼。语言环境的配置也很重要。
服务器字符集<来自文档>:

代码
Name Description Language Server? Bytes/Char Aliases
BIG5 Big Five Traditional Chinese No
1-2 WIN950, Windows950
EUC_CN Extended UNIX Code
-CN Simplified Chinese Yes 1-3
EUC_JP Extended UNIX Code
-JP Japanese Yes 1-3
EUC_JIS_2004 Extended UNIX Code
-JP, JIS X 0213 Japanese Yes 1-3
EUC_KR Extended UNIX Code
-KR Korean Yes 1-3
EUC_TW Extended UNIX Code
-TW Traditional Chinese, Taiwanese Yes 1-3
GB18030 National Standard Chinese No
1-2
GBK Extended National Standard Simplified Chinese No
1-2 WIN936, Windows936
ISO_8859_5 ISO
8859-5, ECMA 113 Latin/Cyrillic Yes 1
ISO_8859_6 ISO
8859-6, ECMA 114 Latin/Arabic Yes 1
ISO_8859_7 ISO
8859-7, ECMA 118 Latin/Greek Yes 1
ISO_8859_8 ISO
8859-8, ECMA 121 Latin/Hebrew Yes 1
JOHAB JOHAB Korean (Hangul) No
1-3
KOI8 KOI8
-R(U) Cyrillic Yes 1 KOI8R
LATIN1 ISO
8859-1, ECMA 94 Western European Yes 1 ISO88591
LATIN2 ISO
8859-2, ECMA 94 Central European Yes 1 ISO88592
LATIN3 ISO
8859-3, ECMA 94 South European Yes 1 ISO88593
LATIN4 ISO
8859-4, ECMA 94 North European Yes 1 ISO88594
LATIN5 ISO
8859-9, ECMA 128 Turkish Yes 1 ISO88599
LATIN6 ISO
8859-10, ECMA 144 Nordic Yes 1 ISO885910
LATIN7 ISO
8859-13 Baltic Yes 1 ISO885913
LATIN8 ISO
8859-14 Celtic Yes 1 ISO885914
LATIN9 ISO
8859-15 LATIN1 with Euro and accents Yes 1 ISO885915
LATIN10 ISO
8859-16, ASRO SR 14111 Romanian Yes 1 ISO885916
MULE_INTERNAL Mule
internal code Multilingual Emacs Yes 1-4
SJIS Shift JIS Japanese No
1-2 Mskanji, ShiftJIS, WIN932, Windows932
SHIFT_JIS_2004 Shift JIS, JIS X
0213 Japanese No 1-2
SQL_ASCII unspecified (see text) any Yes
1
UHC Unified Hangul Code Korean No
1-2 WIN949, Windows949
UTF8 Unicode,
8-bit all Yes 1-4 Unicode
WIN866 Windows CP866 Cyrillic Yes
1 ALT
WIN874 Windows CP874 Thai Yes
1
WIN1250 Windows CP1250 Central European Yes
1
WIN1251 Windows CP1251 Cyrillic Yes
1 WIN
WIN1252 Windows CP1252 Western European Yes
1
WIN1253 Windows CP1253 Greek Yes
1
WIN1254 Windows CP1254 Turkish Yes
1
WIN1255 Windows CP1255 Hebrew Yes
1
WIN1256 Windows CP1256 Arabic Yes
1
WIN1257 Windows CP1257 Baltic Yes
1
WIN1258 Windows CP1258 Vietnamese Yes
1 ABC, TCVN, TCVN5712, VSCII

 

常用的简体中文字符集是UTF8和EUC_CN两种。
可自动转换字符集<来自文档>:

代码
Server Character Set Available Client Character Sets
BIG5 not supported
as a server encoding
EUC_CN EUC_CN, MULE_INTERNAL, UTF8
EUC_JP EUC_JP, MULE_INTERNAL, SJIS, UTF8
EUC_KR EUC_KR, MULE_INTERNAL, UTF8
EUC_TW EUC_TW, BIG5, MULE_INTERNAL, UTF8
GB18030 not supported
as a server encoding
GBK not supported
as a server encoding
ISO_8859_5 ISO_8859_5, KOI8, MULE_INTERNAL, UTF8, WIN866, WIN1251
ISO_8859_6 ISO_8859_6, UTF8
ISO_8859_7 ISO_8859_7, UTF8
ISO_8859_8 ISO_8859_8, UTF8
JOHAB JOHAB, UTF8
KOI8 KOI8, ISO_8859_5, MULE_INTERNAL, UTF8, WIN866, WIN1251
LATIN1 LATIN1, MULE_INTERNAL, UTF8
LATIN2 LATIN2, MULE_INTERNAL, UTF8, WIN1250
LATIN3 LATIN3, MULE_INTERNAL, UTF8
LATIN4 LATIN4, MULE_INTERNAL, UTF8
LATIN5 LATIN5, UTF8
LATIN6 LATIN6, UTF8
LATIN7 LATIN7, UTF8
LATIN8 LATIN8, UTF8
LATIN9 LATIN9, UTF8
LATIN10 LATIN10, UTF8
MULE_INTERNAL MULE_INTERNAL, BIG5, EUC_CN, EUC_JP, EUC_KR, EUC_TW, ISO_8859_5, KOI8, LATIN1 to LATIN4, SJIS, WIN866, WIN1250, WIN1251
SJIS not supported
as a server encoding
SQL_ASCII any (no conversion will be performed)
UHC not supported
as a server encoding
UTF8 all supported encodings
WIN866 WIN866, ISO_8859_5, KOI8, MULE_INTERNAL, UTF8, WIN1251
WIN874 WIN874, UTF8
WIN1250 WIN1250, LATIN2, MULE_INTERNAL, UTF8
WIN1251 WIN1251, ISO_8859_5, KOI8, MULE_INTERNAL, UTF8, WIN866
WIN1252 WIN1252, UTF8
WIN1253 WIN1253, UTF8
WIN1254 WIN1254, UTF8
WIN1255 WIN1255, UTF8
WIN1256 WIN1256, UTF8
WIN1257 WIN1257, UTF8
WIN1258 WIN1258, UTF8

 

以下针对客户端与服务器字符集配置问题作几个小测试。

测试一:服务器、客户端、语言环境一致的情况

代码
[postgre@iss3984 ~]$ echo $LANG
en_US.UTF
-8
[postgre@iss3984
~]$ psql daduxiong
Welcome to psql
8.3.11 (server 8.3.10), the PostgreSQL interactive terminal.

Type: \copyright
for distribution terms
\h
for help with SQL commands
\
? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

daduxiong
=# \l
List of databases
Name
| Owner | Encoding
-----------+---------+----------
daduxiong
| postgre | UTF8
postgres
| postgre | UTF8
template0
| postgre | UTF8
template1
| postgre | UTF8
(
4 rows)

daduxiong
=# show client_encoding;
client_encoding
-----------------
UTF8
(
1 row)

daduxiong
=# insert into t1 values (1,'中国');
INSERT
0 1
daduxiong
=# select * from t1;
id
| name
----+------------------------
1 | 中国
(
1 row)

 

服务器与客户端字符集相同,在数据录入时不发生字符集转换;因语言环境也相同所以展现不会出现乱码。

测试二:客户端与服务器、语言环境不一致的情况

代码
daduxiong=# \encoding GBK
daduxiong
=# show client_encoding;
client_encoding
-----------------
GBK
(
1 row)

daduxiong
=# insert into t1 values (2,'日本');
INSERT
0 1
daduxiong
=# select * from t1;
id
| name
----+-------------------------
1 | �й�
2 | 日本
(
2 rows)

 

客户端与服务器的字符集不一致,在数据录入时将发生字符集转换;当前展现的第二条记录非乱码形式是因为客户端字符集为GBK,在UTF8下同样出现乱码,在使用时需要语言环境进行配置。

测试三:服务器与客户端、语言环境不一致的情况

代码
daduxiong=# \q
[postgre@iss3984
~]$ LANG=GBK export LANG
[postgre@iss3984
~]$ echo $LANG
GBK
[postgre@iss3984
~]$ psql daduxiong
Welcome to psql
8.3.11 (server 8.3.10), the PostgreSQL interactive terminal.

Type: \copyright
for distribution terms
\h
for help with SQL commands
\
? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

daduxiong
=# show client_encoding;
client_encoding
-----------------
UTF8
(
1 row)

daduxiong
=# \encoding GBK
daduxiong
=# select * from t1;
id
| name
----+-------------------------
1 | �й�
2 | 日本
(
2 rows)

daduxiong
=# insert into t1 values (3,'美国');
INSERT
0 1
daduxiong
=# select * from t1;
id
| name
----+-------------------------
1 | �й�
2 | 日本
3 | 美国
(
3 rows)
daduxiong
=# \q

 

客户端、语言环境均配置为GBK字符集,在当前环境下展现的为非乱码形式,数据录入时将发生字符集转换。

测试四:服务器与客户端、语言环境恢复一致的情况

代码
[postgre@iss3984 ~]$ LANG=en_US.UTF8 export LANG
[postgre@iss3984
~]$ psql daduxiong
Welcome to psql
8.3.11 (server 8.3.10), the PostgreSQL interactive terminal.

Type: \copyright
for distribution terms
\h
for help with SQL commands
\
? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

daduxiong
=# show client_encoding;
client_encoding
-----------------
UTF8
(
1 row)

daduxiong
=# select * from t1;
id
| name
----+-------------------------
1 | 中国
2 | 鏃ユ湰
3 | 缇庡浗
(
3 rows)

daduxiong
=#
通过恢复原始的字符集状态,所有环境均为UTF8字符集,此时发现经过字符集转换后的内容为乱码。

 

 

posted on 2010-10-14 17:11  大肚熊  阅读(10742)  评论(1编辑  收藏  举报

导航