CodePage and CharSet
Code Page charset 语种
708 ASMO-708 阿拉伯字符 (ASMO 708)
720 DOS-720 阿拉伯字符 (DOS)
28596 iso-8859-6 阿拉伯字符 (ISO)
1256 windows-1256 阿拉伯字符 (Windows)
1257 windows-1257 波罗的海字符 (Windows)
852 ibm852 中欧字符 (DOS)
28592 iso-8859-2 中欧字符 (ISO)
1250 windows-1250 中欧字符 (Windows)
936 gb2312 简体中文 (GB2312)
950 big5 繁体中文 (Big5)
862 DOS-862 希伯来字符 (DOS)
866 cp866 西里尔字符 (DOS)
874 windows-874 泰语 (Windows)
932 shift_jis 日语 (Shift-JIS)
949 ks_c_5601-1987 朝鲜语
1251 windows-1251 西里尔字符 (Windows)
1252 iso-8859-1 西欧字符
1253 windows-1253 希腊字符 (Windows)
1254 iso-8859-9 土耳其字符 (Windows)
1255 windows-1255 希伯来字符 (Windows)
1258 windows-1258 越南字符 (Windows)
20866 koi8-r 西里尔字符 (KOI8-R)
21866 koi8-ru 西里尔字符 (KOI8-U)
28595 iso-8859-5 西里尔字符 (ISO)
28597 iso-8859-7 希腊字符 (ISO)
28598 iso-8859-8 希伯来字符 (ISO-Visual)
38598 iso-8859-8-i 希伯来字符 (ISO-Logical)
50932 _autodetect 日语 (自动选择)
51932 euc-jp 日语 (EUC)
52936 hz-gb-2312 简体中文 (HZ)
65001 utf-8 Unicode (UTF-8)
代码页(CodePage) | 名称(CharSet) | 显示名称(中文) | 显示名称(英文) |
---|---|---|---|
Info.CodePage | Info.Name(CharSet) | Info.DisplayName(cn) | Info.DisplayName(en) |
37 | IBM037 | IBM EBCDIC(美国 - 加拿大) | IBM EBCDIC (US-Canada) |
437 | IBM437 | OEM 美国 | OEM United States |
500 | IBM500 | IBM EBCDIC(国际) | IBM EBCDIC (International) |
708 | ASMO-708 | 阿拉伯字符 (ASMO 708) | Arabic (ASMO 708) |
720 | DOS-720 | 阿拉伯字符 (DOS) | Arabic (DOS) |
737 | ibm737 | 希腊字符 (DOS) | Greek (DOS) |
775 | ibm775 | 波罗的海字符 (DOS) | Baltic (DOS) |
850 | ibm850 | 西欧字符 (DOS) | Western European (DOS) |
852 | ibm852 | 中欧字符 (DOS) | Central European (DOS) |
855 | IBM855 | OEM 西里尔语 | OEM Cyrillic |
857 | ibm857 | 土耳其字符 (DOS) | Turkish (DOS) |
858 | IBM00858 | OEM 多语言拉丁语 I | OEM Multilingual Latin I |
860 | IBM860 | 葡萄牙语 (DOS) | Portuguese (DOS) |
861 | ibm861 | 冰岛语 (DOS) | Icelandic (DOS) |
862 | DOS-862 | 希伯来字符 (DOS) | Hebrew (DOS) |
863 | IBM863 | 加拿大法语 (DOS) | French Canadian (DOS) |
864 | IBM864 | 阿拉伯字符 (864) | Arabic (864) |
865 | IBM865 | 北欧字符 (DOS) | Nordic (DOS) |
866 | cp866 | 西里尔字符 (DOS) | Cyrillic (DOS) |
869 | ibm869 | 现代希腊字符 (DOS) | Greek, Modern (DOS) |
870 | IBM870 | IBM EBCDIC(多语言拉丁语 2) | IBM EBCDIC (Multilingual Latin-2) |
874 | windows-874 | 泰语 (Windows) | Thai (Windows) |
875 | cp875 | IBM EBCDIC(现代希腊语) | IBM EBCDIC (Greek Modern) |
932 | shift_jis | 日语 (Shift-JIS) | Japanese (Shift-JIS) |
936 | gb2312 | 简体中文 (GB2312) | Chinese Simplified (GB2312) |
949 | ks_c_5601-1987 | 朝鲜语 | Korean |
950 | big5 | 繁体中文 (Big5) | Chinese Traditional (Big5) |
1026 | IBM1026 | IBM EBCDIC(土耳其拉丁语 5) | IBM EBCDIC (Turkish Latin-5) |
1047 | IBM01047 | IBM 拉丁语 1 | IBM Latin-1 |
1140 | IBM01140 | IBM EBCDIC(美国 - 加拿大 - 欧洲) | IBM EBCDIC (US-Canada-Euro) |
1141 | IBM01141 | IBM EBCDIC(德国 - 欧洲) | IBM EBCDIC (Germany-Euro) |
1142 | IBM01142 | IBM EBCDIC(丹麦 - 挪威 - 欧洲) | IBM EBCDIC (Denmark-Norway-Euro) |
1143 | IBM01143 | IBM EBCDIC(芬兰 - 瑞典 - 欧洲) | IBM EBCDIC (Finland-Sweden-Euro) |
1144 | IBM01144 | IBM EBCDIC(意大利 - 欧洲) | IBM EBCDIC (Italy-Euro) |
1145 | IBM01145 | IBM EBCDIC(西班牙 - 欧洲) | IBM EBCDIC (Spain-Euro) |
1146 | IBM01146 | IBM EBCDIC(英国 - 欧洲) | IBM EBCDIC (UK-Euro) |
1147 | IBM01147 | IBM EBCDIC(法国 - 欧洲) | IBM EBCDIC (France-Euro) |
1148 | IBM01148 | IBM EBCDIC(国际 - 欧洲) | IBM EBCDIC (International-Euro) |
1149 | IBM01149 | IBM EBCDIC(冰岛语 - 欧洲) | IBM EBCDIC (Icelandic-Euro) |
1200 | utf-16 | Unicode | Unicode |
1201 | UnicodeFFFE | Unicode (Big-Endian) | Unicode (Big-Endian) |
1250 | windows-1250 | 中欧字符 (Windows) | Central European (Windows) |
1251 | windows-1251 | 西里尔字符 (Windows) | Cyrillic (Windows) |
1252 | Windows-1252 | 西欧字符 (Windows) | Western European (Windows) |
1253 | windows-1253 | 希腊字符 (Windows) | Greek (Windows) |
1254 | windows-1254 | 土耳其字符 (Windows) | Turkish (Windows) |
1255 | windows-1255 | 希伯来字符 (Windows) | Hebrew (Windows) |
1256 | windows-1256 | 阿拉伯字符 (Windows) | Arabic (Windows) |
1257 | windows-1257 | 波罗的海字符 (Windows) | Baltic (Windows) |
1258 | windows-1258 | 越南字符 (Windows) | Vietnamese (Windows) |
1361 | Johab | 朝鲜语 (Johab) | Korean (Johab) |
10000 | macintosh | 西欧字符 (Mac) | Western European (Mac) |
10001 | x-mac-japanese | 日语 (Mac) | Japanese (Mac) |
10002 | x-mac-chinesetrad | 繁体中文 (Mac) | Chinese Traditional (Mac) |
10003 | x-mac-korean | 朝鲜语 (Mac) | Korean (Mac) |
10004 | x-mac-arabic | 阿拉伯字符 (Mac) | Arabic (Mac) |
10005 | x-mac-hebrew | 希伯来字符 (Mac) | Hebrew (Mac) |
10006 | x-mac-greek | 希腊字符 (Mac) | Greek (Mac) |
10007 | x-mac-cyrillic | 西里尔字符 (Mac) | Cyrillic (Mac) |
10008 | x-mac-chinesesimp | 简体中文 (Mac) | Chinese Simplified (Mac) |
10010 | x-mac-romanian | 罗马尼亚语 (Mac) | Romanian (Mac) |
10017 | x-mac-ukrainian | 乌克兰语 (Mac) | Ukrainian (Mac) |
10021 | x-mac-thai | 泰语 (Mac) | Thai (Mac) |
10029 | x-mac-ce | 中欧字符 (Mac) | Central European (Mac) |
10079 | x-mac-icelandic | 冰岛语 (Mac) | Icelandic (Mac) |
10081 | x-mac-turkish | 土耳其字符 (Mac) | Turkish (Mac) |
10082 | x-mac-croatian | 克罗地亚语 (Mac) | Croatian (Mac) |
20000 | x-Chinese-CNS | 繁体中文 (CNS) | Chinese Traditional (CNS) |
20001 | x-cp20001 | TCA 台湾 | TCA Taiwan |
20002 | x-Chinese-Eten | 繁体中文 (Eten) | Chinese Traditional (Eten) |
20003 | x-cp20003 | IBM5550 台湾 | IBM5550 Taiwan |
20004 | x-cp20004 | TeleText 台湾 | TeleText Taiwan |
20005 | x-cp20005 | Wang 台湾 | Wang Taiwan |
20105 | x-IA5 | 西欧字符 (IA5) | Western European (IA5) |
20106 | x-IA5-German | 德语 (IA5) | German (IA5) |
20107 | x-IA5-Swedish | 瑞典语 (IA5) | Swedish (IA5) |
20108 | x-IA5-Norwegian | 挪威语 (IA5) | Norwegian (IA5) |
20127 | us-ascii | US-ASCII | US-ASCII |
20261 | x-cp20261 | T.61 | T.61 |
20269 | x-cp20269 | ISO-6937 | ISO-6937 |
20273 | IBM273 | IBM EBCDIC(德国) | IBM EBCDIC (Germany) |
20277 | IBM277 | IBM EBCDIC(丹麦 - 挪威) | IBM EBCDIC (Denmark-Norway) |
20278 | IBM278 | IBM EBCDIC(芬兰 - 瑞典) | IBM EBCDIC (Finland-Sweden) |
20280 | IBM280 | IBM EBCDIC(意大利) | IBM EBCDIC (Italy) |
20284 | IBM284 | IBM EBCDIC(西班牙) | IBM EBCDIC (Spain) |
20285 | IBM285 | IBM EBCDIC(英国) | IBM EBCDIC (UK) |
20290 | IBM290 | IBM EBCDIC(日语片假名) | IBM EBCDIC (Japanese katakana) |
20297 | IBM297 | IBM EBCDIC(法国) | IBM EBCDIC (France) |
20420 | IBM420 | IBM EBCDIC(阿拉伯语) | IBM EBCDIC (Arabic) |
20423 | IBM423 | IBM EBCDIC(希腊语) | IBM EBCDIC (Greek) |
20424 | IBM424 | IBM EBCDIC(希伯来语) | IBM EBCDIC (Hebrew) |
20833 | x-EBCDIC-KoreanExtended | IBM EBCDIC(朝鲜语扩展) | IBM EBCDIC (Korean Extended) |
20838 | IBM-Thai | IBM EBCDIC(泰语) | IBM EBCDIC (Thai) |
20866 | koi8-r | 西里尔字符 (KOI8-R) | Cyrillic (KOI8-R) |
20871 | IBM871 | IBM EBCDIC(冰岛语) | IBM EBCDIC (Icelandic) |
20880 | IBM880 | IBM EBCDIC(西里尔俄语) | IBM EBCDIC (Cyrillic Russian) |
20905 | IBM905 | IBM EBCDIC(土耳其语) | IBM EBCDIC (Turkish) |
20924 | IBM00924 | IBM 拉丁语 1 | IBM Latin-1 |
20932 | EUC-JP | 日语(JIS 0208-1990 和 0212-1990) | Japanese (JIS 0208-1990 and 0212-1990) |
20936 | x-cp20936 | 简体中文 (GB2312-80) | Chinese Simplified (GB2312-80) |
20949 | x-cp20949 | 朝鲜语 Wansung | Korean Wansung |
21025 | cp1025 | IBM EBCDIC(西里尔塞尔维亚 - 保加利亚语) | IBM EBCDIC (Cyrillic Serbian-Bulgarian) |
21866 | koi8-u | 西里尔字符 (KOI8-U) | Cyrillic (KOI8-U) |
28591 | iso-8859-1 | 西欧字符 (ISO) | Western European (ISO) |
28592 | iso-8859-2 | 中欧字符 (ISO) | Central European (ISO) |
28593 | iso-8859-3 | 拉丁语 3 (ISO) | Latin 3 (ISO) |
28594 | iso-8859-4 | 波罗的海字符 (ISO) | Baltic (ISO) |
28595 | iso-8859-5 | 西里尔字符 (ISO) | Cyrillic (ISO) |
28596 | iso-8859-6 | 阿拉伯字符 (ISO) | Arabic (ISO) |
28597 | iso-8859-7 | 希腊字符 (ISO) | Greek (ISO) |
28598 | iso-8859-8 | 希伯来字符 (ISO-Visual) | Hebrew (ISO-Visual) |
28599 | iso-8859-9 | 土耳其字符 (ISO) | Turkish (ISO) |
28603 | iso-8859-13 | 爱沙尼亚语 (ISO) | Estonian (ISO) |
28605 | iso-8859-15 | 拉丁语 9 (ISO) | Latin 9 (ISO) |
29001 | x-Europa | 欧罗巴 | Europa |
38598 | iso-8859-8-i | 希伯来字符 (ISO-Logical) | Hebrew (ISO-Logical) |
50220 | iso-2022-jp | 日语 (JIS) | Japanese (JIS) |
50221 | csISO2022JP | 日语(JIS- 允许 1 字节假名) | Japanese (JIS-Allow 1 byte Kana) |
50222 | iso-2022-jp | 日语(JIS- 允许 1 字节假名 - SO/SI) | Japanese (JIS-Allow 1 byte Kana - SO/SI) |
50225 | iso-2022-kr | 朝鲜语 (ISO) | Korean (ISO) |
50227 | x-cp50227 | 简体中文 (ISO-2022) | Chinese Simplified (ISO-2022) |
51932 | euc-jp | 日语 (EUC) | Japanese (EUC) |
51936 | EUC-CN | 简体中文 (EUC) | Chinese Simplified (EUC) |
51949 | euc-kr | 朝鲜语 (EUC) | Korean (EUC) |
52936 | hz-gb-2312 | 简体中文 (HZ) | Chinese Simplified (HZ) |
54936 | GB18030 | 简体中文 (GB18030) | Chinese Simplified (GB18030) |
57002 | x-iscii-de | ISCII 梵文 | ISCII Devanagari |
57003 | x-iscii-be | ISCII 孟加拉语 | ISCII Bengali |
57004 | x-iscii-ta | ISCII 泰米尔语 | ISCII Tamil |
57005 | x-iscii-te | ISCII 泰卢固语 | ISCII Telugu |
57006 | x-iscii-as | ISCII 阿萨姆语 | ISCII Assamese |
57007 | x-iscii-or | ISCII 奥里雅语 | ISCII Oriya |
57008 | x-iscii-ka | ISCII 卡纳达语 | ISCII Kannada |
57009 | x-iscii-ma | ISCII 马拉雅拉姆语 | ISCII Malayalam |
57010 | x-iscii-gu | ISCII 古吉拉特语 | ISCII Gujarati |
57011 | x-iscii-pa | ISCII 旁遮普语 | ISCII Punjabi |
65000 | utf-7 | Unicode (UTF-7) | Unicode (UTF-7) |
65001 | utf-8 | Unicode (UTF-8) | Unicode (UTF-8) |
65005 | utf-32 | Unicode (UTF-32) | Unicode (UTF-32) |
65006 | utf-32BE | Unicode (UTF-32 Big-Endian) | Unicode (UTF-32 Big-Endian) |
The following Windows code pages exist:
- 874 — Thai
- 932 — Japanese
- 936 — Chinese (simplified) (PRC, Singapore)
- 949 — Korean
- 950 — Chinese (traditional) (Taiwan, Hong Kong)
- 1200 — Unicode (BMP of ISO 10646, UTF-16LE)
- 1201 — Unicode (BMP of ISO 10646, UTF-16BE)
- 1250 — Latin (Central European languages)
- 1251 — Cyrillic
- 1252 — Latin (Western European languages, replacing Code page 850)
- 1253 — Greek
- 1254 — Turkish
- 1255 — Hebrew
- 1256 — Arabic
- 1257 — Latin (Baltic languages)
- 1258 — Vietnamese
- 65000 — Unicode (BMP of ISO 10646, UTF-7)
- 65001 — Unicode (BMP of ISO 10646, UTF-8)
SBCS (Single Byte Character Set) Codepages
DBCS (Double Byte Character Set) Codepages
Table 2-3 lSO 8859 Character Sets
Standard | Languages Supported |
---|---|
ISO 8859-1 |
Western European ( |
ISO 8859-2 |
Eastern European ( |
ISO 8859-3 |
Southeastern European ( |
ISO 8859-4 |
Northern European ( |
ISO 8859-5 |
Eastern European ( |
ISO 8859-6 |
Arabic |
ISO 8859-7 |
Greek |
ISO 8859-8 |
Hebrew |
ISO 8859-9 |
Western European ( |
ISO 8859-10 |
Northern European ( |
ISO 8859-13 |
Baltic Rim ( |
ISO 8859-14 |
Celtic ( |
ISO 8859-15 |
Western European ( |
Code Page Identifiers
The following table defines the available code page identifiers.
Note ANSI code pages can be different on different computers, or can be changed for a single computer, leading to data corruption. For the most consistent results, applications should use Unicode, such as UTF-8 or UTF-16, instead of a specific code page.
Identifier | .NET Name | Additional information |
---|---|---|
037 | IBM037 | IBM EBCDIC US-Canada |
437 | IBM437 | OEM United States |
500 | IBM500 | IBM EBCDIC International |
708 | ASMO-708 | Arabic (ASMO 708) |
709 | Arabic (ASMO-449+, BCON V4) | |
710 | Arabic - Transparent Arabic | |
720 | DOS-720 | Arabic (Transparent ASMO); Arabic (DOS) |
737 | ibm737 | OEM Greek (formerly 437G); Greek (DOS) |
775 | ibm775 | OEM Baltic; Baltic (DOS) |
850 | ibm850 | OEM Multilingual Latin 1; Western European (DOS) |
852 | ibm852 | OEM Latin 2; Central European (DOS) |
855 | IBM855 | OEM Cyrillic (primarily Russian) |
857 | ibm857 | OEM Turkish; Turkish (DOS) |
858 | IBM00858 | OEM Multilingual Latin 1 + Euro symbol |
860 | IBM860 | OEM Portuguese; Portuguese (DOS) |
861 | ibm861 | OEM Icelandic; Icelandic (DOS) |
862 | DOS-862 | OEM Hebrew; Hebrew (DOS) |
863 | IBM863 | OEM French Canadian; French Canadian (DOS) |
864 | IBM864 | OEM Arabic; Arabic (864) |
865 | IBM865 | OEM Nordic; Nordic (DOS) |
866 | cp866 | OEM Russian; Cyrillic (DOS) |
869 | ibm869 | OEM Modern Greek; Greek, Modern (DOS) |
870 | IBM870 | IBM EBCDIC Multilingual/ROECE (Latin 2); IBM EBCDIC Multilingual Latin 2 |
874 | windows-874 | ANSI/OEM Thai (same as 28605, ISO 8859-15); Thai (Windows) |
875 | cp875 | IBM EBCDIC Greek Modern |
932 | shift_jis | ANSI/OEM Japanese; Japanese (Shift-JIS) |
936 | gb2312 | ANSI/OEM Simplified Chinese (PRC, Singapore); Chinese Simplified (GB2312) |
949 | ks_c_5601-1987 | ANSI/OEM Korean (Unified Hangul Code) |
950 | big5 | ANSI/OEM Traditional Chinese (Taiwan; Hong Kong SAR, PRC); Chinese Traditional (Big5) |
1026 | IBM1026 | IBM EBCDIC Turkish (Latin 5) |
1047 | IBM01047 | IBM EBCDIC Latin 1/Open System |
1140 | IBM01140 | IBM EBCDIC US-Canada (037 + Euro symbol); IBM EBCDIC (US-Canada-Euro) |
1141 | IBM01141 | IBM EBCDIC Germany (20273 + Euro symbol); IBM EBCDIC (Germany-Euro) |
1142 | IBM01142 | IBM EBCDIC Denmark-Norway (20277 + Euro symbol); IBM EBCDIC (Denmark-Norway-Euro) |
1143 | IBM01143 | IBM EBCDIC Finland-Sweden (20278 + Euro symbol); IBM EBCDIC (Finland-Sweden-Euro) |
1144 | IBM01144 | IBM EBCDIC Italy (20280 + Euro symbol); IBM EBCDIC (Italy-Euro) |
1145 | IBM01145 | IBM EBCDIC Latin America-Spain (20284 + Euro symbol); IBM EBCDIC (Spain-Euro) |
1146 | IBM01146 | IBM EBCDIC United Kingdom (20285 + Euro symbol); IBM EBCDIC (UK-Euro) |
1147 | IBM01147 | IBM EBCDIC France (20297 + Euro symbol); IBM EBCDIC (France-Euro) |
1148 | IBM01148 | IBM EBCDIC International (500 + Euro symbol); IBM EBCDIC (International-Euro) |
1149 | IBM01149 | IBM EBCDIC Icelandic (20871 + Euro symbol); IBM EBCDIC (Icelandic-Euro) |
1200 | utf-16 | Unicode UTF-16, little endian byte order (BMP of ISO 10646); available only to managed applications |
1201 | unicodeFFFE | Unicode UTF-16, big endian byte order; available only to managed applications |
1250 | windows-1250 | ANSI Central European; Central European (Windows) |
1251 | windows-1251 | ANSI Cyrillic; Cyrillic (Windows) |
1252 | windows-1252 | ANSI Latin 1; Western European (Windows) |
1253 | windows-1253 | ANSI Greek; Greek (Windows) |
1254 | windows-1254 | ANSI Turkish; Turkish (Windows) |
1255 | windows-1255 | ANSI Hebrew; Hebrew (Windows) |
1256 | windows-1256 | ANSI Arabic; Arabic (Windows) |
1257 | windows-1257 | ANSI Baltic; Baltic (Windows) |
1258 | windows-1258 | ANSI/OEM Vietnamese; Vietnamese (Windows) |
1361 | Johab | Korean (Johab) |
10000 | macintosh | MAC Roman; Western European (Mac) |
10001 | x-mac-japanese | Japanese (Mac) |
10002 | x-mac-chinesetrad | MAC Traditional Chinese (Big5); Chinese Traditional (Mac) |
10003 | x-mac-korean | Korean (Mac) |
10004 | x-mac-arabic | Arabic (Mac) |
10005 | x-mac-hebrew | Hebrew (Mac) |
10006 | x-mac-greek | Greek (Mac) |
10007 | x-mac-cyrillic | Cyrillic (Mac) |
10008 | x-mac-chinesesimp | MAC Simplified Chinese (GB 2312); Chinese Simplified (Mac) |
10010 | x-mac-romanian | Romanian (Mac) |
10017 | x-mac-ukrainian | Ukrainian (Mac) |
10021 | x-mac-thai | Thai (Mac) |
10029 | x-mac-ce | MAC Latin 2; Central European (Mac) |
10079 | x-mac-icelandic | Icelandic (Mac) |
10081 | x-mac-turkish | Turkish (Mac) |
10082 | x-mac-croatian | Croatian (Mac) |
12000 | utf-32 | Unicode UTF-32, little endian byte order; available only to managed applications |
12001 | utf-32BE | Unicode UTF-32, big endian byte order; available only to managed applications |
20000 | x-Chinese_CNS | CNS Taiwan; Chinese Traditional (CNS) |
20001 | x-cp20001 | TCA Taiwan |
20002 | x_Chinese-Eten | Eten Taiwan; Chinese Traditional (Eten) |
20003 | x-cp20003 | IBM5550 Taiwan |
20004 | x-cp20004 | TeleText Taiwan |
20005 | x-cp20005 | Wang Taiwan |
20105 | x-IA5 | IA5 (IRV International Alphabet No. 5, 7-bit); Western European (IA5) |
20106 | x-IA5-German | IA5 German (7-bit) |
20107 | x-IA5-Swedish | IA5 Swedish (7-bit) |
20108 | x-IA5-Norwegian | IA5 Norwegian (7-bit) |
20127 | us-ascii | US-ASCII (7-bit) |
20261 | x-cp20261 | T.61 |
20269 | x-cp20269 | ISO 6937 Non-Spacing Accent |
20273 | IBM273 | IBM EBCDIC Germany |
20277 | IBM277 | IBM EBCDIC Denmark-Norway |
20278 | IBM278 | IBM EBCDIC Finland-Sweden |
20280 | IBM280 | IBM EBCDIC Italy |
20284 | IBM284 | IBM EBCDIC Latin America-Spain |
20285 | IBM285 | IBM EBCDIC United Kingdom |
20290 | IBM290 | IBM EBCDIC Japanese Katakana Extended |
20297 | IBM297 | IBM EBCDIC France |
20420 | IBM420 | IBM EBCDIC Arabic |
20423 | IBM423 | IBM EBCDIC Greek |
20424 | IBM424 | IBM EBCDIC Hebrew |
20833 | x-EBCDIC-KoreanExtended | IBM EBCDIC Korean Extended |
20838 | IBM-Thai | IBM EBCDIC Thai |
20866 | koi8-r | Russian (KOI8-R); Cyrillic (KOI8-R) |
20871 | IBM871 | IBM EBCDIC Icelandic |
20880 | IBM880 | IBM EBCDIC Cyrillic Russian |
20905 | IBM905 | IBM EBCDIC Turkish |
20924 | IBM00924 | IBM EBCDIC Latin 1/Open System (1047 + Euro symbol) |
20932 | EUC-JP | Japanese (JIS 0208-1990 and 0121-1990) |
20936 | x-cp20936 | Simplified Chinese (GB2312); Chinese Simplified (GB2312-80) |
20949 | x-cp20949 | Korean Wansung |
21025 | cp1025 | IBM EBCDIC Cyrillic Serbian-Bulgarian |
21027 | (deprecated) | |
21866 | koi8-u | Ukrainian (KOI8-U); Cyrillic (KOI8-U) |
28591 | iso-8859-1 | ISO 8859-1 Latin 1; Western European (ISO) |
28592 | iso-8859-2 | ISO 8859-2 Central European; Central European (ISO) |
28593 | iso-8859-3 | ISO 8859-3 Latin 3 |
28594 | iso-8859-4 | ISO 8859-4 Baltic |
28595 | iso-8859-5 | ISO 8859-5 Cyrillic |
28596 | iso-8859-6 | ISO 8859-6 Arabic |
28597 | iso-8859-7 | ISO 8859-7 Greek |
28598 | iso-8859-8 | ISO 8859-8 Hebrew; Hebrew (ISO-Visual) |
28599 | iso-8859-9 | ISO 8859-9 Turkish |
28603 | iso-8859-13 | ISO 8859-13 Estonian |
28605 | iso-8859-15 | ISO 8859-15 Latin 9 |
29001 | x-Europa | Europa 3 |
38598 | iso-8859-8-i | ISO 8859-8 Hebrew; Hebrew (ISO-Logical) |
50220 | iso-2022-jp | ISO 2022 Japanese with no halfwidth Katakana; Japanese (JIS) |
50221 | csISO2022JP | ISO 2022 Japanese with halfwidth Katakana; Japanese (JIS-Allow 1 byte Kana) |
50222 | iso-2022-jp | ISO 2022 Japanese JIS X 0201-1989; Japanese (JIS-Allow 1 byte Kana - SO/SI) |
50225 | iso-2022-kr | ISO 2022 Korean |
50227 | x-cp50227 | ISO 2022 Simplified Chinese; Chinese Simplified (ISO 2022) |
50229 | ISO 2022 Traditional Chinese | |
50930 | EBCDIC Japanese (Katakana) Extended | |
50931 | EBCDIC US-Canada and Japanese | |
50933 | EBCDIC Korean Extended and Korean | |
50935 | EBCDIC Simplified Chinese Extended and Simplified Chinese | |
50936 | EBCDIC Simplified Chinese | |
50937 | EBCDIC US-Canada and Traditional Chinese | |
50939 | EBCDIC Japanese (Latin) Extended and Japanese | |
51932 | euc-jp | EUC Japanese |
51936 | EUC-CN | EUC Simplified Chinese; Chinese Simplified (EUC) |
51949 | euc-kr | EUC Korean |
51950 | EUC Traditional Chinese | |
52936 | hz-gb-2312 | HZ-GB2312 Simplified Chinese; Chinese Simplified (HZ) |
54936 | GB18030 | Windows XP and later: GB18030 Simplified Chinese (4 byte); Chinese Simplified (GB18030) |
57002 | x-iscii-de | ISCII Devanagari |
57003 | x-iscii-be | ISCII Bengali |
57004 | x-iscii-ta | ISCII Tamil |
57005 | x-iscii-te | ISCII Telugu |
57006 | x-iscii-as | ISCII Assamese |
57007 | x-iscii-or | ISCII Oriya |
57008 | x-iscii-ka | ISCII Kannada |
57009 | x-iscii-ma | ISCII Malayalam |
57010 | x-iscii-gu | ISCII Gujarati |
57011 | x-iscii-pa | ISCII Punjabi |
65000 | utf-7 | Unicode (UTF-7) |
65001 | utf-8 | Unicode (UTF-8) |
IBM PC (OEM) code pages [edit]
These code pages were originally embedded directly in the text mode hardware of the graphic adapters used with the IBM PC and its clones, including the original MDA and CGA adapters whose character sets could only be changed by physically replacing a ROM chip that contained the font. The interface of those adapters (emulated by all later adapters such as VGA) was typically limited to single byte character sets with only 256 characters in each font/encoding (although VGA added partial support for slightly larger character sets). Since the original IBM PC code page (number 437) was not really designed for international use, several partially compatible country or region specific variants emerged. Microsoft refers to these as the OEM code pages because they were defined by the OEM's who licensed MS-DOS for distribution with their hardware, not by Microsoft or a standard body. Examples include:
- 437 – Original IBM PC hardware code page
- 667 - Polish (Mazovia)
- 668 - Slavic
- 720 – Arabic/Middle East
- 737 – Greek
- 770 - Lithuanian
- 773 - Lithuanian
- 775 – Estonian, Lithuanian and Latvian
- 790 - Polish (Mazovia)
- 819 - ISO 8859-1
- 850 – "Multilingual (Latin-1)" (Western European languages)
- 851 - Greek
- 852 – "Slavic (Latin-2)" (Central and Eastern European languages)
- 853 - Turkish (Latin-2)
- 854 - Spanish
- 855 – Cyrillic
- 857 – Turkish
- 858 – "Multilingual" with euro symbol
- 860 – Portuguese
- 861 – Icelandic
- 862 – Hebrew
- 863 – French (Quebec French)
- 864 - Arabic/Middle East
- 865 – Danish/Norwegian Differs from 437 only in the letter Ø (ø) in place of ¥ and ¢
- 866 – Cyrillic
- 867 – Czech (Kamenický)
- 868 - Arabic/Middle East
- 869 – Greek
- 872 - Cyrillic
- 874 – Thai[7]
- 895 - Czech (Kamenický), (conflictive ID)
- 912
- 915
- 932 - Japanese (DBCS)
- 991 - Polish (Mazovia)
When dealing with older hardware, protocols and file formats, it is often necessary to support these code pages, but use of newer code pages, in particular Unicode, is encouraged for new designs.
Code page 819 is identical to Latin-1, ISO/IEC 8859-1, and with slightly-modified commands, permits MS-DOS machines to use that encoding. It was used with IBM AS/400 minicomputers.
Code pages for DBCS character sets [edit]
These code pages represent DBCS character encodings for various CJK languages. In Microsoft operating systems, these are used as both the "OEM" and "ANSI" code page for the applicable locale.
- 932 – Supports Japanese
- 936 – GBK Supports Simplified Chinese
- 949 – Supports Korean
- 950 – Supports Traditional Chinese
Microsoft code page numbers for various other character encodings [edit]
The following code page numbers are specific to Microsoft Windows. IBM may use different numbers for these code pages.
- 1200 – UTF-16LE Unicode little-endian
- 1201 – UTF-16BE Unicode big-endian
- 65000 – UTF-7 Unicode
- 65001 – UTF-8 Unicode
- 10000 – Macintosh Roman encoding (followed by several other Mac character sets)
- 10007 – Macintosh Cyrillic encoding
- 10029 – Macintosh Central European encoding
- 20127 – US-ASCII The classic US 7 bit character set with no char larger than 127
- 28591 – ISO-8859-1
- 28592 – ISO-8859-2
- 28593 – ISO-8859-3
- 28594 – ISO-8859-4
- 28595 – ISO-8859-5
- 28596 and 38596 – ISO-8859-6
- 28597 – ISO-8859-7
- 28598 and 38598 – ISO-8859-8
- 28599 – ISO-8859-9
- 28600 – ISO-8859-10
- 28601 – ISO-8859-11
- (28602 – ISO-8859-12)
- 28603 – ISO-8859-13
- 28604 – ISO-8859-14
- 28605 – ISO-8859-15
- 28606 – ISO-8859-16
Miscellaneous [edit]
- (number missing) – ASMO449+ Supports Arabic
- (number missing) – MIK Supports Bulgarian and Russian as well
Windows (ANSI) code pages [edit]
Microsoft defined a number of code pages known as the ANSI code pages (as the first one, 1252 was based on an apocryphal ANSI draft of what became ISO 8859-1). Code page 1252 is built on ISO 8859-1 but uses the range 0x80-0x9F for extra printable characters rather than the C1 control codes used in ISO-8859-1. Some of the others are based in part on other parts of ISO 8859 but often rearranged to make them closer to 1252.
- 1250 – Central and East European Latin
- 1251 – Cyrillic
- 1252 – West European Latin
- 1253 – Greek
- 1254 – Turkish
- 1255 – Hebrew
- 1256 – Arabic
- 1257 – Baltic
- 1258 – Vietnamese
- 874 – Thai
Microsoft recommends applications use UTF-8 or UCS-2/UTF-16 instead of these code pages.[8]