Python 内置编码器
基于 python 3.4.4. 原文在帮助文档
- Python »
- 3.4.4 Documentation »
- The Python Standard Library »
- 7. Binary Data Services »
-
7.2. codecs — Codec registry and base classes
Python自带许多内置的编解码器,要么实现为C函数,要么使用字典作为映射表。下表按名称列出了编解码器,以及一些常见别名,以及可能使用编码的语言。别名列表和语言列表都不是详尽无遗的。请注意,只有大小写不同或使用连字符而不是下划线的拼写备选方案也是有效的别名;因此,例如“utf-8”是“utf8”编解码器的有效别名。
CPython实现细节:一些常见的编码可以绕过编解码器查找机制以提高性能。CPython仅对有限的别名集识别这些优化机会:utf-8、utf8、latin-1、latin1、iso-8859-1、mbcs(仅限Windows)、ascii、utf-16和utf-32。对这些编码使用其他拼写可能会导致执行速度变慢。
许多字符集支持相同的语言。它们在单个字符(例如是否支持EURO SIGN)以及字符分配到代码位置方面有所不同。特别是对于欧洲语言,通常存在以下变体:
- an ISO 8859 codeset
- a Microsoft Windows code page, which is typically derived from a 8859 codeset, but replaces control characters with additional graphic characters
- an IBM EBCDIC code page
- an IBM PC code page, which is ASCII compatible
Codec | Aliases | Languages |
---|---|---|
ascii | 646, us-ascii | English |
big5 | big5-tw, csbig5 | Traditional Chinese |
big5hkscs | big5-hkscs, hkscs | Traditional Chinese |
cp037 | IBM037, IBM039 | English |
cp273 | 273, IBM273, csIBM273 |
German New in version 3.4. |
cp424 | EBCDIC-CP-HE, IBM424 | Hebrew |
cp437 | 437, IBM437 | English |
cp500 | EBCDIC-CP-BE, EBCDIC-CP-CH, IBM500 | Western Europe |
cp720 | Arabic | |
cp737 | Greek | |
cp775 | IBM775 | Baltic languages |
cp850 | 850, IBM850 | Western Europe |
cp852 | 852, IBM852 | Central and Eastern Europe |
cp855 | 855, IBM855 | Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
cp856 | Hebrew | |
cp857 | 857, IBM857 | Turkish |
cp858 | 858, IBM858 | Western Europe |
cp860 | 860, IBM860 | Portuguese |
cp861 | 861, CP-IS, IBM861 | Icelandic |
cp862 | 862, IBM862 | Hebrew |
cp863 | 863, IBM863 | Canadian |
cp864 | IBM864 | Arabic |
cp865 | 865, IBM865 | Danish, Norwegian |
cp866 | 866, IBM866 | Russian |
cp869 | 869, CP-GR, IBM869 | Greek |
cp874 | Thai | |
cp875 | Greek | |
cp932 | 932, ms932, mskanji, ms-kanji | Japanese |
cp949 | 949, ms949, uhc | Korean |
cp950 | 950, ms950 | Traditional Chinese |
cp1006 | Urdu | |
cp1026 | ibm1026 | Turkish |
cp1125 | 1125, ibm1125, cp866u, ruscii |
Ukrainian New in version 3.4. |
cp1140 | ibm1140 | Western Europe |
cp1250 | windows-1250 | Central and Eastern Europe |
cp1251 | windows-1251 | Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
cp1252 | windows-1252 | Western Europe |
cp1253 | windows-1253 | Greek |
cp1254 | windows-1254 | Turkish |
cp1255 | windows-1255 | Hebrew |
cp1256 | windows-1256 | Arabic |
cp1257 | windows-1257 | Baltic languages |
cp1258 | windows-1258 | Vietnamese |
cp65001 |
Windows only: Windows UTF-8 (CP_UTF8) New in version 3.3. |
|
euc_jp | eucjp, ujis, u-jis | Japanese |
euc_jis_2004 | jisx0213, eucjis2004 | Japanese |
euc_jisx0213 | eucjisx0213 | Japanese |
euc_kr | euckr, korean, ksc5601, ks_c-5601, ks_c-5601-1987, ksx1001, ks_x-1001 | Korean |
gb2312 | chinese, csiso58gb231280, euc- cn, euccn, eucgb2312-cn, gb2312-1980, gb2312-80, iso- ir-58 | Simplified Chinese |
gbk | 936, cp936, ms936 | Unified Chinese |
gb18030 | gb18030-2000 | Unified Chinese |
hz | hzgb, hz-gb, hz-gb-2312 | Simplified Chinese |
iso2022_jp | csiso2022jp, iso2022jp, iso-2022-jp | Japanese |
iso2022_jp_1 | iso2022jp-1, iso-2022-jp-1 | Japanese |
iso2022_jp_2 | iso2022jp-2, iso-2022-jp-2 | Japanese, Korean, Simplified Chinese, Western Europe, Greek |
iso2022_jp_2004 | iso2022jp-2004, iso-2022-jp-2004 | Japanese |
iso2022_jp_3 | iso2022jp-3, iso-2022-jp-3 | Japanese |
iso2022_jp_ext | iso2022jp-ext, iso-2022-jp-ext | Japanese |
iso2022_kr | csiso2022kr, iso2022kr, iso-2022-kr | Korean |
latin_1 | iso-8859-1, iso8859-1, 8859, cp819, latin, latin1, L1 | West Europe |
iso8859_2 | iso-8859-2, latin2, L2 | Central and Eastern Europe |
iso8859_3 | iso-8859-3, latin3, L3 | Esperanto, Maltese |
iso8859_4 | iso-8859-4, latin4, L4 | Baltic languages |
iso8859_5 | iso-8859-5, cyrillic | Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
iso8859_6 | iso-8859-6, arabic | Arabic |
iso8859_7 | iso-8859-7, greek, greek8 | Greek |
iso8859_8 | iso-8859-8, hebrew | Hebrew |
iso8859_9 | iso-8859-9, latin5, L5 | Turkish |
iso8859_10 | iso-8859-10, latin6, L6 | Nordic languages |
iso8859_11 | iso-8859-11, thai | Thai languages |
iso8859_13 | iso-8859-13, latin7, L7 | Baltic languages |
iso8859_14 | iso-8859-14, latin8, L8 | Celtic languages |
iso8859_15 | iso-8859-15, latin9, L9 | Western Europe |
iso8859_16 | iso-8859-16, latin10, L10 | South-Eastern Europe |
johab | cp1361, ms1361 | Korean |
koi8_r | Russian | |
koi8_u | Ukrainian | |
mac_cyrillic | maccyrillic | Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
mac_greek | macgreek | Greek |
mac_iceland | maciceland | Icelandic |
mac_latin2 | maclatin2, maccentraleurope | Central and Eastern Europe |
mac_roman | macroman, macintosh | Western Europe |
mac_turkish | macturkish | Turkish |
ptcp154 | csptcp154, pt154, cp154, cyrillic-asian | Kazakh |
shift_jis | csshiftjis, shiftjis, sjis, s_jis | Japanese |
shift_jis_2004 | shiftjis2004, sjis_2004, sjis2004 | Japanese |
shift_jisx0213 | shiftjisx0213, sjisx0213, s_jisx0213 | Japanese |
utf_32 | U32, utf32 | all languages |
utf_32_be | UTF-32BE | all languages |
utf_32_le | UTF-32LE | all languages |
utf_16 | U16, utf16 | all languages |
utf_16_be | UTF-16BE | all languages |
utf_16_le | UTF-16LE | all languages |
utf_7 | U7, unicode-1-1-utf-7 | all languages |
utf_8 | U8, UTF, utf8 | all languages |
utf_8_sig | all languages |
分类:
python
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)