字符编码简要 - Zoe233

Tips：

Windows操作系统中文版系统的字符编码为GBK。

linux操作系统默认的是utf-8编码

打印系统默认编码：

#打印系统默认编码
import sys
print(sys.getdefaultencoding())

在python2默认编码是ASCII；

“In Python 3, all strings are sequences of Unicode characters. ”——python3里默认编码方式是utf-8，但是python3中的所有字符串都是unicode编码。

　　　　In Python 3, all strings are sequences of Unicode characters. There is no such thing as a Python string encoded in utf-8, or a Python string encoded as CP-1252. “Is this string utf-8?” is an invalid question. utf-8 is a way of encoding characters as a sequence of bytes. If you want to take a string and turn it into a sequence of bytes in a particular character encoding, Python 3 can help you with that. If you want to take a sequence of bytes and turn it into a string, Python 3 can help you with that too. Bytes are not characters; bytes are bytes. Characters are an abstraction. A string is a sequence of those abstractions.

　　unicode 分为 utf-32(占4个字节),utf-16(占两个字节)，utf-8(占1-4个字节)， so utf-16就是现在最常用的unicode版本，不过在文件里存的还是utf-8，因为utf-8省空间

　　在py3中encode,在转码的同时还会把string 变成bytes类型，decode在解码的同时还会把bytes变回string

在Pycharm中，将默认文件编码更改为GBK之后，创建一个新的.py文件，在未声明文件编码时，执行下面的代码：

1 import sys
2 print(sys.getdefaultencoding())
3 
4 s='中文世界'  #字符编码在python3中默认就是unicode编码
5 print(s)

由于Pycharm的默认文件编码为GBK，而python3的默认编码为uicode，会因为编码不匹配而导致报错。

解决报错的方式有两种，一种是调整Pycharm编辑器的默认编码；另一种是在Python中声明文件编码。

1 #!/usr/bin/env python
2 # -*- coding:gbk -*-   #声明的文件的编码
3 # Author:Zoe
4 
5 import sys
6 print(sys.getdefaultencoding())
7 
8 s='中文世界'  #还是unicode编码
9 print(s)

在Python3中执行下面代码：

 1 #!/usr/bin/env python
 2 # -*- coding:gbk -*-
 3 # Author:Zoe
 4 
 5 import sys
 6 print(sys.getdefaultencoding())
 7 
 8 s='中文世界'
 9 print(s)
10 print(s.encode('gbk'))
11 print(s.encode('utf-8'))
12 print(s.encode('gb2312').decode('gb2312'))

返回的是：

通过上面代码的执行结果，我们可以发现，在Python3中还会将编码解码时解码为unicode之后再变成bytes。

posted on 2017-06-21 17:02 Zoe233 阅读(154) 评论(0) 编辑收藏举报