Tips:
Windows操作系统中文版系统的字符编码为GBK。
linux操作系统默认的是utf-8编码
打印系统默认编码:
#打印系统默认编码 import sys print(sys.getdefaultencoding())
在python2默认编码是ASCII;
“In Python 3, all strings are sequences of Unicode characters. ”——python3里默认编码方式是utf-8,但是python3中的所有字符串都是unicode编码。
In Python 3, all strings are sequences of Unicode characters. There is no such thing as a Python string encoded in utf-8, or a Python string encoded as CP-1252. “Is this string utf-8?” is an invalid question. utf-8 is a way of encoding characters as a sequence of bytes. If you want to take a string and turn it into a sequence of bytes in a particular character encoding, Python 3 can help you with that. If you want to take a sequence of bytes and turn it into a string, Python 3 can help you with that too. Bytes are not characters; bytes are bytes. Characters are an abstraction. A string is a sequence of those abstractions.
unicode 分为 utf-32(占4个字节),utf-16(占两个字节),utf-8(占1-4个字节), so utf-16就是现在最常用的unicode版本, 不过在文件里存的还是utf-8,因为utf-8省空间
在py3中encode,在转码的同时还会把string 变成bytes类型,decode在解码的同时还会把bytes变回string
在Pycharm中,将默认文件编码更改为GBK之后,创建一个新的.py文件,在未声明文件编码时,执行下面的代码:
1 import sys 2 print(sys.getdefaultencoding()) 3 4 s='中文世界' #字符编码在python3中默认就是unicode编码 5 print(s)
由于Pycharm的默认文件编码为GBK,而python3的默认编码为uicode,会因为编码不匹配而导致报错。
解决报错的方式有两种,一种是调整Pycharm编辑器的默认编码;另一种是在Python中声明文件编码。
1 #!/usr/bin/env python 2 # -*- coding:gbk -*- #声明的文件的编码 3 # Author:Zoe 4 5 import sys 6 print(sys.getdefaultencoding()) 7 8 s='中文世界' #还是unicode编码 9 print(s)
返回:
在Python3中执行下面代码:
1 #!/usr/bin/env python 2 # -*- coding:gbk -*- 3 # Author:Zoe 4 5 import sys 6 print(sys.getdefaultencoding()) 7 8 s='中文世界' 9 print(s) 10 print(s.encode('gbk')) 11 print(s.encode('utf-8')) 12 print(s.encode('gb2312').decode('gb2312'))
返回的是:
通过上面代码的执行结果,我们可以发现,在Python3中还会将编码解码时解码为unicode之后再变成bytes。