Tips:

Windows操作系统中文版系统的字符编码为GBK。

linux操作系统默认的是utf-8编码

 

打印系统默认编码:

#打印系统默认编码
import sys
print(sys.getdefaultencoding())

在python2默认编码是ASCII;

“In Python 3, all strings are sequences of Unicode characters. ”——python3里默认编码方式是utf-8,但是python3中的所有字符串都是unicode编码。

    In Python 3, all strings are sequences of Unicode characters. There is no such thing as a Python string encoded in utf-8, or a Python string encoded as CP-1252. “Is this string utf-8?” is an invalid question. utf-8 is a way of encoding characters as a sequence of bytes. If you want to take a string and turn it into a sequence of bytes in a particular character encoding, Python 3 can help you with that. If you want to take a sequence of bytes and turn it into a string, Python 3 can help you with that too. Bytes are not characters; bytes are bytes. Characters are an abstraction. A string is a sequence of those abstractions.

 

   unicode 分为 utf-32(占4个字节),utf-16(占两个字节),utf-8(占1-4个字节), so utf-16就是现在最常用的unicode版本, 不过在文件里存的还是utf-8,因为utf-8省空间

  在py3中encode,在转码的同时还会把string 变成bytes类型,decode在解码的同时还会把bytes变回string

 


 

在Pycharm中,将默认文件编码更改为GBK之后,创建一个新的.py文件,在未声明文件编码时,执行下面的代码:

1 import sys
2 print(sys.getdefaultencoding())
3 
4 s='中文世界'  #字符编码在python3中默认就是unicode编码
5 print(s)

由于Pycharm的默认文件编码为GBK,而python3的默认编码为uicode,会因为编码不匹配而导致报错。

解决报错的方式有两种,一种是调整Pycharm编辑器的默认编码;另一种是在Python中声明文件编码。

1 #!/usr/bin/env python
2 # -*- coding:gbk -*-   #声明的文件的编码
3 # Author:Zoe
4 
5 import sys
6 print(sys.getdefaultencoding())
7 
8 s='中文世界'  #还是unicode编码
9 print(s)

返回:

 

在Python3中执行下面代码:

 1 #!/usr/bin/env python
 2 # -*- coding:gbk -*-
 3 # Author:Zoe
 4 
 5 import sys
 6 print(sys.getdefaultencoding())
 7 
 8 s='中文世界'
 9 print(s)
10 print(s.encode('gbk'))
11 print(s.encode('utf-8'))
12 print(s.encode('gb2312').decode('gb2312'))

返回的是:

通过上面代码的执行结果,我们可以发现,在Python3中还会将编码解码时解码为unicode之后再变成bytes。

 

posted on 2017-06-21 17:02  Zoe233  阅读(154)  评论(0编辑  收藏  举报