Python 2.x关于unicode；decode；utf-8等字符转化写入问题研究

Python 语言总的方式编码方式：

基于python2.7中的字符串：

unicode——》编码encode('utf-8')——》写入文件 @1

读出文件——》解码decode('utf-8')——》unicode @2

#@1在没有声明文件为utf-8的编码的方法，实现unicode --编码为utf-8--写入文件
#unicode——》编码encode('utf-8')——》写入文件 @1
FILE = 'unicode.txt'  
hello_out = u'Hello ,中文测试\n'  #unicode字符串编码
bytes_out = hello_out.encode('utf-8') #在写入文件的时候进行编码为utf-8，从而保存在文件中的编码为utf-8  
f = open(FILE,'w')  
f.write(bytes_out)  
f.close()  

================================================
》》》结果：Hello KEL,ÖÐÎÄ²âÊÔ
写入的txt出现乱码。写入txt错误。

文件加上声明编码方式：
#-*- coding:utf-8 -*- #声明为unicode编码文件，否则会报错
FILE = 'unicode.txt'  
hello_out = u'Hello ,中文测试\n'  #unicode字符串编码
bytes_out = hello_out.encode('utf-8') #在写入文件的时候进行编码为utf-8，从而保存在文件中的编码为utf-8  
f = open(FILE,'w')  
f.write(bytes_out)  
f.close() 
================================================
》》》结果：Hello ,中文测试

#-*- coding:utf-8 -*- #声明为unicode编码文件，否则会报错。读取的时候加不加都一样。
#读出文件——》解码decode('utf-8')——》unicode @2


FILE = 'unicode.txt'
f = open(FILE,'r')  
hello_in = f.read()  
bytes_in = hello_in.decode('utf-8')#读出文件的时候，进行解码，从utf-8进行解码，解码为unicode类型  
print bytes_in

================================================
》》》结果：Hello ,中文测试

总结纠正：

-*- coding:utf-8 -*- ===》unicode——》编码encode('utf-8')——》写入文件 @1
-*- coding:utf-8 -*- ==》读出文件——》解码decode('utf-8')——》unicode @2

在使用unicode的时候，必须注意以下的原则：

1、程序中出现字符串的地方加前缀u，表示为unicode类型

2、不要使用str函数，在使用的时候使用unicode函数

3、不要使用string模块

4、只有在写入文件或者数据库或者网络的时候，才使用encode函数来进行编码发送；只有在把数据读取回来的时候，才使用decode进行解码

5、在对象中，如果一个对象包含一个__unicode__()方法，那么可以将一个对象转换为unicode对象

>>> kel = '汉字'
>>> kel
'\xe6\xb1\x89\xe5\xad\x97'
>>> kel.encode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

在进行编码解码的时候，默认是使用ascii编码来进行解码，如果出现以上错误，或者是UnicodeEncodeError，那么均表示为在进行编码或解码的时候不正确，没有正确的创建一个unicode对象。

posted on 2017-04-11 13:45 星海一哥阅读(659) 评论(0) 编辑收藏举报

指间灵动，快码加编

刷新页面返回顶部

星海一哥

Python 2.x关于unicode；decode；utf-8等字符转化写入问题研究

导航

公告