Python3 读写文件碰到的编码问题

1，远程文件资源读取 response的为 bytes，即utf-8或者gbk，需解码decode为unicode

如：

[python] view plaincopy
# coding=gbk  
import urllib.request  
import re  
url = 'http://www.163.com'  
file = 'd:/test.html'  
data = urllib.request.urlopen(url).read()  
r1 = re.compile('<.*?>')  
c_t = r1.findall(data)  
print(c_t)  

发现读取下来后,运行到第9 行,出现:

can't use a string pattern on a bytes-like object

查找了一下,是说3.0现在的参数更改了,现在读取的是bytes-like的,但参数要求是chart-like的,找了一下,加了个编码:

data = data.decode('GBK')

在与正则使用前,就可以正常使用了..

2.读取本地文本文件open（fname）的为str，即unicode，需编码为encode(utf-8")

如：

[python] view plaincopy
import os  
  
fname = 'e:/data/html.txt'  
f = open(fname,'r')  
html = f.read()  
#print(html)  
print (type(html))             #输出为 <class 'str'>  
  
u = html.encode('utf-8')  
print (type(u))<span style="white-space:pre">           </span>#输出为 <class 'bytes'>  

在python3中 <str>型为unicode

来自为知笔记(Wiz)

posted @ 2015-02-06 16:31 阳光树林阅读(2041) 评论(0) 编辑收藏举报

刷新页面返回顶部

Python3 读写文件碰到的编码问题

公告