爬虫 写入文件时遇到gbk编码错误

#获取视频地址
# 每次请求一次,然后写文件,这样可以规避多次请求触发反爬虫
r = requests.get('https://www.pearvideo.com/video_1522192')
html = r.content.decode("utf-8")
print(html)
with open("./test.html","w") as f:
    f.write(html.encode("gbk","ignore").decode("gbk","ignore"))

#读取文件
with open('test.html', encoding='gbk') as file_obj:
    contents = file_obj.read()
#正则匹配视频地址
regex = re.compile('srcUrl="(.+?)"')
print(regex.findall(contents))

 

posted @ 2019-03-03 15:01  TheoldmanPickgarbage  阅读(289)  评论(0编辑  收藏  举报