爬虫写入文件时遇到gbk编码错误

#获取视频地址
# 每次请求一次，然后写文件，这样可以规避多次请求触发反爬虫
r = requests.get('https://www.pearvideo.com/video_1522192')
html = r.content.decode("utf-8")
print(html)
with open("./test.html","w") as f:
    f.write(html.encode("gbk","ignore").decode("gbk","ignore"))

#读取文件
with open('test.html', encoding='gbk') as file_obj:
    contents = file_obj.read()
#正则匹配视频地址
regex = re.compile('srcUrl="(.+?)"')
print(regex.findall(contents))

posted @ 2019-03-03 15:01 TheoldmanPickgarbage 阅读(289) 评论(0) 编辑收藏举报

刷新页面返回顶部

TheoldmanPickgarbage

爬虫 写入文件时遇到gbk编码错误

公告

爬虫写入文件时遇到gbk编码错误