'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

问题描述：在使用python爬取斗鱼直播的数据时，使用str(读取到的字节，编码格式)进行解码时报错：'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

　　代码如下：

from urllib import request

class Spilder():

    url='https://www.douyu.com/'

    def __fetch_content(self):
        r = request.urlopen(Spilder.url)
        htmls = r.read()    #获取字节码（html）
        htmls = str(htmls, encoding='utf-8')   
　　
    def go(self):
        self.__fetch_content()

spilder=Spilder()
spilder.go()

问题原因：断点调试的时候发现r.read()获取到的字节码是以‘b’\x1f\x8b\x08’开头的，说明它是gzip压缩过的数据，这也是报错的原因，所以我们需要对我们接收的字节码进行一个解码操作。修改之后的代码如下：

from urllib import request
from io import BytesIO
import gzip


class Spider():
    url = 'https://www.douyu.com/'

    def __fetch_content(self):
        r = request.urlopen(Spider.url)
        htmls = r.read()
        buff = BytesIO(htmls)
        f = gzip.GzipFile(fileobj=buff)
        htmls = f.read().decode('utf-8')

    # 入口方法
    def go(self):
        self.__fetch_content()


spider = Spider()
spider.go()

修改之后解码正常

posted @ 2020-04-03 15:27 做个读书人阅读(2674) 评论(1) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

公告

昵称：做个读书人
园龄： 7年6个月
粉丝： 14
关注： 1

+加关注

2025年3月

日

一

二

三

四

五

六

做个读书人

计算机科学领域的任何问题都可以通过增加一个间接的中间层来解决!

'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

公告

搜索

常用链接

随笔分类

随笔档案

阅读排行榜

评论排行榜

推荐排行榜

最新评论