爬取网页信息遇到的小问题

在网页头部信息看到：

Accept-Encoding:gzip, deflate

图片：

而爬到的汉字的部分是乱码：

查看获取响应的数据类型：

import re
import requests
from bs4 import BeautifulSoup

headers = {
    'Upgrade-Insecure-Requests': '1',
    'DNT': '1',
    'User-Agent': '',# 输入个人的user_agent
    'Referer': 'http://jnga.jinan.gov.cn/col/col22173/index.html',
}
url = "http://jnga.jinan.gov.cn/col/col22173/index.html"
response = requests.get(url,headers=headers)
print(response.encoding)

#ISO-8859-1

我个人的解决方法：

请求网页获取响应后，将响应的编码类型改为utf-8或者gdk

response.encoding='gdk' #或者'utf-8'

posted @ 2023-02-25 12:26 曦峫阅读(15) 评论(0) 收藏举报

刷新页面返回顶部

爬取网页信息遇到的小问题

公告