python爬虫（八） requests库之 get请求

requests库比urllib库更加方便，包含了很多功能。

1、在使用之前需要先安装pip，在pycharm中打开：

写入pip install requests命令，即可下载

在github中有关于requests库的介绍，网址：https://github.com/requests/requests

2、Get请求

response=requests.get("https://www.baidu.com/")

我们要完成在百度的页面获取中国的相关信息，相当于

输入中国：

用爬虫代码实验实现：

import requests

# wd是在网址中后面的一段
params={
    'wd':'中国'
}

headers={
    'User-Agent':"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
}
# 这时我们要在这个网址中加入S
response=requests.get("http://www.baidu.com/s",params=params,headers=headers)

with open('baidu.html','w',encoding='utf-8') as fp:
    fp.write(response.content.decode('utf-8'))

打开后就是中国的相关信息：

3、response.txt和response.content的区别

response.txt是 requests是经response.content解码的字符串，requests会根据自己的猜测来进行解码，有时候会猜测错误，导致乱码。

response.content是直接从网上爬取的数据，没有经过经过任何解码，是bytes类型。

所以最常用的就是：response.content.decode('utf-8')

posted on 2020-02-27 20:52 方木Fengl 阅读(3895) 评论(0) 收藏举报

刷新页面返回顶部

zhaoxinhui

python爬虫（八） requests库之 get请求

导航

公告