Spider --爬虫请求模块 requests

1、安装

Linux
- ```
sudo pip3 install requests
```

Windows

# 进入cmd命令行
python -m pip install requests

2、用法1：requests.get()

作用

# 向网站发起请求,并获取响应对象
res = requests.get(url,headers=headers)

参数

1、url ：需要抓取的URL地址
2、headers : 请求头
3、timeout : 超时时间，超过时间会抛出异常

响应对象(res)属性

1、encoding ：响应字符编码
   res.encoding = 'utf-8'
2、text ：字符串
3、content ：字节流
4、status_code ：HTTP响应码
5、url ：实际数据的URL地址

示例

import requests

url = 'https://timgsa.baidu.com/timg?image&quality=80&size=b9999_10000&sec=1567090051520&di=77e8b97b3280f999cf51340af4315b4b&imgtype=jpg&src=http%3A%2F%2F5b0988e595225.cdn.sohucs.com%2Fimages%2F20171121%2F4e6759d153d04c6badbb0a5262ec103d.jpeg'
headers = {'User-Agent':'Mozilla/5.0'}

html = requests.get(url=url,headers=headers).content
with open('花千骨.jpg','wb') as f:
    f.write(html)

3、用法2：requests.get() 参数：params

参数类型：

字典,字典中键值对作为查询参数

使用方法：

1、res = requests.get(url,params=params,headers=headers)
2、特点: 
   a) url为基准的url地址，不包含查询参数
   b) 该方法会自动对params字典编码,然后和url拼接

示例：

import requests

baseurl = 'http://tieba.baidu.com/f?'
params = {
  'kw' : '赵丽颖吧',
  'pn' : '50'
}
headers = {'User-Agent' : 'Mozilla/4.0'}
# 自动对params进行编码,然后自动和url进行拼接,去发请求
res = requests.get(url=baseurl,params=params,headers=headers)
res.encoding = 'utf-8'
print(res.text)

4、用法4：requests.get() Web 客户端验证参数-auth

作用及类型

1、针对于需要web客户端用户名密码认证的网站
2、auth = ('username','password')

res = requests.get(
    url=url,
    params=params,
    auth=auth,
    headers=headers,
    timeout=3
)

5、用法5：SSL证书认证参数-verify

适用网站及场景：

1、适用网站: https类型网站但是没有经过 证书认证机构 认证的网站
2、适用场景: 抛出 SSLError 异常则考虑使用此参数

参数类型：

1、verify=True(默认)   : 检查证书认证
2、verify=False（常用）: 忽略证书认证
# 示例
response = requests.get(
    url=url,
    params=params,
    headers=headers,
    verify=False
)

6、用法6：requests.post()

适用场景：
- ```
Post类型请求的网站
```

参数-data

response = requests.post(url,data=data,headers=headers)
# data ：post数据（Form表单数据-字典格式）

请求方式的特点：

# 一般
GET请求 : 参数在URL地址中有显示
POST请求: Form表单提交数据

posted @ 2020-04-06 10:03 Be-myself 阅读(332) 评论(0) 编辑收藏举报

刷新页面返回顶部

The snail

Spider --爬虫请求模块 requests

1、安装

2、用法1：requests.get()

3、用法2：requests.get() 参数：params

4、用法4：requests.get() Web 客户端验证参数-auth

5、用法5：SSL证书认证参数-verify

6、用法6：requests.post()

公告