一个例子讲明爬虫请求库requests
使用requests可以模拟浏览器的请求,比起之前用到的urllib,requests模块的api更加便捷(本质就是封装了urllib3)
1.安装
pip install requests
2.实例
from random import choice import requests user_agents = [ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11", "User-Agent:Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; en) Presto/2.8.131 Version/11.11", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"] headers = { "User-Agent":choice(user_agents) } #发送get请求 def getHttp(): url="https://tieba.baidu.com/f?" params={ "kw":"尚学堂" } response=requests.get(url,headers=headers,params=params) print(response.text) #发送post请求 def postHttp(): url="https://login.taobao.com/newlogin/login.do?appName=taobao&fromSite=0" params={ "loginId":"username", "password2":"password" } response = requests.post(url,headers=headers,data=params); print(response.text) #代理 def proxyHttp(): url="http://httpbin.org/get" proxies={ "http":"http://101.37.118.54:8888" } response = requests.get(url,headers=headers,proxies=proxies); print(response.text) #获取https def getHttps(): url="https://www.12306.cn/index/" response = requests.get(url,verify=False,headers=headers); response.encoding="utf-8" print(response.text) if __name__ == '__main__': # getHttp()#发送get请求 #postHttp()#发送post请求 #proxyHttp()#代理 getHttps()#获取https
3 。说明
requests.get 发送的是get请求
requests.post发送的是post请求
headers 存放请求头
params 传递get请求参数
data 传递post请求参数
proxies 使用代理