爬虫--requests模块
requests模块的get操作
1.导包
import requests
2.get操作的三个参数
requests.get(url,params,headers)
url
params :get请求携带的参数
heraders:UA伪装
1 2 3 4 5 6 7 8 9 | url = 'https://www.sogou.com/web' param = { 'query' : 'RMB' } headers = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' } |
requests模块get操作实例
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import requests wd = input ( 'enter a word:' ) url = 'https://www.sogou.com/web' #参数的封装 param = { 'query' :wd } #UA伪装 headers = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' } response = requests.get(url = url,params = param,headers = headers) #手动修改响应数据的编码 response.encoding = 'utf-8' page_text = response.text fileName = wd + '.html' with open (fileName, 'w' ,encoding = 'utf-8' ) as fp: fp.write(page_text) print (fileName, '爬取成功!!!' ) |
requests模块的post操作
1.导包
import requests
2.post操作的三个参数
requests.post(url,data,headers)
url
data :post请求携带的参数
heraders:UA伪装
requests模块post操作实例
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #破解百度翻译 url = 'https://fanyi.baidu.com/sug' word = input ( 'enter a English word:' ) #请求参数的封装 data = { 'kw' :word } #UA伪装 headers = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' } response = requests.post(url = url,data = data,headers = headers) #text:字符串 json():对象 obj_json = response.json() print (obj_json) |
requests模块的post请求处理ajax请求获取数据实例
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | #爬取任意城市对应的肯德基餐厅的位置信息 #动态加载的数据 import requests city = input ( 'enter a cityName:' ) url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword' data = { "cname" : "", "pid" : "", "keyword" : city, "pageIndex" : "2" , "pageSize" : "10" , } #UA伪装 headers = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' } response = requests.post(url = url,headers = headers,data = data) json_text = response.text print (json_text) |
练习
1.爬取豆瓣电影中更多的电影详情数据 https://movie.douban.com/typerank?type_name=%E5%8A%A8%E4%BD%9C&type=5&interval_id=100:90&action="
2.http://125.35.6.84:81/xk/ 爬取每家企业的企业详情数据
#豆瓣网爬取 import requests for i in range ( 0 , 1000000 , 20 ): url = 'https://movie.douban.com/j/chart/top_list' data = { 'type' : '11' , 'interval_id' : '100:90' , 'action' : '', 'start' : str (i), 'limit' : '20' } headers = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' } res = requests.post(url = url, headers = headers, data = data).json() print (res) |
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 【自荐】一款简洁、开源的在线白板工具 Drawnix
· 没有Manus邀请码?试试免邀请码的MGX或者开源的OpenManus吧
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· C#/.NET/.NET Core优秀项目和框架2025年2月简报
· DeepSeek在M芯片Mac上本地化部署
2019-05-08 ansible的roles使用