Python requests 模块示例代码
Python requests 模块是一个简单优雅的 Python HTTP 库,用于发送 HTTP 请求,并获取响应,从中得到所需信息。请求网址和相关参数一般通过浏览器 “开发者工具” (F12) 中的 Network 标签下的 Fetch/XHR 选项过滤获得。本文主要是 requests 模块的一些示例代码,requests 入门教程参见 Python requests 模块-RUNOOB 和 Quickstart - Python requests documentation。具体示例代码如下:
01. 搜狗搜索数据
import requests
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36' }
url = 'https://www.sogou.com/web'
kw = input('Enter a keyword:')
params = {'query': kw}
r = requests.get(url=url, headers=headers, params=params)
page_text = r.text
with open('sogou.html', 'w', encoding='utf-8') as fp:
fp.write(page_text)
print('Request URL: ', r.url)
print('Request Type: ', r.request)
print('Response status: ', r.status_code)
print('Over')
输出结果,如下图所示
02. 百度翻译
import requests
import json
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'}
kw = input('Enter a keyword:')
data = {'kw': kw}
url = 'https://fanyi.baidu.com/sug'
r = requests.post(url=url, headers=headers, data=data)
json_data = r.json()
with open('baidu-fanyi.json', 'w', encoding='utf-8') as fp:
json.dump(json_data, fp=fp, ensure_ascii=False)
print('Request URL: ', r.url)
print('Request Type: ', r.request)
print('Response json data: ', json_data)
print('Over')
输出结果,如下图所示
03. 豆瓣电影排行
import requests
import json
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'}
params = {
# base params
'interval_id': '100:90',
'action':'',
# other params
'type': '24', # movie type
'start': '0', # start index
'limit': '5', # quantity limit of movies returned
}
url = 'https://movie.douban.com/j/chart/top_list'
r = requests.get(url=url, headers=headers, params=params)
json_data = r.json()
with open('douban-movie-toplist.json', 'w', encoding='utf-8') as fp:
json.dump(json_data, fp=fp, ensure_ascii=False)
print('Request URL: ', r.url)
print('Request Type: ', r.request)
print('Response json data: ', json_data)
print('Over')
输出结果,如下图所示
04. 肯德基门店信息
import requests
import json
cityname, kw = '北京', '中关村'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'}
data = {
'cname': cityname,
'pid': '',
'keyword': kw,
'pageIndex': '1',
'pageSize': '10',
}
url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
r = requests.post(url=url, headers=headers, data=data)
json_data = r.json()
with open('KFC-storelist.json', 'w', encoding='utf-8') as fp:
json.dump(json_data, fp=fp, ensure_ascii=False)
print('Request URL: ', r.url)
print('Request Type: ', r.request)
print('Response json data: ', json_data)
print('Over')
输出结果,如下图所示
05. 新浪、腾讯股票实时数据
import requests
stocklist = ['sh600000','sz000001']
keystr = ','.join(stocklist)
# Get sina stock spot data
print('=' * 30, 'sina', '='*30)
headers = {'referer': 'https://finance.sina.com.cn'}
url = 'https://hq.sinajs.cn/list=%s' % keystr
r = requests.get(url=url, headers=headers)
page_text = r.text
print('Request URL: ', r.url)
print('Request Type: ', r.request)
print('Response text data: ')
print(page_text)
# Get tencent stock spot data
print('=' * 30, 'tencent', '='*30)
url = 'https://qt.gtimg.cn/q=%s' % keystr
r = requests.get(url=url)
page_text = r.text
print('Request URL: ', r.url)
print('Request Type: ', r.request)
print('Response text data: ')
print(page_text)
print('Over')
输出结果,如下图所示
06. 东方财富个股人气榜(top 100)
import requests
payload = {
'appId': 'appId01',
'globalId': '786e4c21-70dc-435a-93bb-38',
'marketType':'',
'pageNo':1,
'pageSize':100,
}
url = 'https://emappdata.eastmoney.com/stockrank/getAllCurrentList'
r = requests.post(url, json=payload)
json_data = r.json()
print('Request URL: ', r.url)
print('Request Type: ', r.request)
print('Response json data: ', json_data)
print('Over')
输出结果,如下图所示
07. 雪球 SPSIOP 股票价格
import requests
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'}
params = {'symbol':'.SPSIOP', 'detail':'extend'}
url = 'https://stock.xueqiu.com/v5/stock/quote.json'
# 1. Create Session instance to get cookie automatically
session = requests.Session()
# 2. Get xueqiu.com cookie
session.get('https://xueqiu.com', headers=headers)
# 3. Get request with the cookie
r = session.get(url, headers=headers, params=params)
json_data = r.json()
print('Request URL: ', r.url)
print('Request Type: ', r.request)
print('Response json data: ', json_data)
print('Over')
注:由于雪球网站需要带 cookie 去访问相应网页,否则会得到错误信息 —— '遇到错误,请刷新页面或者重新登录帐号后再试' 。因此使用 requests.Session 对象,先访问雪球主页(https://xueqiu.com),得到 cookie 信息,并自动保存。然后再访问目标网址(https://stock.xueqiu.com/v5/stock/quote.json),获得所需信息。
输出结果,如下图所示
补充 1(更新于 2023.6.24)
本文 jupyter notebook 源码下载:https://github.com/klchang/python-requests-examples
参考资料
1. Python爬虫网络请求 requests(get、post)- CSDN博客. https://blog.csdn.net/qq_38232003/article/details/110678650