python 爬取整理
请求部分
url解析
from urllib import parse url = "http://www.baidu.com/s?" info = {"wd":"kidd"} url = url + parse.urlencode(info) print(url) #http://www.baidu.com/s?wd=kidd
url的编码与解码
为何要这需要使用呢?
如果一个请求中包含?= / + 等特殊符号时可能会发生冲突。如果你直接 http://www.baidu.com/s?wd=/a+b=?/ 搜过内容肯定会有差别。
from urllib import parse # 编码 url = "http://www.baidu.com/s?wd=" info = parse.quote("/a+b=?/") url += info print(url) # http://www.baidu.com/s?wd=/a%2Bb%3D%3F/ # 解码 parse_url = parse.unquote(url) print(parse_url) # http://www.baidu.com/s?wd=/a+b=?/
requests好像不能实现,如果能实现麻烦告诉我。
requests的post请求
data数据不是字典
data = "name=kidd" response = requests.post("http://httpbin.org/post",data=data) print(response.text)
返回结果,放在data中
"{ "args": {}, "data": "name=kidd", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "9", "Host": "httpbin.org", "User-Agent": "python-requests/2.23.0", "X-Amzn-Trace-Id": "Root=1-5edeee36-d00dd8b083c14254ec60605a" }, "json": null, "origin": "39.77.220.193", "url": "http://httpbin.org/post" }"
data是字典
data = {"name":"kidd"} response = requests.post("http://httpbin.org/post",data=data) print(response.text)
返回数据,放在form中,数据在form才算成功
{ "args": {}, "data": "", "files": {}, "form": { "name": "kidd" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "9", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.23.0", "X-Amzn-Trace-Id": "Root=1-5edeeee5-f0544530bbb1b22824acd930" }, "json": null, "origin": "39.77.220.193", "url": "http://httpbin.org/post" }