爬虫基础
urllibopen
基本库区别
直接使用urllibopen无法构建复杂的header信息,需要借助Request
from urllib import request,parse
#
# url = 'http://httpbin.org/post'
# headers = {
# "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36",
# "Host":"httpbin.org"
# }
# dict = {
# 'name':'Germey'
# }
# data = bytes(parse.urlencode(dict), encoding='utf-8')
# req = request.Request(url=url, data=data, headers=headers, method='POST')
# response = request.urlopen(req)
# print(response.read().decode('utf-8'))
# 还可以add_headers方法
url = 'http://httpbin.org/post'
dict = {
'name':'Germey'
}
data = bytes(parse.urlencode(dict), encoding='utf-8')
req = request.Request(url=url, data=data, method='POST')
req.add_header("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36")
response = request.urlopen(req)
print(response.read().decode('utf-8'))
基本库使用起来比较麻烦,添加请求头,请求数据,设置代理设置cookie等等都比较麻烦,因此使用Request库比较好
安装 pip3 install request