Python爬虫学习笔记9.28

由于内容定位是个人学习笔记,所以并不适合作为系统的学习材料!!!


请求

实现不同方式的请求

r = requests.get('https://www.baidu.com/')
r = requests.post('http://httpbin.org/post')
r = requests.put('http://httpbin.org/put')
r = requests.delete('http://httpbin.org/delete')
r = requests.head('http://httpbin.org/get')
r = requests.options('http://httpbin.org/get')

添加参数

import requests

data = {
    'name': 'germey',
    'age': 22
}
r = requests.get("http://httpbin.org/get", params=data)
print(r.text)

网页的返回类型实际上是 str 类型,但是它很特殊,是JSON格式的。所以,如果想直接解析返回结果,得到一个字典格式的话,可以直接调用 json() 方法。示例如下:

import requests

r = requests.get("http://httpbin.org/get")
print(type(r.text))
print(r.json())
print(type(r.json()))

抓取二进制文件

r = requests.get("https://github.com/favicon.ico")
print(r.content)

响应

import requests

r = requests.get('http://www.jianshu.com')
print(type(r.status_code), r.status_code)
print(type(r.headers), r.headers)
print(type(r.cookies), r.cookies)
print(type(r.url), r.url)
print(type(r.history), r.history)

文件上传

import requests

files = {'file': open('favicon.ico', 'rb')}
r = requests.post("http://httpbin.org/post", files=files)
print(r.text)
import requests

r = requests.get("https://www.baidu.com")
print(r.cookies)
for key, value in r.cookies.items():
    print(key + '=' + value)

设置Cookie

import requests

cookies = 'q_c1=31653b264a074fc9a57816d1ea93ed8b|1474273938000|1474273938000; d_c0="AGDAs254kAqPTr6NW1U3XTLFzKhMPQ6H_nc=|1474273938"; __utmv=51854390.100-1|2=registration_date=20130902=1^3=entry_date=20130902=1;a_t="2.0AACAfbwdAAAXAAAAso0QWAAAgH28HQAAAGDAs254kAoXAAAAYQJVTQ4FCVgA360us8BAklzLYNEHUd6kmHtRQX5a6hiZxKCynnycerLQ3gIkoJLOCQ==";z_c0=Mi4wQUFDQWZid2RBQUFBWU1DemJuaVFDaGNBQUFCaEFsVk5EZ1VKV0FEZnJTNnp3RUNTWE10ZzBRZFIzcVNZZTFGQmZn|1474887858|64b4d4234a21de774c42c837fe0b672fdb5763b0'
jar = requests.cookies.RequestsCookieJar()
headers = {
    'Host': 'www.zhihu.com',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36'
}
for cookie in cookies.split(';'):
    key, value = cookie.split('=', 1)
    jar.set(key, value)
r = requests.get("http://www.zhihu.com", cookies=jar, headers=headers)
print(r.text)

Session

利用 Session,可以做到模拟同一个会话而不用担心 Cookies 的问题。

import requests

s = requests.Session()
s.get('http://httpbin.org/cookies/set/number/123456789')
r = s.get('http://httpbin.org/cookies')
print(r.text)

SSL证书验证

import requests

response = requests.get('https://www.12306.cn', verify=False)
print(response.status_code)

也可以指定一个本地证书用作客户端证书,这可以是单个文件(包含密钥和证书)或一个包含两个文件路径的元组。注意,本地私有证书的 key 必须是解密状态,加密状态的 key 是不支持的。

import requests

response = requests.get('https://www.12306.cn', cert=('/path/server.crt', '/path/key'))
print(response.status_code)

代理设置

import requests

proxies = {
  "http": "http://10.10.1.10:3128",
  "https": "http://10.10.1.10:1080",
}

requests.get("https://www.taobao.com", proxies=proxies)

身份认证

方式一:

import requestsfrom requests.auth import HTTPBasicAuthr = requests.get('http://localhost:5000', auth=HTTPBasicAuth('username', 'password'))print(r.status_code)

方式二:

import requests

r = requests.get('http://localhost:5000', auth=('username', 'password'))
print(r.status_code)

内容参考(摘抄)自Python3 网络爬虫开发实战

posted @ 2021-09-28 21:58  Rekord  阅读(34)  评论(0编辑  收藏  举报