requests模块

100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
#各种请求方式：常用的就是requests.get()和requests.post()
>>> import requests
>>> r = requests.get('https://api.github.com/events')请求页面，并返回页面内容
>>> r = requests.post('http://httpbin.org/post', data = {'key':'value'})大多用于提交表单或上传文件，数据包含在请求体中            
>>> r = requests.put('http://httpbin.org/put', data = {'key':'value'})从客户端向服务器传送的数据取代指定文梢中的内容
>>> r = requests.delete('http://httpbin.org/delete')请求服务器删除指定的页面
>>> r = requests.head('http://httpbin.org/get')类似于GET 请求， 只不过返回的响应中没有具体的内容，用于获取报头
>>> r = requests.options('http://httpbin.org/get')允许客户端查看服务器的性能
>>> r = requests.connect('http://httpbin.org/get') 把服务器当作跳板，让服务器代替客户端防问其他网页
>>> r = requests.trace('http://httpbin.org/get')囚显服务器收到的请求，主要用于测试或诊断
 
    (1)Content-Type 和POST 提交数据方式的关系:
        application /x-www-forrn-urlencoded 表单数据
        multi part/form-data                表单文件上传
        application/json                    序列化JSON 数据
        text/xml                            XML 数据
 
 
        handers一般网站采用下列7个作为请求头
        {
        Host
        Connection
        Accept
        User-Agent
        Referrer
        Accept-Econding
        Accept-Language
        }
 
headers：
    from fake_useragent import UserAgent
    ue = UserAgent()
    headers = {'User-Agent':ue.random}
     
data:
    requests.post(url, data = data)
 
         
解析json：
    res = requests.get("http://httpbin.org")
    res.json()
    # 返回json格式字符串
 
获取属性内容：
    res.text        # 获取为html，str类型
    res.content     # 获取为html，二进制类型
    res.status_code # 获取状态码，int
    res.headers     # 获取头信息，dict
    res.cookies     # 获取cookie，dict
    res.url         # 获取连接，str
    res.history     # 获取历史记录，list
     
     
cookie模拟登陆验证：
    自动跟踪会话对象 
    session = requests.Session() 
        1、可以持续跟踪cookie，headers
        2、跟踪运行的HTTP协议信息
        3、 res = session.post() 与 res = requests.post()参数一致
            res = session.get() 与   res = requests.get()参数一致
         
            例子：
            --------------------------------------------使用session
            import requests
             
            session = requests.Seesion()
             
            params = {'username':'username','password':'password'}
             
            res1 = session.post(url_php,params)
             
            res2 = session.get(url_file)
            # res2 的 cookie 与 res1 相同
             
            ------------------------------------------不使用session
            import requests
 
            params = {'username':'username','password':'password'}
 
            r = requests.post('url_php',params)
            print(r.cookies.get_dict())
             
            cookies = r.cookies
             
            r = requests.get('url_file',cookies)
            print(r.text)   
            # 每一次保持cookie打开连接，都需要添加cookie
 
         
HTTP登录认证：
    （1）
        import requests
        from requests.auth import HTTPBasicAuth
         
        #生成HTTPBasicAuth对象，参数为cookie
        auth = HTTPBasicAuth('ryrl','password')
        r = requests.post('http://pythonscraping.com/pages/auth/login.php',auth = auth)
        print(r.text)
    （2）
        import requests
     
        r = requests.post('http://pythonscraping.com/pages/auth/login.php',auth = ('user','password'))
 
SSL证书：
    （i）设置verify=False,忽略ssl验证。已经忽略ssl警告
        import requests
        from requests.packages import urllib3
         
        urllib3.disable_warnings()
        res = requests.get('http://www.12306.cn',verify = False)
        res.status_code
    （ii）设置证书路径
        import requests
         
        res = requests.get('http://www.12306.cn',cert = ('/path/server.crt','/path/key'))
        print(res.status_code)
 
 
代理IP：
    （1）常规
            >>> import requests
 
            >>> proxies = {
              "http": "http://10.10.1.10:3128",
              "https": "http://10.10.1.10:1080",
            }
 
            >>> requests.get("http://example.org", proxies=proxies)
            你也可以通过环境变量 HTTP_PROXY 和 HTTPS_PROXY 来配置代理。
 
            $ export HTTP_PROXY="http://10.10.1.10:3128"
            $ export HTTPS_PROXY="http://10.10.1.10:1080"
 
            $ python
            >>> import requests
            >>> requests.get("http://example.org")
            若你的代理需要使用HTTP Basic Auth，可以使用 http://user:password@host/ 语法：
 
            >>> proxies = {
                "http": "http://user:pass@10.10.1.10:3128/",
            }
            要为某个特定的连接方式或者主机设置代理，使用 scheme://hostname 作为 key， 它会针对指定的主机和连接方式进行匹配。
 
            >>> proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'}
            注意，代理 URL 必须包含连接方式。
    （2）需要密码登陆：
            >>> import requests
            >>> user = 'loli'
            >>> passwrod = '123456'
            >>> ip_user = "http://"+user+":"+passwrod+"@10.10.1.10:3128"
                    # 拼接后"http://loli:123456@10.10.1.10:3128"
            >>> proxies = {
              "http": ip_user,}
 
            >>> requests.get("http://example.org", proxies=proxies)
    （3）使用socks代理
            >>> import requests
            >>> proxies = {
              "http": "sock5://user:passwrod@10.10.1.10:3128",
              "https": "sock5://user:passwrod@10.10.1.10:1080",
            }
            >>> res = requests.get("http://www.taobao.com",proxies=proxies)
             
             
异常超时设置：
    ReadTimeout类
    from requests.exceptions import ReadTimeout，ConnectionError，RequestException
     
    try:
        ...
    except ReadTimeout:
        print('连接超时')
    except ConnectionError:
        print('网络问题')
    except RequestException：
        print("无网络")
         
 
     
     
 
     
     
    
posted @ 2019-09-13 15:07 spotfg 阅读(156) 评论(0) 编辑收藏举报
刷新页面返回顶部
登录后才能查看或发表评论，立即登录或者逛逛博客园首页
公告

昵称： spotfg
园龄： 6年
粉丝： 0
关注： 1
2025年1月
日
一
二
三
四
五
六