爬虫~~~~Urllib库[~]

一、网络库urlib

　　1. urllib简介

　　　1.1 urllib是Python3中内置的HTTP请求库。

　　　1.2 包含4个模块：

　　　　　1 ）requests：最基本的HTTP请求模块。

　　　　　2 ）error：异常处理模块。

　　　　　3 ）parse：工具模块。

　　　　　4 ）robotparser：主要用来识别网站的robots.txt文件。

　　2. 发送请求与获得响应

　　　2.1 用urlopenh函数发送HTTP GET 请求。

# 例子1 ： 本例演示了HTTP Response类型的对象。
import urllib.request
# 向京东商城发送HTTP GET请求 ，urlopen函数既可以使用http,也可以使用https.
response = urllib.request.urlopen("https://jd.com")
# 输出urlopen函数返回值的数据类型
print('response的类型：',type(response))
# 输出响应状态码、响应消息和HTTP版本
print('status: ',response.status,' msg:',response.msg,' version:',response.version)
# 输出所有的响应头信息
print('headers: ',response.getheaders())
# 输出名为Content-Type的响应头信息
print('headers.Content-Type',response.getheader('Content-Type'))
# 输出京东商城首页所有的HTML代码(经过utf-8解码)
print(response.read().decode('utf-8'))

　　　2.2 用urlopen 函数发送HTTP POST 请求

1 # 例子2：本例向http://httpbin.org/post发送HTTP POST 请求，并输出返回结果。
2 import  urllib.request
3 # 将表单数据转换为bytes类型，用utf-8编码
4 data = bytes(urllib.parse.urlencode({'name':'Bill','age':'30'}),encoding='utf-8')
5 # 提交HTTP POST 请求
6 response = urllib.request.urlopen('http://httpbin.org/post',data=data)
7 # 输出响应数据
8 print(response.read().decode('utf-8'))

　　　2.3 请求超时

 1 # 例子3：本例使用try...except语句捕捉urlopen函数抛出的超时异常，并进行异常处理。
 2 import urllib.request
 3 import socket
 4 import urllib.error
 5 try:
 6     response = urllib.request.urlopen('http://httpbin.org/get',timeout=0.1)
 7 except urllib.error.URLError as e:
 8     # 判断抛出的异常是否为超时异常
 9     if isinstance(e.reason,socket.timeout):
10         # 继续异常处理。
11         print('超时')
12 print('继续爬虫其他的工作')

　　　2.4 设置HTTP请求头

# 例子4：本例修改User-Agent和Host请求头，并添加了自定义请求头who，然后将修改了请求头的HTTP请求提交给http://httpbin.org/post,输出返回结果。
from urllib import request,parse
# 定义要提交HTTP请求的URL
url = "http://httpbin.org/post"
# 定义HTTP请求头,其中who是自定义的请求字段
headers ={
    'User-Agent':'Mozile/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebkit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36',
    'Host':'httpbin.org',
    'who':'Python Scray'
}
# 定义表单数据
dict = {
    'name':'Bill',
    'age': 30
}
# 将表单数据转换为bytes形式
data =bytes(parse.urlencode(dict),encoding='utf-8')
# 创建Request对象，通过Requests类的构造方法指定了表单数据和HTTP请求头
req = request.Request(url = url,data=data,headers=headers)
# urlopen函数通过Requests对象向服务器发送HTTP POST请求
response = request.urlopen(req)
# 输出返回结果
print(response.read().decode('utf-8'))

posted @ 2021-01-03 09:57 敲代码的小付阅读(114) 评论(0) 收藏举报

刷新页面返回顶部