urllib.parse解析链接
1. urlparse() 解析链接,注意,返回值比3多一个params的属性
from urllib.parse import urlparse result = urlparse('http://www.baidu.com/index.html;user?id=5#comment') print(type(result), result)
<class 'urllib.parse.ParseResult'> ParseResult(scheme='http', netloc='www.baidu.com', path='/index.html', params='user', query='id=5', fragment='comment')
2. urlunparse() 生成链接,数组必须要有6个元素
from urllib.parse import urlunparse data = ['http', 'www.baidu.com', 'index.html', 'user', 'a=6', 'comment'] print(urlunparse(data))
http://www.baidu.com/index.html;user?a=6#comment
3. urlsplit() 解析链接,一般用这个,因为网上的链接大多都没有params
params
from urllib.parse import urlsplit result = urlsplit('http://www.baidu.com/index.html;user?id=5#comment') print(result)
SplitResult(scheme='http', netloc='www.baidu.com', path='/index.html;user', query='id=5', fragment='comment')
4. urlunsplit() 生成链接,数组中有且仅有5个值
from urllib.parse import urlunsplit data = ['http', 'www.baidu.com', 'index.html', 'a=6', 'comment'] print(urlunsplit(data))
http://www.baidu.com/index.html?a=6#comment
5. urljoin() 合并链接,
from urllib.parse import urljoin print(urljoin('http://www.baidu.com', 'FAQ.html')) print(urljoin('http://www.baidu.com', 'https://cuiqingcai.com/FAQ.html')) print(urljoin('http://www.baidu.com/about.html', 'https://cuiqingcai.com/FAQ.html')) print(urljoin('http://www.baidu.com/about.html', 'https://cuiqingcai.com/FAQ.html?question=2')) print(urljoin('http://www.baidu.com?wd=abc', 'https://cuiqingcai.com/index.php')) print(urljoin('http://www.baidu.com', '?category=2#comment')) print(urljoin('www.baidu.com', '?category=2#comment')) print(urljoin('www.baidu.com#comment', '?category=2'))
http://www.baidu.com/FAQ.html https://cuiqingcai.com/FAQ.html https://cuiqingcai.com/FAQ.html https://cuiqingcai.com/FAQ.html?question=2 https://cuiqingcai.com/index.php http://www.baidu.com?category=2#comment www.baidu.com?category=2#comment www.baidu.com?category=2
6. urlencode() 参数序列化
from urllib.parse import urlencode params = { 'name': 'germey', 'age': 22 } base_url = 'http://www.baidu.com?' url = base_url + urlencode(params) print(url)
http://www.baidu.com?name=germey&age=22
7. parse_qs()
反序列化
from urllib.parse import parse_qs query = 'name=germey&age=22' print(parse_qs(query))
这个结合1或者3非常实用的,怎么实用自行脑补。
{'name': ['germey'], 'age': ['22']}
8. parse_qsl()
将参数转化为元组组成的列
from urllib.parse import parse_qsl query = 'name=germey&age=22' print(parse_qsl(query))
[('name', 'germey'), ('age', '22')]
9. quote()
将内容转化为URL编码的格式,URL中带有中文参数时,请使用。
from urllib.parse import quote keyword = '壁纸' url = 'https://www.baidu.com/s?wd=' + quote(keyword) print(url)
https://www.baidu.com/s?wd=%E5%A3%81%E7%BA%B8
值得注意的是:只能用在参数部分,否则整个url都编码了,他的亲爹都不认识了。
10. unquote()
与9正好相反
from urllib.parse import unquote url = 'https://www.baidu.com/s?wd=%E5%A3%81%E7%BA%B8' print(unquote(url))
https://www.baidu.com/s?wd=壁纸
参考自:https://cuiqingcai.com/5508.html