urllib parse
1、urlparse
作用:解析url
from urllib import parse url = "https://book.qidian.com/info/1004608738" result = parse.urlparse(url=url) print(result)
结果:
ParseResult(scheme='https', netloc='book.qidian.com', path='/info/1004608738', params='', query='', fragment='')
scheme:表示协议
netloc:域名
path:路径
params:参数
query:查询条件,一般都是get请求的url
fragment:锚点,用于直接定位页
面的下拉位置,跳转到网页的指定位置
2、urlunparse
作用:上传url
from urllib import parse url_params = ('https', 'book.qidian.com', '/info/1004608738', '', '', '') _url = parse.urlunparse(url_params) print(_url) # https://book.qidian.com/info/1004608738
3、urljoin
作用:拼接url
from urllib import parse url_1 = "https://book.qidian.com/" url_2 = "info/1004608738" new_url = parse.urljoin(url_1, url_2) print(new_url) # https://book.qidian.com/info/1004608738
4、urlencode
作用:url的参数,字典格式->url格式
from urllib import parse params = { 'page': 10, 'job': 'python' } url = "https://test.job.com/" url_params = parse.urlencode(params) new_url = parse.urljoin(url, url_params) print(new_url) # https://test.job.com/page=10&job=python
5、quote
作用:将中文->url的编码
from urllib import parse key = "海贼王" _key = parse.quote(key) print(_key) url = "https://www.sogou.com/web?query={}".format(_key) print(url) # https://www.sogou.com/web?query=%E6%B5%B7%E8%B4%BC%E7%8E%8B
6、unquote
url编码->中文
from urllib import parse url = "https://www.sogou.com/web?query=%E6%B5%B7%E8%B4%BC%E7%8E%8B" unquote_url = parse.unquote(url) print(unquote_url) # https://www.sogou.com/web?query=海贼王