开发笔记 -- URL地址格式显示异常-用python-urllib库解决1

场景描述:

  开发中,尤其数据采集过程中,偶尔会遇到URL地址显示异常的情况,如下:

https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fp6.itc.cn%2Fq_70%2Fimages03%2F20210910%2F3a1618342d16479698e1026983dce86b.jpeg&refer=http%3A%2F%2Fp6.itc.cn&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=auto?sec=1670722169&t=6a3e0b1c459545b0dba348c38477ce9f
https://facert.gitbooks.io/python-data-structure-cn/2.%E7%AE%97%E6%B3%95%E5%88%86%E6%9E%90/2.2.%E4%BB%80%E4%B9%88%E6%98%AF%E7%AE%97%E6%B3%95%E5%88%86%E6%9E%90/

如果直接复制到浏览器访问,会提示异常,如下:

http%3A%2F%2Fp6.itc.cn%2Fq_70%2Fimages03%2F20210910%2F3a1618342d16479698e1026983dce86b.jpeg

怎么处理?

利用python的urllib库来处理:

macdeMacBook-Pro-2:~ mac$ python3
Python 3.10.4 (v3.10.4:9d38120e33, Mar 23 2022, 17:29:05) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib.parse import unquote, quote
>>> 
>>> html_str = unquote('https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fp6.itc.cn%2Fq_70%2Fimages03%2F20210910%2F3a1618342d16479698e1026983dce86b.jpeg&refer=http%3A%2F%2Fp6.itc.cn&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=auto?sec=1670722169&t=6a3e0b1c459545b0dba348c38477ce9f')
>>> html_str
'https://gimg2.baidu.com/image_search/src=http://p6.itc.cn/q_70/images03/20210910/3a1618342d16479698e1026983dce86b.jpeg&refer=http://p6.itc.cn&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=auto?sec=1670722169&t=6a3e0b1c459545b0dba348c38477ce9f'
>>> 
>>> html_str_1 = unquote('https://facert.gitbooks.io/python-data-structure-cn/2.%E7%AE%97%E6%B3%95%E5%88%86%E6%9E%90/2.2.%E4%BB%80%E4%B9%88%E6%98%AF%E7%AE%97%E6%B3%95%E5%88%86%E6%9E%90/')
>>> html_str_1
'https://facert.gitbooks.io/python-data-structure-cn/2.算法分析/2.2.什么是算法分析/'
>>> exit()

OK,html_str 和 html_str_1打印输出的格式,就是格式化后的地址。

例如-下方是其中格式化后的地址:

http://p6.itc.cn/q_70/images03/20210910/3a1618342d16479698e1026983dce86b.jpeg

 

posted @ 2022-11-14 11:44  hello-Jesson  阅读(150)  评论(0编辑  收藏  举报