网站更新内容：请访问： https://bigdata.ministep.cn/

python使用lxml获取所有href标签

lxml 获取或有标签

from lxml import etree
xhtmle= etree.HTML(text)
eles = xhtmle.xpath("//div[@class='sidebar']/ul/li")
for ele in eles:
    href = ele.xpath(".//a/@href")
    print(href)
    print('- - - '*30)

lxml的href短链转换成链接

from urllib.parse import urlparse, parse_qs
url ="https://developer.work.weixin.qq.com/document/path/97108"
netloc = urlparse(url).netloc
url = netloc+href

参考：

python - 通过 beautifulsoup 从 href 获取 url 链接，无需重定向链接 - 爱编程的大狗
 web scraping - retrieve links from web page using python and BeautifulSoup - Stack Overflow
Python lxml/beautiful soup to find all links on a web page - Stack Overflow

posted @ 2023-04-18 21:30 ministep88 阅读(137) 评论(0) 编辑收藏举报

刷新页面返回顶部

网站更新内容：请访问：https://bigdata.ministep.cn/