网站更新内容:请访问: https://bigdata.ministep.cn/

python使用lxml获取所有href标签

lxml 获取或有标签

from lxml import etree
xhtmle= etree.HTML(text)
eles = xhtmle.xpath("//div[@class='sidebar']/ul/li")
for ele in eles:
    href = ele.xpath(".//a/@href")
    print(href)
    print('- - - '*30)

lxml的href短链转换成链接

from urllib.parse import urlparse, parse_qs
url ="https://developer.work.weixin.qq.com/document/path/97108"
netloc = urlparse(url).netloc
url = netloc+href

参考:

python - 通过 beautifulsoup 从 href 获取 url 链接,无需重定向链接 - 爱编程的大狗
web scraping - retrieve links from web page using python and BeautifulSoup - Stack Overflow
Python lxml/beautiful soup to find all links on a web page - Stack Overflow

posted @ 2023-04-18 21:30  ministep88  阅读(137)  评论(0编辑  收藏  举报
网站更新内容:请访问:https://bigdata.ministep.cn/