python使用lxml获取所有href标签
lxml 获取或有标签
from lxml import etree
xhtmle= etree.HTML(text)
eles = xhtmle.xpath("//div[@class='sidebar']/ul/li")
for ele in eles:
href = ele.xpath(".//a/@href")
print(href)
print('- - - '*30)
lxml的href短链转换成链接
from urllib.parse import urlparse, parse_qs
url ="https://developer.work.weixin.qq.com/document/path/97108"
netloc = urlparse(url).netloc
url = netloc+href
参考:
python - 通过 beautifulsoup 从 href 获取 url 链接,无需重定向链接 - 爱编程的大狗
web scraping - retrieve links from web page using python and BeautifulSoup - Stack Overflow
Python lxml/beautiful soup to find all links on a web page - Stack Overflow