python 解析HTML和XML文档

一、BeautifulSoup

BeautifulSoup是一个Python包，用于解析HTML和XML文档。它可以快速而方便地从网页中提取信息，并以易于使用的方式对其进行处理。它支持各种解析器，包括内置的Python解析器和第三方解析器，例如lxml和html5lib。

二、对标签提取代码示列

以下是使用BeautifulSoup解析HTML文档的示例代码：

from bs4 import BeautifulSoup
import requests

url = 'https://www.example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# 找到所有链接
links = soup.find_all('a')

# 找到特定类的所有元素
rows = soup.find_all('div', class_='row')

在上面的代码中，首先使用requests库获取网页的HTML代码，然后使用BeautifulSoup解析该代码。使用soup.find_all()方法可以查找HTML中的所有匹配元素，然后可以进行后续处理。

三、对标签内容提取代码实例

BeautifulSoup还提供了其他有用的功能，例如寻找元素的父代和子代，修改元素的属性和内容等。

以下是使用BeautifulSoup解析HTML文档,读取元素的属性和内容的示列代码

from bs4 import BeautifulSoup

html_code = ' <li>
                  <div class="item">
                    <img src="demo" alt="">
                    <p title=""><a href="" title=demo>demo</a></p>
                  </div>
                </li>  '

soup = BeautifulSoup(html_code, 'html.parser')
li_tags = soup.find_all('li')

for li in li_tags:
    img_src = li.find('img')['src']
    a_href = li.find('a')['href']
    a_text = li.find('a').text.strip()
    a_title = li.find('a').get('title','')

    print(f"img src: {img_src}, a href: {a_href}, a text: {a_text}")

原文链接：Python 解析HTML和XML文档

一站式网址导航初版：www.51istudy

posted @ 2023-06-08 09:55 行走的ID 阅读(131) 评论(0) 收藏举报

刷新页面返回顶部

AIAndwang

python 解析HTML和XML文档

一、BeautifulSoup

二、对标签提取代码示列

三、对标签内容提取代码实例

公告

AIAndwang

python 解析HTML和XML文档

一 、BeautifulSoup

二、对标签提取代码示列

三 、对标签内容提取代码实例

公告

一、BeautifulSoup

三、对标签内容提取代码实例