BeautifulSoup库

一、安装BeautifulSoup库

 可以现在目前python安装了哪些包

安装beautifulsoup

二、beautifulsoup官网

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

三、beautifulsoup的主要解析器

 四、beautifulsoup的find函数

查找html的title

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
from bs4 import BeautifulSoup
 
html="""
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>bobby基本信息</title>
    <script src="jquery-3.5.1.min.js"></script>
</head>
<body>
    <div id="info">
        <p style="color: blue">讲师信息</p>
        <div class="teacher_info">
            Python全栈工程师
            <p class="age">年龄:29</p>
            <p class="name">姓名:bobby</p>
            <p class="work_years">工作年限:7年</p>
            <p class="position">职位:python开发工程师</p>
        </div>
        <p style="color:aquamarine">课程信息</p>
        <table class="courses">
            <tbody><tr><th>课程名称</th>
            <th>讲师</th>
            <th>地址</th>
        </tr><tr>
                <td>django打造在线教育</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/78.html">访问</a></td>
            </tr><tr>
                <td>python高级编程</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/200.html">访问</a></td>
            </tr><tr>
                <td>scrapy分布式爬虫</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/92.html">访问</a></td>
            </tr><tr>
                <td>diango rest framework打造生鲜电商</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/131.html">访问</a></td>
            </tr><tr>
                <td>tornado从入门到精通</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/290.html">访问</a></td>
            </tr></tbody></table>
 
</div>
</body>
</html>
"""
bs=BeautifulSoup(html,"html.parser")
title_tag=bs.title.string
print(title_tag)
#点取元素的时候,只取第一个匹配的元素
div_tag1=bs.title
print("div_tag1:"+str(div_tag1))

 输出结果:

 查找html中的div元素

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
from bs4 import BeautifulSoup
 
html="""
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>bobby基本信息</title>
    <script src="jquery-3.5.1.min.js"></script>
</head>
<body>
    <div id="info">
        <p style="color: blue">讲师信息</p>
        <div class="teacher_info">
            Python全栈工程师
            <p class="age">年龄:29</p>
            <p class="name">姓名:bobby</p>
            <p class="work_years">工作年限:7年</p>
            <p class="position">职位:python开发工程师</p>
        </div>
        <p style="color:aquamarine">课程信息</p>
        <table class="courses">
            <tbody><tr><th>课程名称</th>
            <th>讲师</th>
            <th>地址</th>
        </tr><tr>
                <td>django打造在线教育</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/78.html">访问</a></td>
            </tr><tr>
                <td>python高级编程</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/200.html">访问</a></td>
            </tr><tr>
                <td>scrapy分布式爬虫</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/92.html">访问</a></td>
            </tr><tr>
                <td>diango rest framework打造生鲜电商</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/131.html">访问</a></td>
            </tr><tr>
                <td>tornado从入门到精通</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/290.html">访问</a></td>
            </tr></tbody></table>
 
</div>
</body>
</html>
"""
bs=BeautifulSoup(html,"html.parser")
div_tag2=bs.find("div")
print("div_tag2:"+str(div_tag2))

 输出结果:

查找html中的所有P元素

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
from bs4 import BeautifulSoup
 
html="""
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>bobby基本信息</title>
    <script src="jquery-3.5.1.min.js"></script>
</head>
<body>
    <div id="info">
        <p style="color: blue">讲师信息</p>
        <div class="teacher_info">
            Python全栈工程师
            <p class="age">年龄:29</p>
            <p class="name">姓名:bobby</p>
            <p class="work_years">工作年限:7年</p>
            <p class="position">职位:python开发工程师</p>
        </div>
        <p style="color:aquamarine">课程信息</p>
        <table class="courses">
            <tbody><tr><th>课程名称</th>
            <th>讲师</th>
            <th>地址</th>
        </tr><tr>
                <td>django打造在线教育</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/78.html">访问</a></td>
            </tr><tr>
                <td>python高级编程</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/200.html">访问</a></td>
            </tr><tr>
                <td>scrapy分布式爬虫</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/92.html">访问</a></td>
            </tr><tr>
                <td>diango rest framework打造生鲜电商</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/131.html">访问</a></td>
            </tr><tr>
                <td>tornado从入门到精通</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/290.html">访问</a></td>
            </tr></tbody></table>
 
</div>
</body>
</html>
"""
bs=BeautifulSoup(html,"html.parser")
#找回所有的元素
div_tag3=bs.find_all("p")
print("p:"+str(div_tag3))
for p in div_tag3:
    print(p.string)

 输出结果:

指定id进行html查找

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
from bs4 import BeautifulSoup
 
html="""
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>bobby基本信息</title>
    <script src="jquery-3.5.1.min.js"></script>
</head>
<body>
    <div id="info">
        <p style="color: blue">讲师信息</p>
        <div class="teacher_info">
            Python全栈工程师
            <p class="age">年龄:29</p>
            <p class="name">姓名:bobby</p>
            <p class="work_years">工作年限:7年</p>
            <p class="position">职位:python开发工程师</p>
        </div>
        <p style="color:aquamarine">课程信息</p>
        <table class="courses">
            <tbody><tr><th>课程名称</th>
            <th>讲师</th>
            <th>地址</th>
        </tr><tr>
                <td>django打造在线教育</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/78.html">访问</a></td>
            </tr><tr>
                <td>python高级编程</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/200.html">访问</a></td>
            </tr><tr>
                <td>scrapy分布式爬虫</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/92.html">访问</a></td>
            </tr><tr>
                <td>diango rest framework打造生鲜电商</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/131.html">访问</a></td>
            </tr><tr>
                <td>tornado从入门到精通</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/290.html">访问</a></td>
            </tr></tbody></table>
 
</div>
</body>
</html>
"""
bs=BeautifulSoup(html,"html.parser")
div_tag4=bs.find(id="info")
print("div_tag4:"+str(div_tag4))
div_tag5=bs.find_all("div",id="info")
print("div_tag5:"+str(div_tag5))

 输出结果:

 正则表达式匹配元素

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import re
 
from bs4 import BeautifulSoup
 
html="""
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>bobby基本信息</title>
    <script src="jquery-3.5.1.min.js"></script>
</head>
<body>
    <div id="info-955">
        <p style="color: blue">讲师信息</p>
        <div class="teacher_info">
            Python全栈工程师
            <p class="age">年龄:29</p>
            <p class="name">姓名:bobby</p>
            <p class="work_years">工作年限:7年</p>
            <p class="position">职位:python开发工程师</p>
        </div>
        <p style="color:aquamarine">课程信息</p>
        <table class="courses">
            <tbody><tr><th>课程名称</th>
            <th>讲师</th>
            <th>地址</th>
        </tr><tr>
                <td>django打造在线教育</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/78.html">访问</a></td>
            </tr><tr>
                <td>python高级编程</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/200.html">访问</a></td>
            </tr><tr>
                <td>scrapy分布式爬虫</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/92.html">访问</a></td>
            </tr><tr>
                <td>diango rest framework打造生鲜电商</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/131.html">访问</a></td>
            </tr><tr>
                <td>tornado从入门到精通</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/290.html">访问</a></td>
            </tr></tbody></table>
 
</div>
</body>
</html>
"""
bs=BeautifulSoup(html,"html.parser")
div_tag=bs.find("div",id=re.compile("info-\d+"))
print(div_tag)

 输出结果:

 根据网页字符串定位元素

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import re
 
from bs4 import BeautifulSoup
 
html="""
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>bobby基本信息</title>
    <script src="jquery-3.5.1.min.js"></script>
</head>
<body>
    <div id="info-955">
        <p style="color: blue">讲师信息</p>
        <div class="teacher_info">
            Python全栈工程师
            <p class="age">年龄:29</p>
            <p class="name">姓名:bobby</p>
            <p class="work_years">工作年限:7年</p>
            <p class="position">职位:python开发工程师</p>
        </div>
        <p style="color:aquamarine">课程信息</p>
        <table class="courses">
            <tbody><tr><th>课程名称</th>
            <th>讲师</th>
            <th>地址</th>
        </tr><tr>
                <td>django打造在线教育</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/78.html">访问</a></td>
            </tr><tr>
                <td>python高级编程</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/200.html">访问</a></td>
            </tr><tr>
                <td>scrapy分布式爬虫</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/92.html">访问</a></td>
            </tr><tr>
                <td>diango rest framework打造生鲜电商</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/131.html">访问</a></td>
            </tr><tr>
                <td>tornado从入门到精通</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/290.html">访问</a></td>
            </tr></tbody></table>
 
</div>
</body>
</html>
"""
bs=BeautifulSoup(html,"html.parser")
div_tag=bs.find(string="django打造在线教育")
print(div_tag)

 输出结果:

 输出dom树子标签的标签名

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
import re
 
from bs4 import BeautifulSoup
 
html="""
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>bobby基本信息</title>
    <script src="jquery-3.5.1.min.js"></script>
</head>
<body>
    <div id="info-955">
        <p style="color: blue">讲师信息</p>
        <div class="teacher_info">
            Python全栈工程师
            <p class="age">年龄:29</p>
            <p class="name">姓名:bobby</p>
            <p class="work_years">工作年限:7年</p>
            <p class="position">职位:python开发工程师</p>
        </div>
        <p style="color:aquamarine">课程信息</p>
        <table class="courses">
            <tbody><tr><th>课程名称</th>
            <th>讲师</th>
            <th>地址</th>
        </tr><tr>
                <td>django打造在线教育</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/78.html">访问</a></td>
            </tr><tr>
                <td>python高级编程</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/200.html">访问</a></td>
            </tr><tr>
                <td>scrapy分布式爬虫</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/92.html">访问</a></td>
            </tr><tr>
                <td>diango rest framework打造生鲜电商</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/131.html">访问</a></td>
            </tr><tr>
                <td>tornado从入门到精通</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/290.html">访问</a></td>
            </tr></tbody></table>
 
</div>
</body>
</html>
"""
bs=BeautifulSoup(html,"html.parser")
div_tag=bs.find("div",id=re.compile("info-\d+"))
childrens=div_tag.contents
for child in childrens:
    if child.name:
        print(child.name)
childrens_childrens = div_tag.descendants
for child_child in childrens_childrens:
    if child_child.name:
        print(child_child.name)

  输出如下:输出子标签的标签名,遍历子元素

 输出dom树的父标签的标签名

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import re
 
from bs4 import BeautifulSoup
 
html="""
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>bobby基本信息</title>
    <script src="jquery-3.5.1.min.js"></script>
</head>
<body>
    <div id="info-955">
        <p style="color: blue">讲师信息</p>
        <div class="teacher_info">
            Python全栈工程师
            <p class="age">年龄:29</p>
            <p class="name">姓名:bobby</p>
            <p class="work_years">工作年限:7年</p>
            <p class="position">职位:python开发工程师</p>
        </div>
        <p style="color:aquamarine">课程信息</p>
        <table class="courses">
            <tbody><tr><th>课程名称</th>
            <th>讲师</th>
            <th>地址</th>
        </tr><tr>
                <td>django打造在线教育</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/78.html">访问</a></td>
            </tr><tr>
                <td>python高级编程</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/200.html">访问</a></td>
            </tr><tr>
                <td>scrapy分布式爬虫</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/92.html">访问</a></td>
            </tr><tr>
                <td>diango rest framework打造生鲜电商</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/131.html">访问</a></td>
            </tr><tr>
                <td>tornado从入门到精通</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/290.html">访问</a></td>
            </tr></tbody></table>
 
</div>
</body>
</html>
"""
bs=BeautifulSoup(html,"html.parser")
parents=bs.find("p",{"class":"name"}).parents
for parent in parents:
    print(parent.name)

 输出结果:

 输出dom树的兄弟标签的标签名

输出下一个兄弟标签的标签名

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import re
 
from bs4 import BeautifulSoup
 
html="""
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>bobby基本信息</title>
    <script src="jquery-3.5.1.min.js"></script>
</head>
<body>
    <div id="info-955">
        <p style="color: blue">讲师信息</p>
        <div class="teacher_info">
            Python全栈工程师
            <p class="age">年龄:29</p>
            <p class="name">姓名:bobby</p>
            <p class="work_years">工作年限:7年</p>
            <p class="position">职位:python开发工程师</p>
        </div>
        <p style="color:aquamarine">课程信息</p>
        <table class="courses">
            <tbody><tr><th>课程名称</th>
            <th>讲师</th>
            <th>地址</th>
        </tr><tr>
                <td>django打造在线教育</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/78.html">访问</a></td>
            </tr><tr>
                <td>python高级编程</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/200.html">访问</a></td>
            </tr><tr>
                <td>scrapy分布式爬虫</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/92.html">访问</a></td>
            </tr><tr>
                <td>diango rest framework打造生鲜电商</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/131.html">访问</a></td>
            </tr><tr>
                <td>tornado从入门到精通</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/290.html">访问</a></td>
            </tr></tbody></table>
 
</div>
</body>
</html>
"""
bs=BeautifulSoup(html,"html.parser")
next_siblings=bs.find("p",{"class":"age"}).next_siblings
for sibling in next_siblings:
    print(sibling.string) 

 输出结果:

 输出上一个兄弟标签的标签名

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import re
 
from bs4 import BeautifulSoup
 
html="""
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>bobby基本信息</title>
    <script src="jquery-3.5.1.min.js"></script>
</head>
<body>
    <div id="info-955">
        <p style="color: blue">讲师信息</p>
        <div class="teacher_info">
            Python全栈工程师
            <p class="age">年龄:29</p>
            <p class="name">姓名:bobby</p>
            <p class="work_years">工作年限:7年</p>
            <p class="position">职位:python开发工程师</p>
        </div>
        <p style="color:aquamarine">课程信息</p>
        <table class="courses">
            <tbody><tr><th>课程名称</th>
            <th>讲师</th>
            <th>地址</th>
        </tr><tr>
                <td>django打造在线教育</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/78.html">访问</a></td>
            </tr><tr>
                <td>python高级编程</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/200.html">访问</a></td>
            </tr><tr>
                <td>scrapy分布式爬虫</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/92.html">访问</a></td>
            </tr><tr>
                <td>diango rest framework打造生鲜电商</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/131.html">访问</a></td>
            </tr><tr>
                <td>tornado从入门到精通</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/290.html">访问</a></td>
            </tr></tbody></table>
 
</div>
</body>
</html>
"""
bs=BeautifulSoup(html,"html.parser")
previous_siblings=bs.find("p",{"class":"name"}).previous_siblings
for sibling in previous_siblings:
    print(sibling.string)

 输出结果:

 如果要输出前一个兄弟标签的标签名,需要去掉回车换行符

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import re
 
from bs4 import BeautifulSoup
 
html="""
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>bobby基本信息</title>
    <script src="jquery-3.5.1.min.js"></script>
</head>
<body>
    <div id="info-955">
        <p style="color: blue">讲师信息</p>
        <div class="teacher_info">
            Python全栈工程师
            <p class="age">年龄:29</p><p class="name">姓名:bobby</p>
            <p class="work_years">工作年限:7年</p>
            <p class="position">职位:python开发工程师</p>
        </div>
        <p style="color:aquamarine">课程信息</p>
        <table class="courses">
            <tbody><tr><th>课程名称</th>
            <th>讲师</th>
            <th>地址</th>
        </tr><tr>
                <td>django打造在线教育</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/78.html">访问</a></td>
            </tr><tr>
                <td>python高级编程</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/200.html">访问</a></td>
            </tr><tr>
                <td>scrapy分布式爬虫</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/92.html">访问</a></td>
            </tr><tr>
                <td>diango rest framework打造生鲜电商</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/131.html">访问</a></td>
            </tr><tr>
                <td>tornado从入门到精通</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/290.html">访问</a></td>
            </tr></tbody></table>
 
</div>
</body>
</html>
"""
bs=BeautifulSoup(html,"html.parser")
previous_sibling=bs.find("p",{"class":"name"}).previous_sibling
print(previous_sibling.string)

 注意:此处html去掉回车换行符,否则无输出

 输出结果:

 获取html的某些标签元素的属性值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import re
 
from bs4 import BeautifulSoup
 
html="""
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>bobby基本信息</title>
    <script src="jquery-3.5.1.min.js"></script>
</head>
<body>
    <div id="info-955">
        <p style="color: blue">讲师信息</p>
        <div class="teacher_info">
            Python全栈工程师
            <p class="age">年龄:29</p>
            <p class="name">姓名:bobby</p>
            <p class="work_years">工作年限:7年</p>
            <p class="position">职位:python开发工程师</p>
        </div>
        <p style="color:aquamarine">课程信息</p>
        <table class="courses">
            <tbody><tr><th>课程名称</th>
            <th>讲师</th>
            <th>地址</th>
        </tr><tr>
                <td>django打造在线教育</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/78.html">访问</a></td>
            </tr><tr>
                <td>python高级编程</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/200.html">访问</a></td>
            </tr><tr>
                <td>scrapy分布式爬虫</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/92.html">访问</a></td>
            </tr><tr>
                <td>diango rest framework打造生鲜电商</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/131.html">访问</a></td>
            </tr><tr>
                <td>tornado从入门到精通</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/290.html">访问</a></td>
            </tr></tbody></table>
 
</div>
</body>
</html>
"""
bs=BeautifulSoup(html,"html.parser")
name_tag=bs.find("p",{"class":"name"})
print(name_tag["class"])
print(name_tag.get("class"))

 输出结果:

 元素多值属性问题

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import re
 
from bs4 import BeautifulSoup
 
html="""
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>bobby基本信息</title>
    <script src="jquery-3.5.1.min.js"></script>
</head>
<body>
    <div id="info-955">
        <p style="color: blue">讲师信息</p>
        <div class="teacher_info">
            Python全栈工程师
            <p class="age">年龄:29</p>
            <p class="name bobbyname" data-bind="bobby">姓名:bobby</p>
            <p class="work_years">工作年限:7年</p>
            <p class="position">职位:python开发工程师</p>
        </div>
        <p style="color:aquamarine">课程信息</p>
        <table class="courses">
            <tbody><tr><th>课程名称</th>
            <th>讲师</th>
            <th>地址</th>
        </tr><tr>
                <td>django打造在线教育</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/78.html">访问</a></td>
            </tr><tr>
                <td>python高级编程</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/200.html">访问</a></td>
            </tr><tr>
                <td>scrapy分布式爬虫</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/92.html">访问</a></td>
            </tr><tr>
                <td>diango rest framework打造生鲜电商</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/131.html">访问</a></td>
            </tr><tr>
                <td>tornado从入门到精通</td>
                <td>bobby</td>
                <td><a href="https://coding.imooc.com/class/290.html">访问</a></td>
            </tr></tbody></table>
 
</div>
</body>
</html>
"""
bs=BeautifulSoup(html,"html.parser")
name_tag=bs.find("p",{"class":"name"})
print(name_tag["class"])
print(name_tag.get("class"))
print(name_tag["data-bind"])
print(name_tag.get("data-bind"))

 输出结果:

posted @   leagueandlegends  阅读(42)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· .NET10 - 预览版1新功能体验(一)
历史上的今天:
2022-05-19 STP协议
点击右上角即可分享
微信分享提示