CSS选择器
一、CSS选择器
二、CSS选择器实例
按照class属性值取出网页信息
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | from scrapy import Selector html = """ <html lang="en"> <head> <meta charset="UTF-8"> <title>bobby基本信息</title> <script src="jquery-3.5.1.min.js"></script> </head> <body> <div id="info"> <p style="color: blue">讲师信息</p> <div class="teacher_info"> Python全栈工程师 <p class="age">年龄:29</p> <p class="name bobbyname" data-bind="bobby">姓名:bobby</p> <p class="work_years">工作年限:7年</p> <p class="position">职位:python开发工程师</p> </div> <p style="color:aquamarine">课程信息</p> <table class="courses"> <tbody><tr><th>课程名称</th> <th>讲师</th> <th>地址</th> </tr><tr> <td>django打造在线教育</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/78.html">访问</a></td> </tr><tr> <td>python高级编程</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/200.html">访问</a></td> </tr><tr> <td>scrapy分布式爬虫</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/92.html">访问</a></td> </tr><tr> <td>diango rest framework打造生鲜电商</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/131.html">访问</a></td> </tr><tr> <td>tornado从入门到精通</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/290.html">访问</a></td> </tr></tbody></table> </div> </body> </html> """ #先取出所有的html值 sel = Selector(text = html) teacher_info = sel.css( '.teacher_info' ).extract() print (teacher_info) |
输出结果:输出class为teacher_info的所有html元素
按照id值取出html网页元素
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | from scrapy import Selector html = """ <html lang="en"> <head> <meta charset="UTF-8"> <title>bobby基本信息</title> <script src="jquery-3.5.1.min.js"></script> </head> <body> <div id="info"> <p style="color: blue">讲师信息</p> <div class="teacher_info"> Python全栈工程师 <p class="age">年龄:29</p> <p class="name bobbyname" data-bind="bobby">姓名:bobby</p> <p class="work_years">工作年限:7年</p> <p class="position">职位:python开发工程师</p> </div> <p style="color:aquamarine">课程信息</p> <table class="courses"> <tbody><tr><th>课程名称</th> <th>讲师</th> <th>地址</th> </tr><tr> <td>django打造在线教育</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/78.html">访问</a></td> </tr><tr> <td>python高级编程</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/200.html">访问</a></td> </tr><tr> <td>scrapy分布式爬虫</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/92.html">访问</a></td> </tr><tr> <td>diango rest framework打造生鲜电商</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/131.html">访问</a></td> </tr><tr> <td>tornado从入门到精通</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/290.html">访问</a></td> </tr></tbody></table> </div> </body> </html> """ #先取出所有的html值 sel = Selector(text = html) info_tag = sel.css( '#info' ).extract() print (info_tag) |
输出结果:
选取对应class属性值下的对应元素
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | from scrapy import Selector html = """ <html lang="en"> <head> <meta charset="UTF-8"> <title>bobby基本信息</title> <script src="jquery-3.5.1.min.js"></script> </head> <body> <div id="info"> <p style="color: blue">讲师信息</p> <div class="teacher_info"> Python全栈工程师 <p class="age">年龄:29</p> <p class="name bobbyname" data-bind="bobby">姓名:bobby</p> <p class="work_years">工作年限:7年</p> <p class="position">职位:python开发工程师</p> </div> <p style="color:aquamarine">课程信息</p> <table class="courses"> <tbody><tr><th>课程名称</th> <th>讲师</th> <th>地址</th> </tr><tr> <td>django打造在线教育</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/78.html">访问</a></td> </tr><tr> <td>python高级编程</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/200.html">访问</a></td> </tr><tr> <td>scrapy分布式爬虫</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/92.html">访问</a></td> </tr><tr> <td>diango rest framework打造生鲜电商</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/131.html">访问</a></td> </tr><tr> <td>tornado从入门到精通</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/290.html">访问</a></td> </tr></tbody></table> </div> </body> </html> """ #先取出所有的html值 sel = Selector(text = html) age_tag = sel.css( ".teacher_info > p" ).extract()[ 0 ] print (age_tag) |
输出结果:
选取输出对应class属性值下的对应元素的值
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | from scrapy import Selector html = """ <html lang="en"> <head> <meta charset="UTF-8"> <title>bobby基本信息</title> <script src="jquery-3.5.1.min.js"></script> </head> <body> <div id="info"> <p style="color: blue">讲师信息</p> <div class="teacher_info"> Python全栈工程师 <p class="age">年龄:29</p> <p class="name bobbyname" data-bind="bobby">姓名:bobby</p> <p class="work_years">工作年限:7年</p> <p class="position">职位:python开发工程师</p> </div> <p style="color:aquamarine">课程信息</p> <table class="courses"> <tbody><tr><th>课程名称</th> <th>讲师</th> <th>地址</th> </tr><tr> <td>django打造在线教育</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/78.html">访问</a></td> </tr><tr> <td>python高级编程</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/200.html">访问</a></td> </tr><tr> <td>scrapy分布式爬虫</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/92.html">访问</a></td> </tr><tr> <td>diango rest framework打造生鲜电商</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/131.html">访问</a></td> </tr><tr> <td>tornado从入门到精通</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/290.html">访问</a></td> </tr></tbody></table> </div> </body> </html> """ #先取出所有的html值 sel = Selector(text = html) age_tag_value = sel.css( ".teacher_info > p::text" ).extract()[ 0 ] print (age_tag_value) |
输出结果:
输出对应class属性的指定第n个孩子节点的值
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | from scrapy import Selector html = """ <html lang="en"> <head> <meta charset="UTF-8"> <title>bobby基本信息</title> <script src="jquery-3.5.1.min.js"></script> </head> <body> <div id="info"> <p style="color: blue">讲师信息</p> <div class="teacher_info"> Python全栈工程师 <p class="age">年龄:29</p> <p class="name bobbyname" data-bind="bobby">姓名:bobby</p> <p class="work_years">工作年限:7年</p> <p class="position">职位:python开发工程师</p> </div> <p style="color:aquamarine">课程信息</p> <table class="courses"> <tbody><tr><th>课程名称</th> <th>讲师</th> <th>地址</th> </tr><tr> <td>django打造在线教育</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/78.html">访问</a></td> </tr><tr> <td>python高级编程</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/200.html">访问</a></td> </tr><tr> <td>scrapy分布式爬虫</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/92.html">访问</a></td> </tr><tr> <td>diango rest framework打造生鲜电商</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/131.html">访问</a></td> </tr><tr> <td>tornado从入门到精通</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/290.html">访问</a></td> </tr></tbody></table> </div> </body> </html> """ #先取出所有的html值 sel = Selector(text = html) name_tag = ".teacher_info p:nth_child(2)::text" name_tag_value1 = sel.css( ".teacher_info > p:nth_child(2)::text" ).extract()[ 0 ] print (name_tag_value1) name_tag_value2 = sel.css( ".teacher_info p:nth_child(2)::text" ).extract()[ 0 ] print (name_tag_value2) name_tag_value3 = sel.css(name_tag).extract()[ 0 ] print (name_tag_value3) |
输出结果:
输出对应class属性后面第一个对应P属性的值
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | from scrapy import Selector html = """ <html lang="en"> <head> <meta charset="UTF-8"> <title>bobby基本信息</title> <script src="jquery-3.5.1.min.js"></script> </head> <body> <div id="info"> <p style="color: blue">讲师信息</p> <div class="teacher_info"> Python全栈工程师 <p class="age">年龄:29</p> <p class="name bobbyname" data-bind="bobby">姓名:bobby</p> <p class="work_years">工作年限:7年</p> <p class="position">职位:python开发工程师</p> </div> <p style="color:aquamarine">课程信息</p> <table class="courses"> <tbody><tr><th>课程名称</th> <th>讲师</th> <th>地址</th> </tr><tr> <td>django打造在线教育</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/78.html">访问</a></td> </tr><tr> <td>python高级编程</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/200.html">访问</a></td> </tr><tr> <td>scrapy分布式爬虫</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/92.html">访问</a></td> </tr><tr> <td>diango rest framework打造生鲜电商</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/131.html">访问</a></td> </tr><tr> <td>tornado从入门到精通</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/290.html">访问</a></td> </tr> </tbody> </table> </div> </body> </html> """ #先取出所有的html值 sel = Selector(text = html) course_info1 = sel.css( ".teacher_info + p::text" ).extract()[ 0 ] print (course_info1) |
输出结果:
输出所有与class属性相邻P元素的值
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | from scrapy import Selector html = """ <html lang="en"> <head> <meta charset="UTF-8"> <title>bobby基本信息</title> <script src="jquery-3.5.1.min.js"></script> </head> <body> <div id="info"> <p style="color: blue">讲师信息</p> <div class="teacher_info"> Python全栈工程师 <p class="age">年龄:29</p> <p class="name bobbyname" data-bind="bobby">姓名:bobby</p> <p class="work_years">工作年限:7年</p> <p class="position">职位:python开发工程师</p> </div> <p style="color:aquamarine">课程信息</p> <table class="courses"> <tbody><tr><th>课程名称</th> <th>讲师</th> <th>地址</th> </tr><tr> <td>django打造在线教育</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/78.html">访问</a></td> </tr><tr> <td>python高级编程</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/200.html">访问</a></td> </tr><tr> <td>scrapy分布式爬虫</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/92.html">访问</a></td> </tr><tr> <td>diango rest framework打造生鲜电商</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/131.html">访问</a></td> </tr><tr> <td>tornado从入门到精通</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/290.html">访问</a></td> </tr></tbody></table> </div> </body> </html> """ #先取出所有的html值 sel = Selector(text = html) course_info2 = sel.css( ".teacher_info ~ p::text" ).extract()[ 0 ] print (course_info2) |
输出结果:
输出指定超链接的标签属性值
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | from scrapy import Selector html = """ <html lang="en"> <head> <meta charset="UTF-8"> <title>bobby基本信息</title> <script src="jquery-3.5.1.min.js"></script> </head> <body> <div id="info"> <p style="color: blue">讲师信息</p> <div class="teacher_info"> Python全栈工程师 <p class="age">年龄:29</p> <p class="name bobbyname" data-bind="bobby">姓名:bobby</p> <p class="work_years">工作年限:7年</p> <p class="position">职位:python开发工程师</p> </div> <p style="color:aquamarine">课程信息</p> <table class="courses"> <tbody><tr><th>课程名称</th> <th>讲师</th> <th>地址</th> </tr><tr> <td>django打造在线教育</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/78.html">访问</a></td> </tr><tr> <td>python高级编程</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/200.html">访问</a></td> </tr><tr> <td>scrapy分布式爬虫</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/92.html">访问</a></td> </tr><tr> <td>diango rest framework打造生鲜电商</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/131.html">访问</a></td> </tr><tr> <td>tornado从入门到精通</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/290.html">访问</a></td> </tr></tbody></table> </div> </body> </html> """ #先取出所有的html值 sel = Selector(text = html) couse_url1 = sel.css( "a[href='https://coding.imooc.com/class/200.html']::text" ).extract()[ 0 ] print (couse_url1) |
输出结果:
获取指定超链接包含某字符串的所有标签属性值
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | from scrapy import Selector html = """ <html lang="en"> <head> <meta charset="UTF-8"> <title>bobby基本信息</title> <script src="jquery-3.5.1.min.js"></script> </head> <body> <div id="info"> <p style="color: blue">讲师信息</p> <div class="teacher_info"> Python全栈工程师 <p class="age">年龄:29</p> <p class="name bobbyname" data-bind="bobby">姓名:bobby</p> <p class="work_years">工作年限:7年</p> <p class="position">职位:python开发工程师</p> </div> <p style="color:aquamarine">课程信息</p> <table class="courses"> <tbody><tr><th>课程名称</th> <th>讲师</th> <th>地址</th> </tr><tr> <td>django打造在线教育</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/78.html">访问</a></td> </tr><tr> <td>python高级编程</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/200.html">访问</a></td> </tr><tr> <td>scrapy分布式爬虫</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/92.html">访问</a></td> </tr><tr> <td>diango rest framework打造生鲜电商</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/131.html">访问</a></td> </tr><tr> <td>tornado从入门到精通</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/290.html">访问</a></td> </tr></tbody></table> </div> </body> </html> """ #先取出所有的html值 sel = Selector(text = html) course_url2 = sel.css( "a[href*='imooc']::text" ).extract() print (course_url2) |
输出结果:
向后获取同级标签属性的值
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | from scrapy import Selector html = """ <html lang="en"> <head> <meta charset="UTF-8"> <title>bobby基本信息</title> <script src="jquery-3.5.1.min.js"></script> </head> <body> <div id="info"> <p style="color: blue">讲师信息</p> <div class="teacher_info"> Python全栈工程师 <p class="age">年龄:29</p> <p class="name bobbyname" data-bind="bobby">姓名:bobby</p> <p class="work_years">工作年限:7年</p> <p class="position">职位:python开发工程师</p> </div> <p style="color:aquamarine">课程信息</p> <table class="courses"> <tbody><tr><th>课程名称</th> <th>讲师</th> <th>地址</th> </tr><tr> <td>django打造在线教育</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/78.html">访问</a></td> </tr><tr> <td>python高级编程</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/200.html">访问</a></td> </tr><tr> <td>scrapy分布式爬虫</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/92.html">访问</a></td> </tr><tr> <td>diango rest framework打造生鲜电商</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/131.html">访问</a></td> </tr><tr> <td>tornado从入门到精通</td> <td>bobby</td> <td><a href="https://coding.imooc.com/class/290.html">访问</a></td> </tr></tbody></table> </div> </body> </html> """ #先取出所有的html值 sel = Selector(text = html) sibling_tag = sel.css( "p.name ~ p::text" ).extract() print (sibling_tag) |
输出结果:
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· .NET10 - 预览版1新功能体验(一)
2020-05-20 IP网际协议