[Python]BeautifulSoup标签的遍历
1.下行遍历
标签树的下行遍历
.content 子节点列表,将tag所有儿子节点存入列表
.children 子节点的迭代类型,与.contents类似用于循环遍历儿子节点
.descendants 子孙节点的迭代类型,包含所有子孙节点,用于循环遍历
测试代码:
import requests from bs4 import BeautifulSoup r=requests.get("http://python123.io/ws/demo.html") demo=r.text soup=BeautifulSoup(demo,"html.parser") print(soup.head) #head标签内容 print(soup.head.contents) #head标签子节点的内容 print(soup.body.contents) #body标签子节点的内容 print(len(soup.body.contents)) #body标签的子节点层数 print(soup.body.contents[1]) #
2.上行遍历
.parent 节点的父亲标签
.parents 循环遍历先辈节点
测试代码:
import requests from bs4 import BeautifulSoup r=requests.get("http://python123.io/ws/demo.html") demo=r.text soup=BeautifulSoup(demo,"html.parser") #print(soup.title.parent) #print(soup.html.parent) for parent in soup.a.parents: if parent is None: print(parent) else: print(parent.name)
3.平行遍历
标签树的平行遍历
.next_sibling 返回按照HTML文本顺序的下一个平行节点标签
.previous_sibling 返回按照HTML文本顺序的上一个平行节点标签
.nex_siblings 迭代类型,返回按照HTML文本顺序的后续所有平行节点标签
.previous_siblings 迭代类型,返回按照HTML文本顺序的前续所有平行节点标签
测试代码:
import requests from bs4 import BeautifulSoup r=requests.get("http://python123.io/ws/demo.html") demo=r.text soup=BeautifulSoup(demo,"html.parser") print(soup.a.next_sibling) #a的平行标签 print(soup.a.next_sibling.next_sibling) #a标签的下一个标签的平行标签 print(soup.a.previous_sibling) #a标签的上一个标签 print(soup.a.previous_sibling.previous_sibling) #a标签的上一个标签的平行标签