Python练习六十:网页分析,找出里面的正文与链接
网页分析,找出里面的正文与链接
代码如下:
from urllib import request from bs4 import BeautifulSoup request = request.urlopen('https://www.baidu.com/') request_text = request.read().decode('utf-8') soup = BeautifulSoup(request_text,'lxml') # print(soup.prettify) url = soup.findAll('a') contents1 = soup.contents #全部子节点 href1 = [] #链接 string1 = [] #正文 for i in url: href1.append(i['href']) for string in soup.stripped_strings: string1.append(repr(string)) print(href1) print('-----------------------------') print(contents1) print('-----------------------------') print(string1)
执行结果忽略