python requests 简单网页文本爬取

爬取网页：

http://www.cnblogs.com/xrq730/archive/2018/06/11/9159586.html

抓取的是一个博客的文本内容

用requeusts获取整个网页的HTML信息；
使用Beautiful Soup解析HTML信息

 1 import requests
 2 from bs4 import BeautifulSoup
 3  
 4 
 5 if __name__=='__main__':
 6     target='http://www.cnblogs.com/xrq730/archive/2018/06/11/9159586.html'
 7     req=requests.get(url=target)
 8     html=req.text
 9     bf=BeautifulSoup(html)
10     texts=bf.find_all('div',class_='blogpost-body')
11     #print(html)
12     print(texts[0].text.replace('<p><span style=\"font-size: 14px; font-family: 宋体;\">','\n\n\t'))
13     #print(texts[0].text.replace('\ax0'*8,'\n\n'))

posted @ 2018-06-20 18:44 hhhaaa 阅读(2039) 评论(0) 编辑收藏举报

刷新页面返回顶部

hhhaaa

python requests 简单网页文本爬取

公告