Python网络小说爬虫

Python for cyber novel web crawler

Package in use

urllib, BeautifulSoup
urllib is a build-in package in Python and the most useful childpackage is .request.urlopen.
BeautifulSoup could be installed through Anaconda by yourself, and it could compel the .html webpage as an object.

Example

html = urlopen("http://www.shuhai.com/read/54351/1.html")
bsObj = BeautifulSoup(html)
chapter_content = bsObj.findAll("p")
for content in chapter_content:
    print(content.get_text())

Extend

Use of the bsObj to check the construction of html body.
Use of .get_text() to return the text content in the object.
Use of .findAll()

posted @ 2020-08-01 15:28 ～Anti 阅读(100) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

～Anti

Python网络小说爬虫

Python for cyber novel web crawler

Package in use

Example

Extend

公告