Fork me on GitHub

Python网络小说爬虫

Python for cyber novel web crawler

Package in use

urllib, BeautifulSoup
urllib is a build-in package in Python and the most useful childpackage is .request.urlopen.
BeautifulSoup could be installed through Anaconda by yourself, and it could compel the .html webpage as an object.

Example

html = urlopen("http://www.shuhai.com/read/54351/1.html")
bsObj = BeautifulSoup(html)
chapter_content = bsObj.findAll("p")
for content in chapter_content:
    print(content.get_text())

Extend

Use of the bsObj to check the construction of html body.
Use of .get_text() to return the text content in the object.
Use of .findAll()

posted @ 2020-08-01 15:28  ~Anti  阅读(100)  评论(0编辑  收藏  举报