02_爬取灌篮高手全国大赛漫画

前言:

  写本次项目主要是忙里偷闲想看看漫画,决定写个爬虫练练手。爬取的过程中还是遇到了一些问题。所以这一次项目主要同样是用Selenium来模拟浏览器操作来获取全部图片,用xpath来解析出图片,最后将图片保存到本地。


1、本次目标

  利用Selenium爬取漫画图片,将图片保存到本地。

2. 准备工作

  本次项目需要用到的库和浏览器和上一节相同,这里就不在赘述。

3、思路分析

  本次爬取网站的URL为https://www.shenmanhua.com/glgsqgdspqc/,进去点击第一话可以看到页面如图一所示。

                                                                                       图一

首先可以看到页面的URL为https://www.shenmanhua.com/glgsqgdspqc/1.html。翻页方式为单击翻页,单击之后可以发现,URL并没有变化,还是原来的URL。再点击一章可以发现URL后面的数字增加了1,所以很容易的得出:每一章的URL实际上就是主页面的URL后面接上一个page.html(page为章节数),即章节数所对应的URL的关系就为url = https://www.shenmanhua.com/glgsqgdspqc/ + page.html。该漫画只有79章,page的范围就是1-79。所以只需要将page从range(1,80)中遍历出来,和主页的URL以及后缀字符串“.html“拼接起来,就这样很容易就可以得到全部章节的URL了,剩下的只需要从每个章节提取每一话的图即可。

接下来看如何提取出图片。如图二所示:

                                                                                      图二

可以看到图片的位置保存在class=mh_comicpic的div标签下的img里,但是只有一张,单击跳转到下一话,如图三所示,图片所处同样的位置且只有一张。

                                                                                   图三

到这里,我们可以使用selenium模拟鼠标点击图片来起到跳转下一话的作用,同时运用xpath在对应的标签下解析到图片即可。但是这样做有存在一个问题,那就是每一章的有多少话是不一样的,所以到底要点击多少次来确保取完章节里的所有图,这一点来说不太好控制。

这里也存在一个方法:提取每一张图时,提取出当前页数和最大页数,然后进行对比,小于最大页数就继续执行单击跳转,等于时就结束,URL跳转至下一章。这个方法虽然可行,但是过于复杂,便不推荐。毕竟Python的哲学是简洁和优雅。那么下面就来介绍一下这个简洁又优雅的方法吧。

上面得到图片的方式是单击,可以不可换一个思路,不使用单击,选择连续阅读,如图四所示。

                                                                                       图四

 可以看到,选择连续阅读,图片就不需要跳转了,单个页面有所有的图片标签,下拉滚动条图片都呈现了出来。是不是现在可以选中所有图片所在的标签直接使用xpath就可以把图片提取出来了呢?

答案是不可以。

为什么?

仔细看上图可以发现,现在滚动条下拉的位置到了第三张图,第三张图片及其标签下的链接已经显示出来了,但是看到第四张图以及之后的数据并没有显示出来。

这里采用了ajax技术来加载图片,我们需要下拉滚动条来触发ajax来得到图片。(实际上之前讲述的单击阅读,也是通过单击操作的方式来触发ajax得到图片的)

和上一节京东的项目类似,使用selenium缓慢下拉滚动条至底部,确保图片加载完毕,此时在获得完整html数据的情况下,再从里面相应的标签下获取所有的图片即可。

 4、获取URL

  上面分析已经提到过了如何获取所有章节的URL了,只需要将page从range(1,80)遍历出来,和主URL以及字符串“.html”拼接起来得到新的URL,这样就可以得到所有章节的URL了。代码实现如下:(该代码及以下代码重在演示功能)

1 for page in range(1, 80):
2     url = 'https://www.shenmanhua.com/glgsqgdspqc/' + str(page) + '.html'
3     print(url)

运行结果如下:

 1 https://www.shenmanhua.com/glgsqgdspqc/1.html
 2 https://www.shenmanhua.com/glgsqgdspqc/2.html
 3 https://www.shenmanhua.com/glgsqgdspqc/3.html
 4 https://www.shenmanhua.com/glgsqgdspqc/4.html
 5 https://www.shenmanhua.com/glgsqgdspqc/5.html
 6 https://www.shenmanhua.com/glgsqgdspqc/6.html
 7 https://www.shenmanhua.com/glgsqgdspqc/7.html
 8 https://www.shenmanhua.com/glgsqgdspqc/8.html
 9 https://www.shenmanhua.com/glgsqgdspqc/9.html
10 https://www.shenmanhua.com/glgsqgdspqc/10.html
11 https://www.shenmanhua.com/glgsqgdspqc/11.html
12 https://www.shenmanhua.com/glgsqgdspqc/12.html
13 https://www.shenmanhua.com/glgsqgdspqc/13.html
14 https://www.shenmanhua.com/glgsqgdspqc/14.html
15 https://www.shenmanhua.com/glgsqgdspqc/15.html
16 https://www.shenmanhua.com/glgsqgdspqc/16.html
17 https://www.shenmanhua.com/glgsqgdspqc/17.html
18 https://www.shenmanhua.com/glgsqgdspqc/18.html
19 https://www.shenmanhua.com/glgsqgdspqc/19.html
20 https://www.shenmanhua.com/glgsqgdspqc/20.html
21 https://www.shenmanhua.com/glgsqgdspqc/21.html
22 https://www.shenmanhua.com/glgsqgdspqc/22.html
23 https://www.shenmanhua.com/glgsqgdspqc/23.html
24 https://www.shenmanhua.com/glgsqgdspqc/24.html
25 https://www.shenmanhua.com/glgsqgdspqc/25.html
26 https://www.shenmanhua.com/glgsqgdspqc/26.html
27 https://www.shenmanhua.com/glgsqgdspqc/27.html
28 https://www.shenmanhua.com/glgsqgdspqc/28.html
29 https://www.shenmanhua.com/glgsqgdspqc/29.html
30 https://www.shenmanhua.com/glgsqgdspqc/30.html
31 https://www.shenmanhua.com/glgsqgdspqc/31.html
32 https://www.shenmanhua.com/glgsqgdspqc/32.html
33 https://www.shenmanhua.com/glgsqgdspqc/33.html
34 https://www.shenmanhua.com/glgsqgdspqc/34.html
35 https://www.shenmanhua.com/glgsqgdspqc/35.html
36 https://www.shenmanhua.com/glgsqgdspqc/36.html
37 https://www.shenmanhua.com/glgsqgdspqc/37.html
38 https://www.shenmanhua.com/glgsqgdspqc/38.html
39 https://www.shenmanhua.com/glgsqgdspqc/39.html
40 https://www.shenmanhua.com/glgsqgdspqc/40.html
41 https://www.shenmanhua.com/glgsqgdspqc/41.html
42 https://www.shenmanhua.com/glgsqgdspqc/42.html
43 https://www.shenmanhua.com/glgsqgdspqc/43.html
44 https://www.shenmanhua.com/glgsqgdspqc/44.html
45 https://www.shenmanhua.com/glgsqgdspqc/45.html
46 https://www.shenmanhua.com/glgsqgdspqc/46.html
47 https://www.shenmanhua.com/glgsqgdspqc/47.html
48 https://www.shenmanhua.com/glgsqgdspqc/48.html
49 https://www.shenmanhua.com/glgsqgdspqc/49.html
50 https://www.shenmanhua.com/glgsqgdspqc/50.html
51 https://www.shenmanhua.com/glgsqgdspqc/51.html
52 https://www.shenmanhua.com/glgsqgdspqc/52.html
53 https://www.shenmanhua.com/glgsqgdspqc/53.html
54 https://www.shenmanhua.com/glgsqgdspqc/54.html
55 https://www.shenmanhua.com/glgsqgdspqc/55.html
56 https://www.shenmanhua.com/glgsqgdspqc/56.html
57 https://www.shenmanhua.com/glgsqgdspqc/57.html
58 https://www.shenmanhua.com/glgsqgdspqc/58.html
59 https://www.shenmanhua.com/glgsqgdspqc/59.html
60 https://www.shenmanhua.com/glgsqgdspqc/60.html
61 https://www.shenmanhua.com/glgsqgdspqc/61.html
62 https://www.shenmanhua.com/glgsqgdspqc/62.html
63 https://www.shenmanhua.com/glgsqgdspqc/63.html
64 https://www.shenmanhua.com/glgsqgdspqc/64.html
65 https://www.shenmanhua.com/glgsqgdspqc/65.html
66 https://www.shenmanhua.com/glgsqgdspqc/66.html
67 https://www.shenmanhua.com/glgsqgdspqc/67.html
68 https://www.shenmanhua.com/glgsqgdspqc/68.html
69 https://www.shenmanhua.com/glgsqgdspqc/69.html
70 https://www.shenmanhua.com/glgsqgdspqc/70.html
71 https://www.shenmanhua.com/glgsqgdspqc/71.html
72 https://www.shenmanhua.com/glgsqgdspqc/72.html
73 https://www.shenmanhua.com/glgsqgdspqc/73.html
74 https://www.shenmanhua.com/glgsqgdspqc/74.html
75 https://www.shenmanhua.com/glgsqgdspqc/75.html
76 https://www.shenmanhua.com/glgsqgdspqc/76.html
77 https://www.shenmanhua.com/glgsqgdspqc/77.html
78 https://www.shenmanhua.com/glgsqgdspqc/78.html
79 https://www.shenmanhua.com/glgsqgdspqc/79.html

这样我就得到了全部章节的URL了。

5、获取图片

  上面已经详细分析了提取图片的方法,即选择连续阅读模式,下拉滚动条加载全部图片,xpath提取图片。代码实现如下:

 1 import time
 2 from selenium import webdriver
 3 from lxml import etree
 4 
 5 # url为第79章的URL,这里重在演示功能就只取一章
 6 url = 'https://www.shenmanhua.com/glgsqgdspqc/79.html'
 7 browser = webdriver.Chrome()
 8 browser.get(url)
 9 # 将按钮调至为连续阅读    
10 buttom = browser.find_element_by_xpath(('/html/body/div[2]/div[3]/select[2]/option[3]'))
11 buttom.click()
12 # 缓慢下拉加载出全部漫画   !若果直接下来至底部不能加载出全部漫画
13 for y in range(180):
14     js = 'window.scrollBy(0,100)'
15     browser.execute_script(js)
16     time.sleep(0.1)
17 # 运用xpath规则来提取该章节中所有图片
18 html = etree.HTML(browser.page_source)
19 images = html.xpath("//div[@class='mh_comicpic']/img/@src")
20 # 遍历出该章节中每一话的图片
21 for image in images:
22     print(image)
23 browser.close()

运行结果如下:

 1 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F1.jpg-noresize.webp
 2 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F2.jpg-noresize.webp
 3 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F3.jpg-noresize.webp
 4 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F4.jpg-noresize.webp
 5 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F5.jpg-noresize.webp
 6 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F6.jpg-noresize.webp
 7 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F7.jpg-noresize.webp
 8 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F8.jpg-noresize.webp
 9 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F79%E8%AF%9D%2F9.jpg-noresize.webp

这样我们就可以得到每一章节的全部图片了。

6、完整代码

 1 from lxml import etree
 2 import requests
 3 from selenium import webdriver
 4 import time
 5 
 6 
 7 def main():
 8     print('爬虫运行开始!')
 9     print('正在爬取中...')
10     for page in range(1, 80):
11         url = get_url(page)
12         html = get_page(url)
13         get_image(html, page)
14     print('爬虫运行结束!')
15 
16 
17 def get_url(page):
18     url = 'https://www.shenmanhua.com/glgsqgdspqc/' + str(page) + '.html'
19     return url
20 
21 
22 def get_page(url):
23     try:
24         browser.get(url)
25         # 将按钮调至为连续阅读     !每一页只有一副漫画,该网站应该是点击页码数来触发ajax,编写代码比较麻烦
26         buttom = browser.find_element_by_xpath(('/html/body/div[2]/div[3]/select[2]/option[3]'))
27         buttom.click()
28         # 缓慢下拉加载出全部漫画   !若果直接下来至底部不能加载出全部漫画
29         for y in range(180):
30             js = 'window.scrollBy(0,100)'
31             browser.execute_script(js)
32             time.sleep(0.1)
33         html = etree.HTML(browser.page_source)
34         current_page = url[-6]
35     except Exception as error:
36         print('第%s章爬取出错啦! 出错连接为:' % current_page, url, error)
37     finally:
38         print('正在爬取第%s章...' % current_page)
39         return html
40 
41 
42 def get_image(html, page):
43     images = html.xpath("//div[@class='mh_comicpic']/img/@src")
44     i = 1
45     # 漫画章节数,按章节话数给图片命名    例如:第一章 1-1 ...
46     chapter = str(page) + '-'
47     for image in images:
48         print(image)
49         image_name = chapter + str(i) +jpg'
50         with open(image_name, 'wb') as f:
51             response = requests.get(image)
52             f.write(response.content)
53         i += 1
54     print('第%s章爬取完毕!' % page)
55 
56 
57 if __name__ == "__main__":
58     browser = webdriver.Chrome()
59     main()
60     browser.close()

运行结果如下:

 1 爬虫运行开始!
 2 正在爬取中...
 3 正在爬取第1章...
 4 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F1.jpg-noresize.webp
 5 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F2.jpg-noresize.webp
 6 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F3.jpg-noresize.webp
 7 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F4.jpg-noresize.webp
 8 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F5.jpg-noresize.webp
 9 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F6.jpg-noresize.webp
10 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F7.jpg-noresize.webp
11 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F8.jpg-noresize.webp
12 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F9.jpg-noresize.webp
13 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F10.jpg-noresize.webp
14 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F11.jpg-noresize.webp
15 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F12.jpg-noresize.webp
16 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F13.jpg-noresize.webp
17 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F14.jpg-noresize.webp
18 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F15.jpg-noresize.webp
19 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F16.jpg-noresize.webp
20 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F17.jpg-noresize.webp
21 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F18.jpg-noresize.webp
22 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F19.jpg-noresize.webp
23 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F20.jpg-noresize.webp
24 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F1%E8%AF%9D%2F21.jpg-noresize.webp
25 第1章爬取完毕!
26 正在爬取第2章...
27 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F1.jpg-noresize.webp
28 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F2.jpg-noresize.webp
29 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F3.jpg-noresize.webp
30 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F4.jpg-noresize.webp
31 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F5.jpg-noresize.webp
32 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F6.jpg-noresize.webp
33 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F7.jpg-noresize.webp
34 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F8.jpg-noresize.webp
35 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F9.jpg-noresize.webp
36 https://mhpic.mhxk.com/comic/G%2F%E7%81%8C%E7%AF%AE%E9%AB%98%E6%89%8B%E5%85%A8%E5%9B%BD%E5%A4%A7%E8%B5%9B%E7%AF%87%E5%85%A8%E5%BD%A9%2F2%E8%AF%9D%2F10.jpg-noresize.webp
37 第2章爬取完毕!
.....

图片保存到本地如图五所示:

                                                                                              图五

这样所有的图片都爬取下来了,项目大功告成!(图五左下角,冥冥之中自有...)


 

结语:

  这次爬虫小项目写下来还是收获挺多的,特别是重温了一下这部很喜欢的漫画,勾起了我很多的回忆。

   

                                                                    青春,因为遗憾才显得弥足珍贵

 

posted @ 2018-12-04 19:54  我家有只大白兔  阅读(730)  评论(0编辑  收藏  举报