scrapy爬取图片时,出现 ValueError:Missing scheme in request url:h错误
2021-03-12 23:34:24 [scrapy.core.scraper] ERROR: Error processing {'article_name': '灰塔的黎明', 'article_path': 'D:/data/python/23us', 'article_url': 'https://www.23us.com/html/76/76788/', 'article_url_id': '1a8113fbd8d765f4d58002506ece2c2a', 'article_webaddr': 'https://www.23us.com/book/76788', 'author': '\xa0湖中羊', 'chapter_fullnum': '\xa03191844字', 'chapter_lastest_name': ' 第五百三十一章 活着的城市', 'chapter_lastest_url': 'https://www.23us.com/html/76/76788/', 'chapter_num': '湖中羊', 'collect_num': '\xa023', 'content_validity': '嘿!对,就是你!你这么行色匆匆的要去哪里啊?哦,我知道,我知道生活不易,不过也别太拼命了。你问我在这里干什么?哈哈,我只是坐在这里,讲一些老掉牙的故事,关于巫师,巨龙……你知道的,那些曾经在我们梦里出现过的东西。嘿,你猜怎么着,如果你不那么着急的话,为什么不坐下来听听它们呢?我虽然自认不是个好的说书人,可我敢保证这故事我绝对用了心!来听听吧,也许,它能让你重新梦到,那些早就被我们忘了的……传奇。书友群:193123031欢迎前来催稿', 'front_image_url': 'https://www.23us.com/files/article/image/76/76788/76788s.jpg', 'full_click_num': '\xa0551', 'full_recommend_num': '3', 'image_list': 'https://www.23us.com/files/article/image/76/76788/76788s.jpg', 'mon_click_num': '\xa016', 'mon_recommend_num': '1', 'novel_classify': '玄幻魔法', 'status': 1, 'update': '\xa02021-03-12', 'webaddr': 'https://www.23us.com', 'webaddr_id': 'c2affe5b45bdf9396163dec0fdcea696', 'webname': '23us', 'week_click_num': '\xa04', 'week_recommend_num': '3'} Traceback (most recent call last): File "D:\data\python\environment\zhaopin\lib\site-packages\twisted\internet\defer.py", line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File "D:\data\python\environment\zhaopin\lib\site-packages\scrapy\utils\defer.py", line 150, in f return deferred_from_coro(coro_f(*coro_args, **coro_kwargs)) File "D:\data\python\environment\zhaopin\lib\site-packages\scrapy\pipelines\media.py", line 88, in process_item dlist = [self._process_request(r, info, item) for r in requests] File "D:\data\python\environment\zhaopin\lib\site-packages\scrapy\pipelines\media.py", line 88, in <listcomp> dlist = [self._process_request(r, info, item) for r in requests] File "D:\data\python\zhaopin\WEB-INF\ArticleSpider\ArticleSpider\pipelines.py", line 51, in get_media_requests yield Request(image_url,meta={'webname': item['webname'], 'jpg_num': jpg_num, File "D:\data\python\environment\zhaopin\lib\site-packages\scrapy\http\request\__init__.py", line 25, in __init__ self._set_url(url) File "D:\data\python\environment\zhaopin\lib\site-packages\scrapy\http\request\__init__.py", line 73, in _set_url raise ValueError(f'Missing scheme in request url: {self._url}') ValueError: Missing scheme in request url: h
原因:因为在settings.py存储图片,其ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 301}用到的是图片的url列表,而在Spider类中返回的是一个url字符串,所以ITEM_PIPELINES参数在执行循环获取url列表时,出现了只获取到了字符串的h,也就是上述的错误
解决办法:一定要注意图片url要使用双中括号,即红方框中的内容