scrapy框架使用splash渲染引擎爬取动态页面

1.启动docker，在命令行里输入

docker run -p 8050:8050 scrapinghub/splash

在docker上运行splash引擎
2.接下来就可以来写爬虫文件了
首先在setting里配置

splash_url='http://loaclhost:8050'
DUPEFLITER='scrapy_splash.SplashAwareDupeFilter'

DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware':723,
    'scrapy_splash.SplashMiddleware':725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware':810
}

同时启用pileline
3.在写spider文件时，在开头加入

from scrapy_splash import SplashRequest

我们就使用SplashReqeust方法来将我们要解析的页面提交给splash引擎的

posted @ 2018-12-30 22:16 ayang818 阅读(255) 评论(0) 收藏举报

刷新页面返回顶部

ayang818

Always in the way!

scrapy框架使用splash渲染引擎爬取动态页面

公告