摘要:
[https://pymongo.readthedocs.io/en/stable/examples/high_availability.html#](https://pymongo.readthedocs.io/en/stable/examples/high_availability.html#) 阅读全文
摘要:
使用pymongo,具体可以参考官方文档: 语法上基本和原生mongodb是一样的,所以非常容易入手... [https://pymongo.readthedocs.io/en/stable/tutorial.html](https://pymongo.readthedocs.io/en/stabl 阅读全文
摘要:
命令参考:[https://github.com/scrapy/scrapyd-client](https://github.com/scrapy/scrapyd-client) [https://scrapyd.readthedocs.io](https://scrapyd.readthedocs 阅读全文
摘要:
scrapy本身是自带支持HTTP2的爬取: [https://docs.scrapy.org/en/latest/topics/settings.html?highlight=H2DownloadHandler#download-handlers-base](https://docs.scrapy 阅读全文
摘要:
**高级方法:** **一般方法:** 运行爬虫时使用-a传递参数 ```Bash scrapy crawl 爬虫名 -a key=values ``` 然后在爬虫类的__init__魔法方法中获取kwargs ```Python class Bang123Spider(RedisCrawlSpid 阅读全文
摘要:
settings.py中设置配置项 ```Python MONGODB_HOST = "127.0.0.1" MONGODB_PORT = 27017 MONGODB_DB_NAME = "bang123" ``` pipelines.py: ```Python from scrapy.pipeli 阅读全文
摘要:
scrapy特性就是效率高,异步,如果非要集成selenium实际上意义不是特别大....因为selenium慢.... 案例:淘宝首页推荐商品的标题获取 爬虫类 toabao.py ```Python import scrapy from scrapy.http import HtmlRespon 阅读全文
摘要:
安装包 ```Python pip install -U scrapy-redis ``` settings.py ```Python ##### Scrapy-Redis ##### ### Scrapy指定Redis 配置 ### # 其他默认配置在scrapy_redis.default.py 阅读全文
摘要:
参考官方文档:[https://docs.scrapy.org/en/latest/topics/jobs.html?highlight=JOBDIR#jobs-pausing-and-resuming-crawls](https://docs.scrapy.org/en/latest/topics 阅读全文
摘要:
CrawlSpider类型的爬虫会根据指定的rules规则自动找到url比自动爬取。 优点:适合整站爬取,自动翻页爬取 缺点:比较难以通过meta传参,只适合一个页面就能拿完数据的。 ```Python import scrapy from scrapy.http import HtmlRespon 阅读全文