2017 年 11月 24 日随笔档案 - 人微言轻1

2017年11月24日

摘要： 1. 针对需要带cookies访问的网站 scrapy.Request() 设置cookies 设置phantomjs后运行爬虫还是依然会别ban > 解决方法是在 phantomjs中设置cookies driver.add_cookies(cookies) 阅读全文

posted @ 2017-11-24 15:53 人微言轻1 阅读(118) 评论(0) 推荐(0) 编辑

scrapy中的ruquest

摘要： 1. scrapy.Request 对象和 requests.get() 阅读全文

posted @ 2017-11-24 15:50 人微言轻1 阅读(414) 评论(0) 推荐(0) 编辑

scrapy-scheduler

摘要： 1 # scheduler的作用: 用于控制Request对象的存储和获取，并提供了过滤重复Request的功能。 2 3 class Scheduler(object): 4 5 def __init__(self, dupefilter, jobdir=None, dqclass=None, mqclass=None, 6 logun... 阅读全文

posted @ 2017-11-24 13:20 人微言轻1 阅读(1326) 评论(0) 推荐(0) 编辑

人贱言轻

公告