twisted的task之cooperator和scrapy的parallel()函数
Posted on 2018-10-12 13:28 王将军之武库 阅读(387) 评论(0) 编辑 收藏 举报def handle_spider_output(self, result, request, response, spider): if not result: return defer_succeed(None) it = iter_errback(result, self.handle_spider_error, request, response, spider) dfd = parallel(it, self.concurrent_items, self._process_spidermw_output, request, response, spider) return dfd
def iter_errback(iterable, errback, *a, **kw): """Wraps an iterable calling an errback if an error is caught while iterating it. """ it = iter(iterable) while True: try: yield next(it) except StopIteration: break except: errback(failure.Failure(), *a, **kw)
包装一个iter,使其可以在迭代时出现异常时调用 错误处理函数。
def parallel(iterable, count, callable, *args, **named): """Execute a callable over the objects in the given iterable, in parallel, using no more than ``count`` concurrent calls. Taken from: http://jcalderone.livejournal.com/24285.html """ coop = task.Cooperator() work = (callable(elem, *args, **named) for elem in iterable) return defer.DeferredList([coop.coiterate(work) for _ in range(count)])
并行处理函数,通过twisted的task来实现的。work是一个生成器,每次迭代时,使work前进一步。defer.DeferredList([coop.coiterate(work) for _ in range(count)])生成count个cooperatertask,定时调用work,直到迭代完成。由此可见,蜘蛛输出是一个deferredlist,一个defer在执行callback时,return是defer时,会停止执行callback,等待到结果执行callback时才能再次继续执行。这样实现了defer的串联执行,外层defer相当于总控制,callback返回defer相当于下层的分支。
def coiterate(self, iterator, doneDeferred=None): """ Add an iterator to the list of iterators this L{Cooperator} is currently running. Equivalent to L{cooperate}, but returns a L{defer.Deferred} that will be fired when the task is done. @param doneDeferred: If specified, this will be the Deferred used as the completion deferred. It is suggested that you use the default, which creates a new Deferred for you. @return: a Deferred that will fire when the iterator finishes. """ if doneDeferred is None: doneDeferred = defer.Deferred() CooperativeTask(iterator, self).whenDone().chainDeferred(doneDeferred) return doneDeferred
cooperator什么时候调用start开始执行任务?其实在构造cooperator时started=True,所以CooperativeTask()时,会把task加入cooperator,同时调用cooperator的_reschedule()使其可以参与调度。
def _addTask(self, task): """ Add a L{CooperativeTask} object to this L{Cooperator}. """ if self._stopped: self._tasks.append(task) # XXX silly, I know, but _completeWith # does the inverse task._completeWith(SchedulerStopped(), Failure(SchedulerStopped())) else: self._tasks.append(task) self._reschedule()
def _tick(self):#每次调度时会遍历没有停止的任务,每个任务会执行onework。 """ Run one scheduler tick. """ self._delayedCall = None for taskObj in self._tasksWhileNotStopped(): taskObj._oneWorkUnit() self._reschedule() _mustScheduleOnStart = False def _reschedule(self): if not self._started: self._mustScheduleOnStart = True return if self._delayedCall is None and self._tasks: self._delayedCall = self._scheduler(self._tick)延时call为定时调用tick函数。
EPSILON = 0.00000001 def _defaultScheduler(x): from twisted.internet import reactor return reactor.callLater(_EPSILON, x)
通过self._scheduler(self._tick)(_defaultScheduler(x))使twisted的reactor能不断调用tick函数。