twisted的task之cooperator和scrapy的parallel()函数

Posted on 2018-10-12 13:28  王将军之武库  阅读(387)  评论(0编辑  收藏  举报
def handle_spider_output(self, result, request, response, spider):
        if not result:
            return defer_succeed(None)
        it = iter_errback(result, self.handle_spider_error, request, response, spider)
        dfd = parallel(it, self.concurrent_items,
            self._process_spidermw_output, request, response, spider)
        return dfd
def iter_errback(iterable, errback, *a, **kw):
    """Wraps an iterable calling an errback if an error is caught while
    iterating it.
    """
    it = iter(iterable)
    while True:
        try:
            yield next(it)
        except StopIteration:
            break
        except:
            errback(failure.Failure(), *a, **kw)

包装一个iter,使其可以在迭代时出现异常时调用 错误处理函数。

def parallel(iterable, count, callable, *args, **named):
    """Execute a callable over the objects in the given iterable, in parallel,
    using no more than ``count`` concurrent calls.

    Taken from: http://jcalderone.livejournal.com/24285.html
    """
    coop = task.Cooperator()
    work = (callable(elem, *args, **named) for elem in iterable)
    return defer.DeferredList([coop.coiterate(work) for _ in range(count)])

并行处理函数,通过twisted的task来实现的。work是一个生成器,每次迭代时,使work前进一步。defer.DeferredList([coop.coiterate(work) for _ in range(count)])生成count个cooperatertask,定时调用work,直到迭代完成。由此可见,蜘蛛输出是一个deferredlist,一个defer在执行callback时,return是defer时,会停止执行callback,等待到结果执行callback时才能再次继续执行。这样实现了defer的串联执行,外层defer相当于总控制,callback返回defer相当于下层的分支。

 

def coiterate(self, iterator, doneDeferred=None):
        """
        Add an iterator to the list of iterators this L{Cooperator} is
        currently running.

        Equivalent to L{cooperate}, but returns a L{defer.Deferred} that will
        be fired when the task is done.

        @param doneDeferred: If specified, this will be the Deferred used as
            the completion deferred.  It is suggested that you use the default,
            which creates a new Deferred for you.

        @return: a Deferred that will fire when the iterator finishes.
        """
        if doneDeferred is None:
            doneDeferred = defer.Deferred()
        CooperativeTask(iterator, self).whenDone().chainDeferred(doneDeferred)
        return doneDeferred

cooperator什么时候调用start开始执行任务?其实在构造cooperator时started=True,所以CooperativeTask()时,会把task加入cooperator,同时调用cooperator的_reschedule()使其可以参与调度。

def _addTask(self, task):
        """
        Add a L{CooperativeTask} object to this L{Cooperator}.
        """
        if self._stopped:
            self._tasks.append(task) # XXX silly, I know, but _completeWith
                                     # does the inverse
            task._completeWith(SchedulerStopped(), Failure(SchedulerStopped()))
        else:
            self._tasks.append(task)
            self._reschedule()
def _tick(self):#每次调度时会遍历没有停止的任务,每个任务会执行onework。
        """
        Run one scheduler tick.
        """
        self._delayedCall = None
        for taskObj in self._tasksWhileNotStopped():
            taskObj._oneWorkUnit()
        self._reschedule()


    _mustScheduleOnStart = False
    def _reschedule(self):
        if not self._started:
            self._mustScheduleOnStart = True
            return
        if self._delayedCall is None and self._tasks:
            self._delayedCall = self._scheduler(self._tick)延时call为定时调用tick函数。
EPSILON = 0.00000001
def _defaultScheduler(x):
    from twisted.internet import reactor
    return reactor.callLater(_EPSILON, x)

通过self._scheduler(self._tick)(_defaultScheduler(x))使twisted的reactor能不断调用tick函数。

Copyright © 2024 王将军之武库
Powered by .NET 9.0 on Kubernetes