An Introduction to Asynchronous Programming and Twisted (2)

Part 6: And Then We Took It Higher

Part5中的client2.0, 在封装性上已经做的不错, 用户只需要了解和修改PoetryProtocol, PoetryClientFactory就可以完成一个应用.

其实此处, protocol的逻辑就是接受数据, 接受完以后通知factory处理, 这段逻辑已经可以作为common的框架代码, 用户无需改动.

真正需要用户每次根据上下文修改的是, 当数据接受完后的处理逻辑poem_finished(print? save?), 和当发生错误是异常处理逻辑connection_failed.

而这两段逻辑都在PoetryClientFactory中, 这样用户需要每次来修改Factory, 而且对于不同的处理方式还要给出不同的factory类?

这明显是不合理的, 必须将这部分逻辑抽象出去, 那么PoetryClientFactory也就被封装成common框架代码, 用户也无需修改.

具体的做法就是将这部分代码作为callback, 当参数传到PoetryClientFactory, 这样就达到了较好的封装性, 用户连PoetryProtocol, PoetryClientFactory都不需要了解, 就可以完成应用, 比Part5的抽象又高了一层.

基于如下典型的异步接口, 用户只需要给出如下参数, 尤其给出完成时的callback, 和出错时的errback, 就可以完成异步操作, 甚至都不用关心是否有Twisted存在.

def get_poetry(host, port, callback, errback)

如果是用同步block的方式的, 代码需要这样写,

try:

    result = get_poetry(host, port)

except:

    errback()

else:

    callback(result)

 

Client 3.1
class PoetryClientFactory(ClientFactory):
 
    protocol = PoetryProtocol
 
    def __init__(self, callback, errback):
        self.callback = callback
        self.errback = errback
 
    def poem_finished(self, poem):
        self.callback(poem)
 
    def clientConnectionFailed(self, connector, reason):
        self.errback(reason)        

def get_poetry(host, port, callback, errback):
    from twisted.internet import reactor
    factory = PoetryClientFactory(callback,errback)
    reactor.connectTCP(host, port, factory)
  
def poetry_main():
    addresses = parse_args()
    from twisted.internet import reactor
    poems = []
 
    def got_poem(poem):
        poems.append(poem)
        if len(poems) == len(addresses):
            reactor.stop()
            
    def err_back(reason):
        print reason
        
    for address in addresses:
        host, port = address
        get_poetry(host, port, got_poem, err_back)
 
    reactor.run()

总结, 到这儿很有意思, 从最底层的socket调用, 一直到最高层的get_poetry, 可以清晰的了解到Twisted架构的层层封装, 一步步的搭建起一个异步编程框架, 用户可以方便的给出callback就能完成异步应用.

再回头看看Part2的asynchronous client, 要完成一个异步应用需要几部分,

数据传输

这部分相对比较简单, 最简单的例子, 我们也是使用socket, 已经是对TCP或UDP协议的封装, 后面又使用Transports类来进行进一步封装形成框架代码

数据异步监听

最基本是使用系统调用select来unblock的监听多个端口, 后面将select调用封装到Reactor类中, 通过reactor.run来启动侦听, 这是Twisted的核心

数据解析和组装

侦听到端口有数据后, 怎么处理?

这就是协议, 对于Poetry而言, 协议很简单, 不用任何解析, 就是直接把数据加到buffer中.

最初协议和数据处理是混合在一起的, 后面单独抽象出来, 是因为这部分的代码是相对稳定的, 可以作为框架代码, 不需要每个用户都去编写.

如HTTP, FTP, IMAP或其他的自定义的协议, 如Poetry, 他们的职责就是解析和生成数据, 如Poetry就是不断接收, 最后拼成poem. FTP就是最终拼成file.

但是怎么处理这个数据, 就不是协议的职责了, 因为这个千变万化, print, save, delete, 每个用户, 不同场景都不一样, 这个没法放到框架中去.

而协议的生成和管理被封装在Factory类中.

数据处理和错误处理

只有这部分是用户相关的, 也是用户唯一需要提供给框架的部分.

 

Part 7: An Interlude,  Deferred

We came face-to-face with this fact: callbacks are a fundamental aspect of asynchronous programming with Twisted.

So using Twisted, or any reactor-based asynchronous system, means organizing our code in a particular way, as a series of "callback chains” invoked by a reactor loop.

Callback作为异步系统的基础机制, 很重要, 但直接使用时会遇到些问题.

参见part6, 对于同步系统, 对于callback或errback的调用很直接, 调试很方便, 而且通过逻辑可以保证callback和errback只被调用一次.

但对于异步系统, 调用关系就比较晦涩, 只是把callback和errback作为参数传入, 什么时候被调用, 是否会被调用多次无法保证.

对于异步系统, 异常处理更为重要, 对于同步系统如果不catch异常, 系统会crash并将异常打出. 而对于异步系统, 如果没有异常处理, 这个异常就很有可能被隐藏了

所以需要继续对callback做封装.

The Deferred

Since callbacks are used so much in asynchronous programming, and since using them correctly can, as we have discovered, be a bit tricky, the Twisted developers created an abstraction called a Deferred to make programming with callbacks easier. The Deferred class is defined in twisted.internet.defer.

Deferred就是callback抽象封装, 目的是让编程者更容易的使用callback.

image

如上图, 其实Deferred很简单, 用下面3句话就可以说清楚,

A deferred contains a pair of callback chains, one for normal results and one for errors. A newly-created deferred has two empty chains.

We can populate the chains by adding callbacks and errbacks.

Firing (only once) the deferred will invoke the appropriate callbacks or errbacks in the order they were added.

Since deferreds don’t use the reactor, we can test them out without starting up the loop.

from twisted.internet.defer import Deferred
 
def got_poem(res):
    print 'Your poem is served:'
    print res
 
def poem_failed(err):
    print 'No poetry for you.'
 
d = Deferred()  #new created deferred has empty chains
 
# add a callback/errback pair to the chain
d.addCallbacks(got_poem, poem_failed)
 
# fire the chain with a normal result
d.callback('This poem is short.')
 
print "Finished"

Deferred和callback比, 好处是,

1. Invoking callbacks multiple times will likely result in subtle, hard-to-debug problems. Deferreds can only be fired once, making them similar to the familiar semantics of try/except statements.

2. Programming with plain callbacks can make refactoring tricky. With deferreds, we can refactor by adding links to the chain and moving code from one link to another.

def got_poem(poem):
    print poem
 
def poem_failed(err):
    print >>sys.stderr, 'poem download failed'
    print >>sys.stderr, 'I am terribly sorry'
    print >>sys.stderr, 'try again later?'
 
def poem_done(_):
    from twisted.internet import reactor
    reactor.stop()
 
d = Deferred()
 
d.addCallbacks(got_poem, poem_failed)
d.addBoth(poem_done)

Part 8: Deferred Poetry

Client 4.0

Now that we have know something about deferreds, we can rewrite our Twisted poetry client to use them.

def get_poetry(host, port):
    """
    Download a poem from the given host and port. This function
    returns a Deferred which will be fired with the complete text of
    the poem or a Failure if the poem could not be downloaded.
    """
    d = defer.Deferred()
    from twisted.internet import reactor
    factory = PoetryClientFactory(d)
    reactor.connectTCP(host, port, factory)
    return d

class PoetryClientFactory(ClientFactory):
 
    protocol = PoetryProtocol
 
    def __init__(self, deferred):
        self.deferred = deferred
 
    def poem_finished(self, poem):
        if self.deferred is not None:
            d, self.deferred = self.deferred, None
            d.callback(poem)
 
    def clientConnectionFailed(self, connector, reason):
        if self.deferred is not None:
            d, self.deferred = self.deferred, None
            d.errback(reason)
            
def poetry_main():
    addresses = parse_args()
    from twisted.internet import reactor
    poems = []
    errors = []
 
    def got_poem(poem):
        poems.append(poem)
 
    def poem_failed(err):
        print >>sys.stderr, 'Poem failed:', err
        errors.append(err)
 
    def poem_done(_):
        if len(poems) + len(errors) == len(addresses):
            reactor.stop()
 
    for address in addresses:
        host, port = address
        d = get_poetry(host, port)
        d.addCallbacks(got_poem, poem_failed)
        d.addBoth(poem_done)
 
    reactor.run()
可以对比一下4.0和3.1的代码, 可以看出deferred就是callback的简单封装, 没有更多的复杂的机制.
When You’re Using Deferreds, You’re Still Using Callbacks, and They’re Still Invoked by the Reactor
 
这样的好处,
1. PoetryClientFactory的代码更为通用, 只需要deferred一个参数, 哪怕需要多个callback函数
2. 更为灵活, Deferred中的callback或errback可以随机添加, 只要在reactor.run()之前
3. def get_poetry(host, port)函数回到了同步系统的模式, 只需要逻辑相关的参数, 不同的仅仅是返回值是deferred对象, 而非真正的结果
A Deferred object represents an “asynchronous result” or a “result that has not yet come”.
image 
 

关于Deferred, 对于刚刚接触异步系统的人, 挺难理解的, 其原因作者讲的比较清楚, 我个人也比较赞同

I think this mistake is caused by trying to learn Twisted without first learning the asynchronous model. Since typical Twisted code uses lots of deferreds and only occasionally refers to the reactor, it can appear that deferreds are doing all the work.

Deferreds are a useful abstraction, but we wrote several versions of our Twisted client without using them in any way.

其实Deferred只是callback的抽象, 并不是必须的, reactor才是异步系统的核心...

Because the behavior of deferreds is well-defined and well-known (to folks with some experience programming with Twisted), by returning deferreds from your own APIs you are making it easier for other Twisted programmers to understand and use your code.

 

Part 9: A Second Interlude, Deferred

前面给出的callback情况比较简单, 实际开发中数据处理的流程会比较复杂...

类似工作流, 有若干步, 对于每一步还要考虑error handling……

这样就会比较复杂, Deferred封装了这个过程, 个人觉得这是Deferred的最大的意义.

 

Let’s summarize what we know about the deferred firing pattern:

  1. A deferred contains a chain of ordered callback/errback pairs (stages). The pairs are in the order they were added to the deferred.
  2. Stage 0, the first callback/errback pair, is invoked when the deferred is fired. If the deferred is fired with the callback method, then the stage 0 callback is called. If the deferred is fired with the errback method, then the stage 0 errback is called.
  3. If stage N fails, then the stage N+1 errback is called with the exception (wrapped in a Failure) as the first argument.
  4. If stage N succeeds, then the stage N+1 callback is called with the stage N return value as the first argument.

This pattern is illustrated in Figure 17:

image

Figure 17: control flow in a deferred

The green lines indicate what happens when a callback or errback succeeds and the red lines are for failures. The lines show both the flow of control and the flow of exceptions and return values down the chain.

 

Part 10: Poetry Transformed

根据Part9, 给出一个例子...原来文章中这部分例子个人觉得写的不太好, 我改了一下

从服务器上get_poetry, 然后做cummingsify处理, 存到poems集合中, 所有poem处理完后, stop reactor.

先给出synchronous的版本, 便于大家理解这个例子, 后面再试着将其转化为asynchronous callback版本

try:
    poem = get_poetry(host, port)
except:
    print 'First errback, pass through'
else:
    try:
        print 'First callback, cummingsify'
        poem = engine.cummingsify(poem)
    except GibberishError:
        print 'Second errback, cummingsify_failed, use original poem'
        print 'Second callback, got_poem'
        print poem
        poems.append(poem)
    except:
        print 'Third errback, poem_failed'
        errors.append(err)
    else:
        print 'Second callback, got_poem'
        print poem
        poems.append(poem)

print 'Third callback, poem_done'        
if len(poems) + len(errors) == len(addresses):
    reactor.stop()
sys.exit()

现在来进行转化, 转化的过程就是把poem = get_poetry(host, port) 之后的所有操作都封装成callback和errback, 并加到deferred中去, 这些callback和errback调用顺序和交互由deferred去负责, 前提是你addcallback时的顺序和逻辑正确.

def poetry_main():
    addresses = parse_args()
    from twisted.internet import reactor
    poems = []
    errors = []
    
    def cummingsify(poem):
        print 'First callback, cummingsify'
        poem = engine.cummingsify(poem)
        return poem
    
    def cummingsify_failed(err):
        if err.check(GibberishError):
            print 'Second errback, cummingsify_failed, use original poem'
            return err.value.args[0] #return original poem
        return err
 
    def got_poem(poem):
        print 'Second callback, got_poem'
        print poem
        poems.append(poem)
 
    def poem_failed(err):
        print 'Third errback, poem_failed'
        errors.append(err)
 
    def poem_done(_):
        print 'Third callback, poem_done'        
        if len(poems) + len(errors) == len(addresses):
            reactor.stop()
        sys.exit()
 
    for address in addresses:
        host, port = address
        d = get_poetry(host, port)
        d.addCallback(cummingsify)
        d.addErrback(cummingsify_failed)
        d.addCallbacks(got_poem, poem_failed)
        d.addBoth(poem_done)

上面就是相应的asynchronous callback版本, 由于例子中第一个errback没有做处理, 所以就pass-through了, 也可以加上相应的处理.

下图就是各个callback和errback在deferred中的调用关系, 从上面的代码你可以发现, 通过不同的add函数可以搭配出不同的调用关系.

And each deferred we create has the structure pictured in following Figure:

image

posted on 2011-09-07 10:02  fxjwind  阅读(390)  评论(0编辑  收藏  举报