Storm 多语言支持

Using non JVM languages with Storm

https://github.com/nathanmarz/storm/wiki/Using-non-JVM-languages-with-Storm

Multilang protocol

https://github.com/nathanmarz/storm/wiki/Multilang-protocol

 

Using non JVM languages with Storm

对于JVM语言比较简单, 直接提高DSL封装Java即可
对于非JVM语言就稍微复杂一些, Storm分为两部分, topology和component(blot和spout)

对于topology用其他语言实现比较easy, 因为nimbus是thrift server, 所以什么语言最终都是都是转化为thrift结构. 而且其实topology本身逻辑就比较简单, 直接用java写也行, 没有太多的必要一定要使用其他的语言

对于component, 采用的方案和Hadoop的一样, 使用shell process来执行component, 并使用stdin, stdout作为component之间的通信 (json messages over stdin/stdout)
通信就涉及通信协议, 即每个component怎样产生别的component可以理解json message, storm的通信协议比较简单, 参考
Multilang protocol
当前storm, 实现python, ruby, 和fancy的版本, 如果需要支持其他的语言, 自己实现一下这个协议也应该很容易.
其实component支持多语言比较必要, 因为很多分析或统计模块, 不一定是使用java, 如果porting比较麻烦, 不象topology那么简单.

two pieces: creating topologies and implementing spouts and bolts in other languages

  • creating topologies in another language is easy since topologies are just thrift structures (link to storm.thrift)
  • implementing spouts and bolts in another language is called a "multilang components" or "shelling"
    • Here's a specification of the protocol: Multilang protocol
    • the thrift structure lets you define multilang components explicitly as a program and a script (e.g., python and the file implementing your bolt)
    • In Java, you override ShellBolt or ShellSpout to create multilang components
      • note that output fields declarations happens in the thrift structure, so in Java you create multilang components like the following:
        • declare fields in java, processing code in the other language by specifying it in constructor of shellbolt
    • multilang uses json messages over stdin/stdout to communicate with the subprocess
    • storm comes with ruby, python, and fancy adapters that implement the protocol. show an example of python
      • python supports emitting, anchoring, acking, and logging
  • "storm shell" command makes constructing jar and uploading to nimbus easy
    • makes jar and uploads it
    • calls your program with host/port of nimbus and the jarfile id

 

Bolt可以使用任何语言来定义. 用其它语言定义的bolt会被当作子进程(subprocess)来执行, storm使用JSON消息通过stdin/stdout来和这些subprocess通信.
这个通信协议是一个只有100行的库, storm团队给这些库开发了对应的Ruby, Python和Fancy版本.

Python版本的Bolt的定义, 和java版不同的是继承ShellBolt类

public static class SplitSentence extends ShellBolt implements IRichBolt {
    public SplitSentence() {
        super("python", "splitsentence.py");
    }
 
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word"));
    }
}
下面是splitsentence.py的定义:
import storm
class SplitSentenceBolt(storm.BasicBolt):
    def process(self, tup):
        words = tup.values[0].split(" ")
        for word in words:
          storm.emit([word])
SplitSentenceBolt().run()

上面是使用python component的例子, 首先继承ShellBolt, 表示输入输出是通过shell stdin/stdout来完成的
然后, 下面直接将python splitsentence.py作为子进程来调用

在python中, 首先import storm, 其中封装了通信协议, 很简单的100行, 可以看看

 

 

DSLs and multilang adapters

https://github.com/nathanmarz/storm/wiki/DSLs-and-multilang-adapters

前面说了, 对于JVM的语言, 很简单只是封装一下java, 然后提供DSL即可, 上面列出所有官方提供的DSL
可以简单以Clojure为例子, 了解一下

Clojure DSL

Storm comes with a Clojure DSL for defining spouts, bolts, and topologies. The Clojure DSL has access to everything the Java API exposes, so if you're a Clojure user you can code Storm topologies without touching Java at all.

https://github.com/nathanmarz/storm/wiki/Clojure-DSL

 

Defining a non-JVM language dsl for storm

https://github.com/nathanmarz/storm/wiki/Defining-a-non-jvm-language-dsl-for-storm

对于non-JVM语言, 通过storm shell命令也可以实现类似dsl

There's a "storm shell" command that will help with submitting a topology. Its usage is like this:

storm shell resources/ python topology.py arg1 arg2

storm shell will then package resources/ into a jar, upload the jar to Nimbus, and call your topology.py script like this:

python topology.py arg1 arg2 {nimbus-host} {nimbus-port} {uploaded-jar-location}

Then you can connect to Nimbus using the Thrift API and submit the topology, passing {uploaded-jar-location} into the submitTopology method. For reference, here's the submitTopology definition:

void submitTopology(1: string name, 2: string uploadedJarLocation, 3: string jsonConf, 4: StormTopology topology) throws (1: AlreadyAliveException e, 2: InvalidTopologyException ite);

posted on 2013-05-10 17:50  fxjwind  阅读(2833)  评论(0编辑  收藏  举报