Storm 多语言支持
Using non JVM languages with Storm
https://github.com/nathanmarz/storm/wiki/Using-non-JVM-languages-with-Storm
Multilang protocol
https://github.com/nathanmarz/storm/wiki/Multilang-protocol
Using non JVM languages with Storm
对于JVM语言比较简单, 直接提高DSL封装Java即可
对于非JVM语言就稍微复杂一些, Storm分为两部分, topology和component(blot和spout)
对于topology用其他语言实现比较easy, 因为nimbus是thrift server, 所以什么语言最终都是都是转化为thrift结构. 而且其实topology本身逻辑就比较简单, 直接用java写也行, 没有太多的必要一定要使用其他的语言
对于component, 采用的方案和Hadoop的一样, 使用shell process来执行component, 并使用stdin, stdout作为component之间的通信 (json messages over stdin/stdout)
通信就涉及通信协议, 即每个component怎样产生别的component可以理解json message, storm的通信协议比较简单, 参考Multilang protocol
当前storm, 实现python, ruby, 和fancy的版本, 如果需要支持其他的语言, 自己实现一下这个协议也应该很容易.
其实component支持多语言比较必要, 因为很多分析或统计模块, 不一定是使用java, 如果porting比较麻烦, 不象topology那么简单.
two pieces: creating topologies and implementing spouts and bolts in other languages
- creating topologies in another language is easy since topologies are just thrift structures (link to storm.thrift)
- implementing spouts and bolts in another language is called a "multilang components" or "shelling"
- Here's a specification of the protocol: Multilang protocol
- the thrift structure lets you define multilang components explicitly as a program and a script (e.g., python and the file implementing your bolt)
- In Java, you override ShellBolt or ShellSpout to create multilang components
- note that output fields declarations happens in the thrift structure, so in Java you create multilang components like the following:
- declare fields in java, processing code in the other language by specifying it in constructor of shellbolt
- note that output fields declarations happens in the thrift structure, so in Java you create multilang components like the following:
- multilang uses json messages over stdin/stdout to communicate with the subprocess
- storm comes with ruby, python, and fancy adapters that implement the protocol. show an example of python
- python supports emitting, anchoring, acking, and logging
- "storm shell" command makes constructing jar and uploading to nimbus easy
- makes jar and uploads it
- calls your program with host/port of nimbus and the jarfile id
Bolt可以使用任何语言来定义. 用其它语言定义的bolt会被当作子进程(subprocess)来执行, storm使用JSON消息通过stdin/stdout来和这些subprocess通信.
这个通信协议是一个只有100行的库, storm团队给这些库开发了对应的Ruby, Python和Fancy版本.
Python版本的Bolt的定义, 和java版不同的是继承ShellBolt类
public static class SplitSentence extends ShellBolt implements IRichBolt { public SplitSentence() { super("python", "splitsentence.py"); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } }
import storm class SplitSentenceBolt(storm.BasicBolt): def process(self, tup): words = tup.values[0].split(" ") for word in words: storm.emit([word]) SplitSentenceBolt().run()
上面是使用python component的例子, 首先继承ShellBolt, 表示输入输出是通过shell stdin/stdout来完成的
然后, 下面直接将python splitsentence.py作为子进程来调用
在python中, 首先import storm, 其中封装了通信协议, 很简单的100行, 可以看看
DSLs and multilang adapters
https://github.com/nathanmarz/storm/wiki/DSLs-and-multilang-adapters
- Scala DSL
- JRuby DSL
- Clojure DSL
- Storm/Esper integration: Streaming SQL on top of Storm
- io-storm: Perl multilang adapter
- storm-php: PHP multilang adapter
前面说了, 对于JVM的语言, 很简单只是封装一下java, 然后提供DSL即可, 上面列出所有官方提供的DSL
可以简单以Clojure为例子, 了解一下
Clojure DSL
Storm comes with a Clojure DSL for defining spouts, bolts, and topologies. The Clojure DSL has access to everything the Java API exposes, so if you're a Clojure user you can code Storm topologies without touching Java at all.
https://github.com/nathanmarz/storm/wiki/Clojure-DSL
Defining a non-JVM language dsl for storm
https://github.com/nathanmarz/storm/wiki/Defining-a-non-jvm-language-dsl-for-storm
对于non-JVM语言, 通过storm shell命令也可以实现类似dsl
There's a "storm shell" command that will help with submitting a topology. Its usage is like this:
storm shell resources/ python topology.py arg1 arg2
storm shell will then package resources/ into a jar, upload the jar to Nimbus, and call your topology.py script like this:
python topology.py arg1 arg2 {nimbus-host} {nimbus-port} {uploaded-jar-location}
Then you can connect to Nimbus using the Thrift API and submit the topology, passing {uploaded-jar-location} into the submitTopology method. For reference, here's the submitTopology definition:
void submitTopology(1: string name, 2: string uploadedJarLocation, 3: string jsonConf, 4: StormTopology topology) throws (1: AlreadyAliveException e, 2: InvalidTopologyException ite);