feature | strom (trident) | spark streaming | 说明 |
并行框架 | 基于DAG的任务并行计算引擎(task parallel continuous computational engine Using DAG) | 基于spark的数据并行计算引擎(data parallel general purpose batch processing engine) | |
数据处理模式 | (one at a time)一次处理一个事件(消息) trident: (Micro-batch)一次 处理多个事件 | (Micro-batch)一次 处理多个事件 | |
延时 | 小于一秒 trident(数秒) | 数秒) | |
容错 | 至少一次 trident:精确一次 | 精确一次 | |
源出处 | BackType and Twitter | UCB | |
实现语言 | Clojure | scala | |
API支持 | java、python、ruby等 | jscala、java、python | |
平台集成 | NA(基于zookeeper) | spark(所以可以统一(或共用)时事处理与历史数据的处理) | |
产品、支持 | Storm has been around for several years and has run in production at Twitter since 2011, as well as at many other companies | Meanwhile, Spark Streaming is a newer project; its only production deployment (that I am aware of) has been at Sharethrough since 2013. | |
计算理论框架 | Storm is the streaming solution in the Hortonworks Hadoop data platform | Spark Streaming is in both MapR's distribution and Cloudera's Enterprise data platform. Databricks | |
集群集成,部署方式 | 依赖zookeeper,standalone,messo | standalone,yarn,messo | |
google trend | |||
bug燃烧图 | https://issues.apache.org/jira/browse/STORM/ | https://issues.apache.org/jira/browse/SPARK/ | 可见spark问题解决比storm要及时得多 |
spark stream和storm之间的争论源远流长。。
refer:
Thanks for the article!
ReplyCould you please explain this point in a bit more detail? "But, it relies on transactions to update state, which is slower and often has to be implemented by the user."
If I want to write my output to a persistent store e.g. redis, then why would it be slower in Storm than in Spark Streaming?