One-Way
爱一人,攀一山,追一梦

官网地址:http://storm.apache.org/index.html

Why use Storm?
Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!
Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
Storm integrates with the queueing and database technologies you already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed. Read more in the tutorial.

 

概念

The concepts discussed are:

  1. Topologies
  2. Streams
  3. Spouts
  4. Bolts
  5. Stream groupings
  6. Reliability
  7. Tasks
  8. Workers

官网说明:http://storm.apache.org/releases/0.9.6/Concepts.html

 

Topologies 

拓扑结构:实时应用程序的逻辑被打包到storm拓扑里面了。一个拓扑就是由 spouts 和 bolts 作为节点,通过 stream groupings 链接起来的图。

 

Stream

数据流,由 tuples 组成的数据流,默认包含integers, longs, shorts, bytes, strings, doubles, floats, booleans, and byte arrays.也可以自定义

每个stream声明都有个ID, 默认ID是'default'

 

Spouts

水龙头,拓扑的数据源,通常spouts从外部源读取tuples并且emit发射到拓扑流程中。spouts可以是可靠的也可以是非可靠的,

 

Bolts

水阀,所有的处理都是在bolts里完成的

 

Stream groupings

在bolts的任务中定义了stream被分流的规则。包含多种grouping

1.Shuffle grouping : 随机分发

2.Fields grouping : 指定fields

3.All grouping : 分发到全部bolts中

4.Global grouping : 分发到id最小的task中

5.None grouping :   

6.Direct grouping :

7.Local or Shuffle grouping :

 

Reliability

fail ack 方法

 

Tasks

执行线程

 

Workers

一个worker就是一个jvm,里面会执行很多tasks,

 

Nimbus(Master)
  集群管理,接收Jar包,调度topology
Supervisor(Slave)
  启动worker
Worker
  一个JVM进程资源分配的单位
Executor
  实际干活的线程
ZooKeeper
  存储状态信息,调度信息,心跳
  Supervisor向ZK写信息,Nimbus通过ZK获取这些信息,这样nimbus就知道各个slaves的状态信息。

 

集群安装

官网说明:http://storm.apache.org/releases/0.9.6/Setting-up-a-Storm-cluster.html

 

API

官网说明:http://storm.apache.org/releases/0.9.6/javadocs/index.html

 

posted on 2016-08-15 15:45  单行道|  阅读(304)  评论(0编辑  收藏  举报