storm相关技术

There are two kinds of nodes on a Storm cluster: the master node and the worker nodes.

有两种节点,主节点和worker节点

主节点,Nimbus

Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures.

worker节点,Supervisor:

The supervisor listens for work assigned to its machine and starts and stops worker processes as necessary based on what Nimbus has assigned to it. Each worker process executes a subset of a topology

topology,拓扑结构(计算逻辑关系):

a running topology consists of many worker processes spread across many machines.

A topology is a graph of computation. Each node in a topology contains processing logic, and links between nodes indicate how data should be passed around between nodes.

Zookeeper:(协调系统)

All coordination between Nimbus and the Supervisors is done through a Zookeeper cluster.

Additionally, the Nimbus daemon and Supervisor daemons are fail-fast and stateless; all state is kept in Zookeeper or on local disk.

(ZooKeeper是Hadoop的正式子项目,它是一个针对大型分布式系统的可靠协调系统,提供的功能包括:配置维护、名字服务、分布式同步、组服务等。ZooKeeper的目标就是封装好复杂易出错的关键服务,将简单易用的接口和性能高效、功能稳定的系统提供给用户)

Thrift :

Since topology definitions are just Thrift structs, and Nimbus is a Thrift service, you can create and submit topologies using any programming language.

(thrift是一个软件框架,用来进行可扩展且跨语言的服务的开发。它结合了功能强大的软件堆栈和代码生成引擎,以构建在 C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, and OCaml 这些编程语言间无缝结合的、高效的服务。)

 

Streams:

The core abstraction in Storm is the “stream”. A stream is an unbounded sequence of tuples.

Storm provides the primitives for transforming a stream into a new stream in a distributed and reliable way.

spouts & bolts 

The basic primitives Storm provides for doing stream transformations are “spouts” and “bolts”. Spouts and bolts have interfaces that you implement to run your application-specific logic.

A spout is a source of streams (数据流的源头)

A bolt consumes any number of input streams,does some processing, and possibly emits (发送出)new streams. 

Bolts can do anything from run functions, filter tuples, do streaming aggregations, do streaming joins, talk to databases, and more.

(加工消耗传给他的数据流,然后发出或传给下一个bolt)

spouts 和 bolts 组成的网络,就构成了一个topology,这是提交给storm执行的高层次抽象。

 A topology is a graph of stream transformations where each node is a spout or bolt.

When a spout or bolt emits a tuple to a stream, it sends the tuple to every bolt that subscribed to that stream.

 

数据模型 

Data model

 (http://storm.apache.org/documentation/Tutorial.html)

posted @ 2014-12-22 10:25  Aveen  阅读(130)  评论(0编辑  收藏  举报
Top