2、Storm中的一些概念理解
1、Tuple,Value,Field
Tuple官方解释:
“A tuple is a named of values where each value can be any type.”
tuple是一个类似于列表的东西,存储的每个元素叫做field(字段),可以是任何类型。
Storm使用tuple作为它的数据模型, 每个tuple是一堆值,每个值都有一个名字,
一个Tuple代表数据流中的一个基本处理单元,
例如:一条cookie日志,它可以包含多个Field, 每个Field表示一个属性。
Tuple本应该是一个Key-Value的Map, 由于各个组件之间的传递的tuple字段名称已经实现预定好了,
所以Tuple只需要按序填入各个Value,所以就是一个Value List。
一个没有边界、源源不断的Tuple序列就组成了Stream。
topology里面的每个节点,必须定义它要发射的Tuple的每个字段
例如下面这个bolt定义它所发射的tuple包含两个字段,类型分别为double,triple。
declareOutputFields方法定义要输出的字段 : [“double”, “triple”]。
public class DoubleAndTripleBolt implements IRichBolt {
private OutputCollector _collector;
@Override
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector;
}
@Override
public voide xecute(Tuple input) {
int val = input.getInteger(0);
_collector.emit(input,new Values(val*2, val*3));
_collector.ack(input);
}
@Override
public void cleanup() {
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("double","triple"));
}
}
declareOutputFields方法定义要输出的字段 : [“sentence”]。
public class RandomSentenceSpout extends BaseRichSpout {
//用来收集Spout输出的tuple
private SpoutOutputCollector collector;
private Random random;
//该方法调用一次,主要由storm框架传入SpoutOutputCollector
@Override
public void open(Map map, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) {
this.collector = collector;
random = new Random();
//连接kafka mysql ,打开本地文件
}
/**
* 上帝之手
* while(true)
* spout.nextTuple()
*/
@Override
public void nextTuple() {
String[] sentences = new String[]{
"the cow jumped over the moon","the dog jumped over the moon",
"the pig jumped over the gun","the fish jumped over the moon","the duck jumped over the moon",
"the man jumped over the sun","the girl jumped over the sun","the boy jumped over the sun"
};
String sentence = sentences[random.nextInt(sentences.length)];
collector.emit(new Values(sentence));
System.out.println("RandomSentenceSpout 发送数据:"+sentence);
}
//消息源可以发射多条消息流stream
@Override
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
outputFieldsDeclarer.declare(new Fields("sentence"));
}
}