寒假学习第十四天
配置Flume
通过以下的配置文件可以将数据发送到Avro Sink。
agent.sinks = avroSink
agent.sinks.avroSink.type = avro
agent.sinks.avroSink.channel = memoryChannel
agent.sinks.avroSink.hostname = <所选机器的IP>
agent.sinks.avroSink.port = <所选机器的端口>
3、配置Spark Streaming应用程序
A.添加依赖
<dependency>
<groupId>org.apache.spark </groupId>
<artifactId>spark-streaming-flume_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
B.在Streaming应用程序的代码中,导入一个FlumeUtils类并创建input DStream。
import org.apache.spark.streaming.flume._
val flumeStream = FlumeUtils.createStream(streamingContext, [所选机器ip], [所选机器端口])
4、测试
A.直接运行代码
package com.ruozedata.streaming
import org.apache.spark.SparkConf
import org.apache.spark.streaming.flume.FlumeUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
object FlumePushApp {
def main(args: Array[String]) {
val Array(hostname, port) = args
val sparkConf = new SparkConf()
.setAppName("FlumePushApp")
.setMaster("local[2]")
val ssc = new StreamingContext(sparkConf, Seconds(10))
val lines = FlumeUtils.createStream(ssc,hostname,port.toInt)
/*由于createStream返回的DStream类型为SparkFlumeEvent,而不是String,故此时split方法无法使用
*为了能够使用split,我们执行了以下的map操作
*/
lines.map(x => new String(x.event.getBody.array()).trim)
.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print()
ssc.start()
ssc.awaitTermination()
}
}