SparkStreaming in Java

参考地址:Spark Streaming Programming Guide

1.新建Maven项目,POM引入依赖

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.13</artifactId>
            <version>3.5.0</version>
        </dependency>

2.项目添加Scala依赖库

image

3.在资源目录添加日志配置文件log4j.properties

log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

4.代码

package cn.coreqi;

import org.apache.spark.SparkConf;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.*;
import scala.Tuple2;

import java.util.Arrays;

public class Main {
    public static void main(String[] args) throws InterruptedException {
        // 创建SparkConf对象
        SparkConf sparkConf = new SparkConf()
                .setMaster("local[*]")
                .setAppName("sparkSql");

        // 第一个参数表示环境配置,第二个参数表示批量处理的周期(采集周期)
        JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, Durations.seconds(3));

        // 1.从端口获取数据
        JavaReceiverInputDStream<String> lines = ssc.socketTextStream("localhost", 9999);
        // 处理数据
        JavaDStream<String> words = lines.flatMap(l -> Arrays.asList(l.split(" ")).iterator());
        JavaPairDStream<String, Integer> pairs = words.mapToPair(s -> new Tuple2<>(s, 1));
        JavaPairDStream<String, Integer> wordCounts = pairs.reduceByKey((i1, i2) -> i1 + i2);

        // 打印结果
        wordCounts.print();

        // 由于SparkStreaming采集器是长期执行的任务,所以不能直接关闭
        // 如果main方法执行完毕,应用程序也会自动结束,所以不能让main执行完毕
        ssc.start();              // 启动采集器
        
        ssc.awaitTermination();   // 等待采集器的关闭
    }
}

5.测试

安装netcat https://eternallybored.org/misc/netcat/

nc -lp 9999
posted @ 2024-01-15 19:37  SpringCore  阅读(8)  评论(0编辑  收藏  举报