Flink四种Sink


Sink有下沉的意思,在Flink中所谓的Sink其实可以表示为将数据存储起来的意思,也可以将范围扩大,表示将处理完的数据发送到指定的存储系统的输出操作.
之前我们一直在使用的print方法其实就是一种Sink

public DataStreamSink<T> print(String sinkIdentifier) { PrintSinkFunction<T> printFunction = new PrintSinkFunction<>(sinkIdentifier, false); return addSink(printFunction).name("Print to Std. Out"); }

Flink内置了一些Sink, 除此之外的Sink需要用户自定义!

本次测试使用的Flink版本为1.12

KafkaSink

1)添加kafka依赖

<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-kafka_2.11</artifactId> <version>1.11.2</version> </dependency> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>1.2.75</version> </dependency>

2)启动Kafka集群
kafka群起脚本链接:
https://www.cnblogs.com/traveller-hzq/p/14487977.html
3)Sink到Kafka的实例代码

import org.apache.flink.api.common.serialization.SimpleStringSchema; import org.apache.flink.streaming.api.datastream.DataStreamSource; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer; import java.util.Properties; /** * TODO * * @author hzq * @version 1.0 * @date 2021/3/5 11:08 */ public class Flink01_KafkaSink { public static void main(String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1); DataStreamSource<String> inputDS = env.socketTextStream("localhost", 9999); // TODO Sink - kafka Properties properties = new Properties(); properties.setProperty("bootstrap.servers", "hadoop1:9092,hadoop2:9092,hadoop3:9092"); FlinkKafkaProducer<String> kafkaSink = new FlinkKafkaProducer<>( "flink0923", new SimpleStringSchema(), properties ); inputDS.addSink(kafkaSink); env.execute(); } }

4.使用 nc -lk 9999命令输入数据
5.在linux启动一个消费者, 查看是否收到数据
bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic topic_sensor

RedisSink

1)添加Redis连接依赖

<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-redis --> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-redis_2.11</artifactId> <version>1.1.5</version> </dependency>

2)启动Redis服务器

./redis-server /etc/redis/6379.conf

3)Sink到Redis的示例代码

import org.apache.flink.api.common.serialization.SimpleStringSchema; import org.apache.flink.streaming.api.datastream.DataStreamSource; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer; import org.apache.flink.streaming.connectors.redis.RedisSink; import org.apache.flink.streaming.connectors.redis.common.config.FlinkJedisPoolConfig; import org.apache.flink.streaming.connectors.redis.common.mapper.RedisCommand; import org.apache.flink.streaming.connectors.redis.common.mapper.RedisCommandDescription; import org.apache.flink.streaming.connectors.redis.common.mapper.RedisMapper; import java.util.Properties; /** * TODO * * @author hzq * @version 1.0 * @date 2021/3/5 11:08 */ public class Flink02_RedisSink { public static void main(String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1); DataStreamSource<String> inputDS = env.socketTextStream("localhost", 9999); // TODO Sink - Redis FlinkJedisPoolConfig flinkJedisPoolConfig = new FlinkJedisPoolConfig.Builder() .setHost("hadoop102") .setPort(6379) .build(); RedisSink<String> redisSink = new RedisSink<>( flinkJedisPoolConfig, new RedisMapper<String>() { @Override public RedisCommandDescription getCommandDescription() { // 第一个参数:redis命令的封装 // 第二个参数:redis 最外层的 key return new RedisCommandDescription(RedisCommand.HSET, "flink0923"); } /* 从数据里提取key,如果是 Hash结构,那么key就是hash的key */ @Override public String getKeyFromData(String data) { return data.split(",")[1]; } // 从数据里提取value,如果是 hash结构,那么 value就是hash的value @Override public String getValueFromData(String data) { return data.split(",")[2]; } } ); inputDS.addSink(redisSink); env.execute(); } }

Redis查看是否收到数据

redis-cli --raw

注意:
发送了5条数据, redis中只有2条数据. 原因是hash的field的重复了, 后面的会把前面的覆盖掉

ElasticsearchSink

1)添加ES依赖

<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-elasticsearch6 --> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch6_2.11</artifactId> <version>1.12.0</version> </dependency>

2)启动ES集群
3)Sink到ES实例代码

import org.apache.flink.api.common.functions.RuntimeContext; import org.apache.flink.streaming.api.datastream.DataStreamSource; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction; import org.apache.flink.streaming.connectors.elasticsearch.RequestIndexer; import org.apache.flink.streaming.connectors.elasticsearch6.ElasticsearchSink; import org.apache.flink.streaming.connectors.redis.RedisSink; import org.apache.flink.streaming.connectors.redis.common.config.FlinkJedisPoolConfig; import org.apache.flink.streaming.connectors.redis.common.mapper.RedisCommand; import org.apache.flink.streaming.connectors.redis.common.mapper.RedisCommandDescription; import org.apache.flink.streaming.connectors.redis.common.mapper.RedisMapper; import org.apache.http.HttpHost; import org.elasticsearch.action.index.IndexRequest; import org.elasticsearch.client.Requests; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; /** * TODO * * @author hzq * @version 1.0 * @date 2021/3/5 11:08 */ public class Flink03_EsSink { public static void main(String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1); DataStreamSource<String> inputDS = env.socketTextStream("hadoop102", 9999); // TODO Sink - ElasticSearch List<HttpHost> httpHosts = new ArrayList<>(); httpHosts.add(new HttpHost("hadoop102", 9200)); httpHosts.add(new HttpHost("hadoop103", 9200)); httpHosts.add(new HttpHost("hadoop104", 9200)); ElasticsearchSink.Builder<String> esSinkBuilder = new ElasticsearchSink.Builder<>( httpHosts, new ElasticsearchSinkFunction<String>() { @Override public void process(String element, RuntimeContext ctx, RequestIndexer indexer) { Map<String, String> dataMap = new HashMap<>(); dataMap.put("data", element); // ESAPI的写法 IndexRequest indexRequest = Requests.indexRequest("flink0923").type("dasfgdasf").source(dataMap); indexer.add(indexRequest); } } ); // TODO 为了演示,bulk设为1,生产环境不要这么设置 esSinkBuilder.setBulkFlushMaxActions(1); inputDS.addSink(esSinkBuilder.build()); env.execute(); } } /* ES 5.x : index -》 库, type -》 表 ES 6.x : 每个 index 只能有 一个 type,所以可以认为 index是一个 表 ES 7.x : 移除了 Type url查看index: 查看 index列表:http://hadoop102:9200/_cat/indices?v 查看 index内容:http://hadoop102:9200/flink0923/_search */

Elasticsearch查看是否收到数据

注意
 如果出现如下错误:

添加log4j2的依赖:

<dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-to-slf4j</artifactId> <version>2.14.0</version> </dependency>

 如果是无界流, 需要配置bulk的缓存

esSinkBuilder.setBulkFlushMaxActions(1);

自定义Sink

如果Flink没有提供给我们可以直接使用的连接器,那我们如果想将数据存储到我们自己的存储设备中,怎么办?
我们自定义一个到Mysql的Sink
1)在mysql中创建数据库和表

create database test; use test; CREATE TABLE `sensor` ( `id` varchar(20) NOT NULL, `ts` bigint(20) NOT NULL, `vc` int(11) NOT NULL, PRIMARY KEY (`id`,`ts`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

2)导入Mysql驱动

<dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.49</version> </dependency>

3)写入到Mysql的自定义Sink实例代码

import com.atguigu.chapter05.WaterSensor; import org.apache.flink.api.common.functions.MapFunction; import org.apache.flink.api.common.functions.RuntimeContext; import org.apache.flink.configuration.Configuration; import org.apache.flink.streaming.api.datastream.DataStreamSource; import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.api.functions.sink.RichSinkFunction; import org.apache.flink.streaming.api.functions.sink.SinkFunction; import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction; import org.apache.flink.streaming.connectors.elasticsearch.RequestIndexer; import org.apache.flink.streaming.connectors.elasticsearch6.ElasticsearchSink; import org.apache.http.HttpHost; import org.elasticsearch.action.index.IndexRequest; import org.elasticsearch.client.Requests; import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; /** * TODO * * @author hzq * @version 1.0 * @date 2021/3/5 11:08 */ public class Flink04_MySink { public static void main(String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1); DataStreamSource<String> inputDS = env.socketTextStream("hadoop102", 9999); SingleOutputStreamOperator<WaterSensor> sensorDS = inputDS.map(new MapFunction<String, WaterSensor>() { @Override public WaterSensor map(String value) throws Exception { // 切分 String[] line = value.split(","); return new WaterSensor(line[0], Long.valueOf(line[1]), Integer.valueOf(line[2])); } }); // TODO Sink - 自定义:MySQL sensorDS.addSink(new MySinkFunction()); env.execute(); } public static class MySinkFunction extends RichSinkFunction<WaterSensor> { Connection conn; PreparedStatement pstmt; @Override public void open(Configuration parameters) throws Exception { conn = DriverManager.getConnection("jdbc:mysql://hadoop102:3306/test", "root", "000000"); pstmt = conn.prepareStatement("insert into sensor values (?,?,?)"); } @Override public void close() throws Exception { if (pstmt != null) { pstmt.close(); } if (conn != null){ conn.close(); } } @Override public void invoke(WaterSensor value, Context context) throws Exception { pstmt.setString(1, value.getId()); pstmt.setLong(2, value.getTs()); pstmt.setInt(3, value.getVc()); pstmt.execute(); } } } /* */

使用nc命令输入命令进行测试


__EOF__

本文作者Later
本文链接https://www.cnblogs.com/traveller-hzq/p/14488022.html
关于博主:评论和私信会在第一时间回复。或者直接私信我。
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
声援博主:如果您觉得文章对您有帮助,可以点击文章右下角推荐一下。您的鼓励是博主的最大动力!
posted @   Later^^  阅读(3096)  评论(0编辑  收藏  举报
编辑推荐:
· AI与.NET技术实操系列:基于图像分类模型对图像进行分类
· go语言实现终端里的倒计时
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
阅读排行:
· 分享一个免费、快速、无限量使用的满血 DeepSeek R1 模型,支持深度思考和联网搜索!
· 25岁的心里话
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· ollama系列01:轻松3步本地部署deepseek,普通电脑可用
· 按钮权限的设计及实现
点击右上角即可分享
微信分享提示