### flume架构介绍
flume之所以这么神奇,是源于它自身的一个设计,这个设计就是agent,agent本身是一个Java进程,运行在日志收集节点—所谓日志收集节点就是服务器节点。
agent里面包含3个核心的组件:source—->channel—–>sink,类似生产者、仓库、消费者的架构。
source:source组件是专门用来收集数据的,可以处理各种类型、各种格式的日志数据,包括avro、thrift、exec、jms、spooling directory、netcat、sequence generator、syslog、http、legacy、自定义。
channel:source组件把数据收集来以后,临时存放在channel中,即channel组件在agent中是专门用来存放临时数据的——对采集到的数据进行简单的缓存,可以存放在memory、jdbc、file等等。
sink:sink组件是用于把数据发送到目的地的组件,目的地包括hdfs、logger、avro、thrift、ipc、file、null、Hbase、solr、自定义。



Agent文件的说明
定义Agent和组件的名字。
| # a1: 自定义agent的名字 |
| # 分别给 sources,channels,sinks取名 |
| a1.sources = r1 |
| a1.channels = c1 |
| a1.sinks = k1 |
定义组件的配置信息
| # r1获取的数据类型 |
| a1.sources.r1.type = netcat |
| # r1的IP地址 |
| a1.sources.r1.bind = localhost |
| # r1的端口 |
| a1.sources.r1.port = 44444 |
| |
| # c1暂存数据的位置为内存里面 |
| a1.channels.c1.type = memory |
| # 设置暂存空间的容量 |
| a1.channels.c1.capacity = 1000 |
| a1.channels.c1.transactionCapacity = 100 |
| |
| # k1输出的数据类型 |
| a1.sinks.k1.type = logger |
Agent通过组件的名字,将三个组件连接起来。
| # sources 连接 channels |
| a1.sources.r1.channels = c1 |
| # sinks 连接 channel(注意没有s) |
| a1.sinks.k1.channel = c1 |
NetCat Source:监听指定网络端口
只要应用程序向这个端口里面写数据,这个source组件就可以获取到信息。
logger Channel:memory
| # 写创建Agent的脚本 |
| gedit /opt/flume/flume1.8.0/conf/netcat.cof |
| a1.sources = r1 |
| a1.sinks = k1 |
| a1.channels = c1 |
| |
| |
| a1.sources.r1.type = netcat |
| a1.sources.r1.bind = 192.168.37.130 |
| a1.sources.r1.port = 44444 |
| |
| a1.channels.c1.type = memory |
| a1.channels.c1.capacity = 1000 |
| a1.channels.c1.transactionCapacity = 100 |
| |
| a1.sinks.k1.type = logger |
| |
| |
| a1.sources.r1.channels = c1 |
| a1.sinks.k1.channel = c1 |
| cd /opt/flume/flume1.8.0/bin |
| |
| # 启动flume输出控制台,打印获取到的数据。 |
| /opt/flume/flume1.8.0/bin/flume-ng agent -n a1 /opt/flume/flume1.8.0/conf/ -f /opt/flume/flume1.8.0/conf/netcat.cof -Dflume.root.logger=DEBUG,console |

使用Windows的Telnet工具向端口发送数据(Windows10 Telnet的开启方法
| # 打开CMD |
| telnet 192.168.37.130 44444 |
| 输入需要发送的数据... |


Sink:hdfs Channel:file
| # 创建保存临时数据的文件夹 |
| mkdir /opt/flume/data |
| mkdir /opt/flume/checkpoint |
| |
| # 创建Agent |
| gedit /opt/flume/flume1.8.0/conf/netcat.cof |
| # Name the components on this agent |
| a1.sources = r1 |
| a1.sinks = k1 |
| a1.channels = c1 |
| |
| # Describe/configure the source |
| a1.sources.r1.type = netcat |
| a1.sources.r1.bind = 192.168.37.130 |
| a1.sources.r1.port = 44444 |
| |
| # Describe the sink |
| a1.sinks.k1.type = hdfs |
| # 保存在hdfs上的路径 |
| a1.sinks.k1.hdfs.path = hdfs://slave2:9000/dataoutput |
| a1.sinks.k1.hdfs.writeFormat = Text |
| a1.sinks.k1.hdfs.fileType = DataStream |
| # 写入一个文件的间隔 |
| a1.sinks.k1.hdfs.rollInterval = 10 |
| a1.sinks.k1.hdfs.rollSize = 0 |
| a1.sinks.k1.hdfs.rollCount = 0 |
| # 生成的文件名前缀 |
| a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S |
| a1.sinks.k1.hdfs.useLocalTimeStamp = true |
| |
| # Use a channel which buffers events in file |
| a1.channels.c1.type = file |
| a1.channels.c1.checkpointDir = /opt/flume/checkpoint |
| a1.channels.c1.dataDirs = /opt/flume/data |
| |
| # Bind the source and sink to the channel |
| a1.sources.r1.channels = c1 |
| a1.sinks.k1.channel = c1 |
| # 启动Hadoop |
| start-all.sh |
| |
| # 开启flume Agent |
| /opt/flume/flume1.8.0/bin/flume-ng agent -n a1 /opt/flume/flume1.8.0/conf/ -f /opt/flume/flume1.8.0/conf/netcat.cof -Dflume.root.logger=DEBUG,console |
| # 打开CMD |
| telnet 192.168.37.130 44444 |
| 输入需要发送的数据... |
数据保存在hdfs上的 /dataoutput
Spooling Directory Source:监听指定目录
只要应用程序向这个指定的目录中添加新的文件,source组件就可以获取到该信息,并解析该文件的内容,然后写入到channle。写入完成后,标记该文件已完成或者删除该文件。
注意事项:1. 拷贝到spool目录下的文件不可以再打开编辑 。2. 不能将具有相同文件名字的文件拷贝到这个目录下
logger Channel:memory
| # 创建监听的文件夹 |
| mkdir /opt/flume/FlumeInputdata/ |
| |
| # 编辑Agent文件 |
| gedit /opt/flume/flume1.8.0/conf/netcat.cof |
| # Name the components on this agent |
| a1.sources = r1 |
| a1.sinks = k1 |
| a1.channels = c1 |
| |
| # Describe/configure the source |
| a1.sources.r1.type = spooldir |
| a1.sources.r1.spoolDir = /opt/flume/FlumeInputdata |
| a1.sources.r1.fileHeader = true |
| a1.sources.r1.interceptors = i1 |
| a1.sources.r1.interceptors.i1.type = timestamp |
| # 是否删除已经标记的文件never(从不),immediate(立即) |
| a1.sources.r1.deletePolicy = never |
| |
| # Describe the sink |
| a1.sinks.k1.type = logger |
| |
| # Use a channel which buffers events in memory |
| a1.channels.c1.type = memory |
| a1.channels.c1.capacity = 1000 |
| a1.channels.c1.transactionCapacity = 100 |
| |
| # Bind the source and sink to the channel |
| a1.sources.r1.channels = c1 |
| a1.sinks.k1.channel = c1 |
| # 开启flume Agent |
| /opt/flume/flume1.8.0/bin/flume-ng agent -n a1 /opt/flume/flume1.8.0/conf/ -f /opt/flume/flume1.8.0/conf/netcat.cof -Dflume.root.logger=DEBUG,console |
发送数到监听的文件夹
| vim /data1.txt |
| cp /data1.txt /opt/flume/flumeInputdata/ |
在控制台查看输出的结果。
hdfs Channel:file
| # 编辑Agent文件 |
| gedit /opt/flume/flume1.8.0/conf/netcat.cof |
| # Name the components on this agent |
| a1.sources = r1 |
| a1.sinks = k1 |
| a1.channels = c1 |
| |
| # Describe/configure the source |
| a1.sources.r1.type = spooldir |
| a1.sources.r1.spoolDir = /opt/flume/FlumeInputdata |
| a1.sources.r1.fileHeader = true |
| a1.sources.r1.interceptors = i1 |
| a1.sources.r1.interceptors.i1.type = timestamp |
| |
| # Describe the sink |
| # Describe the sink |
| a1.sinks.k1.type = hdfs |
| a1.sinks.k1.hdfs.path = hdfs://slave2:9000/dataoutput |
| a1.sinks.k1.hdfs.writeFormat = Text |
| a1.sinks.k1.hdfs.fileType = DataStream |
| a1.sinks.k1.hdfs.rollInterval = 10 |
| a1.sinks.k1.hdfs.rollSize = 0 |
| a1.sinks.k1.hdfs.rollCount = 0 |
| a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S |
| a1.sinks.k1.hdfs.useLocalTimeStamp = true |
| |
| # Use a channel which buffers events in file |
| a1.channels.c1.type = file |
| a1.channels.c1.checkpointDir = /opt/flume/checkpoint |
| a1.channels.c1.dataDirs = /opt/flume/data |
| |
| # Bind the source and sink to the channel |
| a1.sources.r1.channels = c1 |
| a1.sinks.k1.channel = c1 |
| # 开启flume Agent |
| /opt/flume/flume1.8.0/bin/flume-ng agent -n a1 /opt/flume/flume1.8.0/conf/ -f /opt/flume/flume1.8.0/conf/netcat.cof -Dflume.root.logger=DEBUG,console |
| # 发生数据文件 |
| cp /data1.txt /opt/flume/flumeInputdata/ |
数据保存在hdfs上的 /dataoutput
Exec Source:监听指定的命令的结果
获取一条命令的结果作为它的数据源
| # 创建监视的文件 |
| touch /words.txt |
| |
| # 编辑Agent文件 |
| gedit /opt/flume/flume1.8.0/conf/netcat.cof |
| a1.sources = r1 |
| a1.sinks = k1 |
| a1.channels = c1 |
| |
| # Describe/configure the source |
| a1.sources.r1.type = exec |
| # 使用tail -F 命令来监视文件的内容 |
| a1.sources.r1.command = tail -F /words.txt |
| |
| # Describe the sink |
| a1.sinks.k1.type = hdfs |
| a1.sinks.k1.hdfs.path = hdfs://salve2:9000/dataoutput |
| a1.sinks.k1.hdfs.writeFormat = Text |
| a1.sinks.k1.hdfs.fileType = DataStream |
| a1.sinks.k1.hdfs.rollInterval = 10 |
| a1.sinks.k1.hdfs.rollSize = 0 |
| a1.sinks.k1.hdfs.rollCount = 0 |
| a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S |
| a1.sinks.k1.hdfs.useLocalTimeStamp = true |
| |
| # Use a channel which buffers events in file |
| a1.channels.c1.type = file |
| a1.channels.c1.checkpointDir = /opt/flume/checkpoint |
| a1.channels.c1.dataDirs = /opt/flume/data |
| |
| # Bind the source and sink to the channel |
| a1.sources.r1.channels = c1 |
| a1.sinks.k1.channel = c1 |
| # 开启flume Agent |
| /opt/flume/flume1.8.0/bin/flume-ng agent -n a1 /opt/flume/flume1.8.0/conf/ -f /opt/flume/flume1.8.0/conf/netcat.cof -Dflume.root.logger=DEBUG,console |
| echo 'file word!' >> /words.txt |
数据保存在hdfs上的 /dataoutput
Avro Source:监听指定的Avro 端口
通过Avro 端口可以获取到Avro client发送过来的文件 。即只要应用程序通过Avro 端口发送文件,source组件就可以获取到该文件中的内容。
| # 编辑Agent文件 |
| gedit /opt/flume/flume1.8.0/conf/netcat.cof |
| # Name the components on this agent |
| a1.sources = r1 |
| a1.sinks = k1 |
| a1.channels = c1 |
| |
| # Describe/configure the source |
| a1.sources.r1.type = avro |
| a1.sources.r1.bind = 192.168.37.130 |
| a1.sources.r1.port = 4141 |
| |
| # Describe the sink |
| a1.sinks.k1.type = hdfs |
| a1.sinks.k1.hdfs.path = hdfs://slave2:9000/dataoutput |
| a1.sinks.k1.hdfs.writeFormat = Text |
| a1.sinks.k1.hdfs.fileType = DataStream |
| a1.sinks.k1.hdfs.rollInterval = 10 |
| a1.sinks.k1.hdfs.rollSize = 0 |
| a1.sinks.k1.hdfs.rollCount = 0 |
| a1.sinks.k1.hdfs.filePrefix = %Y-%m-%d-%H-%M-%S |
| a1.sinks.k1.hdfs.useLocalTimeStamp = true |
| |
| # Use a channel which buffers events in file |
| a1.channels.c1.type = file |
| a1.channels.c1.checkpointDir = /opt/flume/checkpoint |
| a1.channels.c1.dataDirs = /opt/flume/data |
| |
| # Bind the source and sink to the channel |
| a1.sources.r1.channels = c1 |
| a1.sinks.k1.channel = c1 |
| # 开启flume Agent |
| /opt/flume/flume1.8.0/bin/flume-ng agent -n a1 /opt/flume/flume1.8.0/conf/ -f /opt/flume/flume1.8.0/conf/netcat.cof -Dflume.root.logger=DEBUG,console |
| # 发送文件到端口 |
| /opt/flume/flume1.8.0/bin/flume-ng avro-client -c /opt/flume/flume1.8.0/conf/ -H 192.168.37.130 -p 4141 -F /words.txt |
数据保存在hdfs上的 /dataoutput
参考链接:https://blog.csdn.net/qq_33366098/article/details/81565618
【推荐】还在用 ECharts 开发大屏?试试这款永久免费的开源 BI 工具!
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:使用Catalyst进行自然语言处理
· 分享一个我遇到过的“量子力学”级别的BUG。
· Linux系列:如何调试 malloc 的底层源码
· AI与.NET技术实操系列:基于图像分类模型对图像进行分类
· go语言实现终端里的倒计时
· 几个技巧,教你去除文章的 AI 味!
· 系统高可用的 10 条军规
· 关于普通程序员该如何参与AI学习的三个建议以及自己的实践
· 对象命名为何需要避免'-er'和'-or'后缀
· AI与.NET技术实操系列(八):使用Catalyst进行自然语言处理