|NO.Z.00043|——————————|BigDataEnd|——|Hadoop&Flume.V06|——|Flume.v06|Flume.v1.9案例.v04|
一、监控日志文件信息到HDFS
### --- 监控日志文件信息到HDFS
~~~ # 业务需求:
~~~ 监控本地日志文件,收集内容实时上传到HDFS
### --- 需求分析:
~~~ 使用 tail -F 命令即可找到本地日志文件产生的信息
~~~ source 选择 exec。exec 监听一个指定的命令,获取命令的结果作为数据源。
~~~ source组件从这个命令的结果中取数据。当agent进程挂掉重启后,可能存在数据丢失;
~~~ channel 选择 memory
~~~ sink 选择 HDFS
tail -f // 等同于--follow=descriptor,根据文件描述符进行追踪,当文件改名或被删除,追踪停止
tail -F // 等同于--follow=name --retry,根据文件名进行追踪,并保持重试,即该文件被删除或改名后,如果再次创建相同的文件名,会继续追踪
二、实现步骤:
### --- 环境准备。
~~~ Flume要想将数据输出到HDFS,必须持有Hadoop相关jar包。
~~~ 拷贝到 $FLUME_HOME/lib 文件夹下
~~~ 将 commons-configuration-1.6.jar
hadoop-auth-2.9.2.jar
hadoop-common-2.9.2.jar
hadoop-hdfs-2.9.2.jar
commons-io-2.4.jar
htrace-core4-4.1.0-incubating.jar
### --- 在$HADOOP_HOME/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEBINF/lib 有这些文件
[root@linux123 ~]# cd $HADOOP_HOME/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib
[root@linux123 lib]# cp commons-configuration-1.6.jar $FLUME_HOME/lib/
[root@linux123 lib]# cp hadoop-auth-2.9.2.jar $FLUME_HOME/lib/
[root@linux123 lib]# cp hadoop-common-2.9.2.jar $FLUME_HOME/lib/
[root@linux123 lib]# cp hadoop-hdfs-2.9.2.jar $FLUME_HOME/lib/
[root@linux123 lib]# cp commons-io-2.4.jar $FLUME_HOME/lib/
[root@linux123 lib]# cp htrace-core4-4.1.0-incubating.jar $FLUME_HOME/lib/
### --- 创建配置文件。flume-exec-hdfs.conf :
[root@linux123 ~]# vim $FLUME_HOME/conf/flume-exec-hdfs.conf
# Name the components on this agent
a2.sources = r2
a2.sinks = k2
a2.channels = c2
# Describe/configure the source
a2.sources.r2.type = exec
a2.sources.r2.command = tail -F /tmp/root/hive.log
# Use a channel which buffers events in memory
a2.channels.c2.type = memory
a2.channels.c2.capacity = 10000
a2.channels.c2.transactionCapacity = 500
# Describe the sink
a2.sinks.k2.type = hdfs
a2.sinks.k2.hdfs.path = hdfs://linux121:9000/flume/%Y%m%d/%H%M
# 上传文件的前缀
a2.sinks.k2.hdfs.filePrefix = logs-
# 是否使用本地时间戳
a2.sinks.k2.hdfs.useLocalTimeStamp = true
# 积攒500个Event才flush到HDFS一次
a2.sinks.k2.hdfs.batchSize = 500
# 设置文件类型,支持压缩。DataStream没启用压缩
a2.sinks.k2.hdfs.fileType = DataStream
# 1分钟滚动一次
a2.sinks.k2.hdfs.rollInterval = 60
# 128M滚动一次
a2.sinks.k2.hdfs.rollSize = 134217700
# 文件的滚动与Event数量无关
a2.sinks.k2.hdfs.rollCount = 0
# 最小冗余数
a2.sinks.k2.hdfs.minBlockReplicas = 1
# Bind the source and sink to the channel
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2
### --- 启动Agent
~~~ 启动即可
[root@linux123 ~]# $FLUME_HOME/bin/flume-ng agent --name a2 \
--conf-file $FLUME_HOME/conf/flume-exec-hdfs.conf \
-Dflume.root.logger=INFO,console
~~输出参数
INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c2: Successfully registered new MBean.
INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c2 started
INFO node.Application: Starting Sink k2
INFO node.Application: Starting Source r2
INFO source.ExecSource: Exec source starting with command: tail -F /tmp/root/hive.log
INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r2: Successfully registered new MBean.
INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r2 started
INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k2: Successfully registered new MBean.
### --- 启动Hadoop和Hive,操作Hive产生日志
[root@linux121 ~]# start-dfs.sh
[root@linux123 ~]# start-yarn.sh
~~~ # 监控日志输出文件
[root@linux123 ~]# tail -f /tmp/root/hive.log
~~~
~~~ # 在执行hive输入时,会有日志写入
21/08/28 12:04:02 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
21/08/28 12:04:02 INFO hdfs.BucketWriter: Creating hdfs://linux121:9000/flume/20210828/1204/logs-.1630123442882.tmp
~~~ # 在命令行多次执行
[root@linux122 ~]# hive -e "show databases"
### --- 在HDFS上查看文件
[root@linux123 ~]# hdfs dfs -ls /flume
drwxrwxrwx - root supergroup 0 2021-08-28 12:05 /flume/20210828
Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
——W.S.Landor
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」