Flume实战案例运维篇
一、Flume概述
1、什么是Flume
Flume是一种分布式,可靠且可用的服务,用于有效地收集,聚合和移动大量日志数据。它具有基于流数据流的简单灵活的架构。它具有可靠的可靠性机制和许多故障转移和恢复机制,
具有强大的容错性。它使用简单的可扩展数据模型,允许在线分析应用程序。官方地址:http://flume.apache.org/

2、Flume特性
(1)高可靠性 Flume提供了end to end的数据可靠性机制 (2)易于扩展 Agent为分布式架构,可水平扩展 (3)易于恢复 Channel中保存了与数据源有关的事件,用于失败时的恢复 (4)功能丰富 Flume内置了多种组件,包括不同数据源和不同存储方式
3、Flume常用组件
(1)Source: 数据源,简单的说就是agent获取数据的入口。 (2)Channel: 管道,数据流通和存储的通道。一个source必须至少和一个channel关联。 (3)Sink: 用来接收channel传输的数据并将之传送到指定的地方,成功后从channel中删除。
4、Flume架构

二、部署Flume环境
1、下载Flume组件
[root@wjf-C11-71 ~]# wget http://mirrors.tuna.tsinghua.edu.cn/apache/flume/1.9.0/apache-flume-1.9.0-bin.tar.gz
2、解压flume
[root@wjf-C11-71 software]# tar zxvf apache-flume-1.9.0-bin.tar.gz -C /export/servers/ [root@wjf-C11-71 software]# [root@wjf-C11-71 software]# [root@wjf-C11-71 software]# ll apache-flume-1.9.0-bin total 168 drwxr-xr-x 2 1000 1000 62 Aug 2 14:13 bin -rw-rw-r-- 1 1000 1000 85602 Nov 29 2018 CHANGELOG drwxr-xr-x 2 1000 1000 127 Aug 2 14:13 conf -rw-r--r-- 1 1000 1000 5681 Nov 16 2017 DEVNOTES -rw-r--r-- 1 1000 1000 2873 Nov 16 2017 doap_Flume.rdf drwxrwxr-x 12 1000 1000 4096 Dec 18 2018 docs drwxr-xr-x 2 root root 8192 Aug 2 14:13 lib -rw-rw-r-- 1 1000 1000 43405 Dec 10 2018 LICENSE -rw-r--r-- 1 1000 1000 249 Nov 29 2018 NOTICE -rw-r--r-- 1 1000 1000 2483 Nov 16 2017 README.md -rw-rw-r-- 1 1000 1000 1958 Dec 10 2018 RELEASE-NOTES drwxr-xr-x 2 root root 68 Aug 2 14:13 tools [root@wjf-C11-71 software]#
3、配置flume的环境变量
[root@wjf-C11-71 software]# [root@wjf-C11-71 software]# tail -3 /etc/profile #Flume Add By Wangruopeng export FLUME_HOME=/export/servers/apache-flume-1.9.0-bin export PATH=$PATH:$FLUME_HOME/bin [root@wjf-C11-71 software]# source /etc/profile [root@wjf-C11-71 software]# [root@wjf-C11-71 software]# flume-ng version Flume 1.9.0 Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git Revision: d4fcab4f501d41597bc616921329a4339f73585e Compiled by fszabo on Mon Dec 17 20:45:25 CET 2018 From source with checksum 35db629a3bda49d23e9b3690c80737f9 [root@wjf-C11-71 software]#
4、自定义flume的配置文件存放目录
[root@wjf-C11-71 software]# mkdir /export/data/flume/{log,job,shell} -p [root@wjf-C11-71 software]# ll /export/data/flume/ total 0 drwxr-xr-x 2 root root 6 Aug 2 14:20 job #用于存放flume启动的agent端的配置文件 drwxr-xr-x 2 root root 6 Aug 2 14:20 log #用于存放日志文件 drwxr-xr-x 2 root root 6 Aug 2 14:20 shell #用于存放启动脚本 [root@wjf-C11-71 software]#
三、Flume案例
1、监控端口数据(netcat source-memory channel-logger sink)
[root@Hexindai-C11-71 software]# [root@Hexindai-C11-71 software]# yum -y install telnet net-tools Loaded plugins: fastestmirror, langpacks Loading mirror speeds from cached hostfile epel/x86_64/metalink | 7.9 kB 00:00:00 * base: mirrors.aliyun.com * epel: mirrors.yun-idc.com * extras: mirrors.aliyun.com * updates: mirrors.aliyun.com base | 3.6 kB 00:00:00 epel | 5.3 kB 00:00:00 extras | 3.4 kB 00:00:00 updates | 3.4 kB 00:00:00 (1/3): epel/x86_64/updateinfo | 993 kB 00:00:05 (2/3): updates/7/x86_64/primary_db | 7.4 MB 00:00:06 (3/3): epel/x86_64/primary_db | 6.8 MB 00:03:28 Package net-tools-2.0-0.24.20131004git.el7.x86_64 already installed and latest version Resolving Dependencies --> Running transaction check ---> Package telnet.x86_64 1:0.17-64.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved ===================================================================================================================================================================================================== Package Arch Version Repository Size ===================================================================================================================================================================================================== Installing: telnet x86_64 1:0.17-64.el7 base 64 k Transaction Summary ===================================================================================================================================================================================================== Install 1 Package Total download size: 64 k Installed size: 113 k Downloading packages: telnet-0.17-64.el7.x86_64.rpm | 64 kB 00:00:05 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : 1:telnet-0.17-64.el7.x86_64 1/1 Verifying : 1:telnet-0.17-64.el7.x86_64 1/1 Installed: telnet.x86_64 1:0.17-64.el7 Complete! [root@Hexindai-C11-71 software]#
[root@wjf-C11-71 software]# cat /export/data/flume/job/flume-netcat.conf # 这里的“rabin”是agent的名称,它是我们自定义的。我们分别给“rabin”的sources,sinks,channels的别名分别为r1,k1和c1 rabin.sources = r1 rabin.sinks = k1 rabin.channels = c1 rabin.sources.r1.type = netcat rabin.sources.r1.bind = wjf-C11-71 rabin.sources.r1.port = 8888 # 指定sink的类型,我们这里指定的为logger,即控制台输出。 rabin.sinks.k1.type = logger # 指定channel的类型为memory,指定channel的容量是1000,每次传输的容量是100 rabin.channels.c1.type = memory rabin.channels.c1.capacity = 1000 rabin.channels.c1.transactionCapacity = 100 # 绑定source和sink rabin.sources.r1.channels = c1 rabin.sinks.k1.channel = c1 [root@wjf-C11-71 software]#
[root@Hexindai-C11-71 software]# [root@Hexindai-C11-71 software]# flume-ng agent --conf /export/servers/apache-flume-1.9.0-bin/conf --name rabin --conf-file /export/data/flume/job/flume-netcat.conf -Dflume.monitoring.type=http -Dflume.monitoring.port=10501 -Dflume.root.logger==INFO,console Info: Including Hadoop libraries found via (/export/servers/hadoop-2.9.2/bin/hadoop) for HDFS access Info: Including Hive libraries found via () for Hive access + exec /export/servers/jdk1.8.0_211/bin/java -Xmx20m -Dflume.monitoring.type=http -Dflume.monitoring.port=10501 -Dflume.root.logger==INFO,console -cp '/export/servers/apache-flume-1.9.0-bin/conf:/export/servers/apache-flume-1.9.0-bin/lib/*:/export/servers/hadoop-2.9.2/etc/hadoop:/export/servers/hadoop-2.9.2/share/hadoop/common/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/common/*:/export/servers/hadoop-2.9.2/share/hadoop/hdfs:/export/servers/hadoop-2.9.2/share/hadoop/hdfs/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/hdfs/*:/export/servers/hadoop-2.9.2/share/hadoop/yarn:/export/servers/hadoop-2.9.2/share/hadoop/yarn/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/yarn/*:/export/servers/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/mapreduce/*:/export/servers/hadoop-2.9.2/contrib/capacity-scheduler/*.jar:/lib/*' -Djava.library.path=:/export/servers/hadoop-2.9.2/lib/native org.apache.flume.node.Application --name rabin --conf-file /export/data/flume/job/flume-netcat.conf SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/export/servers/apache-flume-1.9.0-bin/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/export/servers/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2019-08-02 14:45:30,915 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL keystore path specified. 2019-08-02 14:45:30,918 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL keystore password specified. 2019-08-02 14:45:30,928 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL keystore type specified. 2019-08-02 14:45:30,928 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL truststore path specified. 2019-08-02 14:45:30,931 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL truststore password specified. 2019-08-02 14:45:30,931 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL truststore type specified. 2019-08-02 14:45:30,931 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL include protocols specified. 2019-08-02 14:45:30,931 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL exclude protocols specified. 2019-08-02 14:45:30,931 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL include cipher suites specified. 2019-08-02 14:45:30,931 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL exclude cipher suites specified. 2019-08-02 14:45:31,011 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:62)] Configuration provider starting 2019-08-02 14:45:31,012 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:79)] Configuration provider started 2019-08-02 14:45:31,031 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:131)] Checking file:/export/data/flume/job/flume-netcat.conf for changes 2019-08-02 14:45:31,031 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:138)] Reloading configuration file:/export/data/flume/job/flume-netcat.conf 2019-08-02 14:45:31,034 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:c1 2019-08-02 14:45:31,034 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1207)] Created context for c1: type 2019-08-02 14:45:31,037 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1117)] Added sinks: k1 Agent: rabin 2019-08-02 14:45:31,037 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:r1 2019-08-02 14:45:31,037 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1207)] Created context for r1: type 2019-08-02 14:45:31,037 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:c1 2019-08-02 14:45:31,037 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:k1 2019-08-02 14:45:31,037 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1207)] Created context for k1: type 2019-08-02 14:45:31,037 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:r1 2019-08-02 14:45:31,037 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:c1 2019-08-02 14:45:31,038 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:k1 2019-08-02 14:45:31,038 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:r1 2019-08-02 14:45:31,038 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:r1 2019-08-02 14:45:31,038 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:350)] Starting validation of configuration for agent: rabin 2019-08-02 14:45:31,038 (conf-file-poller-0) [INFO - org.apache.flume.conf.LogPrivacyUtil.<clinit>(LogPrivacyUtil.java:51)] Logging of configuration details is disabled. To see configuration details in the log run the agent with -Dorg.apache.flume.log.printconfig=true JVM argument. Please note that this is not recommended in production systems as it may leak private information to the logfile. 2019-08-02 14:45:31,038 (conf-file-poller-0) [WARN - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateConfigFilterSet(FlumeConfiguration.java:623)] Agent configuration for 'rabin' has no configfilters. 2019-08-02 14:45:31,048 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateChannels(FlumeConfiguration.java:583)] Created channel c1 2019-08-02 14:45:31,052 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSinks(FlumeConfiguration.java:861)] Creating sink: k1 using LOGGER 2019-08-02 14:45:31,053 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:158)] Channels:c1 2019-08-02 14:45:31,053 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:159)] Sinks k1 2019-08-02 14:45:31,053 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:160)] Sources r1 2019-08-02 14:45:31,053 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:163)] Post-validation flume configuration contains configuration for agents: [rabin] 2019-08-02 14:45:31,053 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:151)] Creating channels 2019-08-02 14:45:31,058 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:42)] Creating instance of channel c1 type memory 2019-08-02 14:45:31,061 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:205)] Created channel c1 2019-08-02 14:45:31,062 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:41)] Creating instance of source r1, type netcat 2019-08-02 14:45:31,067 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:42)] Creating instance of sink: k1, type: logger 2019-08-02 14:45:31,069 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:120)] Channel c1 connected to [r1, k1] 2019-08-02 14:45:31,078 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:162)] Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:r1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@75a61c33 counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} } 2019-08-02 14:45:31,081 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:169)] Starting Channel c1 2019-08-02 14:45:31,127 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean. 2019-08-02 14:45:31,127 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: CHANNEL, name: c1 started 2019-08-02 14:45:31,127 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:196)] Starting Sink k1 2019-08-02 14:45:31,129 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:207)] Starting Source r1 2019-08-02 14:45:31,130 (lifecycleSupervisor-1-2) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:155)] Source starting 2019-08-02 14:45:31,137 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:141)] Polling sink runner starting 2019-08-02 14:45:31,150 (lifecycleSupervisor-1-2) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:166)] Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/172.20.11.71:8888] 2019-08-02 14:45:31,152 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.log.Log.initialized(Log.java:180)] Logging to org.slf4j.impl.Log4jLoggerAdapter(org.eclipse.jetty.util.log) via org.eclipse.jetty.util.log.Slf4jLog 2019-08-02 14:45:31,152 (lifecycleSupervisor-1-2) [DEBUG - org.apache.flume.source.NetcatSource.start(NetcatSource.java:191)] Source started 2019-08-02 14:45:31,154 (Thread-2) [DEBUG - org.apache.flume.source.NetcatSource$AcceptHandler.run(NetcatSource.java:271)] Starting accept handler 2019-08-02 14:45:31,155 (conf-file-poller-0) [INFO - org.eclipse.jetty.util.log.Log.initialized(Log.java:192)] Logging initialized @562ms to org.eclipse.jetty.util.log.Slf4jLog 2019-08-02 14:45:31,161 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] org.eclipse.jetty.server.Server@22e3de17 added {qtp2051678454{STOPPED,8<=0<=200,i=0,q=0},AUTO} 2019-08-02 14:45:31,177 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] HttpConnectionFactory@7ff96c8[HTTP/1.1] added {HttpConfiguration@7db99562{32768/8192,8192/8192,https://:0,[]},POJO} 2019-08-02 14:45:31,180 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] ServerConnector@1e8bfcf6{null,[]}{0.0.0.0:0} added {org.eclipse.jetty.server.Server@22e3de17,UNMANAGED} 2019-08-02 14:45:31,180 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] ServerConnector@1e8bfcf6{null,[]}{0.0.0.0:0} added {qtp2051678454{STOPPED,8<=0<=200,i=0,q=0},AUTO} 2019-08-02 14:45:31,181 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] ServerConnector@1e8bfcf6{null,[]}{0.0.0.0:0} added {org.eclipse.jetty.util.thread.ScheduledExecutorScheduler@44a892a7,AUTO} 2019-08-02 14:45:31,181 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] ServerConnector@1e8bfcf6{null,[]}{0.0.0.0:0} added {org.eclipse.jetty.io.ArrayByteBufferPool@70827716,POJO} 2019-08-02 14:45:31,181 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] ServerConnector@1e8bfcf6{null,[http/1.1]}{0.0.0.0:0} added {HttpConnectionFactory@7ff96c8[HTTP/1.1],AUTO} 2019-08-02 14:45:31,182 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.server.AbstractConnector.addConnectionFactory(AbstractConnector.java:406)] ServerConnector@1e8bfcf6{HTTP/1.1,[http/1.1]}{0.0.0.0:0} added HttpConnectionFactory@7ff96c8[HTTP/1.1] 2019-08-02 14:45:31,183 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] ServerConnector@1e8bfcf6{HTTP/1.1,[http/1.1]}{0.0.0.0:0} added {org.eclipse.jetty.server.ServerConnector$ServerConnectorManager@21aec7c0,MANAGED} 2019-08-02 14:45:31,184 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] org.eclipse.jetty.server.Server@22e3de17 added {ServerConnector@1e8bfcf6{HTTP/1.1,[http/1.1]}{0.0.0.0:10501},AUTO} 2019-08-02 14:45:31,220 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] org.eclipse.jetty.server.Server@22e3de17 added {org.apache.flume.instrumentation.http.HTTPMetricsServer$HTTPMetricsHandler@4b8b0241,MANAGED} 2019-08-02 14:45:31,220 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarting(AbstractLifeCycle.java:185)] starting org.eclipse.jetty.server.Server@22e3de17 2019-08-02 14:45:31,223 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] org.eclipse.jetty.server.Server@22e3de17 added {org.eclipse.jetty.server.handler.ErrorHandler@757e6e72,AUTO} 2019-08-02 14:45:31,224 (conf-file-poller-0) [INFO - org.eclipse.jetty.server.Server.doStart(Server.java:372)] jetty-9.4.6.v20170531 2019-08-02 14:45:31,245 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:110)] starting org.eclipse.jetty.server.Server@22e3de17 2019-08-02 14:45:31,246 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarting(AbstractLifeCycle.java:185)] starting qtp2051678454{STOPPED,8<=0<=200,i=0,q=0} 2019-08-02 14:45:31,248 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarted(AbstractLifeCycle.java:177)] STARTED @656ms qtp2051678454{STARTED,8<=8<=200,i=6,q=0} 2019-08-02 14:45:31,249 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarting(AbstractLifeCycle.java:185)] starting org.apache.flume.instrumentation.http.HTTPMetricsServer$HTTPMetricsHandler@4b8b0241 2019-08-02 14:45:31,249 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:110)] starting org.apache.flume.instrumentation.http.HTTPMetricsServer$HTTPMetricsHandler@4b8b0241 2019-08-02 14:45:31,249 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarted(AbstractLifeCycle.java:177)] STARTED @656ms org.apache.flume.instrumentation.http.HTTPMetricsServer$HTTPMetricsHandler@4b8b0241 2019-08-02 14:45:31,249 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarting(AbstractLifeCycle.java:185)] starting org.eclipse.jetty.server.handler.ErrorHandler@757e6e72 2019-08-02 14:45:31,249 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:110)] starting org.eclipse.jetty.server.handler.ErrorHandler@757e6e72 2019-08-02 14:45:31,249 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarted(AbstractLifeCycle.java:177)] STARTED @657ms org.eclipse.jetty.server.handler.ErrorHandler@757e6e72 2019-08-02 14:45:31,250 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarting(AbstractLifeCycle.java:185)] starting ServerConnector@1e8bfcf6{HTTP/1.1,[http/1.1]}{0.0.0.0:10501} 2019-08-02 14:45:31,250 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] ServerConnector@1e8bfcf6{HTTP/1.1,[http/1.1]}{0.0.0.0:10501} added {sun.nio.ch.ServerSocketChannelImpl[/0.0.0.0:10501],POJO} 2019-08-02 14:45:31,251 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarting(AbstractLifeCycle.java:185)] starting org.eclipse.jetty.util.thread.ScheduledExecutorScheduler@44a892a7 2019-08-02 14:45:31,251 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarted(AbstractLifeCycle.java:177)] STARTED @658ms org.eclipse.jetty.util.thread.ScheduledExecutorScheduler@44a892a7 2019-08-02 14:45:31,251 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarting(AbstractLifeCycle.java:185)] starting HttpConnectionFactory@7ff96c8[HTTP/1.1] 2019-08-02 14:45:31,251 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarted(AbstractLifeCycle.java:177)] STARTED @659ms HttpConnectionFactory@7ff96c8[HTTP/1.1] 2019-08-02 14:45:31,251 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarting(AbstractLifeCycle.java:185)] starting org.eclipse.jetty.server.ServerConnector$ServerConnectorManager@21aec7c0 2019-08-02 14:45:31,255 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] org.eclipse.jetty.io.ManagedSelector@57fdf0b1 id=0 keys=-1 selected=-1 added {EatWhatYouKill@496a0561/org.eclipse.jetty.io.ManagedSelector$SelectorProducer@bca4895/IDLE/0/1,AUTO} 2019-08-02 14:45:31,255 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] org.eclipse.jetty.server.ServerConnector$ServerConnectorManager@21aec7c0 added {org.eclipse.jetty.io.ManagedSelector@57fdf0b1 id=0 keys=-1 selected=-1,AUTO} 2019-08-02 14:45:31,255 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] org.eclipse.jetty.io.ManagedSelector@109a50d3 id=1 keys=-1 selected=-1 added {EatWhatYouKill@285a5209/org.eclipse.jetty.io.ManagedSelector$SelectorProducer@31820b6e/IDLE/0/1,AUTO} 2019-08-02 14:45:31,256 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] org.eclipse.jetty.server.ServerConnector$ServerConnectorManager@21aec7c0 added {org.eclipse.jetty.io.ManagedSelector@109a50d3 id=1 keys=-1 selected=-1,AUTO} 2019-08-02 14:45:31,256 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarting(AbstractLifeCycle.java:185)] starting org.eclipse.jetty.io.ManagedSelector@57fdf0b1 id=0 keys=-1 selected=-1 2019-08-02 14:45:31,256 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarting(AbstractLifeCycle.java:185)] starting EatWhatYouKill@496a0561/org.eclipse.jetty.io.ManagedSelector$SelectorProducer@bca4895/IDLE/0/1 2019-08-02 14:45:31,256 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarted(AbstractLifeCycle.java:177)] STARTED @663ms EatWhatYouKill@496a0561/org.eclipse.jetty.io.ManagedSelector$SelectorProducer@bca4895/IDLE/0/1 2019-08-02 14:45:31,304 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.thread.QueuedThreadPool.execute(QueuedThreadPool.java:381)] queue org.eclipse.jetty.io.ManagedSelector$$Lambda$1/347685846@7134b136 2019-08-02 14:45:31,304 (qtp2051678454-23) [DEBUG - org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:590)] run org.eclipse.jetty.io.ManagedSelector$$Lambda$1/347685846@7134b136 2019-08-02 14:45:31,304 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarted(AbstractLifeCycle.java:177)] STARTED @712ms org.eclipse.jetty.io.ManagedSelector@57fdf0b1 id=0 keys=0 selected=0 2019-08-02 14:45:31,305 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarting(AbstractLifeCycle.java:185)] starting org.eclipse.jetty.io.ManagedSelector@109a50d3 id=1 keys=-1 selected=-1 2019-08-02 14:45:31,305 (qtp2051678454-23) [DEBUG - org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:123)] EatWhatYouKill@496a0561/org.eclipse.jetty.io.ManagedSelector$SelectorProducer@bca4895/PRODUCING/0/1 execute true 2019-08-02 14:45:31,305 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarting(AbstractLifeCycle.java:185)] starting EatWhatYouKill@285a5209/org.eclipse.jetty.io.ManagedSelector$SelectorProducer@31820b6e/IDLE/0/1 2019-08-02 14:45:31,305 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarted(AbstractLifeCycle.java:177)] STARTED @713ms EatWhatYouKill@285a5209/org.eclipse.jetty.io.ManagedSelector$SelectorProducer@31820b6e/IDLE/0/1 2019-08-02 14:45:31,305 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.thread.QueuedThreadPool.execute(QueuedThreadPool.java:381)] queue org.eclipse.jetty.io.ManagedSelector$$Lambda$1/347685846@b8ed5ad 2019-08-02 14:45:31,306 (qtp2051678454-23) [DEBUG - org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:206)] EatWhatYouKill@496a0561/org.eclipse.jetty.io.ManagedSelector$SelectorProducer@bca4895/PRODUCING/0/1 produce non-blocking 2019-08-02 14:45:31,306 (qtp2051678454-24) [DEBUG - org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:590)] run org.eclipse.jetty.io.ManagedSelector$$Lambda$1/347685846@b8ed5ad 2019-08-02 14:45:31,306 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarted(AbstractLifeCycle.java:177)] STARTED @713ms org.eclipse.jetty.io.ManagedSelector@109a50d3 id=1 keys=0 selected=0 2019-08-02 14:45:31,306 (qtp2051678454-24) [DEBUG - org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:123)] EatWhatYouKill@285a5209/org.eclipse.jetty.io.ManagedSelector$SelectorProducer@31820b6e/PRODUCING/0/1 execute true 2019-08-02 14:45:31,306 (qtp2051678454-24) [DEBUG - org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:206)] EatWhatYouKill@285a5209/org.eclipse.jetty.io.ManagedSelector$SelectorProducer@31820b6e/PRODUCING/0/1 produce non-blocking 2019-08-02 14:45:31,306 (qtp2051678454-24) [DEBUG - org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:233)] Selector loop waiting on select 2019-08-02 14:45:31,306 (qtp2051678454-23) [DEBUG - org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:233)] Selector loop waiting on select 2019-08-02 14:45:31,306 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarted(AbstractLifeCycle.java:177)] STARTED @713ms org.eclipse.jetty.server.ServerConnector$ServerConnectorManager@21aec7c0 2019-08-02 14:45:31,308 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.ContainerLifeCycle.addBean(ContainerLifeCycle.java:322)] ServerConnector@1e8bfcf6{HTTP/1.1,[http/1.1]}{0.0.0.0:10501} added {acceptor-0@7e2852e1,POJO} 2019-08-02 14:45:31,308 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.thread.QueuedThreadPool.execute(QueuedThreadPool.java:381)] queue acceptor-0@7e2852e1 2019-08-02 14:45:31,308 (conf-file-poller-0) [INFO - org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:280)] Started ServerConnector@1e8bfcf6{HTTP/1.1,[http/1.1]}{0.0.0.0:10501} 2019-08-02 14:45:31,308 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarted(AbstractLifeCycle.java:177)] STARTED @716ms ServerConnector@1e8bfcf6{HTTP/1.1,[http/1.1]}{0.0.0.0:10501} 2019-08-02 14:45:31,309 (conf-file-poller-0) [INFO - org.eclipse.jetty.server.Server.doStart(Server.java:444)] Started @716ms 2019-08-02 14:45:31,309 (qtp2051678454-25) [DEBUG - org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:590)] run acceptor-0@7e2852e1 2019-08-02 14:45:31,309 (conf-file-poller-0) [DEBUG - org.eclipse.jetty.util.component.AbstractLifeCycle.setStarted(AbstractLifeCycle.java:177)] STARTED @716ms org.eclipse.jetty.server.Server@22e3de17 2019-08-02 14:46:01,310 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:131)] Checking file:/export/data/flume/job/flume-netcat.conf for changes
[root@Hexindai-C11-71 software]# ss -antlp State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 50 *:10501 *:* users:(("java",pid=17764,fd=361)) LISTEN 0 128 *:8040 *:* users:(("java",pid=6836,fd=303)) LISTEN 0 128 *:8042 *:* users:(("java",pid=6836,fd=314)) LISTEN 0 128 172.20.11.71:8020 *:* users:(("java",pid=7647,fd=260)) LISTEN 0 128 172.20.11.71:50070 *:* users:(("java",pid=7647,fd=249)) LISTEN 0 128 *:22 *:* users:(("sshd",pid=1076,fd=3)) LISTEN 0 128 *:36151 *:* users:(("java",pid=6836,fd=292)) LISTEN 0 50 172.20.11.71:8888 *:* users:(("java",pid=17764,fd=358)) LISTEN 0 100 127.0.0.1:25 *:* users:(("master",pid=1341,fd=13)) LISTEN 0 128 *:50010 *:* users:(("java",pid=7796,fd=242)) LISTEN 0 128 *:13562 *:* users:(("java",pid=6836,fd=313)) LISTEN 0 128 *:50075 *:* users:(("java",pid=7796,fd=305)) LISTEN 0 128 127.0.0.1:43937 *:* users:(("java",pid=7796,fd=253)) LISTEN 0 128 *:50020 *:* users:(("java",pid=7796,fd=309)) LISTEN 0 128 :::22 :::* users:(("sshd",pid=1076,fd=4)) [root@Hexindai-C11-71 software]# [root@Hexindai-C11-71 software]# [root@Hexindai-C11-71 software]# telnet Hexindai-C11-71 8888 Trying 172.20.11.71... Connected to Hexindai-C11-71. Escape character is '^]'. this is a flume test! OK golang OK python OK C++ OK java OK
[root@Hexindai-C11-71 software]# [root@Hexindai-C11-71 software]# ss -ntl State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 *:8040 *:* LISTEN 0 128 *:8042 *:* LISTEN 0 128 172.20.11.71:8020 *:* LISTEN 0 128 172.20.11.71:50070 *:* LISTEN 0 128 *:22 *:* LISTEN 0 128 *:36151 *:* LISTEN 0 100 127.0.0.1:25 *:* LISTEN 0 128 *:50010 *:* LISTEN 0 128 *:13562 *:* LISTEN 0 128 *:50075 *:* LISTEN 0 128 127.0.0.1:43937 *:* STEN 0 128 *:50020 *:* ▽ISTEN 0 128 :::22 :::* [root@Hexindai-C11-71 software]# [root@Hexindai-C11-71 software]# [root@Hexindai-C11-71 software]# /export/data/flume/shell/start-netcat.sh [root@Hexindai-C11-71 software]# tail -50f /export/data/flume/log/flume-netcat.log Info: Including Hadoop libraries found via (/export/servers/hadoop-2.9.2/bin/hadoop) for HDFS access Info: Including Hive libraries found via () for Hive access + exec /export/servers/jdk1.8.0_211/bin/java -Xmx20m -Dflume.monitoring.type=http -Dflume.monitoring.port=10501 -Dflume.root.logger=INFO,console -cp '/export/servers/apache-flume-1.9.0-bin/conf:/export/servers/apache-flume-1.9.0-bin/lib/*:/export/servers/hadoop-2.9.2/etc/hadoop:/export/servers/hadoop-2.9.2/share/hadoop/common/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/common/*:/export/servers/hadoop-2.9.2/share/hadoop/hdfs:/export/servers/hadoop-2.9.2/share/hadoop/hdfs/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/hdfs/*:/export/servers/hadoop-2.9.2/share/hadoop/yarn:/export/servers/hadoop-2.9.2/share/hadoop/yarn/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/yarn/*:/export/servers/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/mapreduce/*:/export/servers/hadoop-2.9.2/contrib/capacity-scheduler/*.jar:/lib/*' -Djava.library.path=:/export/servers/hadoop-2.9.2/lib/native org.apache.flume.node.Application --name rabin --conf-file /export/data/flume/job/flume-netcat.conf SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/export/servers/apache-flume-1.9.0-bin/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/export/servers/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2019-08-02 15:09:54,853 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:62)] Configuration provider starting 2019-08-02 15:09:54,862 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:138)] Reloading configuration file:/export/data/flume/job/flume-netcat.conf 2019-08-02 15:09:54,868 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:c1 2019-08-02 15:09:54,868 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1117)] Added sinks: k1 Agent: rabin 2019-08-02 15:09:54,868 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:r1 2019-08-02 15:09:54,869 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:c1 2019-08-02 15:09:54,869 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:k1 2019-08-02 15:09:54,869 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:r1 2019-08-02 15:09:54,869 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:c1 2019-08-02 15:09:54,869 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:k1 2019-08-02 15:09:54,869 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:r1 2019-08-02 15:09:54,869 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:r1 2019-08-02 15:09:54,870 (conf-file-poller-0) [WARN - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateConfigFilterSet(FlumeConfiguration.java:623)] Agent configuration for 'rabin' has no configfilters. 2019-08-02 15:09:54,885 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:163)] Post-validation flume configuration contains configuration for agents: [rabin] 2019-08-02 15:09:54,886 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:151)] Creating channels 2019-08-02 15:09:54,891 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:42)] Creating instance of channel c1 type memory 2019-08-02 15:09:54,896 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:205)] Created channel c1 2019-08-02 15:09:54,897 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:41)] Creating instance of source r1, type netcat 2019-08-02 15:09:54,902 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:42)] Creating instance of sink: k1, type: logger 2019-08-02 15:09:54,904 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:120)] Channel c1 connected to [r1, k1] 2019-08-02 15:09:54,910 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:162)] Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:r1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@1c46152f counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} } 2019-08-02 15:09:54,914 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:169)] Starting Channel c1 2019-08-02 15:09:54,915 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Waiting for channel: c1 to start. Sleeping for 500 ms 2019-08-02 15:09:54,962 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean. 2019-08-02 15:09:54,962 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: CHANNEL, name: c1 started 2019-08-02 15:09:55,415 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:196)] Starting Sink k1 2019-08-02 15:09:55,416 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:207)] Starting Source r1 2019-08-02 15:09:55,417 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:155)] Source starting 2019-08-02 15:09:55,443 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:166)] Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/172.20.11.71:8888] 2019-08-02 15:09:55,450 (conf-file-poller-0) [INFO - org.eclipse.jetty.util.log.Log.initialized(Log.java:192)] Logging initialized @1007ms to org.eclipse.jetty.util.log.Slf4jLog 2019-08-02 15:09:55,533 (conf-file-poller-0) [INFO - org.eclipse.jetty.server.Server.doStart(Server.java:372)] jetty-9.4.6.v20170531 2019-08-02 15:09:55,611 (conf-file-poller-0) [INFO - org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:280)] Started ServerConnector@10792e7b{HTTP/1.1,[http/1.1]}{0.0.0.0:10501} 2019-08-02 15:09:55,611 (conf-file-poller-0) [INFO - org.eclipse.jetty.server.Server.doStart(Server.java:444)] Started @1168ms
[root@Hexindai-C11-71 software]# cat /export/data/flume/shell/start-netcat.sh #!/bin/bash #@author :yinzhengjie #blog:http://www.cnblogs.com/yinzhengjie #EMAIL:y1053419035@qq.com #Data:Thu Oct 18 11:26:06 CST 2018 #将监控数据发送给ganglia,需要指定ganglia服务器地址,使用请确认是否部署好ganglia服务! #nohup flume-ng agent -c /export/data/flume/job --name rabin --conf-file /export/data/flume/job/flume-netcat.conf -Dflume.monitoring.type=ganglia -Dflume.monitoring.hosts=Hexindai-C11-71:8649 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-ganglia-flume-netcat.log 2>&1 & #启动flume自身的监控参数,默认执行以下脚本 nohup flume-ng agent -c /export/servers/apache-flume-1.9.0-bin/conf --name rabin --conf-file /export/data/flume/job/flume-netcat.conf -Dflume.monitoring.type=http -Dflume.monitoring.port=10501 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-netcat.log 2>&1 & [root@Hexindai-C11-71 software]#
[root@wjf-C11-71 software]# yum -y install epel-release #安装epel源
[root@Hexindai-C11-71 software]# yum list jq Loaded plugins: fastestmirror, langpacks Loading mirror speeds from cached hostfile * base: mirrors.aliyun.com * epel: mirrors.yun-idc.com * extras: mirrors.aliyun.com * updates: mirrors.aliyun.com Available Packages jq.x86_64 1.5-1.el7 epel [root@Hexindai-C11-71 software]#
[root@Hexindai-C11-71 software]# yum -y install jq Loaded plugins: fastestmirror, langpacks Loading mirror speeds from cached hostfile * base: mirrors.aliyun.com * epel: mirrors.yun-idc.com * extras: mirrors.aliyun.com * updates: mirrors.aliyun.com Resolving Dependencies --> Running transaction check ---> Package jq.x86_64 0:1.5-1.el7 will be installed --> Processing Dependency: libonig.so.2()(64bit) for package: jq-1.5-1.el7.x86_64 --> Running transaction check ---> Package oniguruma.x86_64 0:5.9.5-3.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved ===================================================================================================================================================================================================== Package Arch Version Repository Size ===================================================================================================================================================================================================== Installing: jq x86_64 1.5-1.el7 epel 153 k Installing for dependencies: oniguruma x86_64 5.9.5-3.el7 epel 129 k Transaction Summary ===================================================================================================================================================================================================== Install 1 Package (+1 Dependent package) Total download size: 282 k Installed size: 906 k Downloading packages: (1/2): jq-1.5-1.el7.x86_64.rpm | 153 kB 00:00:05 (2/2): oniguruma-5.9.5-3.el7.x86_64.rpm | 129 kB 00:00:06 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Total 43 kB/s | 282 kB 00:00:06 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : oniguruma-5.9.5-3.el7.x86_64 1/2 Installing : jq-1.5-1.el7.x86_64 2/2 Verifying : oniguruma-5.9.5-3.el7.x86_64 1/2 Verifying : jq-1.5-1.el7.x86_64 2/2 Installed: jq.x86_64 0:1.5-1.el7 Dependency Installed: oniguruma.x86_64 0:5.9.5-3.el7 Complete! [root@Hexindai-C11-71 software]#
[root@wjf-C11-71 software]# curl http://wjf-C11-71:10501/metrics |jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 261 0 261 0 0 21285 0 --:--:-- --:--:-- --:--:-- 21750 { "CHANNEL.c1": { "ChannelCapacity": "1000", "ChannelFillPercentage": "0.0", "Type": "CHANNEL", "ChannelSize": "0", "EventTakeSuccessCount": "0", "EventTakeAttemptCount": "220", "StartTime": "1564729794962", "EventPutAttemptCount": "0", "EventPutSuccessCount": "0", "StopTime": "0" } } [root@wjf-C11-71 software]#
温馨提示: 如果你还要想了解更多度量值,可参考官方文档:http://flume.apache.org/FlumeUserGuide.html#monitoring。
[root@Hexindai-C11-71 software]# netstat -untalp |grep 8888 tcp 0 0 172.20.11.71:8888 0.0.0.0:* LISTEN 18333/java [root@Hexindai-C11-71 software]# jps 18784 Jps 6836 NodeManager 7796 DataNode 18333 Application 7647 NameNode [root@Hexindai-C11-71 software]# [root@Hexindai-C11-71 software]# ps aux|grep flume root 18333 0.2 1.1 3632388 90324 pts/0 Sl 15:09 0:04 /export/servers/jdk1.8.0_211/bin/java -Xmx20m -Dflume.monitoring.type=http -Dflume.monitoring.port=10501 -Dflume.root.logger=INFO,console -cp /export/servers/apache-flume-1.9.0-bin/conf:/export/servers/apache-flume-1.9.0-bin/lib/*:/export/servers/hadoop-2.9.2/etc/hadoop:/export/servers/hadoop-2.9.2/share/hadoop/common/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/common/*:/export/servers/hadoop-2.9.2/share/hadoop/hdfs:/export/servers/hadoop-2.9.2/share/hadoop/hdfs/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/hdfs/*:/export/servers/hadoop-2.9.2/share/hadoop/yarn:/export/servers/hadoop-2.9.2/share/hadoop/yarn/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/yarn/*:/export/servers/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/mapreduce/*:/export/servers/hadoop-2.9.2/contrib/capacity-scheduler/*.jar:/lib/* -Djava.library.path=:/export/servers/hadoop-2.9.2/lib/native org.apache.flume.node.Application --name rabin --conf-file /export/data/flume/job/flume-netcat.conf root 18815 0.0 0.0 112708 984 pts/0 S+ 15:42 0:00 grep --color=auto flume [root@Hexindai-C11-71 software]# kill 18333 [root@Hexindai-C11-71 software]# [root@Hexindai-C11-71 software]#
2、实时读取本地文件到HDFS集群(需要flume节点配置hadoop集群环境,exec source - memory channel - hdfs sink)

[root@Hexindai-C11-71 ~]# cat /export/data/flume/job/flume-hdfs.conf rabin2.sources = file_source rabin2.sinks = hdfs_sink rabin2.channels = memory_channel rabin2.sources.file_source.type = exec rabin2.sources.file_source.command = tail -F /var/log/messages rabin2.sources.file_source.shell = /bin/bash -c rabin2.sinks.hdfs_sink.type = hdfs rabin2.sinks.hdfs_sink.hdfs.path = hdfs://Hexindai-C11-71:8020/flume/%Y%m%d/%H #上传文件的前缀 rabin2.sinks.hdfs_sink.hdfs.filePrefix = 172.20.11.73- #是否按照时间滚动文件夹 rabin2.sinks.hdfs_sink.hdfs.round = true #多少时间单位创建一个新的文件夹 rabin2.sinks.hdfs_sink.hdfs.roundValue = 1 #重新定义时间单位 rabin2.sinks.hdfs_sink.hdfs.roundUnit = hour #是否使用本地时间戳 rabin2.sinks.hdfs_sink.hdfs.useLocalTimeStamp = true #积攒多少个Event才flush到HDFS一次 rabin2.sinks.hdfs_sink.hdfs.batchSize = 1000 #设置文件类型,可支持压缩 rabin2.sinks.hdfs_sink.hdfs.fileType = DataStream #多久生成一个新的文件 rabin2.sinks.hdfs_sink.hdfs.rollInterval = 600 #设置每个文件的滚动大小 rabin2.sinks.hdfs_sink.hdfs.rollSize = 134217700 #文件的滚动与Event数量无关 rabin2.sinks.hdfs_sink.hdfs.rollCount = 0 #最小副本数 rabin2.sinks.hdfs_sink.hdfs.minBlockReplicas = 1 rabin2.channels.memory_channel.type = memory rabin2.channels.memory_channel.capacity = 1000 rabin2.channels.memory_channel.transactionCapacity = 1000 rabin2.sources.file_source.channels = memory_channel rabin2.sinks.hdfs_sink.channel = memory_channel [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# cat /export/data/flume/shell/start-hdfs.sh #!/bin/bash #@author :rabin #blog:http://www.cnblogs.com/wtnyihg #将监控数据发送给ganglia,需要指定ganglia服务器地址,使用请确认是否部署好ganglia服务! #nohup flume-ng agent -c /export/softwares/apache-flume-1.9.0-bin/conf --conf-file=/export/data/flume/job/flume-hdfs.conf --name rabin2 -Dflume.monitoring.type=ganglia -Dflume.monitoring.hosts=node105.rabin.org.cn:8649 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-ganglia-flume-hdfs.log 2>&1 & #启动flume自身的监控参数,默认执行以下脚本 nohup flume-ng agent -c /export/data/flume/job --conf-file=/export/data/flume/job/flume-hdfs.conf --name rabin2 -Dflume.monitoring.type=http -Dflume.monitoring.port=10502 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-hdfs.log 2>&1 & [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# chmod +x /export/data/flume/shell/start-hdfs.sh [root@Hexindai-C11-71 ~]# /export/data/flume/shell/start-hdfs.sh [root@Hexindai-C11-71 ~]# ss -antlp|grep 10502 LISTEN 0 50 *:10502 *:* users:(("java",pid=27376,fd=358)) [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# tail -50f /export/data/flume/log/flume-hdfs.log SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/08/06 14:15:29 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting 19/08/06 14:15:29 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/export/data/flume/job/flume-hdfs.conf 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:file_source 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Added sinks: hdfs_sink Agent: rabin2 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:file_source 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:file_source 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:file_source 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 14:15:29 WARN conf.FlumeConfiguration: Agent configuration for 'rabin2' has no configfilters. 19/08/06 14:15:29 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [rabin2] 19/08/06 14:15:29 INFO node.AbstractConfigurationProvider: Creating channels 19/08/06 14:15:29 INFO channel.DefaultChannelFactory: Creating instance of channel memory_channel type memory 19/08/06 14:15:29 INFO node.AbstractConfigurationProvider: Created channel memory_channel 19/08/06 14:15:29 INFO source.DefaultSourceFactory: Creating instance of source file_source, type exec 19/08/06 14:15:29 INFO sink.DefaultSinkFactory: Creating instance of sink: hdfs_sink, type: hdfs 19/08/06 14:15:29 INFO node.AbstractConfigurationProvider: Channel memory_channel connected to [file_source, hdfs_sink] 19/08/06 14:15:29 INFO node.Application: Starting new configuration:{ sourceRunners:{file_source=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:file_source,state:IDLE} }} sinkRunners:{hdfs_sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@5f0629fc counterGroup:{ name:null counters:{} } }} channels:{memory_channel=org.apache.flume.channel.MemoryChannel{name: memory_channel}} } 19/08/06 14:15:29 INFO node.Application: Starting Channel memory_channel 19/08/06 14:15:29 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: memory_channel: Successfully registered new MBean. 19/08/06 14:15:29 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memory_channel started 19/08/06 14:15:29 INFO node.Application: Starting Sink hdfs_sink 19/08/06 14:15:29 INFO node.Application: Starting Source file_source 19/08/06 14:15:29 INFO source.ExecSource: Exec source starting with command: tail -F /var/log/messages 19/08/06 14:15:29 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: file_source: Successfully registered new MBean. 19/08/06 14:15:29 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: file_source started 19/08/06 14:15:29 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: hdfs_sink: Successfully registered new MBean. 19/08/06 14:15:29 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: hdfs_sink started 19/08/06 14:15:29 INFO util.log: Logging initialized @573ms to org.eclipse.jetty.util.log.Slf4jLog 19/08/06 14:15:29 INFO server.Server: jetty-9.4.6.v20170531 19/08/06 14:15:29 INFO server.AbstractConnector: Started ServerConnector@29309fa0{HTTP/1.1,[http/1.1]}{0.0.0.0:10502} 19/08/06 14:15:29 INFO server.Server: Started @701ms 19/08/06 14:15:33 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false 19/08/06 14:15:33 INFO hdfs.BucketWriter: Creating hdfs://Hexindai-C11-71:8020/flume/20190806/14/172.20.11.73-.1565072133413.tmp ^C^C [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# hdfs dfs -ls /flume/20190806/14 Found 1 items -rw-r--r-- 3 root supergroup 841 2019-08-06 14:15 /flume/20190806/14/172.20.11.73-.1565072133413.tmp [root@Hexindai-C11-71 ~]#
[root@wjf-C11-71 ~]# [root@wjf-C11-71 ~]# curl http://172.20.11.71:10502/metrics|jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 958 0 958 0 0 66699 0 --:--:-- --:--:-- --:--:-- 73692 { "SOURCE.file_source": { "AppendBatchAcceptedCount": "0", #成功提交到channel的批次的总数量 "GenericProcessingFail": "0", #常规处理失败的次数 "EventAcceptedCount": "11", #成功写入到channel的事件总数量 "AppendReceivedCount": "0", #每批只有一个事件的事件总数量(与RPC调用的一个append调用相等) "StartTime": "1565072129383", #SOURCE启动时的毫秒值时间 "AppendBatchReceivedCount": "0", #接收到事件批次的总数量 "ChannelWriteFail": "0", #往CHANNEL写失败的次数 "EventReceivedCount": "11", #目前为止source已经接收到的事件总数量 "EventReadFail": "0", #事件读取失败的次数 "Type": "SOURCE", #当前类型为source "AppendAcceptedCount": "0", #逐条录入的次数,单独传入的事件到Channel且成功返回的事件总数量 "OpenConnectionCount": "0", #目前与客户端或sink保持连接的总数量,目前支持avro source展现该度量 "StopTime": "0" #source停止时的毫秒值时间,0代表一直运行 }, "CHANNEL.memory_channel": { "ChannelCapacity": "1000", #channel的容量,目前仅支持File Channel, Memory channel的统计数据。 "ChannelFillPercentage": "0.0", #channel已填入的百分比。 "Type": "CHANNEL", #当前类型为CHANNEL "ChannelSize": "0", #目前channel中事件的总数量,目前仅支持File Channel,Memory channel的统计数据。 "EventTakeSuccessCount": "11", #sink成功从channel读取事件的总数量 "EventTakeAttemptCount": "83", #sink尝试从channel拉取事件的总次数,这不意味着每次事件都被返回,因为sink拉取的时候channel可能没有任何数据 "StartTime": "1565072129380", #CHANNEL启动时的毫秒值时间。 "EventPutAttemptCount": "11", #Source尝试写入channel的事件总次数 "EventPutSuccessCount": "11", #成功写入channel且提交的事件总次数 "StopTime": "0" #CHANNEL停止时的毫秒值时间 }, "SINK.hdfs_sink": { "ConnectionCreatedCount": "1", #下一阶段(或存储系统)创建链接的数量(如HDFS创建一个文件) "BatchCompleteCount": "0", #批量处理event的个数等于处理大小的数量 "EventWriteFail": "0", #时间写失败的次数 "BatchEmptyCount": "69", #批量处理event的个数为0的数量(空的批量的数量),如果数量很大表示source写入数据的速度比sink处理数据的速度慢很多。 "EventDrainAttemptCount": "11", #sink尝试写出到存储的事件总数量 "StartTime": "1565072129386", #Sink启动时的毫秒值时间 "BatchUnderflowCount": "2", #批量处理event的个数小于批处理大小数量(比Sink配置使用的最大批量尺寸更小的批量的数量)如果该值很高也表示sink比source更快 "ChannelReadFail": "0", #从CHANNEL读取失败的次数 "ConnectionFailedCount": "0", #连接失败的次数 "ConnectionClosedCount": "0", #连接关闭的次数 "Type": "SINK", #当前类型为sink "EventDrainSuccessCount": "11", #sink成功写出到存储的事件总数量 "StopTime": "0" #sink停止时的毫秒值时间。 } } [root@wjf-C11-71 ~]#
3、实时指定目录文件内容到HDFS集群(需要flume节点配置Hadoop集群环境, spooldir source - memory channel - hdfs sink)
[root@Hexindai-C11-71 ~]# cat /export/data/flume/job/flume-dir.conf rabin3.sources = spooldir_source rabin3.sinks = hdfs_sink rabin3.channels = memory_channel # Describe/configure the source rabin3.sources.spooldir_source.type = spooldir rabin3.sources.spooldir_source.spoolDir = /export/data/flume/upload rabin3.sources.spooldir_source.fileSuffix = .COMPLETED rabin3.sources.spooldir_source.fileHeader = true #忽略所有以.tmp结尾的文件,不上传 rabin3.sources.spooldir_source.ignorePattern = ([^ ]*\.tmp) #获取源文件名称,方便下面的sink调用变量fileName rabin3.sources.spooldir_source.basenameHeader = true rabin3.sources.spooldir_source.basenameHeaderKey = fileName # Describe the sink rabin3.sinks.hdfs_sink.type = hdfs rabin3.sinks.hdfs_sink.hdfs.path = hdfs://Hexindai-C11-71:8020/flume #上传文件的前缀 rabin3.sinks.hdfs_sink.hdfs.filePrefix = 172.20.11.73-upload- #是否按照时间滚动文件夹 rabin3.sinks.hdfs_sink.hdfs.round = true #多少时间单位创建一个新的文件夹 rabin3.sinks.hdfs_sink.hdfs.roundValue = 1 #重新定义时间单位 rabin3.sinks.hdfs_sink.hdfs.roundUnit = hour #是否使用本地时间戳 rabin3.sinks.hdfs_sink.hdfs.useLocalTimeStamp = true #积攒多少个Event才flush到HDFS一次 rabin3.sinks.hdfs_sink.hdfs.batchSize = 100 #设置文件类型,可支持压缩 rabin3.sinks.hdfs_sink.hdfs.fileType = DataStream #多久生成一个新的文件 rabin3.sinks.hdfs_sink.hdfs.rollInterval = 60 #设置每个文件的滚动大小大概是128M rabin3.sinks.hdfs_sink.hdfs.rollSize = 134217700 #文件的滚动与Event数量无关 rabin3.sinks.hdfs_sink.hdfs.rollCount = 0 #最小冗余数 rabin3.sinks.hdfs_sink.hdfs.minBlockReplicas = 1 #和source的basenameHeader,basenameHeaderKey两个属性一起用可以保持原文件名称上传 rabin3.sinks.hdfs_sink.hdfs.filePrefix = %{fileName} # Use a channel which buffers events in memory rabin3.channels.memory_channel.type = memory rabin3.channels.memory_channel.capacity = 1000 rabin3.channels.memory_channel.transactionCapacity = 1000 # Bind the source and sink to the channel rabin3.sources.spooldir_source.channels = memory_channel rabin3.sinks.hdfs_sink.channel = memory_channel [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# cat /export/data/flume/shell/start-dir.sh #!/bin/bash #@author :rabin #blog:http://www.cnblogs.com/wtnyihg #将监控数据发送给ganglia,需要指定ganglia服务器地址,使用请确认是否部署好ganglia服务! #nohup flume-ng agent -c /export/data/flume/job --conf-file=/export/data/flume/job/flume-dir.conf --name rabin3 -Dflume.monitoring.type=ganglia -Dflume.monitoring.hosts=node105.rabin.org.cn:8649 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-ganglia-flume-dir.log 2>&1 & #启动flume自身的监控参数,默认执行以下脚本 nohup flume-ng agent -c /export/data/flume/job --conf-file=/export/data/flume/job/flume-dir.conf --name rabin3 -Dflume.monitoring.type=http -Dflume.monitoring.port=10503 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-dir.log 2>&1 & [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# mkdir /export/data/flume/upload [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# echo https://www.cnblogs.com/wtnyihg > /export/data/flume/upload/rabin.blog [root@Hexindai-C11-71 ~]# echo https://www.cnblogs.com/wtnyihg > /export/data/flume/upload/rabin2.tmp [root@Hexindai-C11-71 ~]# echo https://www.cnblogs.com/wtnyihg > /export/data/flume/upload/rabin3.txt [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# chmod +x /export/data/flume/shell/start-dir.sh [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# /export/data/flume/shell/start-dir.sh [root@Hexindai-C11-71 ~]# tail -f /export/data/flume/log/flume- flume-dir.log flume-hdfs.log flume-netcat.log [root@Hexindai-C11-71 ~]# tail -f /export/data/flume/log/flume-dir.log 19/08/06 15:20:38 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one. 19/08/06 15:20:38 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false 19/08/06 15:20:38 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /export/data/flume/upload/rabin.blog to /export/data/flume/upload/rabin.blog.COMPLETED 19/08/06 15:20:38 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one. 19/08/06 15:20:38 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /export/data/flume/upload/rabin3.txt to /export/data/flume/upload/rabin3.txt.COMPLETED 19/08/06 15:20:38 INFO server.AbstractConnector: Started ServerConnector@4ac9f13f{HTTP/1.1,[http/1.1]}{0.0.0.0:10503} 19/08/06 15:20:38 INFO server.Server: Started @832ms 19/08/06 15:20:38 INFO hdfs.BucketWriter: Creating hdfs://Hexindai-C11-71:8020/flume/rabin.blog.1565076038569.tmp 19/08/06 15:20:39 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false 19/08/06 15:20:39 INFO hdfs.BucketWriter: Creating hdfs://Hexindai-C11-71:8020/flume/rabin3.txt.1565076039623.tmp ^C [root@Hexindai-C11-71 ~]# ss -antlp|grep 10503 LISTEN 0 50 *:10503 *:* users:(("java",pid=28259,fd=365)) [root@Hexindai-C11-71 ~]#
[root@wjf-C11-71 ~]# ll /export/data/flume/upload/ total 12 -rw-r--r-- 1 root root 32 Aug 6 15:18 rabin2.tmp -rw-r--r-- 1 root root 32 Aug 6 15:18 rabin3.txt.COMPLETED -rw-r--r-- 1 root root 32 Aug 6 15:18 rabin.blog.COMPLETED [root@wjf-C11-71 ~]# [root@wjf-C11-71 ~]# [root@wjf-C11-71 ~]# [root@wjf-C11-71 ~]# hdfs dfs -ls /flume Found 3 items drwxr-xr-x - root supergroup 0 2019-08-06 15:00 /flume/20190806 -rw-r--r-- 3 root supergroup 32 2019-08-06 15:21 /flume/rabin.blog.1565076038569 -rw-r--r-- 3 root supergroup 32 2019-08-06 15:21 /flume/rabin3.txt.1565076039623 [root@wjf-C11-71 ~]# [root@wjf-C11-71 ~]#
[root@Hexindai-C11-71 ~]# curl http://172.20.11.71:10503/metrics|jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 955 0 955 0 0 5850 0 --:--:-- --:--:-- --:--:-- 5895 { "CHANNEL.memory_channel": { "ChannelCapacity": "1000", "ChannelFillPercentage": "0.0", "Type": "CHANNEL", "ChannelSize": "0", "EventTakeSuccessCount": "2", "EventTakeAttemptCount": "77", "StartTime": "1565076038303", "EventPutAttemptCount": "2", "EventPutSuccessCount": "2", "StopTime": "0" }, "SOURCE.spooldir_source": { "AppendBatchAcceptedCount": "2", "GenericProcessingFail": "0", "EventAcceptedCount": "2", "AppendReceivedCount": "0", "StartTime": "1565076038364", "AppendBatchReceivedCount": "2", "ChannelWriteFail": "0", "EventReceivedCount": "2", "EventReadFail": "0", "Type": "SOURCE", "AppendAcceptedCount": "0", "OpenConnectionCount": "0", "StopTime": "0" }, "SINK.hdfs_sink": { "ConnectionCreatedCount": "2", "BatchCompleteCount": "0", "EventWriteFail": "0", "BatchEmptyCount": "74", "EventDrainAttemptCount": "2", "StartTime": "1565076038309", "BatchUnderflowCount": "1", "ChannelReadFail": "0", "ConnectionFailedCount": "0", "ConnectionClosedCount": "2", "Type": "SINK", "EventDrainSuccessCount": "2", "StopTime": "0" } } [root@Hexindai-C11-71 ~]#
4、Flume与Flume之间数据传递,多Flume汇总数据到Flume(需要flume节点配置hadoop集群环境,大致架构如下图所示)
flume-1监控文件yinzhengjie.log,flume-2监控某一个端口的数据流,flume-1与flume-2将数据发送给flume-3,flume3将最终数据写入到HDFS。

[root@Hexindai-C11-71 ~]# cat /export/data/flume/job/flume-aggregation.conf # Name the components on this agent aggregation.sources = avro_source aggregation.sinks = hdfs_sink aggregation.channels = memory_channel # Describe/configure the source aggregation.sources.avro_source.type = avro aggregation.sources.avro_source.bind = Hexindai-C11-71 aggregation.sources.avro_source.port = 6666 # Describe the sink aggregation.sinks.hdfs_sink.type = hdfs aggregation.sinks.hdfs_sink.hdfs.path = hdfs://Hexindai-C11-71:8020/flume/%Y%m%d/%H #上传文件的前缀 aggregation.sinks.hdfs_sink.hdfs.filePrefix = 172.20.11.73- #是否按照时间滚动文件夹 aggregation.sinks.hdfs_sink.hdfs.round = true #多少时间单位创建一个新的文件夹 aggregation.sinks.hdfs_sink.hdfs.roundValue = 1 #重新定义时间单位 aggregation.sinks.hdfs_sink.hdfs.roundUnit = hour #是否使用本地时间戳 aggregation.sinks.hdfs_sink.hdfs.useLocalTimeStamp = true #积攒多少个Event才flush到HDFS一次 aggregation.sinks.hdfs_sink.hdfs.batchSize = 100 #设置文件类型,可支持压缩 aggregation.sinks.hdfs_sink.hdfs.fileType = DataStream #多久生成一个新的文件 aggregation.sinks.hdfs_sink.hdfs.rollInterval = 600 #设置每个文件的滚动大小大概是128M aggregation.sinks.hdfs_sink.hdfs.rollSize = 134217700 #文件的滚动与Event数量无关 aggregation.sinks.hdfs_sink.hdfs.rollCount = 0 #最小冗余数 aggregation.sinks.hdfs_sink.hdfs.minBlockReplicas = 1 # Describe the channel aggregation.channels.memory_channel.type = memory aggregation.channels.memory_channel.capacity = 1000 aggregation.channels.memory_channel.transactionCapacity = 100 # Bind the source and sink to the channel aggregation.sources.avro_source.channels = memory_channel aggregation.sinks.hdfs_sink.channel = memory_channel [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# cat /export/data/flume/shell/start-aggregation.sh #!/bin/bash #@author :wtnyihg #blog:http://www.cnblogs.com/wtnyihg #将监控数据发送给ganglia,需要指定ganglia服务器地址,使用请确认是否部署好ganglia服务! #nohup flume-ng agent -c /export/data/flume/job --conf-file=/export/data/flume/job/flume-aggregation.conf --name aggregation -Dflume.monitoring.type=ganglia -Dflume.monitoring.hosts=node105.wtnyihg.org.cn:8649 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-ganglia-flume-aggregation.log 2>&1 & #启动flume自身的监控参数,默认执行以下脚本 nohup flume-ng agent -c /export/data/flume/job --conf-file=/export/data/flume/job/flume-aggregation.conf --name aggregation -Dflume.monitoring.type=http -Dflume.monitoring.port=10511 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-aggregation.log 2>&1 & [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# chmod +x /export/data/flume/shell/start-aggregation.sh [root@Hexindai-C11-71 ~]# /export/data/flume/shell/start-aggregation.sh [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# tail -50f /export/data/flume/log/flume-aggregation.log SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/08/06 16:33:41 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting 19/08/06 16:33:41 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/export/data/flume/job/flume-aggregation.conf 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Added sinks: hdfs_sink Agent: aggregation 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:avro_source 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:avro_source 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:avro_source 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:avro_source 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 16:33:41 WARN conf.FlumeConfiguration: Agent configuration for 'aggregation' has no configfilters. 19/08/06 16:33:41 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [aggregation] 19/08/06 16:33:41 INFO node.AbstractConfigurationProvider: Creating channels 19/08/06 16:33:41 INFO channel.DefaultChannelFactory: Creating instance of channel memory_channel type memory 19/08/06 16:33:41 INFO node.AbstractConfigurationProvider: Created channel memory_channel 19/08/06 16:33:41 INFO source.DefaultSourceFactory: Creating instance of source avro_source, type avro 19/08/06 16:33:41 INFO sink.DefaultSinkFactory: Creating instance of sink: hdfs_sink, type: hdfs 19/08/06 16:33:41 INFO node.AbstractConfigurationProvider: Channel memory_channel connected to [avro_source, hdfs_sink] 19/08/06 16:33:41 INFO node.Application: Starting new configuration:{ sourceRunners:{avro_source=EventDrivenSourceRunner: { source:Avro source avro_source: { bindAddress: Hexindai-C11-71, port: 6666 } }} sinkRunners:{hdfs_sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@377fa66b counterGroup:{ name:null counters:{} } }} channels:{memory_channel=org.apache.flume.channel.MemoryChannel{name: memory_channel}} } 19/08/06 16:33:41 INFO node.Application: Starting Channel memory_channel 19/08/06 16:33:41 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: memory_channel: Successfully registered new MBean. 19/08/06 16:33:41 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memory_channel started 19/08/06 16:33:41 INFO node.Application: Starting Sink hdfs_sink 19/08/06 16:33:41 INFO node.Application: Starting Source avro_source 19/08/06 16:33:41 INFO source.AvroSource: Starting Avro source avro_source: { bindAddress: Hexindai-C11-71, port: 6666 }... 19/08/06 16:33:41 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: hdfs_sink: Successfully registered new MBean. 19/08/06 16:33:41 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: hdfs_sink started 19/08/06 16:33:41 INFO util.log: Logging initialized @608ms to org.eclipse.jetty.util.log.Slf4jLog 19/08/06 16:33:41 INFO server.Server: jetty-9.4.6.v20170531 19/08/06 16:33:41 INFO server.AbstractConnector: Started ServerConnector@5924daf5{HTTP/1.1,[http/1.1]}{0.0.0.0:10511} 19/08/06 16:33:41 INFO server.Server: Started @798ms 19/08/06 16:33:41 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: avro_source: Successfully registered new MBean. 19/08/06 16:33:41 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: avro_source started 19/08/06 16:33:41 INFO source.AvroSource: Avro source avro_source started. ^C [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# cat /export/data/flume/job/flume-my_netcat.conf # Name the components on this agent my_netcat.sources = netcat_source my_netcat.sinks = avro_sink my_netcat.channels = memory_channel # Describe/configure the source my_netcat.sources.netcat_source.type = netcat my_netcat.sources.netcat_source.bind = Hexindai-C11-71 my_netcat.sources.netcat_source.port = 8888 # Describe the sink my_netcat.sinks.avro_sink.type = avro my_netcat.sinks.avro_sink.hostname = Hexindai-C11-71 my_netcat.sinks.avro_sink.port = 6666 # Use a channel which buffers events in memory my_netcat.channels.memory_channel.type = memory my_netcat.channels.memory_channel.capacity = 1000 my_netcat.channels.memory_channel.transactionCapacity = 100 # Bind the source and sink to the channel my_netcat.sources.netcat_source.channels = memory_channel my_netcat.sinks.avro_sink.channel = memory_channel [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# cat /export/data/flume/shell/start-my_netcat.sh #!/bin/bash #@author :wtnyihg #blog:http://www.cnblogs.com/wtnyihg #将监控数据发送给ganglia,需要指定ganglia服务器地址,使用请确认是否部署好ganglia服务! #nohup flume-ng agent -c /export/data/flume/job --conf-file=/export/data/flume/job/flume-my_netcat.conf --name my_netcat -Dflume.monitoring.type=ganglia -Dflume.monitoring.hosts=node105.wtnyihg.org.cn:8649 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-ganglia-flume-my_netcat.log 2>&1 & #启动flume自身的监控参数,默认执行以下脚本 nohup flume-ng agent -c /export/data/flume/job --conf-file=/export/data/flume/job/flume-my_netcat.conf --name my_netcat -Dflume.monitoring.type=http -Dflume.monitoring.port=10512 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-my_netcat.log 2>&1 & [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# chmod +x /export/data/flume/shell/start-my_netcat.sh [root@Hexindai-C11-71 ~]# ss -antlp|grep 10512 [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# /export/data/flume/ job/ log/ shell/ upload/ [root@Hexindai-C11-71 ~]# /export/data/flume/shell/start-my_netcat.sh [root@Hexindai-C11-71 ~]# ss -antlp|grep 10512 LISTEN 0 50 *:10512 *:* users:(("java",pid=29301,fd=388)) [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# tail -50f /export/data/flume/log/flume-my_netcat.log Info: Including Hadoop libraries found via (/export/servers/hadoop-2.9.2/bin/hadoop) for HDFS access Info: Including Hive libraries found via () for Hive access + exec /export/servers/jdk1.8.0_211/bin/java -Xmx20m -Dflume.monitoring.type=http -Dflume.monitoring.port=10512 -Dflume.root.logger=INFO,console -cp '/export/data/flume/job:/export/servers/apache-flume-1.9.0-bin/lib/*:/export/servers/hadoop-2.9.2/etc/hadoop:/export/servers/hadoop-2.9.2/share/hadoop/common/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/common/*:/export/servers/hadoop-2.9.2/share/hadoop/hdfs:/export/servers/hadoop-2.9.2/share/hadoop/hdfs/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/hdfs/*:/export/servers/hadoop-2.9.2/share/hadoop/yarn:/export/servers/hadoop-2.9.2/share/hadoop/yarn/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/yarn/*:/export/servers/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/mapreduce/*:/export/servers/hadoop-2.9.2/contrib/capacity-scheduler/*.jar:/lib/*' -Djava.library.path=:/export/servers/hadoop-2.9.2/lib/native org.apache.flume.node.Application --conf-file=/export/data/flume/job/flume-my_netcat.conf --name my_netcat SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/export/servers/apache-flume-1.9.0-bin/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/export/servers/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/08/06 16:46:11 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting 19/08/06 16:46:11 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/export/data/flume/job/flume-my_netcat.conf 19/08/06 16:46:11 INFO conf.FlumeConfiguration: Processing:netcat_source 19/08/06 16:46:11 INFO conf.FlumeConfiguration: Added sinks: avro_sink Agent: my_netcat 19/08/06 16:46:11 INFO conf.FlumeConfiguration: Processing:netcat_source 19/08/06 16:46:11 INFO conf.FlumeConfiguration: Processing:avro_sink 19/08/06 16:46:11 INFO conf.FlumeConfiguration: Processing:avro_sink 19/08/06 16:46:11 INFO conf.FlumeConfiguration: Processing:netcat_source 19/08/06 16:46:11 INFO conf.FlumeConfiguration: Processing:avro_sink 19/08/06 16:46:11 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 16:46:11 INFO conf.FlumeConfiguration: Processing:netcat_source 19/08/06 16:46:11 INFO conf.FlumeConfiguration: Processing:avro_sink 19/08/06 16:46:11 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 16:46:11 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 16:46:11 WARN conf.FlumeConfiguration: Agent configuration for 'my_netcat' has no configfilters. 19/08/06 16:46:11 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [my_netcat] 19/08/06 16:46:11 INFO node.AbstractConfigurationProvider: Creating channels 19/08/06 16:46:11 INFO channel.DefaultChannelFactory: Creating instance of channel memory_channel type memory 19/08/06 16:46:11 INFO node.AbstractConfigurationProvider: Created channel memory_channel 19/08/06 16:46:11 INFO source.DefaultSourceFactory: Creating instance of source netcat_source, type netcat 19/08/06 16:46:11 INFO sink.DefaultSinkFactory: Creating instance of sink: avro_sink, type: avro 19/08/06 16:46:11 INFO sink.AbstractRpcSink: Connection reset is set to 0. Will not reset connection to next hop 19/08/06 16:46:11 INFO node.AbstractConfigurationProvider: Channel memory_channel connected to [netcat_source, avro_sink] 19/08/06 16:46:11 INFO node.Application: Starting new configuration:{ sourceRunners:{netcat_source=EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:netcat_source,state:IDLE} }} sinkRunners:{avro_sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3ec4b723 counterGroup:{ name:null counters:{} } }} channels:{memory_channel=org.apache.flume.channel.MemoryChannel{name: memory_channel}} } 19/08/06 16:46:11 INFO node.Application: Starting Channel memory_channel 19/08/06 16:46:11 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: memory_channel: Successfully registered new MBean. 19/08/06 16:46:11 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memory_channel started 19/08/06 16:46:11 INFO node.Application: Starting Sink avro_sink 19/08/06 16:46:11 INFO sink.AbstractRpcSink: Starting RpcSink avro_sink { host: Hexindai-C11-71, port: 6666 }... 19/08/06 16:46:11 INFO node.Application: Starting Source netcat_source 19/08/06 16:46:11 INFO source.NetcatSource: Source starting 19/08/06 16:46:11 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: avro_sink: Successfully registered new MBean. 19/08/06 16:46:11 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: avro_sink started 19/08/06 16:46:11 INFO sink.AbstractRpcSink: Rpc sink avro_sink: Building RpcClient with hostname: Hexindai-C11-71, port: 6666 19/08/06 16:46:11 INFO sink.AvroSink: Attempting to create Avro Rpc client. 19/08/06 16:46:11 INFO api.NettyAvroRpcClient: Using default maxIOWorkers 19/08/06 16:46:11 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/172.20.11.71:8888] 19/08/06 16:46:12 INFO util.log: Logging initialized @638ms to org.eclipse.jetty.util.log.Slf4jLog 19/08/06 16:46:12 INFO server.Server: jetty-9.4.6.v20170531 19/08/06 16:46:12 INFO server.AbstractConnector: Started ServerConnector@5ffb42c4{HTTP/1.1,[http/1.1]}{0.0.0.0:10512} 19/08/06 16:46:12 INFO server.Server: Started @760ms 19/08/06 16:46:12 INFO sink.AbstractRpcSink: Rpc sink avro_sink started. ^C [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# cat /export/data/flume/job/flume-my_exec.conf # Name the components on this agent my_exec.sources = exec_source my_exec.sinks = avro_sink my_exec.channels = memory_channel # Describe/configure the source my_exec.sources.exec_source.type = exec my_exec.sources.exec_source.command = tail -F /export/data/flume/blog.txt my_exec.sources.exec_source.shell = /bin/bash -c # Describe the sink my_exec.sinks.avro_sink.type = avro my_exec.sinks.avro_sink.hostname = Hexindai-C11-7 my_exec.sinks.avro_sink.port = 6666 # Describe the channel my_exec.channels.memory_channel.type = memory my_exec.channels.memory_channel.capacity = 1000 my_exec.channels.memory_channel.transactionCapacity = 100 # Bind the source and sink to the channel my_exec.sources.exec_source.channels = memory_channel my_exec.sinks.avro_sink.channel = memory_channel [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# cat /export/data/flume/shell/start-my_exec.sh #!/bin/bash #@author :wtnyihg #blog:http://www.cnblogs.com/wtnyihg #将监控数据发送给ganglia,需要指定ganglia服务器地址,使用请确认是否部署好ganglia服务! #nohup flume-ng agent -c /export/data/flume/job --conf-file=/export/data/flume/job/flume-my_exec.conf --name my_exec -Dflume.monitoring.type=ganglia -Dflume.monitoring.hosts=node105.wtnyihg.org.cn:8649 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-ganglia-flume-my_exec.log 2>&1 & #启动flume自身的监控参数,默认执行以下脚本 nohup flume-ng agent -c /export/data/flume/job --conf-file=/export/data/flume/job/flume-my_exec.conf --name my_exec -Dflume.monitoring.type=http -Dflume.monitoring.port=10513 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-my_exec.log 2>&1 & [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# chmod +x /export/data/flume/shell/start-my_exec.sh [root@Hexindai-C11-71 ~]# /export/data/flume/shell/start-my_exec.sh [root@Hexindai-C11-71 ~]# ss -antlp|grep 10513 LISTEN 0 50 *:10513 *:* users:(("java",pid=29921,fd=359)) [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# tail -50f /export/data/flume/log/flume-my_exec.log Info: Including Hive libraries found via () for Hive access + exec /export/servers/jdk1.8.0_211/bin/java -Xmx20m -Dflume.monitoring.type=http -Dflume.monitoring.port=10513 -Dflume.root.logger=INFO,console -cp '/export/data/flume/job:/export/servers/apache-flume-1.9.0-bin/lib/*:/export/servers/hadoop-2.9.2/etc/hadoop:/export/servers/hadoop-2.9.2/share/hadoop/common/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/common/*:/export/servers/hadoop-2.9.2/share/hadoop/hdfs:/export/servers/hadoop-2.9.2/share/hadoop/hdfs/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/hdfs/*:/export/servers/hadoop-2.9.2/share/hadoop/yarn:/export/servers/hadoop-2.9.2/share/hadoop/yarn/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/yarn/*:/export/servers/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/mapreduce/*:/export/servers/hadoop-2.9.2/contrib/capacity-scheduler/*.jar:/lib/*' -Djava.library.path=:/export/servers/hadoop-2.9.2/lib/native org.apache.flume.node.Application --conf-file=/export/data/flume/job/flume-my_exec.conf --name my_exec SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/export/servers/apache-flume-1.9.0-bin/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/export/servers/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/08/06 17:08:43 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting 19/08/06 17:08:43 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/export/data/flume/job/flume-my_exec.conf 19/08/06 17:08:43 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 17:08:43 INFO conf.FlumeConfiguration: Added sinks: avro_sink Agent: my_exec 19/08/06 17:08:43 INFO conf.FlumeConfiguration: Processing:exec_source 19/08/06 17:08:43 INFO conf.FlumeConfiguration: Processing:avro_sink 19/08/06 17:08:43 INFO conf.FlumeConfiguration: Processing:avro_sink 19/08/06 17:08:43 INFO conf.FlumeConfiguration: Processing:avro_sink 19/08/06 17:08:43 INFO conf.FlumeConfiguration: Processing:exec_source 19/08/06 17:08:43 INFO conf.FlumeConfiguration: Processing:avro_sink 19/08/06 17:08:43 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 17:08:43 INFO conf.FlumeConfiguration: Processing:exec_source 19/08/06 17:08:43 INFO conf.FlumeConfiguration: Processing:exec_source 19/08/06 17:08:43 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 17:08:43 WARN conf.FlumeConfiguration: Agent configuration for 'my_exec' has no configfilters. 19/08/06 17:08:43 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [my_exec] 19/08/06 17:08:43 INFO node.AbstractConfigurationProvider: Creating channels 19/08/06 17:08:43 INFO channel.DefaultChannelFactory: Creating instance of channel memory_channel type memory 19/08/06 17:08:43 INFO node.AbstractConfigurationProvider: Created channel memory_channel 19/08/06 17:08:43 INFO source.DefaultSourceFactory: Creating instance of source exec_source, type exec 19/08/06 17:08:43 INFO sink.DefaultSinkFactory: Creating instance of sink: avro_sink, type: avro 19/08/06 17:08:43 INFO sink.AbstractRpcSink: Connection reset is set to 0. Will not reset connection to next hop 19/08/06 17:08:43 INFO node.AbstractConfigurationProvider: Channel memory_channel connected to [exec_source, avro_sink] 19/08/06 17:08:43 INFO node.Application: Starting new configuration:{ sourceRunners:{exec_source=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:exec_source,state:IDLE} }} sinkRunners:{avro_sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@464c27fc counterGroup:{ name:null counters:{} } }} channels:{memory_channel=org.apache.flume.channel.MemoryChannel{name: memory_channel}} } 19/08/06 17:08:43 INFO node.Application: Starting Channel memory_channel 19/08/06 17:08:43 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: memory_channel: Successfully registered new MBean. 19/08/06 17:08:43 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memory_channel started 19/08/06 17:08:43 INFO node.Application: Starting Sink avro_sink 19/08/06 17:08:43 INFO node.Application: Starting Source exec_source 19/08/06 17:08:43 INFO source.ExecSource: Exec source starting with command: tail -F /export/data/flume/blog.txt 19/08/06 17:08:43 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: exec_source: Successfully registered new MBean. 19/08/06 17:08:43 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: exec_source started 19/08/06 17:08:43 INFO sink.AbstractRpcSink: Starting RpcSink avro_sink { host: Hexindai-C11-71, port: 6666 }... 19/08/06 17:08:43 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: avro_sink: Successfully registered new MBean. 19/08/06 17:08:43 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: avro_sink started 19/08/06 17:08:43 INFO sink.AbstractRpcSink: Rpc sink avro_sink: Building RpcClient with hostname: Hexindai-C11-71, port: 6666 19/08/06 17:08:43 INFO sink.AvroSink: Attempting to create Avro Rpc client. 19/08/06 17:08:43 INFO util.log: Logging initialized @611ms to org.eclipse.jetty.util.log.Slf4jLog 19/08/06 17:08:43 INFO api.NettyAvroRpcClient: Using default maxIOWorkers 19/08/06 17:08:43 INFO server.Server: jetty-9.4.6.v20170531 19/08/06 17:08:43 INFO server.AbstractConnector: Started ServerConnector@2e48b08d{HTTP/1.1,[http/1.1]}{0.0.0.0:10513} 19/08/06 17:08:43 INFO server.Server: Started @735ms 19/08/06 17:08:43 INFO sink.AbstractRpcSink: Rpc sink avro_sink started. ^C [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# telnet Hexindai-C11-71 8888 Trying 172.20.11.71... Connected to Hexindai-C11-71. Escape character is '^]'. this is a first test! OK
[root@Hexindai-C11-71 ~]# echo "https://www.cnblogs.com/wtnyihg" >> /export/data/flume/blog.txt [root@Hexindai-C11-71 ~]#
[root@wjf-C11-71 ~]# hdfs dfs -cat /flume/20190806/17/172.20.11.73-.1565082663855.tmp #查看上面写入的2条测试数据 this is a first test! https://www.cnblogs.com/wtnyihg [root@wjf-C11-71 ~]#
5、挑选器案例
channel selector: 通道挑选器,选择指定的event发送到指定的channel (1)Replicating Channel Selector 默认为副本挑选器,事件均以副本方式输出,换句话说就是有几个channel就发送几个副本。 (2)multiplexing selector 多路复用挑选器,作用就是可以将不同的内容发送到指定的channel 详情请参考: http://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html#flume-channel-selectors
#多路复用流程
[root@Hexindai-C11-71 ~]# cat /export/data/flume/job/flume-my_avro.conf # Name the components on this agent my_avro.sources = avro_source my_avro.sinks = hdfs_sink my_avro.channels = memory_channel # Describe/configure the source my_avro.sources.avro_source.type = avro my_avro.sources.avro_source.bind = Hexindai-C11-71 my_avro.sources.avro_source.port = 8888 # 定义到hdfs的sink my_avro.sinks.hdfs_sink.type = hdfs my_avro.sinks.hdfs_sink.hdfs.path = hdfs://Hexindai-C11-71:8020/flume/%Y%m%d/%H #上传文件的前缀 my_avro.sinks.hdfs_sink.hdfs.filePrefix = 172.20.11.71 - #是否按照时间滚动文件夹 my_avro.sinks.hdfs_sink.hdfs.round = true #多少时间单位创建一个新的文件夹 my_avro.sinks.hdfs_sink.hdfs.roundValue = 1 #重新定义时间单位 my_avro.sinks.hdfs_sink.hdfs.roundUnit = hour #是否使用本地时间戳 my_avro.sinks.hdfs_sink.hdfs.useLocalTimeStamp = true #积攒多少个Event才flush到HDFS一次 my_avro.sinks.hdfs_sink.hdfs.batchSize = 100 #设置文件类型,可支持压缩 my_avro.sinks.hdfs_sink.hdfs.fileType = DataStream #多久生成一个新的文件 my_avro.sinks.hdfs_sink.hdfs.rollInterval = 60 #设置每个文件的滚动大小大概是128M my_avro.sinks.hdfs_sink.hdfs.rollSize = 134210000 #文件的滚动与Event数量无关 my_avro.sinks.hdfs_sink.hdfs.rollCount = 0 #最小冗余数 my_avro.sinks.hdfs_sink.hdfs.minBlockReplicas = 1 # Describe the channel my_avro.channels.memory_channel.type = memory my_avro.channels.memory_channel.capacity = 1000 my_avro.channels.memory_channel.transactionCapacity = 100 # Bind the source and sink to the channel my_avro.sources.avro_source.channels = memory_channel my_avro.sinks.hdfs_sink.channel = memory_channel [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# cat /export/data/flume/shell/start-my_avro.sh #!/bin/bash #@author :wtnyihg #blog:http://www.cnblogs.com/wtnyihg #将监控数据发送给ganglia,需要指定ganglia服务器地址,使用请确认是否部署好ganglia服务! #nohup flume-ng agent -c /export/data/flume/job --conf-file=/export/data/flume/job/flume-my_avro.conf --name my_avro -Dflume.monitoring.type=ganglia -Dflume.monitoring.hosts=node105.wtnyihg.org.cn:8649 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-ganglia-flume-my_avro.log 2>&1 & #启动flume自身的监控参数,默认执行以下脚本 nohup flume-ng agent -c /export/data/flume/job --conf-file=/export/data/flume/job/flume-my_avro.conf --name my_avro -Dflume.monitoring.type=http -Dflume.monitoring.port=10514 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-my_avro.log 2>&1 & [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# chmod +x /export/data/flume/shell/start-my_avro.sh [root@Hexindai-C11-71 ~]# /export/data/flume/shell/start-my_avro.sh [root@Hexindai-C11-71 ~]# ss -antlp|grep 10514 LISTEN 0 50 *:10514 *:* users:(("java",pid=31472,fd=358)) [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# tail -50f /export/data/flume/log/flume-my_avro.log SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/08/06 17:57:21 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting 19/08/06 17:57:21 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/export/data/flume/job/flume-my_avro.conf 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:avro_source 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:avro_source 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Added sinks: hdfs_sink Agent: my_avro 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:avro_source 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:avro_source 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 17:57:21 WARN conf.FlumeConfiguration: Agent configuration for 'my_avro' has no configfilters. 19/08/06 17:57:21 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [my_avro] 19/08/06 17:57:21 INFO node.AbstractConfigurationProvider: Creating channels 19/08/06 17:57:21 INFO channel.DefaultChannelFactory: Creating instance of channel memory_channel type memory 19/08/06 17:57:21 INFO node.AbstractConfigurationProvider: Created channel memory_channel 19/08/06 17:57:21 INFO source.DefaultSourceFactory: Creating instance of source avro_source, type avro 19/08/06 17:57:21 INFO sink.DefaultSinkFactory: Creating instance of sink: hdfs_sink, type: hdfs 19/08/06 17:57:21 INFO node.AbstractConfigurationProvider: Channel memory_channel connected to [avro_source, hdfs_sink] 19/08/06 17:57:21 INFO node.Application: Starting new configuration:{ sourceRunners:{avro_source=EventDrivenSourceRunner: { source:Avro source avro_source: { bindAddress: Hexindai-C11-71, port: 8888 } }} sinkRunners:{hdfs_sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3e7c1790 counterGroup:{ name:null counters:{} } }} channels:{memory_channel=org.apache.flume.channel.MemoryChannel{name: memory_channel}} } 19/08/06 17:57:21 INFO node.Application: Starting Channel memory_channel 19/08/06 17:57:22 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: memory_channel: Successfully registered new MBean. 19/08/06 17:57:22 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memory_channel started 19/08/06 17:57:22 INFO node.Application: Starting Sink hdfs_sink 19/08/06 17:57:22 INFO node.Application: Starting Source avro_source 19/08/06 17:57:22 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: hdfs_sink: Successfully registered new MBean. 19/08/06 17:57:22 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: hdfs_sink started 19/08/06 17:57:22 INFO source.AvroSource: Starting Avro source avro_source: { bindAddress: Hexindai-C11-71, port: 8888 }... 19/08/06 17:57:22 INFO util.log: Logging initialized @592ms to org.eclipse.jetty.util.log.Slf4jLog 19/08/06 17:57:22 INFO server.Server: jetty-9.4.6.v20170531 19/08/06 17:57:22 INFO server.AbstractConnector: Started ServerConnector@4c9f3e39{HTTP/1.1,[http/1.1]}{0.0.0.0:10514} 19/08/06 17:57:22 INFO server.Server: Started @796ms 19/08/06 17:57:22 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: avro_source: Successfully registered new MBean. 19/08/06 17:57:22 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: avro_source started 19/08/06 17:57:22 INFO source.AvroSource: Avro source avro_source started. ^C [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# cat /export/data/flume/job/flume-my_file_roll.conf # Name the components on this agent my_file_roll.sources = avro_source my_file_roll.sinks = file_roll_sink my_file_roll.channels = memory_channel # Describe/configure the source my_file_roll.sources.avro_source.type = avro my_file_roll.sources.avro_source.bind = Hexindai-C11-71 my_file_roll.sources.avro_source.port = 9999 # Describe the sink my_file_roll.sinks.file_roll_sink.type = file_roll #输出的本地目录必须是已经存在的目录,如果该目录不存在,并不会创建新的目录。 my_file_roll.sinks.file_roll_sink.sink.directory = /export/data/flume/output # Describe the channel my_file_roll.channels.memory_channel.type = memory my_file_roll.channels.memory_channel.capacity = 1000 my_file_roll.channels.memory_channel.transactionCapacity = 100 # Bind the source and sink to the channel my_file_roll.sources.avro_source.channels = memory_channel my_file_roll.sinks.file_roll_sink.channel = memory_channel [root@Hexindai-C11-71 ~]#
root@Hexindai-C11-71 ~]# cat /export/data/flume/shell/start-my_file_roll.sh #!/bin/bash #@author :wtnyihg #blog:http://www.cnblogs.com/wtnyihg #将监控数据发送给ganglia,需要指定ganglia服务器地址,使用请确认是否部署好ganglia服务! #nohup flume-ng agent -c /export/data/flume/job --conf-file=/export/data/flume/job/flume-my_file_roll.conf --name my_file_roll -Dflume.monitoring.type=ganglia -Dflume.monitoring.hosts=node105.wtnyihg.org.cn:8649 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-ganglia-flume-my_file_roll.log 2>&1 & #启动flume自身的监控参数,默认执行以下脚本 nohup flume-ng agent -c /export/data/flume/job --conf-file=/export/data/flume/job/flume-my_file_roll.conf --name my_file_roll -Dflume.monitoring.type=http -Dflume.monitoring.port=10515 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-my_file_roll.log 2>&1 & [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# chmod +x /export/data/flume/shell/start-my_file_roll.sh [root@Hexindai-C11-71 ~]# mkdir /export/data/flume/output [root@Hexindai-C11-71 ~]# /export/data/flume/shell/start-my_file_roll.sh [root@Hexindai-C11-71 ~]# ss -antlp|grep 10515 LISTEN 0 50 *:10515 *:* users:(("java",pid=1031,fd=359)) [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# tail -50f /export/data/flume/log/flume-my_file_roll.log 19/08/06 18:32:14 INFO server.AbstractConnector: Stopped ServerConnector@830c7d7{HTTP/1.1,[http/1.1]}{0.0.0.0:10515} 19/08/06 18:32:14 INFO lifecycle.LifecycleSupervisor: Stopping lifecycle supervisor 11 19/08/06 18:32:14 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider stopping Info: Including Hadoop libraries found via (/export/servers/hadoop-2.9.2/bin/hadoop) for HDFS access Info: Including Hive libraries found via () for Hive access + exec /export/servers/jdk1.8.0_211/bin/java -Xmx20m -Dflume.monitoring.type=http -Dflume.monitoring.port=10515 -Dflume.root.logger=INFO,console -cp '/export/data/flume/job:/export/servers/apache-flume-1.9.0-bin/lib/*:/export/servers/hadoop-2.9.2/etc/hadoop:/export/servers/hadoop-2.9.2/share/hadoop/common/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/common/*:/export/servers/hadoop-2.9.2/share/hadoop/hdfs:/export/servers/hadoop-2.9.2/share/hadoop/hdfs/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/hdfs/*:/export/servers/hadoop-2.9.2/share/hadoop/yarn:/export/servers/hadoop-2.9.2/share/hadoop/yarn/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/yarn/*:/export/servers/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/export/servers/hadoop-2.9.2/share/hadoop/mapreduce/*:/export/servers/hadoop-2.9.2/contrib/capacity-scheduler/*.jar:/lib/*' -Djava.library.path=:/export/servers/hadoop-2.9.2/lib/native org.apache.flume.node.Application --conf-file=/export/data/flume/job/flume-my_file_roll.conf --name my_file_roll SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/export/servers/apache-flume-1.9.0-bin/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/export/servers/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/08/06 18:32:27 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting 19/08/06 18:32:27 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/export/data/flume/job/flume-my_file_roll.conf 19/08/06 18:32:27 INFO conf.FlumeConfiguration: Processing:file_roll_sink 19/08/06 18:32:27 INFO conf.FlumeConfiguration: Processing:file_roll_sink 19/08/06 18:32:27 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 18:32:27 INFO conf.FlumeConfiguration: Processing:file_roll_sink 19/08/06 18:32:27 INFO conf.FlumeConfiguration: Added sinks: file_roll_sink Agent: my_file_roll 19/08/06 18:32:27 INFO conf.FlumeConfiguration: Processing:avro_source 19/08/06 18:32:27 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 18:32:27 INFO conf.FlumeConfiguration: Processing:avro_source 19/08/06 18:32:27 INFO conf.FlumeConfiguration: Processing:memory_channel 19/08/06 18:32:27 INFO conf.FlumeConfiguration: Processing:avro_source 19/08/06 18:32:27 INFO conf.FlumeConfiguration: Processing:avro_source 19/08/06 18:32:27 WARN conf.FlumeConfiguration: Agent configuration for 'my_file_roll' has no configfilters. 19/08/06 18:32:27 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [my_file_roll] 19/08/06 18:32:27 INFO node.AbstractConfigurationProvider: Creating channels 19/08/06 18:32:27 INFO channel.DefaultChannelFactory: Creating instance of channel memory_channel type memory 19/08/06 18:32:27 INFO node.AbstractConfigurationProvider: Created channel memory_channel 19/08/06 18:32:27 INFO source.DefaultSourceFactory: Creating instance of source avro_source, type avro 19/08/06 18:32:27 INFO sink.DefaultSinkFactory: Creating instance of sink: file_roll_sink, type: file_roll 19/08/06 18:32:27 INFO node.AbstractConfigurationProvider: Channel memory_channel connected to [avro_source, file_roll_sink] 19/08/06 18:32:27 INFO node.Application: Starting new configuration:{ sourceRunners:{avro_source=EventDrivenSourceRunner: { source:Avro source avro_source: { bindAddress: Hexindai-C11-71, port: 9999 } }} sinkRunners:{file_roll_sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@39fe1074 counterGroup:{ name:null counters:{} } }} channels:{memory_channel=org.apache.flume.channel.MemoryChannel{name: memory_channel}} } 19/08/06 18:32:27 INFO node.Application: Starting Channel memory_channel 19/08/06 18:32:27 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: memory_channel: Successfully registered new MBean. 19/08/06 18:32:27 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memory_channel started 19/08/06 18:32:27 INFO node.Application: Starting Sink file_roll_sink 19/08/06 18:32:27 INFO sink.RollingFileSink: Starting org.apache.flume.sink.RollingFileSink{name:file_roll_sink, channel:memory_channel}... 19/08/06 18:32:27 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: file_roll_sink: Successfully registered new MBean. 19/08/06 18:32:27 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: file_roll_sink started 19/08/06 18:32:27 INFO sink.RollingFileSink: RollingFileSink file_roll_sink started. 19/08/06 18:32:27 INFO node.Application: Starting Source avro_source 19/08/06 18:32:27 INFO source.AvroSource: Starting Avro source avro_source: { bindAddress: Hexindai-C11-71, port: 9999 }... 19/08/06 18:32:27 INFO util.log: Logging initialized @590ms to org.eclipse.jetty.util.log.Slf4jLog 19/08/06 18:32:27 INFO server.Server: jetty-9.4.6.v20170531 19/08/06 18:32:27 INFO server.AbstractConnector: Started ServerConnector@400973e6{HTTP/1.1,[http/1.1]}{0.0.0.0:10515} 19/08/06 18:32:27 INFO server.Server: Started @784ms 19/08/06 18:32:27 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: avro_source: Successfully registered new MBean. 19/08/06 18:32:27 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: avro_source started 19/08/06 18:32:27 INFO source.AvroSource: Avro source avro_source started. ^C [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# cat /export/data/flume/job/flume-replica.conf # Name the components on this agent replica.sources = exec_source replica.sinks = hdfs_sink file_roll_sink replica.channels = hdfs_channel file_roll_channel # 将数据流复制给多个channel replica.sources.exec_source.selector.type = replicating # Describe/configure the source replica.sources.exec_source.type = exec replica.sources.exec_source.command = tail -F /export/data/flume/blog.txt replica.sources.exec_source.shell = /bin/bash -c # 定义要输出到hdfs的sink,注意端口号 replica.sinks.hdfs_sink.type = avro replica.sinks.hdfs_sink.hostname = Hexindai-C11-71 replica.sinks.hdfs_sink.port = 8888 # 定义要输出到local filesystem的sink replica.sinks.file_roll_sink.type = avro replica.sinks.file_roll_sink.hostname = Hexindai-C11-71 replica.sinks.file_roll_sink.port = 9999 # Describe the channel replica.channels.hdfs_channel.type = memory replica.channels.hdfs_channel.capacity = 1000 replica.channels.hdfs_channel.transactionCapacity = 100 replica.channels.file_roll_channel.type = memory replica.channels.file_roll_channel.capacity = 1000 replica.channels.file_roll_channel.transactionCapacity = 100 # Bind the source and sink to the channel replica.sources.exec_source.channels = hdfs_channel file_roll_channel replica.sinks.hdfs_sink.channel = hdfs_channel replica.sinks.file_roll_sink.channel = file_roll_channel [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# cat /export/data/flume/job/flume-replica.conf # Name the components on this agent replica.sources = exec_source replica.sinks = hdfs_sink file_roll_sink replica.channels = hdfs_channel file_roll_channel # 将数据流复制给多个channel replica.sources.exec_source.selector.type = replicating # Describe/configure the source replica.sources.exec_source.type = exec replica.sources.exec_source.command = tail -F /export/data/flume/blog.txt replica.sources.exec_source.shell = /bin/bash -c # 定义要输出到hdfs的sink,注意端口号 replica.sinks.hdfs_sink.type = avro replica.sinks.hdfs_sink.hostname = Hexindai-C11-71 replica.sinks.hdfs_sink.port = 8888 # 定义要输出到local filesystem的sink replica.sinks.file_roll_sink.type = avro replica.sinks.file_roll_sink.hostname = Hexindai-C11-71 replica.sinks.file_roll_sink.port = 9999 # Describe the channel replica.channels.hdfs_channel.type = memory replica.channels.hdfs_channel.capacity = 1000 replica.channels.hdfs_channel.transactionCapacity = 100 replica.channels.file_roll_channel.type = memory replica.channels.file_roll_channel.capacity = 1000 replica.channels.file_roll_channel.transactionCapacity = 100 # Bind the source and sink to the channel replica.sources.exec_source.channels = hdfs_channel file_roll_channel replica.sinks.hdfs_sink.channel = hdfs_channel replica.sinks.file_roll_sink.channel = file_roll_channel [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# cat /export/data/flume/shell/start-replica.sh #!/bin/bash #@author :wtnyihg #blog:http://www.cnblogs.com/wtnyihg #将监控数据发送给ganglia,需要指定ganglia服务器地址,使用请确认是否部署好ganglia服务! #nohup flume-ng agent -c /export/data/flume/job --conf-file=/export/data/flume/job/flume-replica.conf --name replica -Dflume.monitoring.type=ganglia -Dflume.monitoring.hosts=node105.wtnyihg.org.cn:8649 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-ganglia-flume-replica.log 2>&1 & #启动flume自身的监控参数,默认执行以下脚本 nohup flume-ng agent -c /export/data/flume/job --conf-file=/export/data/flume/job/flume-replica.conf --name replica -Dflume.monitoring.type=http -Dflume.monitoring.port=10516 -Dflume.root.logger=INFO,console >> /export/data/flume/log/flume-replica.log 2>&1 & [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# chmod +x /export/data/flume/shell/start-replica.sh [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# [root@Hexindai-C11-71 ~]# /export/data/flume/shell/start-replica.sh [root@Hexindai-C11-71 ~]# ss -antlp|grep 10516 LISTEN 0 50 *:10516 *:* users:(("java",pid=1184,fd=417)) [root@Hexindai-C11-71 ~]#
[root@Hexindai-C11-71 ~]# tail -50f /export/data/flume/log/flume-replica.log 19/08/06 18:48:28 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 18:48:28 INFO conf.FlumeConfiguration: Processing:file_roll_channel 19/08/06 18:48:28 INFO conf.FlumeConfiguration: Processing:hdfs_sink 19/08/06 18:48:28 INFO conf.FlumeConfiguration: Processing:exec_source 19/08/06 18:48:28 INFO conf.FlumeConfiguration: Processing:file_roll_sink 19/08/06 18:48:28 WARN conf.FlumeConfiguration: Agent configuration for 'replica' has no configfilters. 19/08/06 18:48:28 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [replica] 19/08/06 18:48:28 INFO node.AbstractConfigurationProvider: Creating channels 19/08/06 18:48:28 INFO channel.DefaultChannelFactory: Creating instance of channel hdfs_channel type memory 19/08/06 18:48:28 INFO node.AbstractConfigurationProvider: Created channel hdfs_channel 19/08/06 18:48:28 INFO channel.DefaultChannelFactory: Creating instance of channel file_roll_channel type memory 19/08/06 18:48:28 INFO node.AbstractConfigurationProvider: Created channel file_roll_channel 19/08/06 18:48:28 INFO source.DefaultSourceFactory: Creating instance of source exec_source, type exec 19/08/06 18:48:28 INFO sink.DefaultSinkFactory: Creating instance of sink: file_roll_sink, type: avro 19/08/06 18:48:28 INFO sink.AbstractRpcSink: Connection reset is set to 0. Will not reset connection to next hop 19/08/06 18:48:28 INFO sink.DefaultSinkFactory: Creating instance of sink: hdfs_sink, type: avro 19/08/06 18:48:28 INFO sink.AbstractRpcSink: Connection reset is set to 0. Will not reset connection to next hop 19/08/06 18:48:28 INFO node.AbstractConfigurationProvider: Channel hdfs_channel connected to [exec_source, hdfs_sink] 19/08/06 18:48:28 INFO node.AbstractConfigurationProvider: Channel file_roll_channel connected to [exec_source, file_roll_sink] 19/08/06 18:48:28 INFO node.Application: Starting new configuration:{ sourceRunners:{exec_source=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:exec_source,state:IDLE} }} sinkRunners:{file_roll_sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@2a8af881 counterGroup:{ name:null counters:{} } }, hdfs_sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@523e74e5 counterGroup:{ name:null counters:{} } }} channels:{hdfs_channel=org.apache.flume.channel.MemoryChannel{name: hdfs_channel}, file_roll_channel=org.apache.flume.channel.MemoryChannel{name: file_roll_channel}} } 19/08/06 18:48:28 INFO node.Application: Starting Channel hdfs_channel 19/08/06 18:48:28 INFO node.Application: Starting Channel file_roll_channel 19/08/06 18:48:28 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: file_roll_channel: Successfully registered new MBean. 19/08/06 18:48:28 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: hdfs_channel: Successfully registered new MBean. 19/08/06 18:48:28 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: file_roll_channel started 19/08/06 18:48:28 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: hdfs_channel started 19/08/06 18:48:28 INFO node.Application: Starting Sink file_roll_sink 19/08/06 18:48:28 INFO sink.AbstractRpcSink: Starting RpcSink file_roll_sink { host: Hexindai-C11-71, port: 9999 }... 19/08/06 18:48:28 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: file_roll_sink: Successfully registered new MBean. 19/08/06 18:48:28 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: file_roll_sink started 19/08/06 18:48:28 INFO sink.AbstractRpcSink: Rpc sink file_roll_sink: Building RpcClient with hostname: Hexindai-C11-71, port: 9999 19/08/06 18:48:28 INFO sink.AvroSink: Attempting to create Avro Rpc client. 19/08/06 18:48:28 INFO node.Application: Starting Sink hdfs_sink 19/08/06 18:48:28 INFO sink.AbstractRpcSink: Starting RpcSink hdfs_sink { host: Hexindai-C11-71, port: 8888 }... 19/08/06 18:48:28 INFO node.Application: Starting Source exec_source 19/08/06 18:48:29 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: hdfs_sink: Successfully registered new MBean. 19/08/06 18:48:29 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: hdfs_sink started 19/08/06 18:48:29 INFO sink.AbstractRpcSink: Rpc sink hdfs_sink: Building RpcClient with hostname: Hexindai-C11-71, port: 8888 19/08/06 18:48:29 INFO sink.AvroSink: Attempting to create Avro Rpc client. 19/08/06 18:48:29 INFO source.ExecSource: Exec source starting with command: tail -F /export/data/flume/blog.txt 19/08/06 18:48:29 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: exec_source: Successfully registered new MBean. 19/08/06 18:48:29 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: exec_source started 19/08/06 18:48:29 INFO api.NettyAvroRpcClient: Using default maxIOWorkers 19/08/06 18:48:29 INFO api.NettyAvroRpcClient: Using default maxIOWorkers 19/08/06 18:48:29 INFO util.log: Logging initialized @663ms to org.eclipse.jetty.util.log.Slf4jLog 19/08/06 18:48:29 INFO server.Server: jetty-9.4.6.v20170531 19/08/06 18:48:29 INFO server.AbstractConnector: Started ServerConnector@20b26fa4{HTTP/1.1,[http/1.1]}{0.0.0.0:10516} 19/08/06 18:48:29 INFO server.Server: Started @835ms 19/08/06 18:48:29 INFO sink.AbstractRpcSink: Rpc sink hdfs_sink started. 19/08/06 18:48:29 INFO sink.AbstractRpcSink: Rpc sink file_roll_sink started. ^C [root@Hexindai-C11-71 ~]#
6、主机拦截器案例
拦截器(interceptor): 是source端的在处理过程中能够对数据(event)进行修改或丢弃的组件。常见的拦截器有: (1)host interceptor 将发送的event添加主机名的header (2)timestamp interceptor 将发送的event添加时间戳的header 更多拦截器可参考官方文档: http://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html#flume-interceptors

浙公网安备 33010602011771号