2.搭建Flume

搭建Flume

1.下载解压(官网)

  1. 在/opt目录下新建目录flume

  2. 将下载好的apache-flume-1.9.0-bin.tar.gz,利用Xftp6上传到/opt/flume目录下

  3. 利用命令:tar -xvf apache-flume-1.9.0-bin.tar.gz

     

2.配置flume

  1. 进⼊flume的conf⽬录下,拷⻉flume-env.sh.template然后重命名为flume-env.sh

    cp flume-env.sh.template flume-env.sh
    chmod 777 flume-env.sh
    vim flume-env.sh
  2. 进⼊flume-env.sh中配置java jdk路径

    export JAVA_HOME=/opt/java/jdk1.8.0_261

3.配置环境变量

  1. 在/opt目录下

    vim .bash_profile

    # Flume
    export FLUME_HOME=/opt/flume/export FLUME_HOME=/opt/flume/apache-flume-1.9.0-bin
    export PATH=$PATH:$FLUME_HOME/bin

    source .bash_profile

  2. 在flume-env.sh文件中配置

    # Flume
    export FLUME_HOME=/opt/flume/export FLUME_HOME=/opt/flume/apache-flume-1.9.0-bin
    export PATH=$PATH:$FLUME_HOME/bin
  3. 版本验证

    flume-ng version

    ps:(/etc/profile中我也添加了代码段中的配置)#刷新配置source /etc/profile

4.进行文件传输配置slave

scp -r /opt/flume/apache-flume-1.9.0-bin slave1:/opt/flume/
scp -r /opt/flume/apache-flume-1.9.0-bin slave2:/opt/flume/

按照上诉步骤在slave1、slave2进行操作

5.Flume部署示例(CSDN)

4.1 Avro(以下还没操作修改)

Flume可以通过Avro监听某个端口并捕获传输的数据,具体示例如下:

// 创建一个Flume配置文件

$ cd app/cdh/flume-1.6.0-cdh5.7.1

$ mkdir example

$ cp conf/flume-conf.properties.template example/netcat.conf

 

// 配置netcat.conf用于实时获取另一终端输入的数据

$ vim example/netcat.conf

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel that buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

 

// 运行FlumeAgent,监听本机的44444端口

$ flume-ng agent -c conf -f example/netcat.conf -n a1 -Dflume.root.logger=INFO,console

 

img

// 打开另一终端,通过telnet登录localhost的44444,输入测试数据

$ telnet localhost 44444

 

img

// 查看flume收集数据情况

img

4.2 Spool

Spool用于监测配置的目录下新增的文件,并将文件中的数据读取出来。需要注意两点:拷贝到spool目录下的文件不可以再打开编辑、spool目录下不可包含相应的子目录。具体示例如下:

// 创建两个Flume配置文件

$ cd app/cdh/flume-1.6.0-cdh5.7.1

$ cp conf/flume-conf.properties.template example/spool1.conf

$ cp conf/flume-conf.properties.template example/spool2.conf

 

// 配置spool1.conf用于监控目录avro_data的文件,将文件内容发送到本地60000端口

$ vim example/spool1.conf

# Namethe components

local1.sources= r1

local1.sinks= k1

local1.channels= c1

# Source

local1.sources.r1.type= spooldir

local1.sources.r1.spoolDir= /home/hadoop/avro_data

# Sink

local1.sinks.k1.type= avro

local1.sinks.k1.hostname= localhost

local1.sinks.k1.port= 60000

#Channel

local1.channels.c1.type= memory

# Bindthe source and sink to the channel

local1.sources.r1.channels= c1

local1.sinks.k1.channel= c1

 

// 配置spool2.conf用于从本地60000端口获取数据并写入HDFS

# Namethe components

a1.sources= r1

a1.sinks= k1

a1.channels= c1

# Source

a1.sources.r1.type= avro

a1.sources.r1.channels= c1

a1.sources.r1.bind= localhost

a1.sources.r1.port= 60000

# Sink

a1.sinks.k1.type= hdfs

a1.sinks.k1.hdfs.path= hdfs://localhost:9000/user/wcbdd/flumeData

a1.sinks.k1.rollInterval= 0

a1.sinks.k1.hdfs.writeFormat= Text

a1.sinks.k1.hdfs.fileType= DataStream

# Channel

a1.channels.c1.type= memory

a1.channels.c1.capacity= 10000

# Bind the source and sink to the channel

a1.sources.r1.channels= c1

a1.sinks.k1.channel= c1

 

// 分别打开两个终端,运行如下命令启动两个Flume Agent

$ flume-ng agent -c conf -f example/spool2.conf -n a1

$ flume-ng agent -c conf -f example/spool1.conf -n local1

 

// 查看本地文件系统中需要监控的avro_data目录内容

$ cd avro_data

$ cat avro_data.txt

img

 

// 查看写HDFS的Agent,检查是否捕获了数据别写入HDFS

img

 

// 通过WEB UI查看HDFS中的文件

img

4.3 其它

Flume内置了大量的Source,其中Avro Source、Thrift Source、Spooling Directory Source、Kafka Source具有较好的性能和较广泛的使用场景。下面是Source的一些参考资料:

img

img

img

imgimg

img

img

img

 

posted @ 2020-09-16 10:00  小石小石摩西摩西  阅读(182)  评论(0编辑  收藏  举报