Flume Source

一、概述

1.Flume是Apache提供的开源的、分布式的、可靠的日志收集系统

2.能够有效的收集、聚合、传输大量的日志数据

3.flume有2个版本:flume-og(flume09x)和flume-ng(flume1.x),flume-og和flume-ng不兼容

4.flume中的事务是强一致性的

二、基本概念

1.Event:将收集到的每一条日志都封装成一个Event对象 - 在Flume中流动的是Event,Event的格式采用了json格式来定义的,这个json串中包含了2个部分:headers和body

2.Flume中的组件是以Agent形式出现的,Agent包含:Source,Channel,Sink

  a.Source:从源头采集数据

  b.Channel:临时存储数据

  c.Sink:将数据发往目的地

在flume的安装目录下,创建文件夹data

netcat source:

在data中创建一个文件basic.conf,内容如下  

#给Agent起名为a1
#给source起名
a1.sources = s1
#给channel起名
a1.channels = c1
#给sink起名
a1.sinks = k1

#配置source
#配置source的类型
a1.sources.s1.type = netcat
#配置监听的主机
a1.sources.s1.bind = 0.0.0.0
#配置监听的端口
a1.sources.s1.port = 8090

#配置channel
#配置channel的类型
a1.channels.c1.type = memory
#配置channel的容量,多少条
a1.channels.c1.capacity = 10000
#配置往sink发送的数据量,每次1000条的量发送
al.channels.c1.transactionCapacity = 1000

#配置sink
#配置sink的类型
a1.sinks.k1.type = logger

#将source和channel绑定,一个source有多个channel
a1.sources.s1.channels = c1
#将sink和channel绑定,一个sink只有一个channel
a1.sinks.k1.channel = c1

执行命令:-n agent的名称,-c conf目录,-f 自定义的文件,-Dflume.root.logger 日志输出基本和位置

./flume-ng agent -n a1 -c ../conf/ -f ../data/basic.conf -Dflume.root.logger=INFO,console

另起一个ssh窗口,执行命令:nc hadoop101 8090,在控制台中输入数据,并回车

 

AVRO:avrosource.conf

a1.sources = s1
a1.channels = c1
a1.sinks = k1

a1.sources.s1.type = avro
a1.sources.s1.bind = 0.0.0.0
a1.sources.s1.port = 8090

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sinks.k1.type = logger

a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1

 

bin目录下执行命令:./flume-ng agent -n a1 -c ../conf/ -f ../data/avrosource.conf -Dflume.root.logger=INFO,console  启动flume

 

 执行agent-avro客户端指令,在flume的安装目录的bin目录下:a.txt是需要进行操作的文件,必须存在

./flume-ng avro-client -H 0.0.0.0 -p 8090 -F /home/a.txt -c ../conf/ 

 

Exec Source:execsource.conof

a1.sources=r1
a1.channels=c1
a1.sinks=s1

a1.sources.r1.type=exec
a1.sources.r1.command=cat /home/a.txt

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100

a1.sinks.s1.type=logger

a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1

bin目录下执行命令:./flume-ng agent -n a1 -c ../conf/ -f ../data/execsource.conf -Dflume.root.logger=INFO,console

看到日志中 cat /home/a.txt 命令下有两条结果输出

 

Spooling Directory:监听目录下有文件的变化,spooldirsource.txt配置文件

在 /home下创建 flumedata目录

spooldirsource.txt文件: flume安装目录/data/spooldirsource.txt,内容如下

 

a1.sources=r1
a1.channels=c1
a1.sinks=s1

a1.sources.r1.type=spooldir
a1.sources.r1.spoolDir=/home/flumedata

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100

a1.sinks.s1.type=logger

a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1

bin目录下执行命令:./flume-ng agent -n a1 -c ../conf/ -f ../data/spooldirsource.txt -Dflume.root.logger=INFO,console

在 /home目录下创建一个文件a.log,随意在a.log中写入内容

执行命令:mv a.log /home/flumedata ,在flumedata目录下会有 a.log.COMPLETED 文件

 

Sequence Generator Source --序列发生源

seqsource.conf 配置文件内容:

a1.sources=r1
a1.channels=c1
a1.sinks=s1

a1.sources.r1.type=seq

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100

a1.sinks.s1.type=logger

a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1

启动命令:./flume-ng agent -n a1 -c ../conf/ -f ../data/seqsource.conf -Dflume.root.logger=INFO,console

 

Http source:httpsource.conf

a1.sources = s1
a1.channels = c1
a1.sinks = k1

a1.sources.s1.type = http
a1.sources.s1.port = 8090

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sinks.k1.type = logger

a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1

bin目录下执行命令:./flume-ng agent -n a1 -c ../conf/ -f ../data/httpsource.conf -Dflume.root.logger=INFO,console

新打开一个ssh窗口,发送http POST 请求:

  curl -X POST -d '[{"headers":{"a":"a1","b":"b1"},"body":"hello http-flume"}]' http://0.0.0.0:8090

 

三、自定义Source

1.如果Flume中提供的Source不符合当前场景,需要自定义Source。如果在Flume中自定义Source,

  那么需要确定Source的数据获取方式:PollableSource(主动拉取数据), EventDrivenSource(事件驱动方式)

authsource.conf如下:

 

a1.sources = s1
a1.channels = c1
a1.sinks = k1

a1.sources.s1.type = com.apple.flume.AuthSource ##自定义的类全路径名称
a1.sources.s1.file = /home/person.log ##source数据,自定义数据来源文件

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sinks.k1.type = logger

a1.sources.s1.channels = c1
a1.sinks.k1.channel = c1

 

com.apple.flume.AuthSource类如下:
package com.apple.flume;

import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.EventDrivenSource;
import org.apache.flume.Source;
import org.apache.flume.channel.ChannelProcessor;
import org.apache.flume.conf.Configurable;
import org.apache.flume.event.EventBuilder;
import org.apache.flume.source.AbstractSource;

import java.io.*;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

/**
 * Flume 自定义Source
 *
 * @author apple
 * @date 2020/06/29
 */
public class AuthSource extends AbstractSource implements Configurable, Source, EventDrivenSource {

    private String path;
    private ExecutorService es;

    /**
     * 获取配置文件中的属性
     *
     * @param context
     */
    @Override
    public void configure(Context context) {
        path = context.getString("file");//对应authsouorce.conf文件中的 a1.sources.s1.file,如果改为a1.sources.s1.path,则代码需要修改为path

    }

    /**
     * 启动Source
     */
    @Override
    public synchronized void start() {
        //获取Channel
        ChannelProcessor cp = this.getChannelProcessor();

        es = Executors.newFixedThreadPool(10);
        try {
            es.submit(new FileThread(path, cp));
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }


}

class FileThread implements Runnable {


    private BufferedReader br;
    private ChannelProcessor cp;

    public FileThread(String path, ChannelProcessor cp) throws FileNotFoundException {
        this.cp = cp;
        File file = new File(path);
        if (file.isFile()) {
            br = new BufferedReader(new FileReader(path));
        }

    }

    @Override
    public void run() {
        while (true) {
            try {
                //读数据
                String name = br.readLine();
                if (name == null) {
                    break;
                }

                String age = br.readLine();
                String description = br.readLine();


                Map<String, String> headers = new HashMap<>();
                headers.put("name", name);
                headers.put("age", age);

                //创建一个Event对象
                Event e = EventBuilder.withBody(description.getBytes(), headers);

                //将event放到通道中
                cp.processEvent(e);


            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        try {
            br.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

将 com.apple.flume.AuthSource类 打为jar包上传到flume安装目录下的lib包下

bin目录下执行:./flume-ng agent -n a1 -c ../conf/ -f ../data/authsource.conf -Dflume.root.logger=INFO,console

 

 

posted @ 2020-06-28 14:38  alen-fly  阅读(221)  评论(0编辑  收藏  举报