Flume部署与基础应用

一、Flume部署与基础应用

  • 部署

    1)tar -zxvf apache-flume-1.9.0-bin.tar.gz -C /app,将tar包解压导/app目录

    2)添加环境变量

      export FLUME_HOME=/app/apache-flume-1.9.0-bin

      export PATH=$PATH:$FLUME_HOME/bin

    3)将Flume Home下flume-env.sh.template改为flume-env.sh, 并添加JAVA环境变量

      cd $FLUME_HOME/conf

      mv flume-env.sh.template flume-env.sh

      vim flume-env.sh  ->  export JAVA_HOME=/opt/lagou/servers/jdk1.8.0_231

    4)调整Flume的运行内存,避免GC,一般Xms和Xmx配置一样,防止内存抖动带来性能影响

      vim flume-env.sh ->  export JAVA_OPTS="-Xms2048m -Xmx2048m -Dcom.sun.management.jmxremote"

      

  • 基础应用

    常见的Souce组件:

      exec source, netcat source,kafka source,Taildir Source;比较常用Taildir Source,可以监听多个文件,数据可靠性高,不会丢失数据;

    常见的channel组件:memory channel,file channel,kafka channel,jdbc channel

    常见的sink组件:HDFS channel,Hive channel,kafka channel

    

    测试案例:采集kafka数据导入HDFS:

a1.sources = r1
a1.channels = c1
a1.sinks = k1

a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.kafka.bootstrap.servers = node01:9092,node02:9092,node03:9092
a1.sources.r1.kafka.topics = event_test_topic
a1.sources.r1.kafka.consumer.group.id = custom.g.id
a1.sources.r1.kafka.consumer.auto.offset.reset = earliest

# 配置拦截器
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = org.bigdata.com.interceptor.CustomerInterceptor$Builder

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 2000
a1.channels.c1.transactionCapacity = 1000

# 配置sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/log/p_dymd=%{log_time}
a1.sinks.k1.hdfs.filePrefix = event
a1.sinks.k1.hdfs.rollCount=0           
a1.sinks.k1.hdfs.rollSize=134217728
a1.sinks.k1.hdfs.rollInterval=7200
a1.sinks.k1.hdfs.minBlockReplicas = 1

a1.sinks.k1.hdfs.threadsPoolSize = 10
a1.sinks.k1.hdfs.hdfs.callTimeout = 20000

# 压缩算法
# a1.sinks.k1.hdfs.codeC = bzip2
# a1.sinks.k1.hdfs.fileType = CompressedStream
a1.sinks.k1.hdfs.fileType = DataStream

# 配置关系
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

 

  • 采集到HDFS小文件过多处理

    影响:

      1)元数据:每个小文件都有一份元数据,其中包括文件路径,文件名,所有者,所属组,权限,创建时间等,这些信息都保存在Namenode内存中。所以小文件过多,会占用Namenode服务器大量内存,影响Namenode性能和使用寿命,每个元数据150byte;

      2)计算:mapreduce处理时,会为每一个文件启动一个map task,影响性能,同时也会增大寻址时间

    

    处理:

      1)官方默认的这三个参数配置写入HDFS后会产生小文件,相应的可以根据实际业务调整每个参数:hdfs.rollInterval、hdfs.rollSize、hdfs.rollCount基于以上hdfs.rollInterval=3600,hdfs.rollSize=134217728,hdfs.rollCount =0,hdfs.roundValue=3600,hdfs.roundUnit= second

 

三、Flume自定义拦截器

package org.bigdata.com.interceptor;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.google.common.base.Charsets;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;

import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

/**
 * @author shydow
 * @date 2021-04-14
 */
public class CustomerInterceptor implements Interceptor {

    private static SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd");

    @Override
    public void initialize() {

    }

    @Override
    public Event intercept(Event event) {
        // 获取event主体
        String eventBody = new String(event.getBody(), Charsets.UTF_8);
        // 获取header
        Map<String, String> headers = event.getHeaders();
        try {
            JSONObject jsonObject = JSON.parseObject(eventBody);
            String trigger_time = jsonObject.getString("server_time");
            headers.put("log_time", format.format(Long.parseLong(trigger_time)));
            event.setHeaders(headers);
        } catch (Exception e){
            headers.put("log_time", "unknown");
            event.setHeaders(headers);
        }
        return event;
    }

    @Override
    public List<Event> intercept(List<Event> list) {
        ArrayList<Event> out = new ArrayList<>();
        for (Event event : list) {
            Event outEvent = intercept(event);
            if (null != outEvent){
                out.add(outEvent);
            }
        }
        return out;
    }

    @Override
    public void close() {}

    public static class Builder implements Interceptor.Builder {

        @Override
        public Interceptor build() {
            return new CustomerInterceptor();
        }

        @Override
        public void configure(Context context) {

        }
    }

}

 

posted @ 2021-11-29 16:27  Shydow  阅读(69)  评论(0编辑  收藏  举报