强大的Logstash

导言

1.为什么要写这篇文章?

在我最开始接触Logstah的时候,我认为Logstash只是Elasticsearch的附属,仅仅给ES提供数据处理功能。直到,我更多的使用Logstash后,才发现Logstash的强大,便想将这个软件分享给大家,也是我写写篇文章的初心。

2.Logstah能做什么?

1>开发简单:在遇到一个新的需求,我常常使用Flink、Spark开发。但是开发一个需求耗时长,依赖就是一个很头疼的问题,运行过程中也会遇到各种各样头疼的事,公司要求使用scala语言gradle环境开发就整了一个星期。Logstash仅仅需要写一个配置文件就能解决问题。
2>复用性强:同一个类型的项目,仅仅需要修改数据源地址,中间的不同的一点逻辑,输出地址。开发速度,第一个开发3天的话,再遇到类似的仅仅只要几个小时,甚至更短。
3>灵活性强:下面附录Logstash支持 ,可以支持几乎所有能用到的输入和输出

Logstash官网

logstash解压即安装,我就不写安装了,主要写一下核心配置

案例 kafka2Oracle

消费kafka数据,落地到oracle数据库

需求:

增量向Oracle:PROD_RESERV_ORDER 导入数据

1."MSG_TYPE":"PROD_RESERV_ORDER","MSG_SUBTYPE":"ADD_ORDER" 满足这两个条件才能插入。

2.将"ID":"BST_6744376" BST_删除,剩余部分作为主键

3.数据更新,主键ID存在则更新,不存在则插入

  • 取一条kafka样例数据
kafka_2.11-0.10.0.1/bin/kafka-console-consumer.sh --zookeeper node01:2181,node02:2181,node03:2181 --topic G_P_USER_BST2UC --from-beginning --max-messages 1

一般我们通过 .conf 配置来执行

input插件主要负责数据的采集,它可以读取file、kafka、http、redis、jdbc等50几种数据源

我这里主要来写一下最常使用的读取kafka

kafka2Oracle_cover.conf

input {
  kafka {
    bootstrap_servers => [node01:9092,node02:9092,node03:9092] #kafka地址
    auto_offset_reset => "earliest" # 表示从头消费kafka数据,latest表示从当前位置开始消费
    group_id => "kafka2Oracle_20210823"
    consumer_threads => 3 #消费kafka线程数
    topics => ["G_P_USER_BST2UC"]
    #优化参数
    fetch_max_wait_ms => "1000"
    fetch_min_bytes => "1000000"
  }
}
filter {
  ruby {
    path => "/home/streaming/kafkaToAds/wsy/kafka2Oracle_cover.rb"
    script_params => {"message" => "%{message}"}
    remove_field => ["message"]
  }
  mutate { #logstash 默认格式为字符串,需要将其转换为integer
    convert => ["OPT_TYP","integer"]
    convert => ["ORDERDATE","integer"]
    convert => ["USERTYPE","integer"]
    convert => ["PROD_RISK_LEVEL","integer"]
  }
  mutate {
    split => {"ID" => "_"} # "ID": "BST_6744376", 根据“_”进行切分
    add_field => {
    "ID0" => "%{ID[0]}" #BST
    "ID1" => "%{ID[1]}" #6744376
    }
  }
  mutate {
    remove_field => ["ID0"]
    rename => {"ID1" => "ID"}
  }
}

# dual 是Oracle的一张临时表
output {
  if [MSG_TYPE] == "PRO_RESERV_ORDER" and [MSG_SUBTYPE] == "ADD_ORDER" {
    jdbc {
      driver_jar_path => "/home/streaming/logstash-6.5.4/lib/Oracle-plugins/ojbdc6.jar"
      driver_class => "oracle.jdbc.OracleDriver"
      connection_string => "jdbc:oracle:thin:@node01:1521/qlj"
      statement => [
      "merge into PROD_RESERV_ORDER t1
      using (
      select ? as PRODNAME,? as OPT_TYP,? as ID,? as CUST_ID,? as PRODCODE,? as ORDERAMOUNT,? as ORDERDATE,? as ORDERTIME,? as USERID,? as USERTYPE,? as SOURCE,? as TGLJ_INFO,? as PROD_RISK_LEVEL,? as PROD_CATEGORY_II,? as PROD_CATEGORY_I,? as MSG_TYPE,? as MSG_SUBTYPE
     from dual ) t2
     on (t1.ID = t2.ID)
     when matched then update set 
     t1.PRODNAME PRODNAME,
     t1.OPT_TYP OPT_TYP,

     t1.CUST_ID = t2.CUST_ID,
     t1.PRODCODE = t2.PRODCODE,
     t1.ORDERAMOUNT = t2.ORDERAMOUNT,
     t1.ORDERDATE = t2.ORDERDATE,
     t1.ORDERTIME = t2.ORDERTIME,
     t1.USERID = t2.USERID,
     t1.USERTYPE = t2.USERTYPE,
     t1.SOURCE = t2.SOURCE,
     t1.TGLJ_INFO = t2.TGLJ_INFO,
     t1.PROD_RISK_LEVEL = t2.PROD_RISK_LEVEL,
     t1.PROD_CATEGORY_II = t2.PROD_CATEGORY_II,
     t1.PROD_CATEGORY_I = t2.PROD_CATEGORY_I,
     t1.MSG_TYPE = t2.MSG_TYPE,
     t1.MSG_SUBTYPE = t2.MSG_SUBTYPE
     when not matched then insert {
     t1.PRODNAME,t1.OPT_TYP,t1.ID,t1.CUST_ID,t1.PRODCODE,t1.ORDERAMOUNT,t1.ORDERDATE,t1.ORDERTIME,t1.USERID,t1.USERTYPE,t1.SOURCE,t1.TGLJ_INFO,t1.PROD_RISK_LEVEL,t1.PROD_CATEGORY_II,t1.PROD_CATEGORY_I,t1.MSG_TYPE,t1.MSG_SUBTYPE
     } values {
     t2.PRODNAME,t2.OPT_TYP,t2.ID,t2.CUST_ID,t2.PRODCODE,t2.ORDERAMOUNT,t2.ORDERDATE,t2.ORDERTIME,t2.USERID,t2.USERTYPE,t2.SOURCE,t2.TGLJ_INFO,t2.PROD_RISK_LEVEL,t2.PROD_CATEGORY_II,t2.PROD_CATEGORY_I,t2.MSG_TYPE,t2.MSG_SUBTYPE
     }
      "
      ,"PRODNAME","OPT_TYP","ID,CUST_ID","PRODCODE","ORDERAMOUNT","ORDERDATE","ORDERTIME","USERID","USERTYPE","SOURCE","TGLJ_INFO","PROD_RISK_LEVEL","PROD_CATEGORY_II","PROD_CATEGORY_I","MSG_TYPE","MSG_SUBTYPE"
      ]
      username => "edw03"
      max_pool_size => "1"
      password => "XjmDB#081820"    
    }
  }
}

ruby 配置
不使用ruby使用Grok也行,推荐个Grok 校验工具 https://www.5axxw.com/tools/v2/grok.html

  • kafka2Oracle_cover.rb
def register(params)
  @message = params["message"]
end

def filter(event)
  require 'json'
  txt = event.get('message')
  begin
    obj1 = JSON.parse(txt)
    txt2 = obj1["CONTANT"]
    type1 = obj1['MSG_TYPE']
    type2 = obj1['MSG_SUBTYPE']
    ensure
    if type1 = 'PROD_RESERV_ORDER' and type1 ! = nil
      if type2 = 'ADD_ORDER' and type2 ! = nil
        txt2.each do |k,v|
          if v != ''
            event.set(k,v)
          end
        end
        event.set("MSG_TYPR",type1)
        event.set("MSG_SUBTYPE",type2)
        return [event]
      end
    end
    return []
  end
end

测试文件格式是否正确

/home/streaming/logstash-6.5.4/bin/logstash -f kafka2Oracle_cover.conf --config.test_and_exit

写个执行脚本

  • start_kafka2Oracle_cover.sh
/home/streaming/logstash-6.5.4/bin/logstash -f kafka2Oracle_cover.conf --path.data=/home/streaming/kafkaToAds/wsy/data1 > /home/streaming/kafkaToAds/wsy/logs/nohup-kafka2Oracle_cover.out 2>&1 &

template 模版

  • 添加模板需要在output添加配置,例:
#模版的地址
template => "/home/es/file2es/tamplate/bill.json"
#默认true,false 关闭logstash自动管理模板功能,如果有自定义模板,设置为false
manage_template => true
#定义模版的名称
template_name => "bill"
#重写模板
tamplate_overwrite => true

补充一些偏一点但有用功能

常见的mutate功能看这个就行了

filter

字段切割

  • 切割字段 tradedate:20211105 ,取后四位:2021
grok {
		match => {"tradedate" => "(?><tradedate_year>(?<=....)(.{4}))"}
}
  • 切割字段 tradedate:20211105 ,取后两位位:05
grok {
		match => {"tradedate" => "(?><tradedate_year>(?<=......)(.{2}))"}
}
  • 切割字段 tradedate:20211105 ,取后前6位位:202111
mutate {
  gsub => ["tradedate","(?<=......)(.*)",""]
}

logstash 插件离线安装
执行下面命令进行安装。
./bin/logstash-plugin install file:///root/logstash-output-jdbc.zip

posted @   王双颖  阅读(175)  评论(0编辑  收藏  举报
编辑推荐:
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
阅读排行:
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
点击右上角即可分享
微信分享提示