强大的Logstash
导言
1.为什么要写这篇文章?
在我最开始接触Logstah的时候,我认为Logstash只是Elasticsearch的附属,仅仅给ES提供数据处理功能。直到,我更多的使用Logstash后,才发现Logstash的强大,便想将这个软件分享给大家,也是我写写篇文章的初心。
2.Logstah能做什么?
1>开发简单:在遇到一个新的需求,我常常使用Flink、Spark开发。但是开发一个需求耗时长,依赖就是一个很头疼的问题,运行过程中也会遇到各种各样头疼的事,公司要求使用scala语言gradle环境开发就整了一个星期。Logstash仅仅需要写一个配置文件就能解决问题。
2>复用性强:同一个类型的项目,仅仅需要修改数据源地址,中间的不同的一点逻辑,输出地址。开发速度,第一个开发3天的话,再遇到类似的仅仅只要几个小时,甚至更短。
3>灵活性强:下面附录Logstash支持 ,可以支持几乎所有能用到的输入和输出
logstash解压即安装,我就不写安装了,主要写一下核心配置
案例 kafka2Oracle
消费kafka数据,落地到oracle数据库
需求:
增量向Oracle:PROD_RESERV_ORDER 导入数据
1."MSG_TYPE":"PROD_RESERV_ORDER","MSG_SUBTYPE":"ADD_ORDER" 满足这两个条件才能插入。
2.将"ID":"BST_6744376" BST_删除,剩余部分作为主键
3.数据更新,主键ID存在则更新,不存在则插入
- 取一条kafka样例数据
kafka_2.11-0.10.0.1/bin/kafka-console-consumer.sh --zookeeper node01:2181,node02:2181,node03:2181 --topic G_P_USER_BST2UC --from-beginning --max-messages 1
一般我们通过 .conf 配置来执行
input插件主要负责数据的采集,它可以读取file、kafka、http、redis、jdbc等50几种数据源
我这里主要来写一下最常使用的读取kafka
kafka2Oracle_cover.conf
input {
kafka {
bootstrap_servers => [node01:9092,node02:9092,node03:9092] #kafka地址
auto_offset_reset => "earliest" # 表示从头消费kafka数据,latest表示从当前位置开始消费
group_id => "kafka2Oracle_20210823"
consumer_threads => 3 #消费kafka线程数
topics => ["G_P_USER_BST2UC"]
#优化参数
fetch_max_wait_ms => "1000"
fetch_min_bytes => "1000000"
}
}
filter {
ruby {
path => "/home/streaming/kafkaToAds/wsy/kafka2Oracle_cover.rb"
script_params => {"message" => "%{message}"}
remove_field => ["message"]
}
mutate { #logstash 默认格式为字符串,需要将其转换为integer
convert => ["OPT_TYP","integer"]
convert => ["ORDERDATE","integer"]
convert => ["USERTYPE","integer"]
convert => ["PROD_RISK_LEVEL","integer"]
}
mutate {
split => {"ID" => "_"} # "ID": "BST_6744376", 根据“_”进行切分
add_field => {
"ID0" => "%{ID[0]}" #BST
"ID1" => "%{ID[1]}" #6744376
}
}
mutate {
remove_field => ["ID0"]
rename => {"ID1" => "ID"}
}
}
# dual 是Oracle的一张临时表
output {
if [MSG_TYPE] == "PRO_RESERV_ORDER" and [MSG_SUBTYPE] == "ADD_ORDER" {
jdbc {
driver_jar_path => "/home/streaming/logstash-6.5.4/lib/Oracle-plugins/ojbdc6.jar"
driver_class => "oracle.jdbc.OracleDriver"
connection_string => "jdbc:oracle:thin:@node01:1521/qlj"
statement => [
"merge into PROD_RESERV_ORDER t1
using (
select ? as PRODNAME,? as OPT_TYP,? as ID,? as CUST_ID,? as PRODCODE,? as ORDERAMOUNT,? as ORDERDATE,? as ORDERTIME,? as USERID,? as USERTYPE,? as SOURCE,? as TGLJ_INFO,? as PROD_RISK_LEVEL,? as PROD_CATEGORY_II,? as PROD_CATEGORY_I,? as MSG_TYPE,? as MSG_SUBTYPE
from dual ) t2
on (t1.ID = t2.ID)
when matched then update set
t1.PRODNAME PRODNAME,
t1.OPT_TYP OPT_TYP,
t1.CUST_ID = t2.CUST_ID,
t1.PRODCODE = t2.PRODCODE,
t1.ORDERAMOUNT = t2.ORDERAMOUNT,
t1.ORDERDATE = t2.ORDERDATE,
t1.ORDERTIME = t2.ORDERTIME,
t1.USERID = t2.USERID,
t1.USERTYPE = t2.USERTYPE,
t1.SOURCE = t2.SOURCE,
t1.TGLJ_INFO = t2.TGLJ_INFO,
t1.PROD_RISK_LEVEL = t2.PROD_RISK_LEVEL,
t1.PROD_CATEGORY_II = t2.PROD_CATEGORY_II,
t1.PROD_CATEGORY_I = t2.PROD_CATEGORY_I,
t1.MSG_TYPE = t2.MSG_TYPE,
t1.MSG_SUBTYPE = t2.MSG_SUBTYPE
when not matched then insert {
t1.PRODNAME,t1.OPT_TYP,t1.ID,t1.CUST_ID,t1.PRODCODE,t1.ORDERAMOUNT,t1.ORDERDATE,t1.ORDERTIME,t1.USERID,t1.USERTYPE,t1.SOURCE,t1.TGLJ_INFO,t1.PROD_RISK_LEVEL,t1.PROD_CATEGORY_II,t1.PROD_CATEGORY_I,t1.MSG_TYPE,t1.MSG_SUBTYPE
} values {
t2.PRODNAME,t2.OPT_TYP,t2.ID,t2.CUST_ID,t2.PRODCODE,t2.ORDERAMOUNT,t2.ORDERDATE,t2.ORDERTIME,t2.USERID,t2.USERTYPE,t2.SOURCE,t2.TGLJ_INFO,t2.PROD_RISK_LEVEL,t2.PROD_CATEGORY_II,t2.PROD_CATEGORY_I,t2.MSG_TYPE,t2.MSG_SUBTYPE
}
"
,"PRODNAME","OPT_TYP","ID,CUST_ID","PRODCODE","ORDERAMOUNT","ORDERDATE","ORDERTIME","USERID","USERTYPE","SOURCE","TGLJ_INFO","PROD_RISK_LEVEL","PROD_CATEGORY_II","PROD_CATEGORY_I","MSG_TYPE","MSG_SUBTYPE"
]
username => "edw03"
max_pool_size => "1"
password => "XjmDB#081820"
}
}
}
ruby 配置
不使用ruby使用Grok也行,推荐个Grok 校验工具 https://www.5axxw.com/tools/v2/grok.html
- kafka2Oracle_cover.rb
def register(params)
@message = params["message"]
end
def filter(event)
require 'json'
txt = event.get('message')
begin
obj1 = JSON.parse(txt)
txt2 = obj1["CONTANT"]
type1 = obj1['MSG_TYPE']
type2 = obj1['MSG_SUBTYPE']
ensure
if type1 = 'PROD_RESERV_ORDER' and type1 ! = nil
if type2 = 'ADD_ORDER' and type2 ! = nil
txt2.each do |k,v|
if v != ''
event.set(k,v)
end
end
event.set("MSG_TYPR",type1)
event.set("MSG_SUBTYPE",type2)
return [event]
end
end
return []
end
end
测试文件格式是否正确
/home/streaming/logstash-6.5.4/bin/logstash -f kafka2Oracle_cover.conf --config.test_and_exit
写个执行脚本
- start_kafka2Oracle_cover.sh
/home/streaming/logstash-6.5.4/bin/logstash -f kafka2Oracle_cover.conf --path.data=/home/streaming/kafkaToAds/wsy/data1 > /home/streaming/kafkaToAds/wsy/logs/nohup-kafka2Oracle_cover.out 2>&1 &
template 模版
- 添加模板需要在output添加配置,例:
#模版的地址
template => "/home/es/file2es/tamplate/bill.json"
#默认true,false 关闭logstash自动管理模板功能,如果有自定义模板,设置为false
manage_template => true
#定义模版的名称
template_name => "bill"
#重写模板
tamplate_overwrite => true
补充一些偏一点但有用功能
filter
字段切割
- 切割字段 tradedate:20211105 ,取后四位:2021
grok {
match => {"tradedate" => "(?><tradedate_year>(?<=....)(.{4}))"}
}
- 切割字段 tradedate:20211105 ,取后两位位:05
grok {
match => {"tradedate" => "(?><tradedate_year>(?<=......)(.{2}))"}
}
- 切割字段 tradedate:20211105 ,取后前6位位:202111
mutate {
gsub => ["tradedate","(?<=......)(.*)",""]
}
logstash 插件离线安装
执行下面命令进行安装。
./bin/logstash-plugin install file:///root/logstash-output-jdbc.zip
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· Linux系列:如何用 C#调用 C方法造成内存泄露
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了