logstash 入门及架构介绍
Pipeline
input / filter / output
Input Plugins
- Stdin/File
- Log4j / jdbc / kafka
Output Plugins
将 Event 发送到特定的目的地,是 Pipeline 的最后一个阶段
常见的 Output Plugins
- ElasticSearch
- Kafka
Codec Plugin
将原始数据 decode 成 Event;将 Event encode 成目标数据。
内置的 Codec 插件
- Line / MultipleLIne
- Json / Avro
- Dots / Rubydebug
- Line/json
Filter Plugin
处理 Event
内置的 Filter 插件
- Mutate - 操作 Event
- Metrics - Agregate Metrics
- Ruby - 执行 ruby 代码
Queue
In Memory Queue (进程 Crash、机器宕机会引起数据丢失)
Persistent Queue
示例:
① 读取单行数据,将转换成 event。 点击查看
logstash -e "input{stdin{codec=>json}}output{stdout{codec=>rubydebug}}"
② 读取多行数据
multiline.conf
input { stdin { codec => multiline { pattern => "^\s" what => "previous" } } } filter {} output { stdout { codec => rubydebug } }
③ 综合应用
下载 csv 文件 https://grouplens.org/datasets/movielens/
input { file { path => "movies.csv" start_position => "beginning" sincedb_path => "/dev/null" } } filter { csv { separator => "," columns => ["id","content","genre"] } mutate { split => { "genre" => "|" } remove_field => ["path", "host","@timestamp","message"] } mutate { split => ["content", "("] add_field => { "title" => "%{[content][0]}"} add_field => { "year" => "%{[content][1]}"} } mutate { convert => { "year" => "integer" } strip => ["title"] remove_field => ["path", "host","@timestamp","message","content"] } } output { elasticsearch { hosts => "http://localhost:9200" index => "movies" document_id => "%{id}" } stdout {} }
233