ElasticSearch——Logstash输出到Elasticsearch配置

位置

在Logstash的.conf配置文件中的output中配置ElasticSearch

示例：

output {
　　elasticsearch{
      action => "index"
      index => "%{[fields][product_type]}-transaction-%{+YYYY-MM}"
   　　hosts => ["10.0.xx.xx:9200", "10.0.xx.xx:9200", "10.0.xx.xx:9200"]
   }
}

action

index 给一个文档建立索引
delete 通过id值删除一个文档（这个action需要指定一个id值）
create 插入一条文档信息，如果这条文档信息在索引中已经存在，那么本次插入工作失败
update 通过id值更新一个文档。更新有个特殊的案例upsert，如果被更新的文档还不存在，那么就会用到upsert

示例：

action => "index"

index

写入事件所用的索引。可以动态的使用%{foo}语法，它的默认值是：
"logstash-%{+YYYY.MM.dd}"，以天为单位分割的索引，使你可以很容易的删除老的数据或者搜索指定时间范围内的数据。

索引不能包含大写字母。推荐使用以周为索引的ISO 8601格式，例如logstash-%{+xxxx.ww}

示例：

index => "%{[fields][product_type]}-transaction-%{+YYYY-MM}"

hosts

是一个数组类型的值

意http协议使用的是http地址，端口是9200，示例：

hosts => ["10.0.xx.xx:9200", "10.0.xx.xx:9200", "10.0.xx.xx:9200"]

document_type

定义es索引的type，一般你应该让同一种类型的日志存到同一种type中，比如debug日志和error日志存到不同的type中

如果不设置默认type为logs

template

如果你愿意，你可以设置指向你自己模板的路径。如果没有设置，那么默认的模板会被使用

template_name

这个配置项用来定义在Elasticsearch中模板的命名

注意删除旧的模板示例：

curl -XDELETE <http://localhost:9200/_template/OldTemplateName?pretty>

template_overwrite

布尔类型默认为false
设置为true表示如果你有一个自定义的模板叫logstash，那么将会用你自定义模板覆盖默认模板logstash

manage_template

布尔类型默认为true
设置为false将关闭logstash自动管理模板功能
比如你定义了一个自定义模板，更加字段名动态生成字段，那么应该设置为false

order参数

ELK Stack 在入门学习过程中，必然会碰到自己修改定制索引映射(mapping)乃至模板(template)的问题。
这时候，不少比较认真看 Logstash 文档的新用户会通过下面这段配置来制定自己的模板策略：

output {
    elasticsearch {
        host => "127.0.0.1"
        manage_template => true
        template => "/path/to/mytemplate"
        template_name => "myname"
    }
}

然而随后就发现，自己辛辛苦苦修改出来的模板，通过 curl -XGET 'http://127.0.0.1:9200/_template/myname' 看也确实上传成功了，但实际新数据索引创建出来，就是没生效！

这个原因是：Logstash 默认会上传一个名叫 logstash 的模板到 ES 里。如果你在使用上面这个配置之前，曾经运行过 Logstash（一般来说都会），那么 ES 里就已经存在这么一个模板了。你可以curl -XGET 'http://127.0.0.1:9200/_template/logstash' 验证。

这个时候，ES 里就变成有两个模板，logstash 和 myname，都匹配 logstash-* 索引名，要求设置一定的映射规则了。

ES 会按照一定的规则来尝试自动 merge 多个都匹配上了的模板规则，最终运用到索引上

其中要点就是：template 是可以设置 order 参数的！而不写这个参数，默认的 order 值就是 0。order 值越大，在 merge 规则的时候优先级越高。

所以，解决这个问题的办法很简单：在你自定义的 template 里，加一行，变成这样：

{
    "template" : "logstash-*",
    "order" : 1,
    "settings" : { ... },
    "mappings" : { ... }
}

当然，其实如果只从 Logstash 配置角度出发，其实更简单的办法是：直接修改原来默认的 logstash 模板，然后模板名称也不要改，就好了：

output {
    elasticsearch {
        host => "127.0.0.1"
        manage_template => true
        template_overwrite => true
    }
}

为elasticsearch配置模板

在使用logstash收集日志的时候，我们一般会使用logstash自带的动态索引模板，虽然无须我们做任何定制操作，就能把我们的日志数据推送到elasticsearch索引集群中

但是在我们查询的时候，就会发现，默认的索引模板常常把我们不需要分词的字段，给分词了，这样以来，我们的比较重要的聚合统计就不准确了：

所以这时候，就需要我们自定义一些索引模板了

在logstash与elasticsearch集成的时候，总共有如下几种使用模板的方式：

使用默认自带的索引模板，大部分的字段都会分词，适合开发和时候快速验证使用
在logstash收集端自定义配置模板，因为分散在收集机器上，维护比较麻烦
在elasticsearc服务端自定义配置模板，由elasticsearch负责加载模板，可动态更改，全局生效，维护比较容易

使用默认自带的索引模板

ElasticSearch默认自带了一个名字为”logstash”的模板，默认应用于Logstash写入数据到ElasticSearch使用

优点：最简单，无须任何配置

缺点：无法自定义一些配置，例如：分词方式

在logstash收集端自定义配置模板

使用第二种，适合小规模集群的日志收集

需要在logstash的output插件中使用template指定本机器上的一个模板json路径，例如 template => "/tmp/logstash.json"

优点：配置简单

缺点：因为分散在Logstash Indexer机器上，维护起来比较麻烦

在elasticsearc服务端自定义配置模板

manage_template => false//关闭logstash自动管理模板功能  
template_name => "xxx"//映射模板的名字

第三种需要在elasticsearch的集群中的config/templates路径下配置模板json，在elasticsearch中索引模板可分为两种

静态模板

适合索引字段数据固定的场景，一旦配置完成，不能向里面加入多余的字段，否则会报错

优点：scheam已知，业务场景明确，不容易出现因字段随便映射从而造成元数据撑爆es内存，从而导致es集群全部宕机，维护比较容易，可动态更改，全局生效。

缺点：字段数多的情况下配置稍繁琐

一个静态索引模板配置例子如下：

{  
  "xxx" : {  
      "template": "xxx-*",  
        "settings": {  
            "index.number_of_shards": 3,  
            "number_of_replicas": 0   
        },  
    "mappings" : {  
      "logs" : {  
        "properties" : {  
          "@timestamp" : { //这是专门给kibana用的一个字段，时间索引
            "type" : "date",  
            "format" : "dateOptionalTime",  
            "doc_values" : true  
          },  
          "@version" : {  
            "type" : "string",  
            "index" : "not_analyzed",  
            "doc_values" : true      
          },  
          "id" : {  
            "type" : "string",  
            "index" : "not_analyzed"  
          },  
          "name" : {  
            "type" : "string",  
            "index" : "not_analyzed"  
          }
        }  
      }  
    }  
  }  
}

动态模板

适合字段数不明确，大量字段的配置类型相同的场景，多加字段不会报错

优点：可动态添加任意字段，无须改动scheaml，

缺点：如果添加的字段非常多，有可能造成es集群宕机

一个动态索引模板配置例子如下：

{  
  "template" : "xxx-*",  
  "settings" : {  
   "index.number_of_shards": 5,  
   "number_of_replicas": 0    
  
},  
  "mappings" : {  
    "_default_" : {  
      "_all" : {"enabled" : true, "omit_norms" : true},  
      "dynamic_templates" : [ {  
        "message_field" : {  
          "match" : "message",  
          "match_mapping_type" : "string",  
          "mapping" : {  
            "type" : "string", "index" : "analyzed", "omit_norms" : true,  
            "fielddata" : { "format" : "disabled" }  
          }  
        }  
      }, {  
        "string_fields" : {  
          "match" : "*",  
          "match_mapping_type" : "string",  
          "mapping" : {  
            "type" : "string", "index" : "not_analyzed", "doc_values" : true  
          }  
        }  
      } ],  
      "properties" : {  
        "@timestamp": { "type": "date" },  
        "@version": { "type": "string", "index": "not_analyzed" }, 
        "geoip"  : {  
          "dynamic": true,  
          "properties" : {  
            "ip": { "type": "ip" },  
            "location" : { "type" : "geo_point" },  
            "latitude" : { "type" : "float" },  
            "longitude" : { "type" : "float" }  
          }  
        }  
      }  
    }  
  }  
}

只设置message字段分词，其他的字段默认都不分词

模板结构

通用设置主要是模板匹配索引的过滤规则，影响该模板对哪些索引生效
settings：配置索引的公共参数，比如索引的replicas，以及分片数shards等参数
mappings：最重要的一部分，在这部分中配置每个type下的每个field的相关属性，比如field类型（string,long,date等等），是否分词，是否在内存中缓存等等属性都在这部分配置
aliases：索引别名，索引别名可用在索引数据迁移等用途上。

例子：

{
  "logstash" : {
    "order" : 0,
    "template" : "logstash-*",
    "settings" : {
      "index" : {
        "refresh_interval" : "5s"
      }
    },
    "mappings" : {
      "_default_" : {
        "dynamic_templates" : [ {
          "message_field" : {
            "mapping" : {
              "fielddata" : {
                "format" : "disabled"
              },
              "index" : "analyzed",
              "omit_norms" : true,
              "type" : "string"
            },
            "match_mapping_type" : "string",
            "match" : "message"
          }
        }, {
          "string_fields" : {
            "mapping" : {
              "fielddata" : {
                "format" : "disabled"
              },
              "index" : "analyzed",
              "omit_norms" : true,
              "type" : "string",
              "fields" : {
                "raw" : {
                  "ignore_above" : 256,
                  "index" : "not_analyzed",
                  "type" : "string"
                }
              }
            },
            "match_mapping_type" : "string",
            "match" : "*"
          }
        } ],
        "_all" : {
          "omit_norms" : true,
          "enabled" : true
        },
        "properties" : {
          "@timestamp" : {
            "type" : "date"
          },
          "geoip" : {
            "dynamic" : true,
            "properties" : {
              "ip" : {
                "type" : "ip"
              },
              "latitude" : {
                "type" : "float"
              },
              "location" : {
                "type" : "geo_point"
              },
              "longitude" : {
                "type" : "float"
              }
            }
          },
          "@version" : {
            "index" : "not_analyzed",
            "type" : "string"
          }
        }
      }
    },
    "aliases" : { }
  }
}

我们创建一个自定义Template动态模板，这个模板指定匹配所有以”go_logsindex“开始的索引，并且指定允许添加新字段，匹配所有string类型的新字段会创建一个raw的嵌套字段，这个raw嵌套字段类型也是string，但是是not_analyzed不分词的（主要用于解决一些analyzed的string字段无法做统计，但可以使用这个raw嵌套字段做统计）

{
  "template": "go_logs_index_*",
  "order":0,
  "settings": {
      "index.number_of_replicas": "1",
      "index.number_of_shards": "5",
      "index.refresh_interval" : "10s"
  },
  "mappings": {
    "_default_": {
      "_all": {
        "enabled": false
      },
      "dynamic_templates": [
        {
          "my_template": {
            "match_mapping_type": "string",
            "mapping": {
              "type": "string",
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed"
                }
              }
            }
          }
        }
      ]
    },
    "go": {
      "properties": {
        "timestamp": {
          "type": "string",
          "index": "not_analyzed"
        },
        "msg": {
          "type": "string",
          "analyzer": "ik",
          "search_analyzer": "ik_smart"
        },
        "file": {
          "type": "string",
          "index": "not_analyzed"
        },
        "line": {
          "type": "string",
          "index": "not_analyzed"
        },
        "threadid": {
          "type": "string",
          "index": "not_analyzed"
        },
        "info": {
          "type": "string",
          "index": "not_analyzed"
        },
        "type": {
          "type": "string",
          "index": "not_analyzed"
        },
        "@timestamp": {
          "format": "strict_date_optional_time||epoch_millis",
          "type": "date"
        },
        "@version": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}

复杂更新

script遍历所有字段

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "your_index"
    document_id => "%{[@metadata][_id]}"
    action => "update"
    script_type => "inline"
    script => {
      source => "for (field in event.to_hash()) {
                    if (ctx._source.containsKey(field)) {
                      ctx._source[field] = event[field];
                    }
                  }"
      lang => "painless"
    }
    upsert => {
      "default_field" => "default_value"
      # 在这里定义文档不存在时应该创建的默认字段和值
    }
  }
}

在这个配置中：
event.to_hash() 方法将 Logstash 事件转换为一个哈希表，脚本可以遍历这个哈希表中的所有字段。
ctx._source.containsKey(field) 检查 Elasticsearch 文档的 _source 是否已经包含了这个字段。如果包含，则更新该字段的值。
ctx._source[field] = event[field]; 将 Logstash 事件中的字段值赋给 Elasticsearch 文档的相应字段。

如何只更新特定字段？

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "your_index"
    document_id => "%{[@metadata][_id]}"
    action => "update"
    script_type => "inline"
    script => {
      source => "if (ctx._source.containsKey('field1')) {
                    ctx._source.field1 = event['field1'];
                  }
                  if (ctx._source.containsKey('field2')) {
                    ctx._source.field2 = event['field2'];
                  }
                  // 添加更多字段更新操作
                  "
      lang => "painless"
    }
    upsert => {
      "field1" => "default_value1",
      "field2" => "default_value2",
      // 添加更多默认字段和值
    }
  }
}

在这个配置中：

field1 和 field2 是您想要更新的特定字段。
event['field1'] 和 event['field2'] 是从 Logstash 事件中获取的这些字段的值。
ctx._source.containsKey('field1') 检查 Elasticsearch 文档的 _source 是否已经包含 field1 字段。如果包含，则更新该字段的值。对 field2 也进行同样的检查。
upsert 部分定义了当文档不存在时应该创建的默认文档内容，确保这些特定字段在文档不存在时会被设置。
通过这种方式，您可以在脚本中明确指定哪些字段需要被更新，而不必遍历所有字段。这种方法提高了脚本的效率，并且使得配置更加清晰易懂。记得在应用之前在测试环境中验证脚本的正确性。

如何排除不需要更新的字段？

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "your_index"
    document_id => "%{[@metadata][_id]}"
    action => "update"
    script_type => "inline"
    script => {
      source => "def fields_to_exclude = ['field_to_exclude1', 'field_to_exclude2'];
                  for (field in event) {
                    if (!fields_to_exclude.contains(field)) {
                      ctx._source[field] = event[field];
                    }
                  }"
      lang => "painless"
    }
    upsert => {
      "default_field1" => "default_value1",
      "default_field2" => "default_value2",
      # ... 设置默认字段和值
    }
  }
}

在这个脚本中，我们首先定义了一个名为 fields_to_exclude 的变量，它包含所有要排除的字段名称。然后，在遍历事件字段时，我们使用这个变量来检查字段是否应该被排除。

script更新所有字段时排除某个字段

                elasticsearch {
                                hosts => ["172.16.10.101:9200","172.16.10.102:9200","172.16.10.103:9200"]
                                index => "p-ups-push-msg-%{index_date}"
                                action => "update"
                                document_id => "%{messageId}"
                                doc_as_upsert => true
                                script => "for (field in params.event.keySet()) { if (field != 'logTimestamp' || (field == 'logTimestamp' && ctx._source[field] == null)) { ctx._source[field] = params.event[field] }}"
                                script_lang => "painless"
                                script_type => "inline"
                                manage_template => false
                                template_name => "p-ups-push-template"
                }

能否只更新变更的字段？

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "your_index"
    document_id => "%{[@metadata][_id]}"
    action => "update"
    script_type => "inline"
    script => {
      source => "for (field in event) { if (ctx._source[field] != event[field]) { ctx._source[field] = event[field] } }"
      lang => "painless"
    }
    upsert => {
      "default_field1" => "default_value1",
      "default_field2" => "default_value2",
      # ... 设置默认字段和值
    }
  }
}

在这个配置中：

script 部分定义了一个 Painless 脚本，它会遍历 Logstash 事件中的所有字段。
对于每个字段，脚本会检查 Elasticsearch 文档的 _source 中对应的字段值是否与事件中的值不同。
如果值不同，脚本会将 _source 中的字段更新为事件中的新值。
这样，只有那些实际发生变化的字段会被更新，而未变化的字段则保持不变。

script实现空值才更新

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "your-index"
    document_id => "%{[@metadata][_id]}"
    action => "update"
    script => {
      source => "if (ctx._source.your_field == null) { ctx._source.your_field = params.new_value }"
      lang => "painless"
      params => {
        "new_value" => "Your new value here"
      }
    }
    upsert => {
      "your_field" => "Your new value here"
    }
  }
}
在这个配置中：

your_field 是您想要更新的字段名称。
new_value 是您想要设置的新值，当 your_field 为空时。
upsert 是一个可选参数，如果文档不存在，它将使用提供的文档创建一个新文档。
脚本 source 中的 Painless 代码如下：
if (ctx._source.your_field == null) { ctx._source.your_field = params.new_value }
这段代码检查 your_field 是否为 null。如果是，它将该字段的值设置为 params.new_value 中定义的新值。

能否实现增量更新？

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "your-index"
    document_id => "%{[@metadata][_id]}"
    action => "update"
    script => {
      source => "if (ctx._source.field_to_update != null) { ctx._source.field_to_update += params.increment_value } else { ctx._source.field_to_update = params.increment_value }"
      lang => "painless"
      params => {
        "increment_value" => 1
      }
    }
    upsert => {
      "field_to_update" => 1
    }
  }
}

在这个配置中：

action => "update" 指定了要对现有文档执行更新操作。
script 包含了一个 Painless 脚本，该脚本检查文档中是否存在 field_to_update 字段。如果存在，它将该字段的值增加 increment_value；如果不存在，则将该字段的值设置为 increment_value。
params 定义了脚本中使用的参数，在这个例子中是 increment_value。
upsert 是一个可选参数，如果文档不存在，则使用提供的文档创建一个新文档。

split拆分某个字段（通常是数组或字符串）且保留原始记录

if [requestData] {
clone {
clones => ["original_event"]
add_tag => [ "original_event" ]
}
if "original_event" not in [tags] {
split {
field => "requestData"
add_tag => [ "requestDataSplit" ]
}
}
#if [tags] == ["original_event"] {
# mutate {
# add_tag => [ "requestDataOrigin" ]
# }
#}
}

if [requestData] and "requestDataSplit" in [tags] {
elasticsearch {
hosts => ["172.16.10.101:9200","172.16.10.102:9200","172.16.10.103:9200"]
index => "b-old-app-request-params-%{index_date}"
#action => "update"
#document_id => "%{[requestData][Mp_Tran_No]}"
#doc_as_upsert => true
manage_template => false
template_name => "b-old-app-template"
#user => "elastic"
#password => "macaupass@123"
}
}
if "trade" in [tags] and "original_event" in [tags] {
#stdout {
# codec => rubydebug
#}

# 全量日志
elasticsearch {
hosts => ["172.16.10.101:9200","172.16.10.102:9200","172.16.10.103:9200"]
index => "b-old-app-log-%{index_date}"
manage_template => false
template_name => "b-old-app-template"
#user => "elastic"
#password => "macaupass@123"
}

}



在这个配置中：
在 Logstash 中使用 split 过滤器时，如果您想要保留原始记录，同时还要拆分某个字段（通常是数组或字符串），您可以使用 clone 操作来创建原始记录的副本，并在副本上进行拆分。这样，原始记录和拆分后的记录都会继续在 Logstash 的处理流程中。

在这个配置中：

clone 过滤器用于创建原始事件的副本，并给它添加了一个特殊的标签 _clone_original_event。
split 过滤器只作用于那些没有被标记为 _clone_original_event 的事件，即原始事件的副本。
通过 if 条件，我们检查事件是否带有 original_event 标签，来区分原始事件和拆分后的事件。
在 output 部分，我们根据事件是否带有 original_event 标签，将它们发送到不同的 Elasticsearch 索引。

通过这种方式，您可以在 Logstash 中同时保留原始记录和拆分后的记录，并将它们发送到不同的索引中。请根据您的具体需求调整字段名称、标签和目标索引。

总结

第三种方式统一管理Template最好，推荐使用第三种方式，但是具体问题具体分析。例如场景是Logstash 和ElasticSearch都在一台服务器，第二种就比较好。

定制索引模板，是搜索业务中一项比较重要的步骤，需要注意的地方有很多，比如：
1.字段数固定吗
2.字段类型是什么
3.分不分词
4.索引不索引
5.存储不存储
6.排不排序
7.是否加权

参考链接：

https://www.jianshu.com/p/0b89c07021f4
https://blog.csdn.net/u013613428/article/details/101286588

posted on 2019-11-04 12:10 曹伟雄阅读(31613) 评论(0) 收藏举报

刷新页面返回顶部

曹伟雄