Elasticsearch Data streams
data stream的背后可以认为是一组自动创建的index。
数据流允许跨多个index仅追加时间序列数据,同时为请求提供单个index的命名(别名)。数据流非常适合于日志、事件、度量和其他连续生成的数据。
可以直接向数据流提交索引和搜索请求。流自动将请求路由到存储流数据的备份索引。您可以使用索引生命周期管理(ILM)来自动管理这些备份索引。
读数据
写数据
不能对其他index增加文档,即便是指定全名也不可以。对正在可写的index不能操作:
generation
index生成规则:一个六位数的零填充整数,作为流滚动的累积计数,从000001开始。
index的完整名称将会是
.ds-<data-stream>-<yyyy.MM.dd>-<generation>
例如 .ds-my-data-stream-2021.10.27-000001
append-only 不能将现有文档的更新或删除请求直接发送到data stream,可以使用 update by query and delete by query
如果有必要,可以指定完整的index名称进行更新、删除。
如果需要经常更新、删除操作的,使用index template 加 index别名的方式,而不是使用data stream。详见 Manage time series data without data streams.
创建Data stream
通常的步骤:
- Create an index lifecycle policy 创建ILM
- Create component templates 不是必须的
- Create an index template 创建index template
- Create the data stream 创建data stream
- Secure the data stream 权限控制,不是必须的
创建ILM
PUT _ilm/policy/my-lifecycle-policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_primary_shard_size": "50gb"
}
}
},
"warm": {
"min_age": "30d",
"actions": {
"shrink": {
"number_of_shards": 1
},
"forcemerge": {
"max_num_segments": 1
}
}
},
"cold": {
"min_age": "60d",
"actions": {
"searchable_snapshot": {
"snapshot_repository": "found-snapshots"
}
}
},
"frozen": {
"min_age": "90d",
"actions": {
"searchable_snapshot": {
"snapshot_repository": "found-snapshots"
}
}
},
"delete": {
"min_age": "735d",
"actions": {
"delete": {}
}
}
}
}
}
这里创建2个_component_template供index template使用
PUT _component_template/my-mappings
{
"template": {
"mappings": {
"properties": {
"@timestamp": {
"type": "date",
"format": "date_optional_time||epoch_millis"
},
"message": {
"type": "wildcard"
}
}
}
},
"_meta": {
"description": "Mappings for @timestamp and message fields",
"my-custom-meta-field": "More arbitrary metadata"
}
}
PUT _component_template/my-settings
{
"template": {
"settings": {
"index.lifecycle.name": "my-lifecycle-policy"
}
},
"_meta": {
"description": "Settings for ILM",
"my-custom-meta-field": "More arbitrary metadata"
}
}
创建index template
PUT _index_template/my-index-template
{
"index_patterns": ["my-data-stream*"],
"data_stream": { },
"composed_of": [ "my-mappings", "my-settings" ],
"priority": 500,
"_meta": {
"description": "Template for my time series data",
"my-custom-meta-field": "More arbitrary metadata"
}
}
接下来可以自动创建data stream了
PUT my-data-stream/_bulk
{ "create":{ } }
{ "@timestamp": "2099-05-06T16:21:15.000Z", "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736" }
{ "create":{ } }
{ "@timestamp": "2099-05-06T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" }
POST my-data-stream/_doc
{
"@timestamp": "2099-05-06T16:21:15.000Z",
"message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736"
}
也可以使用 PUT _data_stream/my-data-stream 来创建
查询data stream GET _data_stream/my-data-stream
删除data stream DELETE _data_stream/my-data-stream
使用Data stream通常有以下应用:
- Add documents to a data stream
- Search a data stream
- Get statistics for a data stream
- Manually roll over a data stream
- Open closed backing indices
- Reindex with a data stream
- Update documents in a data stream by query
- Delete documents in a data stream by query
- Update or delete documents in a backing index
增加文档
POST /my-data-stream/_doc/
{
"@timestamp": "2099-03-08T11:06:07.000Z",
"user": {
"id": "8a4f500d"
},
"message": "Login successful"
}
如果指定ID时,不能使用 PUT /<target>/_doc/<_id> ,但可以使用PUT /<target>/_create/<_id>。
而_bulk只支持新增文档。
查询文档
跟index的查询是相同的
查询Data stream的状态度量数据
GET /_data_stream/my-data-stream/_stats?human=true
手动rollover
POST /my-data-stream/_rollover/
开启关闭背后的index
不能对closed的backing index进行查询、更新、删除。
如要reopen可以使用 POST /.ds-my-data-stream-2099.03.07-000001/_open/ , 也可以开启全部closed的backing index POST /my-data-stream/_open/
Reindex到Data stream
POST /_reindex
{
"source": {
"index": "archive"
},
"dest": {
"index": "my-data-stream",
"op_type": "create"
}
}
POST /my-data-stream/_update_by_query
{
"query": {
"match": {
"user.id": "l7gk7f82"
}
},
"script": {
"source": "ctx._source.user.id = params.new_id",
"params": {
"new_id": "XgdX0NoX"
}
}
}
POST /my-data-stream/_delete_by_query
{
"query": {
"match": {
"user.id": "vlb44hny"
}
}
}
指定backing index更新或删除文档
先查询得到index名称和文档ID
修改mappings和settings
由于data stream有一个index template,它的mappings和settings是来自index template的,因此最初要考虑好使用的mappings和settings。
在后续如果想做变更,例如
- Add a new field mapping to a data stream
- Change an existing field mapping in a data stream
- Change a dynamic index setting for a data stream
- Change a static index setting for a data stream
增加字段
首先在index template上增加字段,这样后续自动创建的index将会有新字段
PUT /_index_template/my-data-stream-template
{
"index_patterns": [ "my-data-stream*" ],
"data_stream": { },
"priority": 500,
"template": {
"mappings": {
"properties": {
"message": {
"type": "text"
}
}
}
}
}
再对已存在的backing index也增加字段,这将对所有的backing index起作用,包括write的index
PUT /my-data-stream/_mapping
{
"properties": {
"message": {
"type": "text"
}
}
}
也可以只对write的index增加字段
PUT /my-data-stream/_mapping?write_index_only=true
{
"properties": {
"message": {
"type": "text"
}
}
}
修改已存在的字段
因为ES的字段type是不能修改的,但可以修改其他的参数配置
首先修改index template
PUT /_index_template/my-data-stream-template { "index_patterns": [ "my-data-stream*" ], "data_stream": { }, "priority": 500, "template": { "mappings": { "properties": { "host": { "properties": { "ip": { "type": "ip", "ignore_malformed": true } } } } } } }
以上修改了 "ignore_malformed": true
再对已存在的backing index也作此修改,同上面增加字段
修改index的dynamic settings
同样也是以上步骤,使用对应的api
修改index的static settings
修改index template的settings,跟dynamic不同,static的修改只能对未来新增的backing index起作用。如果想要立即生效,可以使用手动rollover立即产生新的backing index达到效果。
使用reindex修改字段类型
跟index的reindex类似,data stream也可以reindex,实现例如@timestamp的date类型转date_nanos类型