Elasticsearch一些使用笔记(持续更新)

这篇博客记录这一些运维ES的一些经验。

1、节点磁盘使用率过高，导致ES集群shard无法分配，丢失数据？

有两个配置，分配副本的时候

参数名称	默认值	含义
cluster.routing.allocation.disk.watermark.low	85%	当节点磁盘占用量高于85%时，就不会往该节点分配副本了
cluster.routing.allocation.disk.watermark.high	90%	当节点磁盘占用量高于90%时，尝试将该节点的副本重分配到其他节点

配置方式

curl -XPUT 'localhost:9200/_cluster/settings' -d
'{
    "transient": {  
      "cluster.routing.allocation.disk.watermark.low": "90%"    
    }
}'

建议：密切关注ES集群节点的性能参数，对潜在风险有感知。

2、模板管理

template机制是比较有用的，特别是管理大量索引的时候。先给一个template的demo。

order：10 template的优先级，优先级高(order数字大的)会覆盖优先级低的template里的字段。

template：test*，这个template会命中test开头的索引。

index.number_of_shards：20 //index的一些配置

index.number_of_replicas:：1

index.refresh_interval：5s

{
    "aliases": {},
    "order": 10,
    "template": "test*",
    "settings": {
        "index": {
            "priority": "5",
            "merge": {
                "scheduler": {
                    "max_thread_count": "1"
                }
            },
            "search": {
                "slowlog": {
                    "threshold": {
                        "query": {
                            "warn": "10s",
                            "debug": "1s",
                            "info": "5s",
                            "trace": "500ms"
                        },
                        "fetch": {
                            "warn": "1s",
                            "debug": "500ms",
                            "info": "800ms",
                            "trace": "200ms"
                        }
                    }
                }
            },
            "unassigned": {
                "node_left": {
                    "delayed_timeout": "5m"
                }
            },
            "max_result_window": "10000",
            "number_of_shards": "20",
            "number_of_replicas": "1", 
            "translog": {
                "durability": "async"
            },
            "requests": {
                "cache": {
                    "enable": "true"
                }
            },
            "mapping": {
                "ignore_malformed": "true"
            },
            "refresh_interval": "5s"
        }
    }
}

配置方式

curl -XPUT localhost:9200/_template/template_1 -d '
{
    "template" : "test*",
    "order" : 0,
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "type1" : {
            "_source" : { "enabled" : false }
        }
    }
}
'

在配置了模板以后，如何建立索引

# 索引创建
curl -XPUT http://35.1.4.127:9200/index_name

3、mapping创建的一些注意事项

在创建索引type mapping的时候要妥善处理好_all和_source，不然会影响索引的性能。

_all，enable的话会把一个type中的所有字段合并成一个大字段，增加索引时间和大小。

_source，enable的话会请求会返回_source的结构体。

一般我们会禁用_all，打开_source。

另外，对时间的处理，可以如下这样，对于各种繁琐的时间格式都是支持的。

配置方式

curl -PUT http://35.1.4.129:9200/index_name/RELATION/_mapping -d '{
    "RELATION": {
        "_all": {
            "enabled": "false"
        },
        "_source": {
            "enabled": "true"
        },
        "properties": {
            "FROM_SFZH": {
                "type": "keyword"
            },
            "TO_SFZH": {
                "type": "keyword"
            },
            "CREATE_TIME": {
                "type": "date",
                "format": "yyyy-MM-dd HH:mm:ss.SSS Z||yyyy-MM-dd HH:mm:ss.SSS||yyyy-MM-dd HH:mm:ss,SSS||yyyy/MM/dd HH:mm:ss||yyyy-MM-dd HH:mm:ss,SSS Z||yyyy/MM/dd HH:mm:ss,SSS Z||strict_date_optional_time||epoch_millis||yyyy-MM-dd HH:mm:ss"
            }
        }
    }
}'

4、批量数据灌入ES时要禁用副本和刷新

大规模批量导入数据的时候，要禁用副本和刷新，ES在索引数据的时候，如果有副本的话，会同步副本，造成压力。

等到数据索引完成后，在恢复副本。

配置方法

// 关闭
curl -PUT http://35.1.4.129:9200/_settings -d '{
　　"index": {
　　　　"number_of_replicas" : 0
　　　　"refresh_interval" : -1
　　}    
}'

// 打开
curl -PUT http://35.1.4.129:9200/_settings -d '{
　　"index": {
　　　　"number_of_replicas" : 1
　　　　"refresh_interval" : 5s
　　}    
}'

5、jvm层面监控和优化

Elasticsearch是java开发的组件，当然可以压测看一下jvm的表现，例如通过jconsole远程连接。

config/jvm.options里面有各种jvm的配置，可以根据硬件资源合理配置一下。jvm调优就不说了。

-Djava.rmi.server.hostname=192.168.1.152
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9110
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false

6、高并发查询时，优化ES线程池

当你查询并发上来了，有时候你会发现下面这个异常

EsRejectedExcutionException[rejected execution(queue capacity 50) on.......]

这个原因是在新版本的elasticsearch中线程池已经是fixed类型了，即固定大小的线程池，默认是5*core数，当所有线程忙碌，且队列满的情况下，es会拒绝请求。

多种请求类型对应多种线程池

index：此线程池用于索引和删除操作。它的类型默认为fixed，size默认为可用处理器的数量，队列的size默认为200。
search：此线程池用于搜索和计数请求。它的类型默认为fixed，size默认为(可用处理器的数量* 3) / 2) + 1，队列的size默认为1000。
suggest：此线程池用于建议器请求。它的类型默认为fixed，size默认为可用处理器的数量，队列的size默认为1000。
get：此线程池用于实时的GET请求。它的类型默认为fixed，size默认为可用处理器的数量，队列的size默认为1000。
bulk：此线程池用于批量操作。它的类型默认为fixed，size默认为可用处理器的数量，队列的size默认为50。
percolate：此线程池用于预匹配器操作。它的类型默认为fixed，size默认为可用处理器的数量，队列的size默认为1000。

这里以index为例，可以在elasticsearch.yml中修改线程池配置

threadpool.index.type: fixed
threadpool.index.size: 100
threadpool.index.queue_size: 500

通过api控制

curl -XPUT 'localhost:9200/_cluster/settings' -d '{
    "transient": {
        "threadpool.index.type": "fixed",
        "threadpool.index.size": 100,
        "threadpool.index.queue_size": 500
    }
}'

7、若干副本shard分配不成功，集群状态yellow

7.1 先看看集群状态

curl -XGET http://10.96.78.164:9200/_cluster/health?pretty

结果如下，如果有未分配的分片，unassigned_shards应该不为0，status=yellow。

{
"cluster_name": "elasticsearch",
"status": "green",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 575,
"active_shards": 575,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100
}

7.2 查看未分配的shard属于哪个index，以及allocate的目标机器是哪个。

curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED

结果

xiankan_xk_qdhj                  3 r UNASSIGNED    0    261b 10.96.78.164 yfbf9D3
xiankan_xk_qdhj                  2 r UNASSIGNED    0    261b 10.96.78.164 yfbf9D3
xiankan_xk_qdhj                  1 r UNASSIGNED    0    261b 10.96.78.164 yfbf9D3
xiankan_xk_qdhj                  4 r UNASSIGNED    0    261b 10.96.78.164 yfbf9D3

r-表示副本分片，p是主分片，ip是分配目标机器

7.3 尝试1：索引级别的副本重新分配

有问题的索引，先关闭其副本，然后打开重新分配副本。

关闭

curl -PUT http://35.1.4.129:9200/xiankan_xk_zjhj/_settings -d '{
　　"index": {
　　　　"number_of_replicas" : 0
　　}    
}'

打开

http://10.96.78.164:9200/xiankan_xk_zjhj/_settings -d '{
  "index": {
    "number_of_replicas": 1
  }
}'

7.4 尝试2：node级别的副本重新分配

重启shard分配不成功的node，如果shard分布在为数不多的几个node上，可以根据ip重启node上的es实例

杀死es

ps -ef | grep elasticsearch | grep -v grep | awk '{print $2}' | xargs kill -9

启动es

./bin/elasticsearch -d

7.5 尝试3：逐个索引shard的reroute

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands" : [ {
"allocate" : {
"index" : "xiankan_xk_zjhj",
"shard" : 1,
"node" : "yfbf9D3",
"allow_primary" : true
}
}
] }'

posted @ 2019-03-27 11:34 扎心了老铁阅读(3548) 评论(0) 编辑收藏举报

刷新页面返回顶部

扎心了老铁

Elasticsearch一些使用笔记(持续更新)

公告