ElasticSearch 简单入门
一、前言
ElasticSearch 是一个分布式、可扩展、实时的搜索与数据分析引擎。它建立在 Apache Lucene 基础之上。Lucene 可以说是当下最先进、高性能、全功能的搜索引擎库(无论是开源还是私有)。ElasticSearch 将所有的功能打包成一个单独的服务,这样你可以通过程序与它提供的简单的 RESTful API 进行通信,可以使用自己喜欢的编程语言充当客户端。
- 当你在 Github 上搜索时,ElasticSearch 不仅可以帮你找到相关的代码库,还可以帮助你实现代码级的搜索与高亮显示
- 当你在网上购物时,ElasticSearch 可以帮你推荐相关的商品
- 当你下班打车回家时,ElasticSearch 可以通过定位附近的乘客和司机,帮助平台优化调度
- Wikipedia 使用 ElasticSearch 提供高亮片段的全文搜索。
三、单实例安装
介质准备:
elasticsearch-7.10.2-linux-x86_64.tar.gz elasticsearch-analysis-ik-7.10.2.zip elasticsearch-analysis-pinyin-7.10.2.zip
kibana-7.10.2-linux-x86_64.tar.gz
主机参数设置(/etc/sysctl.conf):
# sysctl settings are defined through files in # /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/. # # Vendors settings live in /usr/lib/sysctl.d/. # To override a whole file, create a new file with the same in # /etc/sysctl.d/ and put new settings there. To override # only specific settings, add a file with a lexically later # name in /etc/sysctl.d/ and put new settings there. # # For more information, see sysctl.conf(5) and sysctl.d(5). net.ipv4.tcp_tw_reuse = 0 net.ipv4.tcp_tw_recycle = 0 net.ipv4.tcp_fin_timeout = 5 net.ipv4.tcp_keepalive_time = 15 net.ipv4.ip_local_port_range = 21000 61000 fs.file-max = 6553600 kernel.sem = 250 32000 100 128 net.ipv4.conf.all.accept_redirects = 0 net.core.somaxconn = 32768 vm.max_map_count = 524288
生效:sysctl -p
主机参数设置(/etc/security/limits.conf):
* soft nofile 1048576 * hard nofile 1048576 * soft nproc 65536 * hard nproc 65536 * soft memlock unlimited * hard memlock unlimited
目录规划:
. |-- bin | |-- schema | |-- start-es.sh | |-- start-kibana.sh | |-- stop-es.sh | `-- sync |-- data -> /data/es-data |-- etc |-- lib | |-- ojdbc8-19.8.0.0.jar | `-- orai18n-19.8.0.0.jar |-- logs |-- sbin |-- support |-- elasticsearch-7.10.2 |-- es -> elasticsearch-7.10.2 |-- kibana -> kibana-7.10.2-linux-x86_64 |-- kibana-7.10.2-linux-x86_64 |-- logstash -> logstash-7.10.2 `-- logstash-7.10.2
.bash_profiler 设置
# .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # +-------------------------------------+ # | AI'S PROFILE, DON'T MODIFY! | # +-------------------------------------+ alias grep='grep --colour=auto' alias vi='vim' alias ll='ls -l' alias ls='ls --color=auto' alias mv='mv -i' alias rm='rm -i' alias ups='ps -u `whoami` -f' export ES_HOME=${HOME}/support/es export JAVA_HOME=${ES_HOME}/jdk export PS1="\[\033[01;32m\]\u@\h\[\033[01;34m\] \w \$\[\033[00m\] " export TERM=linux export EDITOR=vim export PATH=${HOME}/bin:${HOME}/sbin:${JAVA_HOME}/bin:${ES_HOME}/bin:${HOME}/support/logstash/bin:$PATH export LANG=zh_CN.utf8 export TIMOUT=3000 export HISTSIZE=1000
根据环境调整 JVM 内存:~/support/es/config/jvm.options
-Xms16g
-Xmx16g
根据环境设置基础配置:~/support/es/config/elasticsearch.yml
# ======================== Elasticsearch Configuration ========================= # # NOTE: Elasticsearch comes with reasonable defaults for most settings. # Before you set out to tweak and tune the configuration, make sure you # understand what are you trying to accomplish and the consequences. # # The primary way of configuring a node is via this file. This template lists # the most important settings you may want to configure for a production cluster. # # Please consult the documentation for further information on configuration options: # https://www.elastic.co/guide/en/elasticsearch/reference/index.html # # ---------------------------------- Cluster ----------------------------------- # # Use a descriptive name for your cluster: # cluster.name: crm # # ------------------------------------ Node ------------------------------------ # # Use a descriptive name for the node: # node.name: node-1 # # Add custom attributes to the node: # node.attr.rack: r1 # # ----------------------------------- Paths ------------------------------------ # # Path to directory where to store the data (separate multiple locations by comma): # path.data: /home/es/data # # Path to log files: # path.logs: /home/es/logs # # ----------------------------------- Memory ----------------------------------- # # Lock the memory on startup: # #bootstrap.memory_lock: true # # Make sure that the heap size is set to about half the memory available # on the system and that the owner of the process is allowed to use this # limit. # # Elasticsearch performs poorly when the system is swapping the memory. # # ---------------------------------- Network ----------------------------------- # # Set the bind address to a specific IP (IPv4 or IPv6): # network.host: 10.230.55.48 # # Set a custom port for HTTP: # http.port: 9200 # # For more information, consult the network module documentation. # # --------------------------------- Discovery ---------------------------------- # # Pass an initial list of hosts to perform discovery when this node is started: # The default list of hosts is ["127.0.0.1", "[::1]"] # discovery.seed_hosts: ["10.230.55.48"] # # Bootstrap the cluster using an initial set of master-eligible nodes: # cluster.initial_master_nodes: ["10.230.55.48"] # # For more information, consult the discovery and cluster formation module documentation. # # ---------------------------------- Gateway ----------------------------------- # # Block initial recovery after a full cluster restart until N nodes are started: # #gateway.recover_after_nodes: 1 # # For more information, consult the gateway module documentation. # # ---------------------------------- Various ----------------------------------- # # Require explicit names when deleting indices: # #action.destructive_requires_name: true # 安全认证配置: http.cors.enabled: true http.cors.allow-origin: "*" http.cors.allow-headers: Authorization xpack.security.enabled: true xpack.security.transport.ssl.enabled: true
启动脚本(~/bin/start-es.sh):
#!/bin/sh cd ~/support/es/bin ./elasticsearch -d
设置密码:
~/support/es/bin/elasticsearch-setup-passwords interactive
需要设置 elastic,apm_system,kibana,kibana_system,logstash_system,beats_system,remote_monitoring_user 这些用户的密码,设置完就可以了。
验证:
es@centos01 ~/bin $ curl --user elastic:123456 -XGET http://10.230.55.48:9200?pretty=true Enter host password for user 'elastic': { "name" : "node-1", "cluster_name" : "crm", "cluster_uuid" : "1SAd8U-zRyGKy8ztRWAQhQ", "version" : { "number" : "7.10.2", "build_flavor" : "default", "build_type" : "tar", "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9", "build_date" : "2021-01-13T00:42:12.435326Z", "build_snapshot" : false, "lucene_version" : "8.7.0", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "You Know, for Search" }
Kibana 安装:
目录:~/support/kibana
修改配置:~/support/kibana/config/kibana.yml
server.port: 5601 server.host: "10.230.55.48" elasticsearch.hosts: ["http://10.230.55.48:9200"] elasticsearch.username: "elastic" elasticsearch.password: "123456" i18n.locale: "en"
Dev Tools:
# 查看 Elastic 版本信息
GET /
# 查看集群健康情况 GET _cluster/health # 查看集群节点 GET _cat/nodes # 分片情况 GET _cat/shards # 查看索引清单 GET _cat/indices # 查看索引数据量 GET sec_function/_count
四、索引
查看当前节点的所有 Index:
es@centos01 ~ $ curl --user elastic:123456 -XGET http://10.230.55.48:9200/_cat/indices?v
Enter host password for user 'elastic': health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open sec_function 90qf16nIQNqfd_l_deqWpA 10 0 8040 51 4.2mb 4.2mb
green open pm_offer_for_trans GSehvq3EQZKgWnkztShSyw 10 0 1056308 172784 313.6mb 313.6mb green open tf_r_address_tree F5xwcaRfTYmfReiPgOE3Fg 10 0 11490425 0 1gb 1gb
新建和删除索引:
es@centos01 ~ curl --user elastic:123456 -XPUT 'http://10.230.55.48:9200/weather' Enter host password for user 'elastic': {"acknowledged":true,"shards_acknowledged":true,"index":"weather"} es@centos01 ~ curl -uelastic -XGET http://10.230.55.48:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open sec_function 90qf16nIQNqfd_l_deqWpA 10 0 8040 51 4.2mb 4.2mb
green open pm_offer_for_trans GSehvq3EQZKgWnkztShSyw 10 0 1056308 172784 313.6mb 313.6mb
green open tf_r_address_tree F5xwcaRfTYmfReiPgOE3Fg 10 0 11490425 0 1gb 1gb
green open weather vIVMeX22SReCpKGD0Pk5uw 5 1 0 0 2.2kb 1.1kb
es@centos01 ~ curl -uelastic -XDELETE 'http://10.230.55.48:9200/weather'
{"acknowledged":true}
五、中文分词
将 elasticsearch-analysis-ik-7.10.2.zip、elasticsearch-analysis-pinyin-7.10.2.zip 解压到 ~/support/es/plugins 目录下,并重启 ES。
es@centos01 ~/support $ tree ~/support/es/plugins/ /home/es/support/es/plugins/ |-- ik | |-- commons-codec-1.9.jar | |-- commons-logging-1.2.jar | |-- config | | |-- extra_main.dic | | |-- extra_single_word.dic | | |-- extra_single_word_full.dic | | |-- extra_single_word_low_freq.dic | | |-- extra_stopword.dic | | |-- IKAnalyzer.cfg.xml | | |-- main.dic | | |-- preposition.dic | | |-- quantifier.dic | | |-- stopword.dic | | |-- suffix.dic | | `-- surname.dic | |-- elasticsearch-analysis-ik-7.10.2.jar | |-- httpclient-4.5.2.jar | |-- httpcore-4.4.4.jar | |-- plugin-descriptor.properties | `-- plugin-security.policy `-- pinyin |-- elasticsearch-analysis-pinyin-7.10.2.jar |-- nlp-lang-1.7.jar `-- plugin-descriptor.properties 3 directories, 22 files
六、数据操作
新索引准备:
curl --user elastic:123456 -XPUT 'http://10.230.55.48:9200/student' -H 'Content-Type: application/json' -d ' { "mappings" : { "properties" : { "name" : { "type" : "keyword" }, "age" : { "type" : "integer" } } }, "settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 } } }'
# 请求,没有指定 _id 的情况下,Elastic 将为你自动生成一个随机字符串作为 _id。 curl --user elastic:123456 -XPOST 'http://10.230.55.48:9200/student/_doc?pretty=true' -H 'Content-Type: application/json' -d ' { "name": "张三" }' # 响应 { "_index" : "student", "_type" : "_doc", "_id" : "q6ek7XcBqu3Z6vLyxDD4", "_version" : 1, "result" : "created", "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1 }
添加数据实例二:(指定 _id 为 2)
# 请求,指定 _id 为 2 curl --user elastic:123456 -XPOST 'http://10.230.55.48:9200/student/_doc/2?pretty=true' -H 'Content-Type: application/json' -d ' { "name": "李四" }' # 响应 { "_index" : "student", "_type" : "_doc", "_id" : "2", "_version" : 1, "result" : "created", "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "_seq_no" : 1, "_primary_term" : 1 }
一种错误的数据更新方式:
# 请求
curl --user elastic:123456 -XGET 'http://10.230.55.48:9200/student/_doc/2?pretty=true'
# 响应 { "_index" : "student", "_type" : "_doc", "_id" : "2", "_version" : 1, "_seq_no" : 1, "_primary_term" : 1, "found" : true, "_source" : { "name" : "李四" } }
我们注意到结果中没有 age 字段。
# 请求
curl --user elastic:123456 -XPOST 'http://10.230.55.48:9200/student/_doc/2?pretty=true' -H 'Content-Type: application/json' -d ' { "age": 10 }'
# 响应
{ "_index" : "student", "_type" : "_doc", "_id" : "2", "_version" : 2, "result" : "updated", "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "_seq_no" : 2, "_primary_term" : 1 }
# 请求
curl --user elastic:123456 -XGET 'http://10.230.55.48:9200/student/_doc/2?pretty=true'
# 响应
{ "_index" : "student", "_type" : "_doc", "_id" : "2", "_version" : 2, "_seq_no" : 2, "_primary_term" : 1, "found" : true, "_source" : { "age" : 10 } }
结果是 version 从1变成了2,而 name 字段不见了。原因是 POST student/_doc/2 这种语法的效果是覆盖数据。可以理解为先把原文档删除,再索引新文档。
使用 _update 更新文档
es@centos01 ~ $ curl --user elastic:123456 -XPOST 'http://10.230.55.48:9200/student/_doc/2?pretty=true' -H 'Content-Type: application/json' -d ' { "name": "李四" }' es@centos01 ~ $ curl --user elastic:123456 -XPOST 'http://10.230.55.48:9200/student/_doc/2/_update?pretty=true' -H 'Content-Type: application/json' -d ' { "doc": { "age": 10 } }' # 请求 es@centos01 ~ $ curl --user elastic:123456 -XGET 'http://10.230.55.48:9200/student/_doc/2?pretty=true' { "_index" : "student", "_type" : "_doc", "_id" : "2", "_version" : 4, "_seq_no" : 4, "_primary_term" : 1, "found" : true, "_source" : { "name" : "李四", "age" : 10 } }
使用 _update 时,ES 做了下面几件事:
- 从旧文档构建 JSON
- 更改该 JSON
- 删除旧文档
- 索引一个新文档