你还不会ES的CUD吗?
近端时间在搬砖过程中对es进行了操作,但是对es查询文档不熟悉,所以这两周都在研究es,简略看了《Elasticsearch权威指南》,摸摸鱼又是一天。
es是一款基于Lucene的实时分布式搜索和分析引擎,今天咱不聊其应用场景,聊一下es索引增删改。
环境:Centos 7,Elasticsearch6.8.3,jdk8
(最新的es是7版本,7版本需要jdk11以上,所以装了es6.8.3版本。)
下面都将以student索引为例
一、创建索引
PUT http://192.168.197.100:9200/student { "mapping":{ "_doc":{ //“_doc”是类型type,es6中一个索引下只有一个type,不能有其它type "properties":{ "id": { "type": "keyword" }, "name":{ "type":"text", "index":"analyzed", "analyzer":"standard" }, "age":{ "type":"integer", "fields": { "keyword": { "type": "keyword", "ignore_above":256 } } }, "birthday":{ "type":"date" }, "gender":{ "type":"keyword" }, "grade":{ "type":"text", "fields":{ "keyword":{ "type":"keyword", "ignore_above":256 } } }, "class":{ "type":"text", "fields":{ "keyword":{ "type":"keyword", "ignore_above":256 } } } } } }, "settings":{ //主分片数量 "number_of_shards" : 1, //分片副本数量 "number_of_replicas" : 1 } }
type属性是text和keyword的区别:
(1)text在查询的时候会被分词,用于搜索
(2)keyword在查询的时候不会被分词,用于聚合
index属性是表示字符串以何种方式被索引,有三种值
(1)analyzed:字段可以被模糊匹配,类似于sql中的like
(2)not_analyzed:字段只能精确匹配,类似于sql中的“=”
(3)no:字段不提供搜索
analyzer属性是设置分词器,中文的话一般是ik分词器,也可以自定义分词器。
number_of_shards属性是主分片数量,默认是5,创建之后不能修改
number_of_replicas属性时分片副本数量,默认是1,可以修改
创建成功之后会返回如下json字符串
{ "acknowledged": true, "shards_acknowledged": true, "index": "student"}
创建之后如何查看索引的详细信息呢?
GET http://192.168.197.100:9200/student/_mapping
es6版本,索引之下只能有一个类型,例如上文中的“_doc”。
es跟关系型数据库比较:
二、修改索引
//修改分片副本数量为2 PUT http://192.168.197.100:9200/student/_settings { "number_of_replicas":2 }
三、删除索引
//删除单个索引 DELETE http://192.168.197.100:9200/student //删除所有索引 DELETE http://192.168.197.100:9200/_all
四、默认分词器standard和ik分词器比较
es默认的分词器是standard,它对英文的分词是以空格分割的,中文则是将一个词分成一个一个的文字,所以其不适合作为中文分词器。
例如:standard对英文的分词
//此api是查看文本分词情况的 POST http://192.168.197.100:9200/_analyze { "text":"the People's Republic of China", "analyzer":"standard" }
结果如下:
{ "tokens": [ { "token": "the", "start_offset": 0, "end_offset": 3, "type": "<ALPHANUM>", "position": 0 }, { "token": "people's", "start_offset": 4, "end_offset": 12, "type": "<ALPHANUM>", "position": 1 }, { "token": "republic", "start_offset": 13, "end_offset": 21, "type": "<ALPHANUM>", "position": 2 }, { "token": "of", "start_offset": 22, "end_offset": 24, "type": "<ALPHANUM>", "position": 3 }, { "token": "china", "start_offset": 25, "end_offset": 30, "type": "<ALPHANUM>", "position": 4 } ] }
对中文的分词:
POST http://192.168.197.100:9200/_analyze { "text":"中华人民共和国万岁", "analyzer":"standard" }
结果如下:
{ "tokens": [ { "token": "中", "start_offset": 0, "end_offset": 1, "type": "<IDEOGRAPHIC>", "position": 0 }, { "token": "华", "start_offset": 1, "end_offset": 2, "type": "<IDEOGRAPHIC>", "position": 1 }, { "token": "人", "start_offset": 2, "end_offset": 3, "type": "<IDEOGRAPHIC>", "position": 2 }, { "token": "民", "start_offset": 3, "end_offset": 4, "type": "<IDEOGRAPHIC>", "position": 3 }, { "token": "共", "start_offset": 4, "end_offset": 5, "type": "<IDEOGRAPHIC>", "position": 4 }, { "token": "和", "start_offset": 5, "end_offset": 6, "type": "<IDEOGRAPHIC>", "position": 5 }, { "token": "国", "start_offset": 6, "end_offset": 7, "type": "<IDEOGRAPHIC>", "position": 6 }, { "token": "万", "start_offset": 7, "end_offset": 8, "type": "<IDEOGRAPHIC>", "position": 7 }, { "token": "岁", "start_offset": 8, "end_offset": 9, "type": "<IDEOGRAPHIC>", "position": 8 } ] }
ik分词器是支持对中文进行词语分割的,其有两个分词器,分别是ik_smart和ik_max_word。
(1)ik_smart:对中文进行最大粒度的划分,简略划分
例如:
POST http://192.168.197.100:9200/_analyze { "text":"中华人民共和国万岁", "analyzer":"ik_smart" }
结果如下:
{ "tokens": [ { "token": "中华人民共和国", "start_offset": 0, "end_offset": 7, "type": "CN_WORD", "position": 0 }, { "token": "万岁", "start_offset": 7, "end_offset": 9, "type": "CN_WORD", "position": 1 } ] }
(2)ik_max_word:对中文进行最小粒度的划分,将文本划分尽量多的词语
例如:
POST http://192.168.197.100:9200/_analyze { "text":"中华人民共和国万岁", "analyzer":"ik_max_word" }
结果如下:
{ "tokens": [ { "token": "中华人民共和国", "start_offset": 0, "end_offset": 7, "type": "CN_WORD", "position": 0 }, { "token": "中华人民", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 1 }, { "token": "中华", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 2 }, { "token": "华人", "start_offset": 1, "end_offset": 3, "type": "CN_WORD", "position": 3 }, { "token": "人民共和国", "start_offset": 2, "end_offset": 7, "type": "CN_WORD", "position": 4 }, { "token": "人民", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 5 }, { "token": "共和国", "start_offset": 4, "end_offset": 7, "type": "CN_WORD", "position": 6 }, { "token": "共和", "start_offset": 4, "end_offset": 6, "type": "CN_WORD", "position": 7 }, { "token": "国", "start_offset": 6, "end_offset": 7, "type": "CN_CHAR", "position": 8 }, { "token": "万岁", "start_offset": 7, "end_offset": 9, "type": "CN_WORD", "position": 9 }, { "token": "万", "start_offset": 7, "end_offset": 8, "type": "TYPE_CNUM", "position": 10 }, { "token": "岁", "start_offset": 8, "end_offset": 9, "type": "COUNT", "position": 11 } ] }
ik分词器对英文的分词:
POST http://192.168.197.100:9200/_analyze { "text":"the People's Republic of China", "analyzer":"ik_smart" }
结果如下:会将不重要的词去掉,但standard分词器会保留(英语水平已经退化到a an the都不知道是属于什么类型的词了,身为中国人,这个不能骄傲)
{ "tokens": [ { "token": "people", "start_offset": 4, "end_offset": 10, "type": "ENGLISH", "position": 0 }, { "token": "s", "start_offset": 11, "end_offset": 12, "type": "ENGLISH", "position": 1 }, { "token": "republic", "start_offset": 13, "end_offset": 21, "type": "ENGLISH", "position": 2 }, { "token": "china", "start_offset": 25, "end_offset": 30, "type": "ENGLISH", "position": 3 } ] }
五、添加文档
可以任意添加字段
//1是“_id”的值,唯一的,也可以随机生成 POST http://192.168.197.100:9200/student/_doc/1 { "id":1, "name":"tom", "age":20, "gender":"male", "grade":"7", "class":"1" }
六、更新文档
POST http://192.168.197.100:9200/student/_doc/1/_update { "doc":{ "name":"jack" } }
七、删除文档
//1是“_id”的值 DELETE http://192.168.197.100:9200/student/_doc/1
上述就是简略的对es进行索引创建,修改,删除,文档添加,删除,修改等操作,为避免篇幅太长,文档查询操作将在下篇进行更新。