ES入门

ES入门概念简介

索引库 index
type
文档 document
字段 filed

创建索引库

创建一个名为“index_test”的索引库：

　　PUT/index_test

{
	"settings":{
		"number_of_shards":1,		//索引库分片数量
		"number_of_replicas":0		//每个分片的副本数，关于分片、集群等在后文详细介绍
	}
}

创建mapping关系

POST /index_name/type_name/_mapping?include_type_name=true=true

7.0之后的版本不支持type所以需要增加此参数 include_type_name=true

　　http://localhost:9200/index_test/doc/_mapping

{
	"properties":{
		"name":{
			"type":"text"
		},
		"age":{
			"type":"integer"
		},
		"description":{
			"type":"text"
		}
	}
}
添加文档

PUT /index/type/id

如果不指定id,es会自动创建id

这里不会做index校验,也就是说如果你写错了index,也会添加成功需要注意

http://localhost:9200/index_test/doc/

{
    "name" : "许文祥",
    "description" : "666666666666666666666666666666666666666666666666",
    "age" : 27
}

IK中文分词器
ES默认情况下是不支持中文分词的，也就是说对于添加的中文数据，ES将会把每个字当做一个term（词项），这不利于中文检索。

es6之前方法是GET，如下：

http://localhost:9200/_analyze?analyzer=ik_smart&text=手机充值

es6之后可以使用GET或者POST方法，但是分词器和查询词需要以json的形式写在body里。

测试ES默认情况下对中文分词的结果：
POST /_analyze
http://localhost:9200/_analyze

analyzer为分词类型,可以不写默认:standard

{
  "analyzer": "standard",
  "text": "许文祥sfsafsf"
}

{
    "tokens": [
        {
            "token": "许",
            "start_offset": 0,
            "end_offset": 1,
            "type": "<IDEOGRAPHIC>",
            "position": 0
        },
        {
            "token": "文",
            "start_offset": 1,
            "end_offset": 2,
            "type": "<IDEOGRAPHIC>",
            "position": 1
        },
        {
            "token": "祥",
            "start_offset": 2,
            "end_offset": 3,
            "type": "<IDEOGRAPHIC>",
            "position": 2
        },
        {
            "token": "sfsafsf",
            "start_offset": 3,
            "end_offset": 10,
            "type": "<ALPHANUM>",
            "position": 3
        }
    ]
}

这里可以看到中文被分成了一个字一个字
下载ik https://www.cnblogs.com/heshun/articles/10658909.html并解压到ES的plugins文件夹下
目录下，并将解压后的目录改名为 ik，重启ES，该插件即会被自动加载。

这里需要注意一个问题,ik的配置文件里会配指定的es版本,如果和你的es版本不一致,启动es的时候会报错

启动es的时候会一闪而过,只能去log里面取看日志
修改这个version=6.2.3之后就能启动了

{
	"text":"中华人民共和国",
	"analyzer":"ik_max_word"	//设置分词器为ik分词器，否则还是会采用默认分词器，可选ik_max_word和ik_smart
}

{
    "tokens": [
        {
            "token": "中华人民共和国",
            "start_offset": 0,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "中华人民",
            "start_offset": 0,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "中华",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "华人",
            "start_offset": 1,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "人民共和国",
            "start_offset": 2,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 4
        },
        {
            "token": "人民",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 5
        },
        {
            "token": "共和国",
            "start_offset": 4,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 6
        },
        {
            "token": "共和",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 7
        },
        {
            "token": "国",
            "start_offset": 6,
            "end_offset": 7,
            "type": "CN_CHAR",
            "position": 8
        }
    ]
}

在输入"中华人民共和国"的时候可以看出ik中文分词的效果,但是在输入"许文祥sdfsf"的时候并不能看出效果原因是ik中文分词默认启用main.dic分词库,在config文件里可以看到.

自定义词库
自定义词库的编码格式需要是utf-8,否则会不启作用
我们在ES的目录下增加自定义的词库文件extra_my.dic

/plugins/ik/config

并添加两行行“许文祥文祥”（词典文件的格式是每一个词项占一行），并在ik的配置文件中引入该自定义词典：

/plugins/ik/config/IKAnalyzer.cfg.xml

这次再去测试中文分词就会根据自己词库进行分词

{
    "tokens": [
        {
            "token": "许文祥",
            "start_offset": 0,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "文祥",
            "start_offset": 1,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 1
        }
    ]
}

映射
新增字段
PUT /xc_course/doc/_mapping

http://localhost:9200/index_test/doc/_mapping

{
    "properties":{
        "create_time":{
            "type":"date"
        }
    }
}

查看映射
GET /index/doc/_mapping

http://localhost:9200/index_test/doc/_mapping

{
    "index_test": {
        "mappings": {
            "doc": {
                "properties": {
                    "age": {
                        "type": "integer"
                    },
                    "create_time": {
                        "type": "date"
                    },
                    "description": {
                        "type": "text"
                    },
                    "name": {
                        "type": "text"
                    }
                }
            }
        }
    }
}

已有的映射可以新增字段但不可以更改已有字段的定义！

如果一定要更改某字段的定义（包括类型、分词器、是否索引等），那么只有删除此索引库重新建立索引并定义好各字段，再迁入数据。因此在索引库创建时要考虑好映射的定义，因为仅可扩展字段但不可重新定义字段。

{
    "properties":{
        "create_time":{
            "type":"integer"
        }
    }
}
这边就会报错
{
    "error": {
        "root_cause": [
            {
                "type": "illegal_argument_exception",
                "reason": "mapper [create_time] of different type, current_type [date], merged_type [integer]"
            }
        ],
        "type": "illegal_argument_exception",
        "reason": "mapper [create_time] of different type, current_type [date], merged_type [integer]"
    },
    "status": 400
}

posted @ 2019-08-08 07:46 xwx唐宋元明清阅读(217) 评论(0) 编辑收藏举报

刷新页面返回顶部

蛋疼ing

ES入门

公告