Kibana入门与ES入门&ES整合IK中文分词器

　　kibana是node开发的。

1.下载安装

0.官网步骤如下

1. 下载

　　也是在官网下载kibana，例如我下载的是：(kibana是nodejs写的，依赖比较多，所以解压缩会比较慢)

2. 解压安装

解压之后修改config/kibana.yml中elasticsearch.hosts的地址，默认是http://localhost:9200，所以不用修改也可以。

3.启动

执行 bin/kibana.bat，启动后日志如下：

 log   [14:32:25.598] [info][server][Kibana][http] http server running at http://localhost:5601

4. 访问kibana首页

http://localhost:5601/app/kibana kibana会做一些默认的初始化工作。

常用功能如下：

Discover: 数据查看以及搜索

Visualize：数据可视化制作

Dashboard：仪表盘制作

Devtools：开发者工具

Management：配置

5.kibana配置详解

主要是config/kibana.yml,注意配置项有：

server.host/server.port 访问kibana的地址和端口。如果需要开启外网访问需要更改该地址。默认是localhost:5601

elasticsearch.hosts: ["http://localhost:9200"] kibana放我的es的地址，默认是本地的9200端口

补充：今天我在另一个机子启动kibana的时候报错如下：

"warning","migrations","pid":6181,"message":"Another Kibana instance appears to be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_index_1 and restarting Kibana.

解决办法：

(1)停止kibana

(2)查看kibana相关索引

curl http://localhost:9200/.kibana*

(3)删除索引再次查看

C:\Users\Administrator>curl -XDELETE http://localhost:9200/.kibana*
{"acknowledged":true}
C:\Users\Administrator>curl http://localhost:9200/.kibana*
{}

(4)再次启动kibana即可

补充：kibana简单使用

0.先到kibana的设置建立Index pttern

1.表格展示 Discover 面板

该面板选择对应的pattern之后，默认是展示数据。点击左边选择对应的列即可展示成表格，如下：

2. 建立饼图-Visualize面板

(1)选择pie饼图，之后选择相应的index pattern

(2)例如选择查看每个thread对应的文档数量，相当于按thread分组后查询总数(参数设置好之后点击save保存即可)

3. kibana 查看集群状态与索引状态

到索引所在的kibana，然后点击：Management -》 Stack Monitoring, 可以查看集群节点的node 信息以及索引信息(可以看到集群的状态和索引的分片信息)。

# 查看节点以及占用磁盘信息
GET _cat/allocation?v

# 查看索引以及分片信息
GET _cat/shards?v
GET _cat/shards?h=index,shard,prirep,state,unassigned.reason | grep UNASSIGNED

# 查看有问题的分片以及原因
GET _cluster/allocation/explain?pretty

# 重新尝试路由分配
POST /_cluster/reroute?retry_failed=true

2.ElasticSearch术语介绍与kibana入门

　　Elasticsearch是RestFul风格的api，通过http的请求形式发送请求，对Elasticsearch进行操作。
查询，请求方式应该是get。
删除，请求方式应该是delete。
添加，请求方式应该是put/post。
修改，请求方式应该是put/post。
　　RESTFul接口url的格式:http://ip:port/<index>/<type>/<[id]>。其中index、type是必须提供的。id是可以选择的，不提供es会自动生成，index、type将信息进行分层，利于管理。

1.Elastic术语介绍

Document：文档数据，就是我们存在es中的一条数据

Index：索引。可以理解为mysql中的一个DB，一个数据库。所有的document都是存在一个具体的index中。

Type:Index下的数据类型。可以理解为mysql的一个表。 ES默认的是_doc。目前的是一个Index一个Type。(今天看公司用的应该是5.x版本的，一个index是支持多个type的)

Field：字段，文档的属性。可以理解为mysql表中的列。

Query DSL：ES查询语法

2.ES中进行CRUD

这里我们使用Kibana的Devtools工具进行操作。

1.创建一个文档

POST /accounts/person/1
{
  "name": "zhi",
  "lastName": "qiao",
  "job": "enginee"
}

返回结果如下：

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "accounts",
  "_type" : "person",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

　　accounts就是索引，类型是person，插入的id是1，版本号是1.

2.读取文档

GET accounts/person/1

返回的结果如下：

#! Deprecation: [types removal] Specifying types in document get requests is deprecated, use the /{index}/_doc/{id} endpoint instead.
{
  "_index" : "accounts",
  "_type" : "person",
  "_id" : "1",
  "_version" : 2,
  "_seq_no" : 1,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "zhi",
    "lastName" : "qiao",
    "job" : "enginee"
  }
}

3.更新文档：(将上面新建文档的job字段更新一下)

POST /accounts/person/1/_update 
{
  "doc": {
     "job": "software enginee"
  }
}

返回结果：

#! Deprecation: [types removal] Specifying types in document update requests is deprecated, use the endpoint /{index}/_update/{id} instead.
{
  "_index" : "accounts",
  "_type" : "person",
  "_id" : "1",
  "_version" : 3,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 1
}

4.再次查看文档：

GET accounts/person/1

结果：

#! Deprecation: [types removal] Specifying types in document get requests is deprecated, use the /{index}/_doc/{id} endpoint instead.
{
  "_index" : "accounts",
  "_type" : "person",
  "_id" : "1",
  "_version" : 3,
  "_seq_no" : 2,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "zhi",
    "lastName" : "qiao",
    "job" : "software enginee"
  }
}

5.删除文档

DELETE accounts/person/1

结果：

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the /{index}/_doc/{id} endpoint instead.
{
  "_index" : "accounts",
  "_type" : "person",
  "_id" : "1",
  "_version" : 4,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 3,
  "_primary_term" : 1
}

6.再次查看

GET accounts/person/1

结果：

#! Deprecation: [types removal] Specifying types in document get requests is deprecated, use the /{index}/_doc/{id} endpoint instead.
{
  "_index" : "accounts",
  "_type" : "person",
  "_id" : "1",
  "found" : false
}

3.Elastic Query

首先准备两条数据：

POST /accounts/person/1
{
  "name": "zhi",
  "lastName": "qiao",
  "job": "enginee"
}

POST /accounts/person/2
{
  "name": "zhi2",
  "lastName": "qiao2",
  "job": "student"
}

1.Query string：按关键字查询

GET accounts/person/_search?q=student

结果：

#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 1557,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0925692,
    "hits" : [
      {
        "_index" : "accounts",
        "_type" : "person",
        "_id" : "2",
        "_score" : 1.0925692,
        "_source" : {
          "name" : "zhi2",
          "lastName" : "qiao2",
          "job" : "student"
        }
      }
    ]
  }
}

查询不存在的关键字：

GET accounts/person/_search?q=teacher

返回结果：

#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

2.Query DSL：以JSON形式拼接查询语言。以httpbody发送请求

GET accounts/person/_search
{
  "query": {
    "term": {
      "job": {
        "value": "student"
      }
    }
  }
}

结果：

#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.6931471,
    "hits" : [
      {
        "_index" : "accounts",
        "_type" : "person",
        "_id" : "2",
        "_score" : 0.6931471,
        "_source" : {
          "name" : "zhi2",
          "lastName" : "qiao2",
          "job" : "student"
        }
      }
    ]
  }
}

QueryDSL学习地址：https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

3.ES整合IK中文分词器

　　Elasticsearch中，内置了很多分词器（analyzers），有standard （标准分词器）、english （英文分词）和chinese （中文分词）。其中standard 是一个一个词（汉字）切分，所以适用范围广，但是精准度低；english 对英文更加智能，可以识别单数负数，大小写，过滤stopwords（例如“the”这个词）等；chinese 效果很差。

1. 比如使用默认分词器进行分词查看

POST /_analyze
{
  "text": "我是一个程序员， I am CXY"
}

结果：

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "一",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "个",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "程",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    },
    {
      "token" : "序",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    },
    {
      "token" : "员",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "<IDEOGRAPHIC>",
      "position" : 6
    },
    {
      "token" : "i",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "<ALPHANUM>",
      "position" : 7
    },
    {
      "token" : "am",
      "start_offset" : 11,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 8
    },
    {
      "token" : "cxy",
      "start_offset" : 14,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 9
    }
  ]
}

2. 整合IK中文分词器

中文分词器插件：https://github.com/medcl/elasticsearch-analysis-ik

1. 需要下载对应版本的插件。

2.将下载的zip包解压至ES_HOME/plugins/ik(ik目录没有的话自己新建一个)

3.测试分词

(1) ik_smart分析器

POST /_analyze
{
  "analyzer":"ik_smart",
  "text": "我是一个程序员"
}

结果：

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "一个",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "程序员",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 3
    }
  ]
}

(2)ik_max_word 分析

POST /_analyze
{
  "analyzer":"ik_max_word",
  "text": "我是一个程序员"
}

结果：

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "一个",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "一",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "TYPE_CNUM",
      "position" : 3
    },
    {
      "token" : "个",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "COUNT",
      "position" : 4
    },
    {
      "token" : "程序员",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "程序",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "员",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_CHAR",
      "position" : 7
    }
  ]
}

注意：

　　ik_max_word: 会将文本做最细粒度的拆分，比如会将“我是一个程序员”拆分为“我，是，一个，一，个，程序员，程序，员”，会穷尽各种可能的组合；
　　ik_smart: 会做最粗粒度的拆分，比如会将“我是一个程序员”拆分为“我，是，一个，程序员”
　　ik_max_word更多的用在做索引的时候，但是在搜索的时候，对于用户所输入的query(查询)词，我们可能更希望得比较准确的结果，例如，我们搜索“花果山”的时候，更希望是作为一个词进行查询，而不是切分为"花"，“果”，“山”三个词进行结果的搜索，因此ik_smart更加常用语对于输入词的分析。

4.在创建mapping时，设置IK分词器，设置analyzer和search_analyzer

PUT /news
{
    "settings": {
        "number_of_shards": 3,
        "number_of_replicas": 2
    },
    "mappings": {
        "properties": {
            "id": {
                "type": "long"
            },
            "title": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_smart"
            },
            "content": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_smart"
            },
            "description": {
                "type": "double"
            }

        }
    }
}

结果：

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "news"
}

查看字段映射如下：

GET /news/_mapping?pretty=true

结果：

{
  "news" : {
    "mappings" : {
      "properties" : {
        "content" : {
          "type" : "text",
          "analyzer" : "ik_max_word",
          "search_analyzer" : "ik_smart"
        },
        "description" : {
          "type" : "double"
        },
        "id" : {
          "type" : "long"
        },
        "title" : {
          "type" : "text",
          "analyzer" : "ik_max_word",
          "search_analyzer" : "ik_smart"
        }
      }
    }
  }
}

补充：ES可视化界面elasticsearch-head

参考：https://github.com/mobz/elasticsearch-hea

Visualize

posted @ 2020-08-05 23:16 QiaoZhi 阅读(1699) 评论(0) 编辑收藏举报

刷新页面返回顶部

Qiao_Zhi

有远大抱负的人不可忽略眼前的工作!!!