ElasticSearch 2.x 问题汇总

1、JDK版本：

需要java 8 update 20 for later, or java 7 update 55 or later version.否则有bug.,甚至导致数据丢失

2、2.x里，linux只能非root账号才能启动，否则报错

Exception in thread "main" java.lang.RuntimeException: don't run elasticsearch as root.

3、2.x开始，如果需要通过ip进行访问es集群，必须修改elasticsearch.yml中的network.host节点。es 1.0版本的默认配置是 "0.0.0.0"，所以不绑定ip也可访问，但是es 2.2版本如果采用默认配置，只能通过 localhost 和 "127.0.0.1"进行访问。network.host节点可以配置多个值，如下：

network.host: [_local_, 192.168.87.77]

4、 es1.0 版本的集群的discovery默认采用的是组播（multicast）模式，但是在es2.2版本中已去除该模式，虽然提供了multicast的插件，但是官方说不建议采用multicast的模式，故我们只能采用单播(unicast)模式。同时我们还必须显示指明 “network.publish_host”节点的内容，否则该节点会动态绑定ip，导致你设置的unicast地址不正确，以下是我的一些配置：

network.publish_host: 192.168.87.76
discovery.zen.ping.unicast.hosts: ["192.168.87.77","192.168.87.87.78"]

关于节点“discovery.zen.ping.unicast.hosts”的值可以是单值也可以是多值，在不同的服务器之间部署es节点可以不指明ip端口，但是在同一服务器中部署，ip最好是加上检测的端口号，否则可能检测不到要加入的节点，如下配置：

network.publish_host: 192.168.87.76
discovery.zen.ping.unicast.hosts: ["192.168.87.77:9300"]

5、在同一台服务器上部署多个节点，最好显示指明节点间通信的端口号，这样有利于第3点进行节点发现的ip+端口号的绑定操作，修改如下节点：

transport.tcp.port: 9310

6、关于中文分词ik与mapping

由于官方对中文的分词是每个汉字都分词，所以对我们开说肯定是没用的。因此需要安装第三方插件，这里我们选择ik。

遇到的问题是，安装完ik,ik测试语句分词也成功了，但是对文档里的内容，分词却不成功，要么是报错，要么是没使用ik分词。

后来在官方文档找到这么一句话，

the query string needs to be passed through the same (or a similar) analyzer so that the terms that it tries to find are in the same format as those that exist in the index.

综合上下文，大致意思是：一个字符串的分词类型由mapping决定，mapping创建的时候使用什么分词类型，那么查询搜索的时候也是什么分词类型，而且一旦创建就无法修改。

由于我的数据是前期就导入好的，所以mapping里没有设置对字符串使用ik分词，只要mapping设置的时候设置成ik即可。

7、mapping的自定义设置

mapping如果我们没有指定，es也会在创建field的时候根据类型自动匹配相应的类型，而分词则也是默认的分词类型，所以我们需要自定义mapping。

但是我在创建自定义mapping遇到了很多问题，纠结近一星期，最后还是在官方文档中找到了正确方法。

首先我是先创建index，然后在设置mapping，但是不知道什么原因，mapping创建成功了，分词却没有效果，“新村”还是被拆分成了"新"、"村"。

创建mapping:

curl -XPUT http://localhost:9200/mbq
curl -XPOST http://localhost:9200/mbq/mbq/_mapping -d'
{
  "mbq": {
    "_all": {
      "analyzer": "ik_max_word",
      "search_analyzer": "ik_max_word",
      "term_vector": "no",
      "store": "false"
    },
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "ik_max_word",
        "search_analyzer": "ik_max_word",
        "include_in_all": "true",
        "boost": 8
      }
    }
  }
'

结果如下:

{

    "_index": "mbq",
    "_type": "mbq",
    "_id": "_mapping -d",
    "_version": 1,
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "created": true

}

查询’新村‘，并高亮显示。

curl -XPOST http://localhost:9200/mbq/mbq/_search?pretty -d '
{
  "query": {
    "match": {
      "content": "新村"
    }
  },
  "highlight": {
    "pre_tags": [
      "<tag1>",
      "<tag2>"
    ],
    "post_tags": [
      "</tag1>",
      "</tag2>"
    ],
    "fields": {
      "content": {}
    }
  }
}
'

而查询结果：

"hits": [
    {
        "_index": "mbq",
        "_type": "mbq",
        "_id": "1",
        "_score": 0.3200825,
        "_source": {
            "std_id": "1",
            "content": "丽水市区 万象街道 丽南行政村 丽南新村 186号3楼",
            "door": "186号3楼"
        },
        "highlight": {
            "content": [
                "丽水市区 万象街道 丽南行政<tag1>村</tag1> 丽南<tag1>新</tag1><tag1>村</tag1> 186号3楼"
            ]
        }
    }

究其原因，我发现使用先创建index，在设置mapping，mapping的设置被当成一条数据加入了index里。但是查询mapping内容，发现分词没设置成功，但是别的都设置成功了。新手上路，具体原因不得而知。

所以我采用了创建index时同时创建mapping。

curl -XPUT http://localhost:9200/jason '
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1
  },
  "mappings": {
    "alpha": {
      "properties": {
        "first_name": {
          "type": "string"
        },
        "last_name": {
          "type": "string"
        },
        "age": {
          "type": "integer"
        },
        "about": {
          "type": "string",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word",
          "include_in_all": "true",
          "store": "true",
          "boost": 8
        },
        "detail": {
          "type": "string",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        },
        "join_time": {
          "type": "date",
          "format": "dateOptionalTime",
          "index": "not_analyzed"
        }
      }
    }
  }
}
'

最后我回过头去尝试，发现是因为加了个 -d的原因，使得设置mapping，变成了插入数据。而mapping -d被看成了一个整体作为id。

8、拼音分词插件 analyzer-lc-pinyin

对于这个插件，我是郁闷的要死，纠结了一星期多，在一次无意尝试中终于成功了，

原因还是在于es1.x和es2.x mapping参数的改变。

es1.x mapping参数有一个是 index-analyzer，但是es2.x没了（是不是改成analyzer我不清楚，官方文档没找到）。

所以在配置的时候，需要有所改变，但是这个插件的信息太少，有的也是对1.x的支持教程。

下面是正确的创建方式：

curl -XPUT http:localhost:9200/addr
curl -XPOST http:localhost:9200/addr/std/_mapping '
{
  "std": {
    "_all": {
      "analyzer": "lc_index",
      "search_analyzer": "lc_search",
      "term_vector": "no",
      "store": "false"
    },
    "properties": {
      "detail_name": {
        "type": "string",
        "store": "true",
        "term_vector": "with_positions_offsets",
        "analyzer": "lc_index",
        "search_analyzer": "lc_search",
        "include_in_all": "true",
        "boost": 8
      }
    }
  }
}
'

我遇到的问题是中文支持，拼音不支持，原因是analyzer 我设置成了lc_search 。

posted @ 2018-03-05 16:20 刍荛采葑菲阅读(242) 评论(0) 编辑收藏举报

刷新页面返回顶部

刍荛采葑菲

ElasticSearch 2.x 问题汇总

文章源自微信公众号【刍荛采葑菲】，转载请注明。

公告