es - 杨杨09265 - 博客园

es

POST /my_store/products/_bulk
向索引中写入数据
{ "index": { "_id": 1 }}
{ "price" : 10, "productID" : "XHDK-A-1293-#fJ3" }
{ "index": { "_id": 2 }}
{ "price" : 20, "productID" : "KDKE-B-9947-#kL5" }
{ "index": { "_id": 3 }}
{ "price" : 30, "productID" : "JODL-X-1937-#pV7" }
{ "index": { "_id": 4 }}
{ "price" : 30, "productID" : "QQPX-R-3956-#aD8" }

设置索引字段
PUT /my_store
{
"mappings" : {
"products" : {
"properties" : {
"productID" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}

}
"index" : "not_analyzed" 表示该字段不进行分词

精确值查找 term
GET /my_store/products/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"price" : 20
}
}
}
}
}
等价于
SELECT document
FROM products
WHERE price = 20

GET /my_store/products/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"productID" : "XHDK-A-1293-#fJ3"
}
}
}
}
}

组合过滤

{
"bool" : {
"must" : [],
"should" : [],
"must_not" : [],
}
}

GET /my_store/products/_search
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"should" : [
{ "term" : {"price" : 20}},
{ "term" : {"productID" : "XHDK-A-1293-#fJ3"}}
],
"must_not" : {
"term" : {"price" : 30}
}
}
}
}
}
}

查找多个值

GET /my_store/products/_search
{
"query" : {
"constant_score" : {
"filter" : {
"terms" : {
"price" : [20, 30]
}
}
}
}
}

范围查找

GET /my_store/products/_search
{
"query" : {
"constant_score" : {
"filter" : {
"range" : {
"price" : {
"gte" : 20,
"lt" : 40
}
}
}
}
}
}

is null 和 is not null
is not null
GET /my_index/posts/_search
{
"query" : {
"constant_score" : {
"filter" : {
"exists" : { "field" : "tags" }
}
}
}
}
is null
GET /my_index/posts/_search
{
"query" : {
"constant_score" : {
"filter": {
"missing" : { "field" : "tags" }
}
}
}
}

dis_max 查询

{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "Brown fox" }},
{ "match": { "body": "Brown fox" }}
]
}
}
}

查询title或body中匹配度高的document
{
"multi_match": {
"query": "Quick brown fox",
"type": "best_fields",
"fields": [ "title", "body" ],
"tie_breaker": 0.3,
"minimum_should_match": "30%"
}
}

{
"query": {
"multi_match": {
"query": "Poland Street W1V",
"type": "most_fields",
"fields": [ "street", "city", "country", "postcode" ]
}
}
}
等价于
{
"query": {
"bool": {
"should": [
{ "match": { "street": "Poland Street W1V" }},
{ "match": { "city": "Poland Street W1V" }},
{ "match": { "country": "Poland Street W1V" }},
{ "match": { "postcode": "Poland Street W1V" }}
]
}
}
}

GET /_validate/query?explain
{
"query": {
"multi_match": {
"query": "peter smith",
"type": "cross_fields",
"operator": "and",
"fields": [ "first_name", "last_name" ]
}
}
}

peter和smith必须同时出现在"first_name", "last_name"中

短语查询
GET /my_index/my_type/_search
{
"query": {
"match_phrase": {
"title": "quick brown fox"
}
}
}
quick 、 brown 和 fox 需要全部出现在域中。
brown 的位置应该比 quick 的位置大 1 。
fox 的位置应该比 quick 的位置大 2 。

GET /my_index/my_type/_search
{
"query": {
"match_phrase": {
"title": {
"query": "quick fox",
"slop": 1
}
}
}
}
slop 参数告诉 match_phrase 查询词条相隔多远时仍然能将文档视为匹配。相隔多远的意思是为了让查询和文档匹配你需要移动词条多少次？

PUT /my_index/_mapping/groups
{
"properties": {
"names": {
"type": "string",
"position_increment_gap": 100
}
}
}position_increment_gap 设置告诉 Elasticsearch 应该为数组中每个新元素增加当前词条 position 的指定值。所以现在当我们再索引 names 数组时，

为了找到所有以 W1 开始的邮编，可以使用简单的 prefix 查询：
GET /my_index/address/_search
{
"query": {
"prefix": {
"postcode": "W1"
}
}
}

这个查询会匹配包含 W1F 7HW 和 W2F 8HW 的文档：

GET /my_index/address/_search
{
"query": {
"wildcard": {
"postcode": "W?F*HW"
}
}
}

GET /my_index/address/_search
{
"query": {
"regexp": {
"postcode": "W[0-9].+"
}
}
}

score(q,d) =
queryNorm(q)
· coord(q,d)
· ∑ (
tf(t in d)
· idf(t)²
· t.getBoost()
· norm(t,d)
) (t in q)

score(q,d) 是文档 d 与查询 q 的相关度评分。

queryNorm(q) 是查询归一化因子（新）。

coord(q,d) 是协调因子（新）。

查询 q 中每个词 t 对于文档 d 的权重和。

tf(t in d) 是词 t 在文档 d 中的词频。

idf(t) 是词 t 的逆向文档频率。

t.getBoost() 是查询中使用的 boost（新）。

norm(t,d) 是字段长度归一值，与索引时字段层 boost （如果存在）的和（新）

GET /_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "quick brown fox",
"boost": 2
}
}
},
{
"match": {
"content": "quick brown fox"
}
}
]
}
}
}
title 查询语句的重要性是 content 查询的 2 倍，因为它的权重提升值为 2 。

没有设置 boost 的查询语句的值为 1 。

桶
桶简单来说就是满足特定条件的文档的集合

GET /cars/transactions/_search
{
"size" : 0,
"aggs":{
"price":{
"histogram":{
"field": "price",
"interval": 20000
},
"aggs":{
"revenue": {
"sum": {
"field" : "price"
}
}
}
}
}
}
histogram 桶要求两个参数：一个数值字段以及一个定义桶大小间隔。

sum 度量嵌套在每个售价区间内，用来显示每个区间内的总收入。

GET /cars/transactions/_search
{
"size" : 0,
"query": {
"match": {
"make": "ford"
}
},
"post_filter": {
"term" : {
"color" : "green"
}
},
"aggs" : {
"all_colors": {
"terms" : { "field" : "color" }
}
}
}

post_filter 元素是 top-level 而且仅对命中结果进行过滤。

GET /cars/transactions/_search
{
"size" : 0,
"aggs" : {
"colors" : {
"terms" : {
"field" : "color",
"order": {
"_count" : "asc"
}
}
}
}
}
用关键字 _count ，我们可以按 doc_count 值的升序排序

_count
按文档数排序。对 terms 、 histogram 、 date_histogram 有效。
_term
按词项的字符串值的字母顺序排序。只在 terms 内使用。
_key
按每个桶的键值数值排序（理论上与 _term 类似）。只在 histogram 和 date_histogram 内使用。

添加索引
PUT /blogs
{
"settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 1
}
}

文档元数据
一个文档不仅仅包含它的数据，也包含元数据 —— 有关文档的信息。三个必须的元数据元素如下：

_index
文档在哪存放
_type
文档表示的对象类别
_id
文档唯一标识

添加数据
PUT /{index}/{type}/{id}
{
"field": "value",
...
}

获取文档
GET /website/blog/123?pretty

返回文档的一部分
默认情况下， GET 请求会返回整个文档，这个文档正如存储在 _source 字段中的一样。但是也许你只对其中的 title 字段感兴趣。单个字段能用 _source 参数请求得到，多个字段也能使用逗号分隔的列表来指定。

GET /website/blog/123?_source=title,text

更新整个文档
在 Elasticsearch 中文档是不可改变的，不能修改它们。相反，如果想要更新现有的文档，需要重建索引或者进行替换，我们可以使用相同的 index API 进行实现，在索引文档中已经进行了讨论。

PUT /website/blog/123
{
"title": "My first blog entry",
"text": "I am starting to get the hang of this...",
"date": "2014/01/02"
}

映射
为了能够将时间域视为时间，数字域视为数字，字符串域视为全文或精确值字符串， Elasticsearch 需要知道每个域中数据的类型。这个信息包含在映射中。

如数据输入和输出中解释的，索引中每个文档都有类型。每种类型都有它自己的映射，或者模式定义。映射定义了类型中的域，每个域的数据类型，以及Elasticsearch如何处理这些域。映射也用于配置与类型有关的元数据。

我们会在类型和映射详细讨论映射。本节，我们只讨论足够让你入门的内容。

核心简单域类型
Elasticsearch 支持如下简单域类型：

字符串: string
整数 : byte, short, integer, long
浮点数: float, double
布尔型: boolean
日期: date
当你索引一个包含新域的文档—之前未曾出现-- Elasticsearch 会使用动态映射，通过JSON中基本数据类型，尝试猜测域类型，使用如下规则：
{
"gb": {
"mappings": {
"tweet": {
"properties": {
"date": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"name": {
"type": "string"
},
"tweet": {
"type": "string"
},
"user_id": {
"type": "long"
}
}
}
}
}
}

排序
按字段排序
GET /_search
{
"query" : {
"bool" : {
"filter" : { "term" : { "user_id" : 1 }}
}
},
"sort": { "date": { "order": "desc" }}
}

什么是相关性?
我们曾经讲过，默认情况下，返回结果是按相关性倒序排列的。但是什么是相关性？相关性如何计算？

每个文档都有相关性评分，用一个正浮点数字段 _score 来表示。 _score 的评分越高，相关性越高。

查询语句会为每个文档生成一个 _score 字段。评分的计算方式取决于查询类型不同的查询语句用于不同的目的： fuzzy 查询会计算与关键词的拼写相似程度，terms 查询会计算找到的内容与关键词组成部分匹配的百分比，但是通常我们说的 relevance 是我们用来计算全文本字段的值相对于全文本检索词相似程度的算法。

Elasticsearch 的相似度算法被定义为检索词频率/反向文档频率， TF/IDF ，包括以下内容：

检索词频率
检索词在该字段出现的频率？出现频率越高，相关性也越高。字段中出现过 5 次要比只出现过 1 次的相关性高。
反向文档频率
每个检索词在索引中出现的频率？频率越高，相关性越低。检索词出现在多数文档中会比出现在少数文档中的权重更低。
字段长度准则
字段的长度是多少？长度越长，相关性越低。检索词出现在一个短的 title 要比同样的词出现在一个长的 content 字段权重更大。
单个查询可以联合使用 TF/IDF 和其他方式，比如短语查询中检索词的距离或模糊查询里的检索词相似度。

相关性并不只是全文本检索的专利。也适用于 yes|no 的子句，匹配的子句越多，相关性评分越高。

如果多条查询子句被合并为一条复合查询语句，比如 bool 查询，则每个查询子句计算得出的评分会被合并到总的相关性评分中。

动态映射
当 Elasticsearch 遇到文档中以前未遇到的字段，它用 dynamic mapping 来确定字段的数据类型并自动把新的字段添加到类型映射。

有时这是想要的行为有时又不希望这样。通常没有人知道以后会有什么新字段加到文档，但是又希望这些字段被自动的索引。也许你只想忽略它们。如果Elasticsearch是作为重要的数据存储，可能就会期望遇到新字段就会抛出异常，这样能及时发现问题。

幸运的是可以用 dynamic 配置来控制这种行为，可接受的选项如下：

true
动态添加新的字段—缺省
false
忽略新的字段
strict
如果遇到新字段抛出异常

PUT /my_index
{
"mappings": {
"my_type": {
"dynamic": "strict",
"properties": {
"title": { "type": "string"},
"stash": {
"type": "object",
"dynamic": true
}
}
}
}
}

posted on 2023-04-02 02:09 杨杨09265 阅读(22) 评论(0) 编辑收藏举报

刷新页面返回顶部

yangyang12138

导航

公告

es