Elasticsearch专题精讲—— REST APIs —— Document APIs —— Update By Query API

REST APIs —— Document APIs ——  Update By Query API

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#docs-update-by-query

Updates documents that match the specified query. If no query is specified, performs an update on every document in the data stream or index without modifying the source, which is useful for picking up mapping changes.

更新 API(_update) 允许根据提供的脚本更新文件。该操作从索引中获取文档,运行脚本(使用可选的脚本语言和参数),并对结果进行索引(还允许删除或忽略该操作)。它使用版本控制来确保在 Get 和 Reindex 操作期间没有发生任何更新。

        POST /< target>/_update_by_query

        POST/< target>/_update_by_query 

1、Request(请求)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#docs-update-by-query-api-request

        POST /< index>/_update/<_id>

    POST /< index>/_update/<_id>

2、Prerequisites(先决条件)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#docs-update-by-query-api-prereqs

If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or alias:

如果启用了 Elasticsearch 安全特性,您必须对目标数据流、索引或别名拥有以下索引特权:

    • read
    • index or write

3、Description(描述)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#docs-update-by-query-api-desc

You can specify the query criteria in the request URI or the request body using the same syntax as the Search API.

您可以使用与 SearchAPI 相同的语法在请求 URI 或请求体中指定查询条件。

When you submit an update by query request, Elasticsearch gets a snapshot of the data stream or index when it begins processing the request and updates matching documents using internal versioning. When the versions match, the document is updated and the version number is incremented. If a document changes between the time that the snapshot is taken and the update operation is processed, it results in a version conflict and the operation fails. You can opt to count version conflicts instead of halting and returning by setting conflicts to proceed. Note that if you opt to count version conflicts the operation could attempt to update more documents from the source than max_docs until it has successfully updated max_docs documents, or it has gone through every document in the source query.

当您提交更新请求时,在 Elasticsearch 开始处理请求时,它会获取数据流或索引的快照,并使用内部版本控制更新匹配的文档。当版本匹配时,文档将被更新并且版本号会递增。如果文档在获取快照和处理更新操作之间发生更改,则会发生版本冲突并且操作失败。您可以选择计算版本冲突而不是停止和返回,通过将 conflicts 设置为 proceed。请注意,如果您选择计算版本冲突,操作可能会尝试从源中更新更多的文档,直到成功更新了 max_docs 个文档,或者它已经遍历了源查询中的每个文档。

Documents with a version equal to 0 cannot be updated using update by query because internal versioning does not support 0 as a valid version number.

版本等于0的文档不能通过查询更新,因为内部版本控制不支持0作为有效的版本号。

While processing an update by query request, Elasticsearch performs multiple search requests sequentially to find all of the matching documents. A bulk update request is performed for each batch of matching documents. Any query or update failures cause the update by query request to fail and the failures are shown in the response. Any update requests that completed successfully still stick, they are not rolled back.

在执行更新查询请求时,Elasticsearch 会依次执行多个搜索请求来查找所有匹配的文档。对于每批匹配的文档,都会执行批量更新请求。任何查询或更新失败都会导致更新查询请求失败,并在响应中显示失败的原因。任何成功完成的更新请求都会固定下来,不会被回滚。

4、Refreshing shards(刷新分片)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#_refreshing_shards_2

Specifying the refresh parameter refreshes all shards once the request completes. This is different than the update API’s refresh parameter, which causes just the shard that received the request to be refreshed. Unlike the update API, it does not support wait_for.

指定 refresh 参数会在请求完成后刷新所有分片。这与更新 API 的 refresh 参数不同,更新 API 的 refresh 参数仅使接收请求的分片进行刷新。与更新 API 不同,它不支持 wait_for。

我理解意思是说: 在 Elasticsearch 中,update API 用于更新现有文档。它提供了具有原子性的部分更新操作,避免整个文档重新索引的开销。更新一个文档实际上是通过创建一个新版本来实现的,新版本替换旧版本,旧版本的 _source 和元数据被保留在一个已删除状态的隐藏字段中。只有搜索请求的时候才会真正删除它。 更新 API 包含有 refresh 参数,它接受 true 或 false 或 wait_for,用于指定何时刷新索引以使更新操作可见。当提供 true 参数时,它告诉 Elasticsearch 在执行更新操作之后立即刷新索引,以便立即查看更新后的结果。但这会造成一定的性能影响。另外一个选项是设置到 wait_for,这将导致更新 API 等待 Elasticsearch 去刷新它更新的记录,然后实际上是将更新应用于索引,但会因为处于等待中而被阻塞,直到可以被应用。如果不提供 refresh 参数,Elasticsearch 默认会在一秒钟内自动刷新索引。 需要注意的是,文档更新 API 的 refresh 参数只会在更新相关分片时触发刷新,而不像指定了 refresh 参数的请求那样全局更新所有分片。更新文档 API 不支持 wait_for 参数。

5、Running update by query asynchronously(异步运行查询更新)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#docs-update-by-query-task-api

If the request contains wait_for_completion=false, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task. Elasticsearch creates a record of this task as a document at .tasks/task/${taskId}.

如果请求包含 wait_for_completion=false,则 Elasticsearch 执行一些预检查,启动请求,并返回一个任务,您可以使用该任务来取消或获取任务的状态。Elasticsearch 会在 .tasks/task/${taskId} 处创建此任务的记录文档��

6、Waiting for active shards(等待活动的分片)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#_waiting_for_active_shards_2

wait_for_active_shards controls how many copies of a shard must be active before proceeding with the request. See Active shards for details. timeout controls how long each write request waits for unavailable shards to become available. Both work exactly the way they work in the Bulk API. Update by query uses scrolled searches, so you can also specify the scroll parameter to control how long it keeps the search context alive, for example ?scroll=10m. The default is 5 minutes.

wait_for_active_shards 参数控制在执行请求之前必须有多少个 shard 副本处于 active 状态。有关详细信息,请参见 Active shards。timeout 参数控制每个写入请求等待不可用 shards 变为可用的时间。两者的工作方式与 Bulk API 中的工作方式相同。Update by query 使用滚动搜索,因此您还可以指定 scroll 参数来控制它保持搜索上下文的时间,例如 ?scroll=10m。默认值为 5 分钟。

7、Throttling update requests(限制更新请求速率)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#_throttling_update_requests

To control the rate at which update by query issues batches of update operations, you can set requests_per_second to any positive decimal number. This pads each batch with a wait time to throttle the rate. Set requests_per_second to -1 to disable throttling.

要控制 update by query 以批处理形式发出更新操作的速率,可以将 requests_per_second 设置为任何正的小数。这会在每个批次中使用等待时间来限制速率。将 requests_per_second 设置为-1以禁用限制速率。

Throttling uses a wait time between batches so that the internal scroll requests can be given a timeout that takes the request padding into account. The padding time is the difference between the batch size divided by the requests_per_second and the time spent writing. By default the batch size is 1000, so if requests_per_second is set to 500:

限制速率使用批次之间的等待时间,以便可以给内部 scroll requests(滚动请求)设置一个考虑请求填充的超时时间。padding time(填充时间)是批量大小除以 requests_per_second 与写入所用的时间之间的差值。默认情况下,批处理大小为1000,因此如果将requests_per_second设置为500:

target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds

Since the batch is issued as a single _bulk request, large batch sizes cause Elasticsearch to create many requests and wait before starting the next set. This is "bursty" instead of "smooth".

由于批处理以单个 _bulk 请求的形式发出,因此大批量大小会导致 Elasticsearch 创建许多请求并在开始下一组之前等待。这是“激增”的而不是“平滑的”。

我理解意思是说: Update by Query API 允许我们通过查询语句对 Elasticsearch 中的文档进行更新操作,其使用方法非常灵活。然而,在实际使用中,由于 Elasticsearch 的性能限制,可能需要对 update by query 请求进行限制和控制。这时我们就可以使用 requests_per_second 这个参数,设置每秒发送更新请求的数量,从而控制更新操作的速率。该参数的值可以为任意正小数,例如 0.5、1、2.5 等。 在每个批次请求之间设置一个等待时间,以确保批处理的速率不超过 requests_per_second。设置 requests_per_second 为 -1 可以禁用限流。

限流通常使用批次之间的等待时间,以便内部滚屏请求可以考虑到请求填充的超时时间。填充时间是批处理大小除以 requests_per_second 减去写入时间的差值。默认情况下,批次大小为 1000,所以如果 requests_per_second 设置为 500,则填充时间为 1 秒钟。在实际使用时,需要根据硬件性能和网络带宽等因素自行计算并设置 requests_per_second 参数的值,以达到最佳的更新效率和可靠性。

总之,控制更新速率是我们使用 Elasticsearch Update by Query API 时需要注意的重要问题之一,适当地设置 requests_per_second 参数可以帮助我们更好地管理 Elasticsearch 集群中的更新请求,达到最优的性能和可靠性。

8、Slicing(切片)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#docs-update-by-query-slice

Update by query supports sliced scroll to parallelize the update process. This can improve efficiency and provide a convenient way to break the request down into smaller parts.

Update by query 支持 sliced scroll 来并行化更新过程。这可以提高效率并提供一种方便的方式,将请求分解成较小的部分。

Setting slices to auto chooses a reasonable number for most data streams and indices. If you’re slicing manually or otherwise tuning automatic slicing, keep in mind that:

将 slices 设置为 auto 可以为大多数数据流和索引选择一个合理的数量。如果您正在手动分片或以其他方式调整自动分片,请记住:

    • Query performance is most efficient when the number of slices is equal to the number of shards in the index or backing index. If that number is large (for example, 500), choose a lower number as too many slices hurts performance. Setting slices higher than the number of shards generally does not improve efficiency and adds overhead.

当片的数量等于索引或后备索引中的碎片数时,查询性能最高。如果这个数字很大(例如,500) ,选择一个较小的数字,因为太多的切片会影响性能。设置高于分片数的切片通常不会提高效率并增加开销。

    • Update performance scales linearly across available resources with the number of slices.

更新性能随着分片的数量线性扩展,可以利用集群中的所有可用资源。

Whether query or update performance dominates the runtime depends on the documents being reindexed and cluster resources.

查询或更新性能在运行时的优先级取决于被重新索引的文档和集群资源。

9、Examples(例子)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#docs-update-by-query-api-example

The simplest usage of _update_by_query just performs an update on every document in the data stream or index without changing the source. This is useful to pick up a new property or some other online mapping change.

_update_by_query 的最简单用法是在不改变源的情况下对数据流或索引中的每个文档执行更新。这对于添加新属性或进行其他在线映射更改非常有用。

To update selected documents, specify a query in the request body:

若要更新所选文档,请在请求正文中指定一个查询:

        curl -X POST "localhost:9200/my-index-000001/_update_by_query?conflicts=proceed&pretty" -H 'Content-Type: application/json' -d'
        {
          "query": { 
            "term": {
              "user.id": "kimchy"
            }
          }
        }'

Update documents in multiple data streams or indices:

在多个数据流或索引中更新文件:

curl -X POST "localhost:9200/my-index-000001,my-index-000002/_update_by_query?pretty"
    

Limit the update by query operation to shards that a particular routing value:

将 _update_by_query 操作限制为特定路由值的分片:

curl -X POST "localhost:9200/my-index-000001/_update_by_query?routing=1&pretty"
    

By default update by query uses scroll batches of 1000. You can change the batch size with the scroll_size parameter:

默认情况下,通过查询更新使用1000个滚动批处理。您可以使用捲 _size 参数更改批处理大小:

curl -X POST "localhost:9200/my-index-000001/_update_by_query?scroll_size=100&pretty"

Update a document using a unique attribute:

使用惟一属性更新文档:

curl -X POST "localhost:9200/my-index-000001/_update_by_query?pretty" -H 'Content-Type: application/json' -d'
        {
          "query": {
            "term": {
              "user.id": "kimchy"
            }
          },
          "max_docs": 1
        }'        

10、Update the document source(更新文档源)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#docs-update-by-query-api-source

Update by query supports scripts to update the document source. For example, the following request increments the count field for all documents with a user.id of kimchy in my-index-000001:

Update by query 支持使用脚本更新文档源。例如,以下请求会增加 my-index-000001 中所有 user.id 为 kimchy 的文档的 count 字段值:

curl -X POST "localhost:9200/my-index-000001/_update_by_query?pretty" -H 'Content-Type: application/json' -d'
        {
          "script": {
            "source": "ctx._source.count++",
            "lang": "painless"
          },
          "query": {
            "term": {
              "user.id": "kimchy"
            }
          }
        }'       
     

我理解意思是说: 这个 CURL 命令是用来在 Elasticsearch 中对文档执行批量更新操作的。下面是对每个参数和内容的解释:

  1. -X POST:指定 HTTP 请求方法为 POST,因为我们是要对文档进行更新操作。
  2. localhost:9200:指定 Elasticsearch 服务的主机名和端口号。
  3. my-index-000001:指定更新的索引名称。
  4. /_update_by_query:指定执行批量更新的 API,它会对索引中匹配条件的所有文档执行更新操作。
  5. pretty:指定响应结果以漂亮的格式输出,便于阅读。
  6. -H 'Content-Type: application/json':指定请求头中的内容类型为 JSON 格式。
  7. -d' {...}':指定请求体的 JSON 格式内容,即要执行的更新操作。具体来说:
  • "script": { "source": "ctx._source.count++", "lang": "painless" }:指定要执行的更新操作。它使用 Painless 脚本语言编写,实现了将指定字段的值自增 1 的功能。在这里,ctx._source 表示当前文档,count++ 表示将文档的 count 字段自增 1。
  • "query": { "term": { "user.id": "kimchy" } }:指定要更新的文档的查询条件。在这里,我们指定了要更新 user.id 字段值为 kimchy 的文档。term 查询是一个简单的精确匹配查询,它会匹配到所有 user.id 字段值精确等于 kimchy 的文档。

综上所述,这个 CURL 命令的作用是:在 Elasticsearch 中的 my-index-000001 索引中查找所有 user.id 字段值精确等于 kimchy 的文档,并将这些文档的 count 字段值加 1。需要注意的是,该命令可以批量地更新符合条件的所有文档。

Update by query only supports update, noop, and delete. Setting ctx.op to anything else is an error. Setting any other field in ctx is an error. This API only enables you to modify the source of matching documents, you cannot move them.

按查询更新只支持更新、 noop 和删除。将 ctx.op 设置为其他任何值都是错误的。在 ctx 中设置任何其他字段都是错误的。此 API 只允许您修改匹配文档的源,不能移动它们。

11、Update documents using an ingest pipeline(使用提取管道更新文档)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#docs-update-by-query-api-ingest-pipeline

我理解意思是说: "Update documents using an ingest pipeline" 这句话可以翻译为“使用提取管道更新文档”。在 Elasticsearch 中,"Ingest pipeline" 指的是处理从客户端或其他数据源中摄取到的文档的流程。使用摄取管道可以对文档进行一系列的处理操作,包括过滤、转换、提取字段、重命名等,而且可以在文档被索引之前对其进行修改。因此,通过使用摄取管道,可以实现对 Elasticsearch 中索引的文档进行定制化的修改和处理。

Update by query can use the Ingest pipelines feature by specifying a pipeline:

按查询更新可以通过指定管道使用 Inest 管道特性:

curl -X PUT "localhost:9200/_ingest/pipeline/set-foo?pretty" -H 'Content-Type: application/json' -d'
        {
          "description" : "sets foo",
          "processors" : [ {
              "set" : {
                "field": "foo",
                "value": "bar"
              }
          } ]
        }'
        curl -X POST "localhost:9200/my-index-000001/_update_by_query?pipeline=set-foo&pretty"     
    

12、Get the status of update by query operations(通过查询操作获取更新状态)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#docs-update-by-query-fetch-tasks

You can fetch the status of all running update by query requests with the Task API:

您可以使用 Task API 通过查询请求获取所有正在运行的更新的状态:

curl -X GET "localhost:9200/_tasks?detailed=true&actions=*byquery&pretty"

The responses looks like:

这些回答看起来像是:

{
        "nodes" : {
          "r1A2WoRbTwKZ516z6NEs5A" : {
            "name" : "r1A2WoR",
            "transport_address" : "127.0.0.1:9300",
            "host" : "127.0.0.1",
            "ip" : "127.0.0.1:9300",
            "attributes" : {
              "testattr" : "test",
              "portsfile" : "true"
            },
            "tasks" : {
              "r1A2WoRbTwKZ516z6NEs5A:36619" : {
                "node" : "r1A2WoRbTwKZ516z6NEs5A",
                "id" : 36619,
                "type" : "transport",
                "action" : "indices:data/write/update/byquery",
                "status" : {    
                  "total" : 6154,
                  "updated" : 3500,
                  "created" : 0,
                  "deleted" : 0,
                  "batches" : 4,
                  "version_conflicts" : 0,
                  "noops" : 0,
                  "retries": {
                    "bulk": 0,
                    "search": 0
                  },
                  "throttled_millis": 0
                },
                "description" : ""
              }
            }
          }
        }
      }

This object contains the actual status. It is just like the response JSON with the important addition of the total field. total is the total number of operations that the reindex expects to perform. You can estimate the progress by adding the updated, created, and deleted fields. The request will finish when their sum is equal to the total field.

这个对象包含了实际的状态信息。它与响应的 JSON 格式相似,但是增加了一个名为 total 的重要字段。total 表示在 reindex 操作中要执行的总数。您可以通过将 updated、created 和 deleted 字段相加来估计操作进度。当它们的总和等于 total 字段时,请求将完成。

With the task id you can look up the task directly. The following example retrieves information about task r1A2WoRbTwKZ516z6NEs5A:36619:

使用任务 id 可以直接查找任务:

curl -X GET "localhost:9200/_tasks/r1A2WoRbTwKZ516z6NEs5A:36619?pretty"

The advantage of this API is that it integrates with wait_for_completion=false to transparently return the status of completed tasks. If the task is completed and wait_for_completion=false was set on it, then it’ll come back with a results or an error field. The cost of this feature is the document that wait_for_completion=false creates at .tasks/task/${taskId}. It is up to you to delete that document.

这个 API 的优点是它可以与 wait_for_completion=false 集成,以透明地返回已完成任务的状态。如果任务已完成并且 wait_for_completion=false 被设置了,那么它将会返回 results 或 error 字段。这个功能的成本是 wait_for_completion=false 创建的文档,存储在 .tasks/task/${taskId} 中。删除该文档是由您自己决定的。

13、Cancel an update by query operation(通过查询操作取消更新)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#docs-update-by-query-cancel-task-api

Any update by query can be cancelled using the Task Cancel API:

可以使用任务取消 API 取消按查询进行的任何更新:

curl -X POST "localhost:9200/_tasks/r1A2WoRbTwKZ516z6NEs5A:36619/_cancel?pretty" 

The task ID can be found using the tasks API.

可以使用任务 API 找到任务 ID。

Cancellation should happen quickly but might take a few seconds. The task status API above will continue to list the update by query task until this task checks that it has been cancelled and terminates itself.

取消应该很快发生,但可能需要几秒钟。上面的任务状态 API 将继续按查询任务列出更新,直到该任务检查它是否已被取消并自行终止。

14、Change throttling for a request(更改请求的节流)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#docs-update-by-query-rethrottle

The value of requests_per_second can be changed on a running update by query using the _rethrottle API:

可以使用 _rethrottle API 在运行更新查询时更改 requests_per_second 的值。

curl -X POST "localhost:9200/_update_by_query/r1A2WoRbTwKZ516z6NEs5A:36619/_rethrottle?requests_per_second=-1&pretty"

The task ID can be found using the tasks API.

可以使用任务 API 找到任务 ID。

Just like when setting it on the _update_by_query API, requests_per_second can be either -1 to disable throttling or any decimal number like 1.7 or 12 to throttle to that level. Rethrottling that speeds up the query takes effect immediately, but rethrotting that slows down the query will take effect after completing the current batch. This prevents scroll timeouts.

就像在 _update_by_query API 上设置它时一样,requests_per_second 可以是 -1(禁用限流)或任何像 1.7 或 12 这样的小数,用于限制速度。加速查询的重新限流会立即生效,但是减慢查询的重新限流将在完成当前批次后生效,以避免滚动超时。

15、Slice manually(手动切片)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#docs-update-by-query-manual-slice

Slice an update by query manually by providing a slice id and total number of slices to each request:

通过为每个请求提供一个片 id 和总片数,手动通过查询切片更新:

curl -X POST "localhost:9200/my-index-000001/_update_by_query?pretty" -H 'Content-Type: application/json' -d'
        {
          "slice": {
            "id": 0,
            "max": 2
          },
          "script": {
            "source": "ctx._source[\u0027extra\u0027] = \u0027test\u0027"
          }
        }
        '
        curl -X POST "localhost:9200/my-index-000001/_update_by_query?pretty" -H 'Content-Type: application/json' -d'
        {
          "slice": {
            "id": 1,
            "max": 2
          },
          "script": {
            "source": "ctx._source[\u0027extra\u0027] = \u0027test\u0027"
          }
        }'
        

Which you can verify works with:

您可以通过以下方式验证其有效性:

curl -X GET "localhost:9200/_refresh?pretty"
        curl -X POST "localhost:9200/my-index-000001/_search?size=0&q=extra:test&filter_path=hits.total&pretty"      
    

Which results in a sensible total like this one:

结果就是这样一个合理的总数:

{
        "hits": {
          "total": {
              "value": 120,
              "relation": "eq"
          }
        }
      } 

16、Use automatic slicing(使用自动切片)

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#docs-update-by-query-automatic-slice

You can also let update by query automatically parallelize using Sliced scroll to slice on _id. Use slices to specify the number of slices to use:

您还可以让 update by query 自动使用 Sliced scroll 进行分片,并在 _id 上进行切片。使用 slices 参数指定要使用的切片数:

curl -X POST "localhost:9200/my-index-000001/_update_by_query?refresh&slices=5&pretty" -H 'Content-Type: application/json' -d'
        {
          "script": {
            "source": "ctx._source[\u0027extra\u0027] = \u0027test\u0027"
          }
        }'
    

Which you also can verify works with:

您还可以使用以下工具验证:

curl -X PUT "localhost:9200/_ingest/pipeline/set-foo?pretty" -H 'Content-Type: application/json' -d'
        {
          "description" : "sets foo",
          "processors" : [ {
              "set" : {
                "field": "foo",
                "value": "bar"
              }
          } ]
        }' 
        curl -X POST "localhost:9200/my-index-000001/_update_by_query?pipeline=set-foo&pretty"         

17、Pick up a new property

https://www.elastic.co/guide/en/elasticsearch/reference/8.8/docs-update-by-query.html#picking-up-a-new-property

Say you created an index without dynamic mapping, filled it with data, and then added a mapping value to pick up more fields from the data:

假设您创建了一个没有动态映射的索引,填充了数据,然后添加了一个映射值以从数据中获取更多字段:

curl -X PUT "localhost:9200/test?pretty" -H 'Content-Type: application/json' -d'
        {
          "mappings": {
            "dynamic": false,   
            "properties": {
              "text": {"type": "text"}
            }
          }
        }
        '
        curl -X POST "localhost:9200/test/_doc?refresh&pretty" -H 'Content-Type: application/json' -d'
        {
          "text": "words words",
          "flag": "bar"
        }
        '
        curl -X POST "localhost:9200/test/_doc?refresh&pretty" -H 'Content-Type: application/json' -d'
        {
          "text": "words words",
          "flag": "foo"
        }
        '
        curl -X PUT "localhost:9200/test/_mapping?pretty" -H 'Content-Type: application/json' -d'
        {
          "properties": {
            "text": {"type": "text"},
            "flag": {"type": "text", "analyzer": "keyword"}
          }
        }'
             
    

This means that new fields won’t be indexed, just stored in _source.

这意味着不会索引新字段,而是存储在 _source 中。

This updates the mapping to add the new flag field. To pick up the new field you have to reindex all documents with it.

这将更新映射以添加新的标志字段。要获取新字段,必须用它重新索引所有文档。

Searching for the data won’t find anything:

搜索数据不会找到任何东西:

curl -X POST "localhost:9200/test/_search?filter_path=hits.total&pretty" -H 'Content-Type: application/json' -d'
        {
          "query": {
            "match": {
              "flag": "foo"
            }
          }
        }'    
    
{
        "hits" : {
          "total": {
              "value": 0,
              "relation": "eq"
          }
        }
      }  
    

But you can issue an _update_by_query request to pick up the new mapping:

但是您可以发出 _update_by_query 请求来获取新的映射:

curl -X POST "localhost:9200/test/_update_by_query?refresh&conflicts=proceed&pretty"
        curl -X POST "localhost:9200/test/_search?filter_path=hits.total&pretty" -H 'Content-Type: application/json' -d'
        {
          "query": {
            "match": {
              "flag": "foo"
            }
          }
        }'
    
{
        "hits" : {
          "total": {
              "value": 1,
              "relation": "eq"
          }
        }
      }

You can do the exact same thing when adding a field to a multifield.

在向多字段添加字段时,您可以做完全相同的事情。

posted @ 2023-06-06 15:46  左扬  阅读(453)  评论(0编辑  收藏  举报
levels of contents