Elasticsearch Tutorial

Concepts

Mapping concepts across SQL and Elasticsearch

While SQL and Elasticsearch have different terms for the way the data is organized, essentially their purpose is the same.

SQL	ElasticSearch	Description
column	field	In both cases, at the lowest level, data is stored in named entries, of a variety of data types, containing one value.
row	document	Columns and fields do not exist by themselves; they are part of a row or a document.
table	index	The target against which queries, whether in SQL or Elasticsearch get executed against.
database	cluster	In SQL, catalog or database are used interchangeably and represent a set of schemas that is, a number of tables. In Elasticsearch the set of indices available are grouped in a cluster.

Field Data Type

Common types

type	description
binary	Binary value encoded as a Base64 string.
boolean	true and false values.
Keywords	The keyword family, including keyword, constant_keyword, and wildcard.
Numbers	Numeric types, such as long and double, used to express amounts.
Dates	Date types, including date and date_nanos.
Text	A field to index full-text values.

Mapping

Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.

Each document is a collection of fields, which each have their own data type. When mapping your data, you create a mapping definition, which contains a list of fields that are pertinent to the document.

Dynamic mapping

Dynamic mapping allows you to experiment with and explore data when you’re just getting started. Elasticsearch adds new fields automatically, just by indexing a document.

Explicit mapping

Explicit mapping allows you to precisely choose how to define the mapping definition. For example,

{
  "mappings": {
    "properties": {
      "uuid": {
        "type": "keyword"
      },
      "title": {
        "type": "text"
      },
      "main_body": {
        "type": "text",
        "index": "false"
      }
    }
  }
}

The index type "keyword" indicates this field should be searched by term query, which means do not be analyzed.

The index type "text" indicates this field should be searched by match query, and it is going to be analyzed.

The "index:false" specify this field should not be indexed, meanwhile, this field could not be searched.

Query and filter contextedit

Relevance scoresedit

By default, Elasticsearch sorts matching search results by relevance score, which measures how well each document matches a query.

Query context

In the query context, a query clause answers the question “How well does this document match this query clause?” Besides deciding whether or not the document matches, the query clause also calculates a relevance score in the _score metadata field.

Filter context

In a filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated.

Query DSL

Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries.

Leaf query clauses

query type	description
match	Returns documents that match a provided text, number, date or boolean value. The provided text is analyzed before matching.
term	Returns documents that contain an exact term in a provided field.
range	Returns documents that contain terms within a provided range.

Compound query clauses

query type	description
bool	A query that matches documents matching boolean combinations of other queries. It is built using one or more boolean clauses, each clause with a typed occurrence.
dis_max	Returns documents matching one or more wrapped queries, called query clauses or clauses. If a returned document matches multiple query clauses, the dis_max query assigns the document the highest relevance score from any matching clause, plus a tie breaking increment for any additional matching subqueries.
constant_score	Wraps a filter query and returns every matching document with a relevance score equal to the boost parameter value.

Allow expensive queries

query type	description
script queries	Filters documents based on a provided script. The script query is typically used in a filter context.
fuzzy queries	Returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.
regexp queries	Returns documents that contain terms matching a regular expression.
prefix queries	Returns documents that contain a specific prefix in a provided field.
wildcard queries	Returns documents that contain terms matching a wildcard pattern. A wildcard operator is a placeholder that matches one or more characters.
range queries	Returns documents that contain terms within a provided range.
Joining queries	Performing full SQL-style joins in a distributed system like Elasticsearch is prohibitively expensive.
Geo-shape query	Filter documents indexed using the geo_shape or geo_point type.
Script score query	Uses a script to provide a custom score for returned documents. The script_score query is useful if, for example, a scoring function is expensive and you only need to calculate the score of a filtered set of documents.
Percolate query	The percolate query can be used to match queries stored in an index. The percolate query itself contains the document that will be used as query to match with the stored queries.

Python3 ElasticSearch in Action

Index

create

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
    body = {
      "mappings": {
        "properties": {
          "uuid": {
            "type": "keyword"
          },
          "title": {
            "type": "text"
          },
          "main_body": {
            "type": "text"
          }
        }
      }
    }
    ret = es.indices.create(index="forward", body=body)
    pprint(ret)


if __name__ == '__main__':
    main()

delete

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
    ret = es.indices.delete(index="forward")
    pprint(ret)


if __name__ == '__main__':
    main()

update

Update mapping API

Adds new fields to an existing data stream or index. You can also use this API to change the search settings of existing fields.

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
    body = {
      "properties": {
        "uuid": {
          "type": "keyword"
        },
        "title": {
          "type": "text"
        },
        "main_body": {
          "type": "text"
        },
        "publish_date": {
          "type": "keyword"
        }
      }
    }
    ret = es.indices.put_mapping(index=args.name, body=body)
    pprint(ret)


if __name__ == '__main__':
    main()

Reindex API

Copies documents from a source to a destination.

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
    body = {
      "source": {
        "index": "forward"
      },
      "dest": {
        "index": "document"
      }
    }
    ret = es.reindex(body=body)
    pprint(ret)


if __name__ == '__main__':
    main()

get

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
    ret = es.indices.get(index="forward")
    pprint(ret)


if __name__ == '__main__':
    main()

Document

create

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    body = {
      "uuid": "1000",
      "title": "中国银行在港交所上市挂牌成功",
      "main_body": "中国银行在港交所上市挂牌成功，成为中国大陆首家在国际市场上市的银行。"
    }
    es = Elasticsearch()
    ret = es.index(index="forward", body=body)
    pprint(ret)


if __name__ == '__main__':
    main()

delete

# encoding=utf-8

import argparse
from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
    ret = es.delete(index="forward", id="WRemuHkBd6vf16HuHzHq")
    pprint(ret)


if __name__ == '__main__':
    main()

update

To fully replace an existing document, use the index API, which is designed to creates or updates a document in an index.

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
		body = {
      "uuid": "1000",
      "title": "<<中国银行在港交所上市挂牌成功>>",
      "main_body": "<<成为中国大陆首家在国际市场上市的银行>>"
    }
    ret = es.index(index="forward", body=body, id="WRemuHkBd6vf16HuHzHq")
    pprint(ret)


if __name__ == '__main__':
    main()

Updates a document with a script or partial document.

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
		body = {
      "uuid": "1000",
      "title": "<<中国银行在港交所上市挂牌成功>>",
      "main_body": "<<成为中国大陆首家在国际市场上市的银行>>"
    }
    ret = es.index(index="forward", body=body, id="WRemuHkBd6vf16HuHzHq")
    pprint(ret)


if __name__ == '__main__':
    main()

Updates a document using the specified script.

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
		body = {
      "script" : {
        "source": "ctx._source.counter += params.count",
        "lang": "painless",
        "params" : {
          "count" : 4
        }
      }
    }
    ret = es.update(index="forward", body=body, id="WRemuHkBd6vf16HuHzHq")
    pprint(ret)


if __name__ == '__main__':
    main()

get

Returns a document.

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
    ret = es.get(index="forward", id="WRemuHkBd6vf16HuHzHq")
    pprint(ret)


if __name__ == '__main__':
    main()

Search

match_phrase query，可以实现基于字的中文布尔检索，实现中文精准匹配、中文精准查询。

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
    body = {
      "query": {
        "match_phrase": {
          "title": "中国石油"
        },
        "match_phrase": {
          "main_body": "中国石油"
        }
      }
    }
    ret = es.search(body=body, index="forward")
    pprint(ret)


if __name__ == '__main__':
    main()

Multi-match query, The multi_match query builds on the match query to allow multi-field queries.

{
  "query": {
    "multi_match" : {
      "query":    "中国石油",
      "fields": [ "title", "main_body" ]
    }
  }
}

Allows to highlight search results on one or more fields.

{
    "query" : {
        "match": { "title": "中国石油" }
    },
    "highlight" : {
        "pre_tags" : ["<tag1>"],
        "post_tags" : ["</tag1>"],
        "fields" : {
            "_all" : {}
        }
    }
}

posted @ 2021-05-30 12:37 健康平安快乐阅读(86) 评论(0) 编辑收藏举报

刷新页面返回顶部