Django-DRF中使用Elasticsearch ,使用IK分词

一.安装依赖

django-haystack==2.8.1
drf-haystack==1.8.6
Django==2.0.5
djangrestframework==3.8.2
elasticsearch==6.4.0

二.安装JAVA SDK

先到官网下载安装包:

下载链接:https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

因为我装的Elasticsearch的版本是2.4.1,安装的JDK==1.8,ES 2.x后的版本使用haystack会有不兼容问题.

安装步骤:

# 首先:
cd /usr/local/
mkdir javajdk
# 将下载的文件上传到:
/usr/local/javajdk
# 将文件解压到此文件夹
tar -xzvf jdk-8u231-linux-i586.tar.gz 
mv jdk1.8.0_231 java
# 配置环境变量:
vim /etc/profile

# 在文件最后添加这几行:

export JAVA_HOME=/usr/local/javajdk/java
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

 # 然后

 source /etc/profile

出现下面的提示则代表安装成功:

三.安装Elasticsearch

下载地址:https://www.elastic.co/cn/downloads/past-releases#elasticsearch

要注意的是Elasticsearch在root用户下启动是会报错的!

首先要新建用户:

useradd -g elastic elastic
# 在/home新建用户目录
mkdir elastic
# 将下载的安装包上传到 elastic 目录下
tar -xzvf elasticsearch-2.4.1.tar.gz -C /home/elastic/
# 给此目录授权
chown -R elastic:elastic elastic
# 切换用户
su - elastic
# 修改配置文件:
vim /home/elastic/elasticsearch-2.4.1/config/elasticsearch.yml
# 修改内容
path.data: /home/elastic/elasticsearch-2.4.1/data
path.logs: /home/elastic/elasticsearch-2.4.1/logs
network.host: 172.xxx.xxx.xxx
http.cors.allow-origin: "*"
# 如果没有data与logs在相关目录下建立

# 启动ES,在elasticsearch的bin目录下:
./elasticsearch

如果在浏览器中看到上面的内容,则表示安装成功!

如果出错解决方法:

1.最大文件描述符太少了,至少要65536,修改/etc/security/limits.conf文件
命令:vim /etc/security/limits.conf
内容修改为:* hard nofile 65536

2.一个进程可以拥有的VMA(虚拟内存区域)的数量太少了,至少要262144,修改文件  
命令:vim /etc/sysctl.conf
增加内容为:vm.max_map_count=262144

3.最大线程太少了,至少要4096,修改/etc/security/limits.conf文件
命令:vim /etc/security/limits.conf
增加内容为:* hard nproc 65536

四.安装IK分词插件

下载安装包:

下载地址:https://github.com/medcl/elasticsearch-analysis-ik/releases?after=v5.0.0

所选版本应于ES版本对应:

 

ES 2.4.1 对应 IK 版本是 1.10.1

将安装包解压到es的安装目录/plugin/ik

如果/plugin下面没有ik目录需要自己手动创建

五.可视化插件安装。

1.插件安装方式(推荐)
#在Elasticsearch目录下
elasticsearch/bin/plugin install mobz/elasticsearch-head

2.下载安装方式
从https://github.com/mobz/elasticsearch-head下载ZIP包。

在 elasticsearch  目录下创建目录/plugins/head/_site 并且将刚刚解压的elasticsearch-head-master目录下所有内容COPY到当前创建的/plugins/head/_site/目录下即可。

需要注意的是在5.xx后的版本,安装方法与这个不一样!

3.重启elasticsearch访问:
 访问地址是http://{你的ip地址}:9200/_plugin/head/
 http  端口默认是9200  

 

六.集群搭建

Elasticsearch集群搭建:

  1. 准备三台elasticsearch服务器

    创建elasticsearch-cluster文件夹,在内部复制三个elasticsearch服务

  2. 修改每台服务器配置

    修改elasticsearch-cluster\node*\config\elasticsearch.yml

如果在现有单机版本的基础上节点进行复制,需要注意的是,在当前节点的安装目录/elasticsearch/data中不能有数据,否则搭建集群会失败.需要删除data目录

# 节点1的配置信息
# 集群名称,保证唯一
cluster.name:my-elasticsearch
# 节点名称,必须不一样
node.name:node-1
# 必须为本机的ip地址
network.host:172.xxx.xxx.xxx
# 服务器端口号,在同一机器下必须不一样
http:port:9200
# 集群间通信端口号,在同一机器下必须不一样
transport.tcp.port:9300
# 设置集群自动发现机器ip集合
discovery.zen.ping.unicast.host:["172.xxx.xxx.xxx:9300",'172.xxx.xxx.xxx:9301',"172.xxx.xxx.xxx:9303"]

 将服务启动即可

七.在Django中配置

首先要在app中创建一个 search_indexes.py 文件这是这django-haystack规定的 

django-haystack:文档地址:https://django-haystack.readthedocs.io/en/master/tutorial.html#configuration

drf-haystack:文档地址:https://drf-haystack.readthedocs.io/en/latest/07_faceting.html#serializing-faceted-results

创建模型类:

from django.db import models

class Article(models.Model):
    title = models.CharField(max_length=128)
    files = models.FileField(upload_to='%Y/%m/')
    content = models.TextField(default='')

创建索引类:

from haystack import indexes
from app001.models import Article

class DocsIndex(indexes.SearchIndex, indexes.Indexable):
    # 1.构建的索引字段
    text = indexes.CharField(document=True, use_template=True)
    files = indexes.CharField(model_attr='files')
    content = indexes.CharField(model_attr='content')

    # 2.指定模型类
    def get_model(self):
        return Article

    # 3.提供数据集
    def index_queryset(self, using=None):
        """Used when the entire index for model is updated."""
        return self.get_model().objects.all()

view视图:

mport os
import datetime
import uuid

from rest_framework.views import APIView
from rest_framework import serializers
from rest_framework.response import Response
from django.conf import settings
from drf_haystack.serializers import HaystackSerializer
from drf_haystack.viewsets import HaystackViewSet

from .models import Article
from .search_indexes import DocsIndex


class DemoSerializer(serializers.ModelSerializer):
    """
    序列化器
    """
    class Meta:
        model = Article
        fields = ('id', 'title','files')



class LocationSerializer(HaystackSerializer):
    object = DemoSerializer(read_only=True)  # 只读,不可以进行反序列化

    class Meta:
        # The `index_classes` attribute is a list of which search indexes
        # we want to include in the search.
        index_classes = [DocsIndex]

        # The `fields` contains all the fields we want to include.
        # NOTE: Make sure you don't confuse these with model attributes. These
        # fields belong to the search index!
        fields = [
             "text","files","id","title"
        ]
 
class LocationSearchView(HaystackViewSet):

    # `index_models` is an optional list of which models you would like to include
    # in the search result. You might have several models indexed, and this provides
    # a way to filter out those of no interest for this particular view.
    # (Translates to `SearchQuerySet().models(*index_models)` behind the scenes.
    index_models = [Article]

    serializer_class = LocationSerializer

setting配置:

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'rest_framework',
    'silk',
    'debug_toolbar',
    'haystack',
    'app001',
]


# 搜索引擎配置:
# haystack配置
HAYSTACK_CONNECTIONS = {
'default': {
# 'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'ENGINE': 'app001.elasticsearch_ik_backend.IKSearchEngine', # 如果配置分词需要重新制定引擎,下面会写到
'URL': 'http://172.16.xxx.xxx:9200/',   # elasticseach 服务地址
'INDEX_NAME': 'haystack', # 索引名称
},
}
# 保持索引都是最新的
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
# 搜索显示的最多条数
HAYSTACK_SEARCH_RESULTS_PER_PAGE = 50

 

重写ik分词配置引擎:

在app中建立 elasticsearch_ik_backend.py 文件:

from haystack.backends.elasticsearch_backend import ElasticsearchSearchBackend
from haystack.backends.elasticsearch_backend import ElasticsearchSearchEngine
class IKSearchBackend(ElasticsearchSearchBackend):
    DEFAULT_ANALYZER = "ik_max_word" # 这里将 es 的 默认 analyzer 设置为 ik_max_word

    def __init__(self, connection_alias, **connection_options):
        super().__init__(connection_alias, **connection_options)

    def build_schema(self, fields):
        content_field_name, mapping = super(IKSearchBackend, self).build_schema(fields)
        for field_name, field_class in fields.items():
            field_mapping = mapping[field_class.index_fieldname]
            if field_mapping["type"] == "string" and field_class.indexed:
                if not hasattr(
                    field_class, "facet_for"
                ) and not field_class.field_type in ("ngram", "edge_ngram"):
                    field_mapping["analyzer"] = getattr(
                        field_class, "analyzer", self.DEFAULT_ANALYZER
                    )
            mapping.update({field_class.index_fieldname: field_mapping})
        return content_field_name, mapping


class IKSearchEngine(ElasticsearchSearchEngine):
    backend = IKSearchBackend

 

在django中使用drf-haystack对查询还不是很全:

在这我使用python 的 elasticsearch 进行查询:def-haystack的查询我觉得并不是很好用:

class EsSearch(APIView):
    def get(self,request):
        es = Elasticsearch(["http://xxx.xxx.xxx.xxx:9200"])
        query = request.GET.get("query")
     # 这里面的搜索方式可以定制你自己想要用的查询:
      
     # https://www.elastic.co/guide/cn/elasticsearch/guide/current/match-query.html
body = { "query":{ "multi_match": { "query": "%s" % query, "fields": [ "text", "content" ] } }, "highlight":{ "fields":{ "content":{}, "text":{} } } } result = es.search(index="haystack", doc_type="modelresult", body=body) return Response(result)

 

url配置:

"""tool_bar URL Configuration

The `urlpatterns` list routes URLs to views. For more information please see:
    https://docs.djangoproject.com/en/2.0/topics/http/urls/
Examples:
Function views
    1. Add an import:  from my_app import views
    2. Add a URL to urlpatterns:  path('', views.home, name='home')
Class-based views
    1. Add an import:  from other_app.views import Home
    2. Add a URL to urlpatterns:  path('', Home.as_view(), name='home')
Including another URLconf
    1. Import the include() function: from django.urls import include, path
    2. Add a URL to urlpatterns:  path('blog/', include('blog.urls'))
"""
from django.contrib import admin
from django.urls import path
from django.conf import settings
from django.conf.urls import url,include
from django.conf.urls.static import static
from django.conf import settings


from app001.views import Index,Uploads
from rest_framework import routers

from app001.views import LocationSearchView,EsSearch
from app002.views import BlogView

# drf-haystack查询
router = routers.DefaultRouter()
router.register("search", LocationSearchView,base_name="location-search")

urlpatterns = [
    # 使用自定义查询
    url(r'elastic_search/',EsSearch.as_view()),
  
url(r"api/", include(router.urls)),
] 

 

查询展示:

posted @ 2020-01-08 17:41  zhaijihai  阅读(1173)  评论(0编辑  收藏  举报