PHP实现关键词全文搜索Sphinx及中文分词Coreseek的安装配置
一、需求
实现文章标题中或分类(甚至文章内容)包含搜索词的文章,按照搜索词出现的频率的权重展示。
二、环境
Nginx+PHP+Mysql(系统Centos7)。
三、安装
1.安装依赖
yum -y install make gcc gcc-c++ libtool autoconf automake imake mariadb mariadb-server mariadb-devel libxml2-devel expat-devel
2.下载软件包
git clone https://github.com/wanqianworld/coreseek4.1.git
cd coreseek4.1 #下载完成后进入目录
3.解压coreseek
tar -xzf coreseek-4.1-beta.tar.gz
4.安装mmseg
cd coreseek-4.1-beta/mmseg-3.2.14 ./bootstrap ./configure --prefix=/usr/local/mmseg3 make && make install
5.安装coreseek
5.1.修改配置
cd ../csft-4.1
vim configure.ac
将
AM_INIT_AUTOMAKE([-Wall -Werror foreign])
修改为
AM_INIT_AUTOMAKE([-Wall foreign])
5.2.下载软件
yum -y install patch
5.3.打补丁
patch -p1 < /yourpath/sphinx/sphinxexpr.cpp-csft-4.1-beta.patch
输入:
/yourpath/sphinx/coreseek-4.1-beta/csft-4.1/src/sphinxexpr.cpp
5.4.安装
sh buildconf.sh ./configure --prefix=/usr/local/coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg3/lib/ --with-mysql make && make install
6.1.测试中文分词
cd ../testpack/ /usr/local/mmseg3/bin/mmseg -d /usr/local/mmseg3/etc var/test/test.xml #测试中文分词
6.2.创建索引
/usr/local/coreseek/bin/indexer -c etc/csft.conf --all
6.3.搜索测试
/usr/local/coreseek/bin/search -c etc/csft.conf 李彦宏
7.php连接sphinx
cd ../csft-4.1/api/libsphinxclient/ #进入目录 aclocal libtoolize --force automake --add-missing autoconf autoheader make clean ./configure --prefix=/usr/local/sphinxclient make && make install #编译 cd ../../../../ #回到软件包目录 tar -xzf sphinx-1.3.0.tgz #解压 yum -y install php php-devel #安装php-devel cd sphinx-1.3.0 #安装 phpize ./configure --with-php-config=/usr/bin/php-config --with-sphinx=/usr/local/sphinxclient make && make install
7.1.开启php-sphinx扩展
vim /etc/php.ini
在末尾加上:
[sphinx]
extension=sphinx.so
8.测试
8.1.添加测试数据
mysql -uroot -p123456 < /usr/local/coreseek/etc/example.sql
8.2复制配置文件
cp /usr/local/coreseek/etc/sphinx.conf.dist /usr/local/coreseek/etc/csft.conf cp /home/lee/sphinx/coreseek-4.1-beta/mmseg-3.2.14/data/* /usr/local/mmseg3/etc/
8.3.修改配置文件
vim /usr/local/coreseek/etc/csft.conf
source src1 { type = mysql sql_host = 127.0.0.1 sql_user = root sql_pass = 123456 sql_db = test sql_port = 3306 # optional, default is 3306 sql_query_pre = SET NAMES utf8 sql_sock = /var/lib/mysql/mysql.sock sql_query = \ SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \ FROM documents sql_attr_uint = group_id sql_attr_timestamp = date_added sql_ranged_throttle = 0 sql_query_info_pre = SET NAMES utf8 sql_query_info = SELECT * FROM documents WHERE id=$id } source src1throttled : src1 { sql_ranged_throttle = 100 } index test1 { source = src1 path = /usr/local/coreseek/var/data/test1 docinfo = extern mlock = 0 morphology = none min_word_len = 1 html_strip = 0 charset_dictpath = /usr/local/mmseg3/etc/ charset_type = zh_cn.utf-8 } indexer { mem_limit = 128M } searchd { listen = 9312 listen = 9306:mysql41 log = /usr/local/coreseek/var/log/searchd.log query_log = /usr/local/coreseek/var/log/query.log read_timeout = 5 client_timeout = 300 max_children = 30 pid_file = /usr/local/coreseek/var/log/searchd.pid max_matches = 1000 seamless_rotate = 1 preopen_indexes = 1 unlink_old = 1 mva_updates_pool = 1M max_packet_size = 8M max_filters = 256 max_filter_values = 4096 max_batch_queries = 32 workers = threads # for RT to work }
8.4.复制二进制文件
cp /usr/local/coreseek/bin/* /usr/bin/
8.5生成索引
indexer --rotate --all
8.6. 启动服务
searchd
8.7.停止服务
searchd --stop
9.测试
编写测试脚本:
vim test.php
<?php $sphinx = new SphinxClient(); $sphinx->SetServer("127.0.0.1",9312); $sphinx->SetMatchMode(SPH_MATCH_ALL); $sphinx->SetLimits(0, 20, 1000); $sphinx->SetArrayResult(true); $result = $sphinx -> query("one","test1"); var_dump($result);
运行脚本:
php test.php