Centos下Sphinx中文分词编译安装测试---CoreSeek
要支持中文分词,还需要下载Coreseek,可以去官方搜索下载,这里我用的4.1
百度云下载地址: https://pan.baidu.com/s/1slNIyHf
tar -zxvf coreseek-4.1-beta.tar.gz cd coreseek-4.1-beta cd mmseg-3.2.14/ ./bootstrap //测试安装环境
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `config'. libtoolize: copying file `config/ltmain.sh' libtoolize: Consider adding `AC_CONFIG_MACRO_DIR([m4])' to configure.in and libtoolize: rerunning libtoolize, to keep the correct libtool macros in-tree. libtoolize: Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am. + autoheader + automake --add-missing --copy + autoconf
./configure --prefix=/usr/local/mmseg3
------------------------------------------------------------------------ Configuration: Source code location: . Compiler: gcc Compiler flags: -g -O2 Host System Type: x86_64-redhat-linux-gnu Install path: /usr/local/mmseg3 See config.h for further configuration information. ------------------------------------------------------------------------
make && make install
在原安装目录下创建一个文本文档测试一下
cd /usr/local/mmseg3 cd /usr/local/src/coreseek-4.1-beta/mmseg-3.2.14/src vim test.txt 山东省德州市 北京朝阳市 中国北京 中国德州 中国山东德州
cd /usr/local/mmsge3/bin ./mmseg -d /usr/local/mmseg3/etc/ /usr/local/src/coreseek-4.1-beta/mmseg-3.2.14/src/test.txt
山东省/x 德州市/x /x /x 北京/x 朝阳市/x 中国/x 北京/x 中国/x 德州/x 中国/x 山东/x 德州/x Word Splite took: 0 ms.
cd /usr/local/src/coreseek-4.1-beta/csft-4.1 //可以把csft当做sphinx了 sh buildconf.sh //执行脚本测试,如果不出问题,证明可以使用 ./configure --prefix=/usr/local/coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/mmseg3/ /include/mmseg/ --with-mmseg-libs=/usr/local/mmseg3/lib/ --with-mysql
You can now run 'make install' to build and install Sphinx binaries. On a multi-core machine, try 'make -j4 install' to speed up the build. Updates, articles, help forum, and commercial support, consulting, training, and development services are available at http://sphinxsearch.com/ Thank you for choosing Sphinx!
make && make install
make[3]: Entering directory `/usr/local/src/coreseek-4.1-beta/csft-4.1' mkdir -p /usr/local/coreseek/var/data && mkdir -p /usr/local/coreseek/var/log make[3]: Leaving directory `/usr/local/src/coreseek-4.1-beta/csft-4.1' make[2]: Leaving directory `/usr/local/src/coreseek-4.1-beta/csft-4.1' make[1]: Leaving directory `/usr/local/src/coreseek-4.1-beta/csft-4.1'
然后进入mysql客户端创建一个表测试一下
create table kecheng(id int primary key auto_increment,name varchar(50),info varchar(50))charset utf8; insert into kecheng(name,info) values('java','java是一门很牛的语言,性能整体来说比PHP要强,但是不如php开发速度快'); insert into kecheng(name,info) values('redis','redis是一种内存缓存数据库,比memcache支持的数据格式多'); insert into kecheng(name,info) values('memcache','memcache支持简单的key value形式,不像redis支持持久化'); insert into kecheng(name,info) values('jquery','jquery是一种前端脚本,结合php和java可以做web开发');
cd /usr/local/coreseek/ //也就是sphinx目录了 cd bin ls //类似于原版sphinx目录结构 cd /usr/local/coreseek/etc cp sphinx.conf.dist csft.conf
CREATE TABLE index_table( //此表为了存放更新完的索引id,不用每次更新全表 Counter_id int unsigned not null primary key auto, Max_id int unsigned not null comment'已经创建完索引的最大id' )
编辑配置文件csft.conf
13 source src1 14 { 15 # data source type. mandatory, no default value 16 # known types are mysql, pgsql, mssql, xmlpipe, xmlpipe2, odbc 17 type = mysql --库类型 18 19 ##################################################################### 20 ## SQL settings (for 'mysql' and 'pgsql' types) 21 ##################################################################### 22 23 # some straightforward parameters for SQL source types 24 sql_host = localhost --不做解释 25 sql_user = root 26 sql_pass = 27 sql_db = test 28 sql_port = 3306 # optional, default is 3306 ..... 79 sql_query_pre = SET NAMES utf8 --设置字符集 80 sql_query_pre = SET SESSION query_cache_type=OFF --关闭mysql查询缓存 84 # mandatory, integer document ID field MUST be the first selected column 85 #sql_query = \ 86 # SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \ 87 # FROM documents--关掉默认的查询表 #设置要查询的信息,如果表主键不叫id,那么还需要别名为id,如 select tid id from tableName; 88 sql_query = SELECT id,name,info FROM kecheng #主查询执行完之后执行的SQL index_table是存放最后更新的主键id,不用每次更新全表,只更新最新数据 sql_query_post = REPLACE INTO index_table SELECT 1,MAX(id) FROM kecheng; ..... #当使用search检索文件的时候,返回的记录字段,这里是所有(测试而已) 241 sql_query_info = SELECT * FROM kecheng WHERE id=$id ..... index test1 318 { ..... 331 path = /usr/local/coreseek/var/data/test1 --索引文件创建的位置 332 333 # document attribute values (docinfo) storage mode 391 charset_type = zh_cn.utf-8 --改为中文 392 charset_dictpath = /usr/local/mmseg3/etc/ --词典目录 #---------------- source zengliangsuoyin : src1{ #取出还没有创建索引的数据 sql_query = SELECT id,name,info FROM kecheng WHERE id > (SELECT max_id FROM index_table ) #再把最后一个id更新到index_table 。。不用写了,因为是继承上一个 } index zengliangsuoyin : src1{ source = zengliangsuoyin path = /usr/local/coreseek/var/data/test1 }
保存退出
cd /usr/local/coreseek/bin/ ./indexer --all
using config file '/usr/local/coreseek/etc/csft.conf'... --指定的配置文档,之前复制的文件命名一致 indexing index 'test1'... WARNING: attribute 'group_id' not found - IGNORING WARNING: attribute 'date_added' not found - IGNORING WARNING: Attribute count is 0: switching to none docinfo collected 5 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 5 docs, 351 bytes total 0.178 sec, 1971 bytes/sec, 28.07 docs/sec indexing index 'test1stemmed'... WARNING: attribute 'group_id' not found - IGNORING WARNING: attribute 'date_added' not found - IGNORING WARNING: Attribute count is 0: switching to none docinfo collected 5 docs, 0.0 MB --发现五个文档也就是mysql五条记录,连接库没问题了 sorted 0.0 Mhits, 100.0% done total 5 docs, 351 bytes total 0.007 sec, 47677 bytes/sec, 679.16 docs/sec skipping non-plain index 'dist1'... skipping non-plain index 'rt'... total 4 reads, 0.000 sec, 0.3 kb/call avg, 0.0 msec/call avg total 12 writes, 0.000 sec, 0.2 kb/call avg, 0.0 msec/call avg
./search php
Coreseek Fulltext 4.1 [ Sphinx 2.0.2-dev (r2922)] Copyright (c) 2007-2011, Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file '/usr/local/coreseek/etc/csft.conf'... index 'test1': query 'php ': returned 3 matches of 3 total in 0.000 sec displaying matches: 1. document=1, weight=2500 id=1 group_id=1 group_id2=5 date_added=2017-02-08 06:22:36 title=test one content=this is my test document number one. also checking search within phrases. 2. document=2, weight=1500 id=2 group_id=1 group_id2=6 date_added=2017-02-08 06:22:36 title=test two content=this is my test document number two 3. document=5, weight=1500 (document not found in db) words: 1. 'php': 3 documents, 5 hits ---出现的次数 index 'test1stemmed': query 'php ': returned 3 matches of 3 total in 0.000 sec displaying matches: 1. document=1, weight=2500 id=1 group_id=1 group_id2=5 date_added=2017-02-08 06:22:36 title=test one content=this is my test document number one. also checking search within phrases. 2. document=2, weight=1500 id=2 group_id=1 group_id2=6 date_added=2017-02-08 06:22:36 title=test two content=this is my test document number two 3. document=5, weight=1500 (document not found in db) words: 1. 'php': 3 documents, 5 hits
测试完成,下面就开始php扩展的安装了