Centos下Sphinx中文分词编译安装测试---CoreSeek
要支持中文分词,还需要下载Coreseek,可以去官方搜索下载,这里我用的4.1
百度云下载地址: https://pan.baidu.com/s/1slNIyHf
1 2 3 4 | tar -zxvf coreseek- 4.1 -beta.tar.gz cd coreseek- 4.1 -beta cd mmseg- 3.2 . 14 / ./bootstrap //测试安装环境 |
1 2 3 4 5 6 7 8 | libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, `config'. libtoolize: copying file `config/ltmain.sh' libtoolize: Consider adding `AC_CONFIG_MACRO_DIR([m4])' to configure. in and libtoolize: rerunning libtoolize, to keep the correct libtool macros in -tree. libtoolize: Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am. + autoheader + automake --add-missing --copy + autoconf |
1 | ./configure --prefix=/usr/local/mmseg3 |
1 2 3 4 5 6 7 8 9 | ------------------------------------------------------------------------ Configuration: Source code location: . Compiler: gcc Compiler flags: -g -O2 Host System Type: x86_64-redhat-linux-gnu Install path: /usr/local/mmseg3 See config.h for further configuration information. ------------------------------------------------------------------------ |
1 | make && make install |
在原安装目录下创建一个文本文档测试一下
1 2 3 4 5 6 7 8 | cd /usr/local/mmseg3 cd /usr/local/src/coreseek-4 .1-beta /mmseg-3 .2.14 /src vim test .txt 山东省德州市 北京朝阳市 中国北京 中国德州 中国山东德州 |
1 2 | cd /usr/local/mmsge3/bin ./mmseg -d /usr/local/mmseg3/etc/ /usr/local/src/coreseek- 4.1 -beta/mmseg- 3.2 . 14 /src/test.txt |
1 2 3 4 5 6 | 山东省/x 德州市/x /x /x 北京/x 朝阳市/x 中国/x 北京/x 中国/x 德州/x 中国/x 山东/x 德州/x Word Splite took: 0 ms. |
1 2 3 4 | cd /usr/local/src/coreseek- 4.1 -beta/csft- 4.1 //可以把csft当做sphinx了 sh buildconf.sh //执行脚本测试,如果不出问题,证明可以使用 ./configure --prefix=/usr/local/coreseek --without-unixodbc -- with -mmseg -- with -mmseg-includes=/usr/local/mmseg3/ / include /mmseg/ -- with -mmseg-libs=/usr/local/mmseg3/lib/ -- with -mysql |
1 2 3 4 5 | You can now run 'make install' to build and install Sphinx binaries. On a multi-core machine, try 'make -j4 install' to speed up the build. Updates, articles, help forum, and commercial support, consulting, training, and development services are available at http: //sphinxsearch.com/ Thank you for choosing Sphinx! |
1 | make && make install |
1 2 3 4 5 | make[ 3 ]: Entering directory `/usr/local/src/coreseek- 4.1 -beta/csft- 4.1 ' mkdir -p /usr/local/coreseek/ var /data && mkdir -p /usr/local/coreseek/ var /log make[ 3 ]: Leaving directory `/usr/local/src/coreseek- 4.1 -beta/csft- 4.1 ' make[ 2 ]: Leaving directory `/usr/local/src/coreseek- 4.1 -beta/csft- 4.1 ' make[ 1 ]: Leaving directory `/usr/local/src/coreseek- 4.1 -beta/csft- 4.1 ' |
然后进入mysql客户端创建一个表测试一下
1 2 3 4 5 | create table kecheng(id int primary key auto_increment,name var char( 50 ),info var char( 50 ))charset utf8; insert into kecheng(name,info) values( 'java' , 'java是一门很牛的语言,性能整体来说比PHP要强,但是不如php开发速度快' ); insert into kecheng(name,info) values( 'redis' , 'redis是一种内存缓存数据库,比memcache支持的数据格式多' ); insert into kecheng(name,info) values( 'memcache' , 'memcache支持简单的key value形式,不像redis支持持久化' ); insert into kecheng(name,info) values( 'jquery' , 'jquery是一种前端脚本,结合php和java可以做web开发' ); |
1 2 3 4 5 | cd /usr/local/coreseek/ //也就是sphinx目录了 cd bin ls //类似于原版sphinx目录结构 cd /usr/local/coreseek/etc cp sphinx.conf.dist csft.conf |
1 2 3 4 | CREATE TABLE index_table( //此表为了存放更新完的索引id,不用每次更新全表 Counter_id int unsigned not null primary key auto, Max_id int unsigned not null comment '已经创建完索引的最大id' ) |
编辑配置文件csft.conf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | 13 source src1 14 { 15 # data source type. mandatory, no default value 16 # known types are mysql, pgsql, mssql, xmlpipe, xmlpipe2, odbc 17 type = mysql --库类型 18 19 ##################################################################### 20 ## SQL settings ( for 'mysql' and 'pgsql' types) 21 ##################################################################### 22 23 # some straightforward parameters for SQL source types 24 sql_host = localhost --不做解释 25 sql_user = root 26 sql_pass = 27 sql_db = test 28 sql_port = 3306 # optional, default is 3306 ..... 79 sql_query_pre = SET NAMES utf8 --设置字符集 80 sql_query_pre = SET SESSION query_cache_type=OFF --关闭mysql查询缓存 84 # mandatory, integer document ID field MUST be the first selected column 85 #sql_query = \ 86 # SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \ 87 # FROM documents--关掉默认的查询表 #设置要查询的信息,如果表主键不叫id,那么还需要别名为id,如 select tid id from tableName; 88 sql_query = SELECT id,name,info FROM kecheng #主查询执行完之后执行的SQL index_table是存放最后更新的主键id,不用每次更新全表,只更新最新数据 sql_query_post = REPLACE INTO index_table SELECT 1 ,MAX(id) FROM kecheng; ..... #当使用search检索文件的时候,返回的记录字段,这里是所有(测试而已) 241 sql_query_info = SELECT * FROM kecheng WHERE id=$id ..... index test1 318 { ..... 331 path = /usr/local/coreseek/ var /data/test1 --索引文件创建的位置 332 333 # document attribute values (docinfo) storage mode 391 charset_type = zh_cn.utf- 8 --改为中文 392 charset_dictpath = /usr/local/mmseg3/etc/ --词典目录 #---------------- source zengliangsuoyin : src1{ #取出还没有创建索引的数据 sql_query = SELECT id,name,info FROM kecheng WHERE id > (SELECT max_id FROM index_table ) #再把最后一个id更新到index_table 。。不用写了,因为是继承上一个 } index zengliangsuoyin : src1{ source = zengliangsuoyin path = /usr/local/coreseek/ var /data/test1 } |
保存退出
1 2 | cd /usr/local/coreseek/bin/ ./indexer --all |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | using config file '/usr/local/coreseek/etc/csft.conf' ... --指定的配置文档,之前复制的文件命名一致 indexing index 'test1' ... WARNING: attribute 'group_id' not found - IGNORING WARNING: attribute 'date_added' not found - IGNORING WARNING: Attribute count is 0 : switching to none docinfo collected 5 docs, 0.0 MB sorted 0.0 Mhits, 100.0 % done total 5 docs, 351 bytes total 0.178 sec, 1971 bytes/sec, 28.07 docs/sec indexing index 'test1stemmed' ... WARNING: attribute 'group_id' not found - IGNORING WARNING: attribute 'date_added' not found - IGNORING WARNING: Attribute count is 0 : switching to none docinfo collected 5 docs, 0.0 MB --发现五个文档也就是mysql五条记录,连接库没问题了 sorted 0.0 Mhits, 100.0 % done total 5 docs, 351 bytes total 0.007 sec, 47677 bytes/sec, 679.16 docs/sec skipping non-plain index 'dist1' ... skipping non-plain index 'rt' ... total 4 reads, 0.000 sec, 0.3 kb/call avg, 0.0 msec/call avg total 12 writes, 0.000 sec, 0.2 kb/call avg, 0.0 msec/call avg |
1 | ./search php |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | Coreseek Fulltext 4.1 [ Sphinx 2.0 . 2 -dev (r2922)] Copyright (c) 2007 - 2011 , Beijing Choice Software Technologies Inc (http: //www.coreseek.com) using config file '/usr/local/coreseek/etc/csft.conf' ... index 'test1' : query 'php ' : returned 3 matches of 3 total in 0.000 sec displaying matches: 1 . document= 1 , weight= 2500 id= 1 group_id= 1 group_id2= 5 date_added= 2017 - 02 - 08 06 : 22 : 36 title=test one content= this is my test document number one. also checking search within phrases. 2 . document= 2 , weight= 1500 id= 2 group_id= 1 group_id2= 6 date_added= 2017 - 02 - 08 06 : 22 : 36 title=test two content= this is my test document number two 3 . document= 5 , weight= 1500 (document not found in db) words: 1 . 'php' : 3 documents, 5 hits ---出现的次数 index 'test1stemmed' : query 'php ' : returned 3 matches of 3 total in 0.000 sec displaying matches: 1 . document= 1 , weight= 2500 id= 1 group_id= 1 group_id2= 5 date_added= 2017 - 02 - 08 06 : 22 : 36 title=test one content= this is my test document number one. also checking search within phrases. 2 . document= 2 , weight= 1500 id= 2 group_id= 1 group_id2= 6 date_added= 2017 - 02 - 08 06 : 22 : 36 title=test two content= this is my test document number two 3 . document= 5 , weight= 1500 (document not found in db) words: 1 . 'php' : 3 documents, 5 hits |
测试完成,下面就开始php扩展的安装了
1 |
分类:
Sphinx
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· 没有源码,如何修改代码逻辑?
· 一个奇形怪状的面试题:Bean中的CHM要不要加volatile?
· [.NET]调用本地 Deepseek 模型
· Blazor Hybrid适配到HarmonyOS系统
· Obsidian + DeepSeek:免费 AI 助力你的知识管理,让你的笔记飞起来!
· 解决跨域问题的这6种方案,真香!
· 一套基于 Material Design 规范实现的 Blazor 和 Razor 通用组件库
· 分享4款.NET开源、免费、实用的商城系统