sphinx 安装测试
Sphinx特性:
* 高速的建立索引(在当代CPU上,峰值性能可达到10MB/秒);
* 高性能的搜索(在2–4GB的文本数据上,平均每次检索响应时间小于0.1秒);
* 可处理海量数据(目前已知可以处理超过100GB的文本数据,在单一CPU的系统上可处理100M文档);
* 提供了优秀的相关度算法,基于短语相似度和统计(BM25)的复合Ranking方法;
* 支持分布式搜索;
* 提供文件的摘录生成;
* 可作为MySQL的存储引擎提供搜索服务;
* 支持布尔、短语、词语相似度等多种检索模式;
* 文档支持多个全文检索字段(最大不超过32个);
* 文档支持多个额外的属性信息(例如:分组信息,时间戳等);
* 停止词查询;
* 支持单一字节编码和UTF-8编码;
* 原生的MySQL支持(同时支持MyISAM和InnoDB);
* 原生的PostgreSQL支持.
我查看了一些网络资料都是互相拷贝,很不靠谱,甚是郁闷...,自己做下了记录,如果对爱好者有帮助,那就继续往下看,嘿嘿
1\直接在http://www.sphinxsearch.com/downloads.html找到最新的windows版本
下载后解压在D:\usr\sphinx目录下;
2.在D:\usr\sphinx\下新建一个data目录用来存放索引文件,一个log目录方日志文件,复制D:\usr\sphinx\sphinx.conf.in到D:\usr\sphinx\bin\sphinx.conf(注意修改文件名);
3.修改D:\usr\sphinx\bin\sphinx.conf,我这里列出需要修改的几个:
- type = mysql # 数据源,我这里是mysql
- sql_host = localhost # 数据库服务器
- sql_user = root # 数据库用户名
- sql_pass = 'root' # 数据库密码
- sql_db = test # 数据库
- sql_port = 3306 # 数据库端口
- sql_query_pre = SET NAMES utf8 # 去掉此行前面的注释,如果你的数据库是uft8编码的
- index test1
- {
- # 放索引的目录
- path = D:/usr/sphinx/data/
- # 编码
- charset_type = utf-8
- # 指定utf-8的编码表
- charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
- # 简单分词,只支持0和1,如果要搜索中文,请指定为1
- ngram_len = 1
- # 需要分词的字符,如果要搜索中文,去掉前面的注释
- ngram_chars = U+3000..U+2FA1F
- }
- # index test1stemmed : test1
- # {
- # path = @CONFDIR@/data/test1stemmed
- # morphology = stem_en
- # }
- # 如果没有分布式索引,注释掉下面的内容
- # index dist1
- # {
- # 'distributed' index type MUST be specified
- # type = distributed
- # local index to be searched
- # there can be many local indexes configured
- # local = test1
- # local = test1stemmed
- # remote agent
- # multiple remote agents may be specified
- # syntax is 'hostname:port:index1,[index2[,...]]
- # agent = localhost:3313:remote1
- # agent = localhost:3314:remote2,remote3
- # remote agent connection timeout, milliseconds
- # optional, default is 1000 ms, ie. 1 sec
- # agent_connect_timeout = 1000
- # remote agent query timeout, milliseconds
- # optional, default is 3000 ms, ie. 3 sec
- # agent_query_timeout = 3000
- # }
- # 搜索服务需要修改的部分
- searchd
- {
- # 日志
- log = D:/usr/sphinx/log/searchd.log
- # PID file, searchd process ID file name
- pid_file = D:/usr/sphinx/log/searchd.pid
- # windows下启动searchd服务一定要注释掉这个
- # seamless_rotate = 1
- }
4\利用自带的导入测试数据example.sql导入到test 数据库,
D:\usr\mysql\bin>mysql -uroot -proot test<d:/usr/sphinx/example.sql
5\建立索引
D:\usr\sphinx\bin>indexer.exe --all
Sphinx 0.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file './sphinx.conf'...
indexing index 'test1'...
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 193 bytes
total 0.034 sec, 5701.98 bytes/sec, 118.18 docs/sec
搜索下test
D:\usr\sphinx\bin>search.exe test
Sphinx 0.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file './sphinx.conf'...
index 'test1': query 'test ': returned 3 matches of 3 total in 0.000 sec
displaying matches:
1. document=1, weight=2, group_id=1, date_added=Sun May 31 13:46:10 2009
id=1
group_id=1
group_id2=5
date_added=2009-05-31 13:46:10
title=test one
content=this is my test document number one. also checking search withi
phrases.
2. document=2, weight=2, group_id=1, date_added=Sun May 31 13:46:10 2009
id=2
group_id=1
group_id2=6
date_added=2009-05-31 13:46:10
title=test two
content=this is my test document number two
3. document=4, weight=1, group_id=2, date_added=Sun May 31 13:46:10 2009
id=4
group_id=2
group_id2=8
date_added=2009-05-31 13:46:10
title=doc number four
content=this is to test groups
words:
1. 'test': 3 documents, 5 hits
图片是数据库所有资料
6.测试中文搜索
修改test数据库中documents数据表,
- UPDATE `test`.`documents` SET `content` = 'this is my test 骨头躲在这里document number one. also checking search within phrases.中文测试,let''s go ,应该搜索的到吧,搜索愉快!' WHERE `documents`.`id` =1 LIMIT 1 ;
重建索引:
D:\usr\sphinx\bin>indexer.exe –all
做一下测试:
D:\usr\sphinx\bin>search.exe 中文
Sphinx 0.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file './sphinx.conf'...
index 'test1': query '中文 ': returned 0 matches of 0 total in 0.000 sec
words:
D:\usr\sphinx\bin>search.exe 骨头
Sphinx 0.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file './sphinx.conf'...
index 'test1': query '骨头 ': returned 0 matches of 0 total in 0.000 sec
words:
OK, 测试咋通过.??
提示:网络上有人说搜索不到中文,一方面可以能是编码问题,在win命令行的中文是 gbk ,只要你的数据库编码是gbk那一段能搜索的处理.
也可以用程序来搜索,不过要启动下Sphinx searchd服务
D:\usr\sphinx\bin>searchd.exe
Sphinx 0.9.8-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff
WARNING: forcing --console mode on Windows
using config file './sphinx.conf'...
creating server socket on 127.0.0.1:3312
accepting connections
然后把一下代码存为PHP执行.
- <?php
- require ’sphinxapi.php’;
- $s = new SphinxClient();
- $s->SetServer(’localhost’,9312);
- $result = $s->Query(’中文’);
- var_dump($result);
- ?>
http://www.phpwind.net/read-htm-tid-801035.html
我直接在浏览器查看
array(8) { ["error"]=> string(0) "" ["warning"]=> string(0) "" ["status"]=> int(0) ["fields"]=> array(2) { [0]=> string(5) "title" [1]=> string(7) "content" } ["attrs"]=> array(2) { ["group_id"]=> int(1) ["date_added"]=> int(2) } ["total"]=> string(1) "0" ["total_found"]=> string(1) "0" ["time"]=> string(5) "0.000" }
OK.测试成功~~