将Mysq数据导入solr索引库
本文的基础环境都是在centos 64bit,jdk1.7.79
将mysql 的jar 包添加到/home/hadoop/cloudsolr/solr-4.10.4/contrib/dataimporthandler/lib 下
修改对应的solrconfig.xml 文件我的core 是collection1,配置文件example/solr/collection1/conf/solrconfig.xml
在配置文件中添加了:
<lib dir="/home/hadoop/cloudsolr/solr-4.10.4/dist/" regex="solr-dataimporthandler-\d.*\.jar" /> <lib dir="/home/hadoop/cloudsolr/solr-4.10.4/contrib/dataimporthandler/lib/" regex=".*\.jar" />
还是在solrconfig配置文件中
<!-- the dataimport requestHandler --> <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport. DataImportHandler"> <lst name="defaults"> <str name="config">db-data-config.xml</str> </lst> </requestHandler>
vim db-data-config.xml
<?xml version="1.0" encoding="UTF-8" ?> <dataConfig> <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://ip:3306/database" user="laiba" password="laiba123" <span style="color:#FF0000;"> batchSize="-1"</span>/><!-- 注意:mysql中一定要batchSize="-1" 否则会报异常--> <document> <entity name="bns_article" pk="id" query="select id,title,author,cover,digest, content from bns_article" deltaImportQuery="select id,title, author, cover,digest, content from bns_article where id='${dataimporter.delta.ID}'" deltaQuery="select id,title, author, cover, digest,content from bns_article where to_char(updatetime,'yyyy-mm-dd hh24:mi:ss')> '${dataimporter.last_index_time}'"> <field column="id" name="id"/> <field column="title" name="title"/> <field column="author" name="author"/> <field column="cover" name="cover"/> <field column="digest" name="digest"/> <field column="content" name="content"/> </entity> </document> </dataConfig>
配置entity的时候要注意的是field 第一个字段是mysql的数据库字段,name 是solr配置文件里面的字段也是在页面显示
第三:配置schema.xml文件 添加一下字段(也就是要生成索引的数据库字段) (根据上一篇IK分词的设置,也可以把字段设置成需要分词的)
添加2个字段:
<field name="cover" type="string" indexed="true" stored="true" multiValued="false"/> <field name="digest" type="string" indexed="true" stored="true" multiValued="false"/>
重启服务后出现错误提示:
HTTP ERROR 500 Problem accessing /solr/. Reason: {msg=SolrCore 'collection1' is not available due to init failure: RequestHandler init failure,trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: RequestHandler init failure at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:745) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368)
问题原因:
<!-- the dataimport requestHandler --> <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport. DataImportHandler"> <lst name="defaults"> <str name="config">db-data-config.xml</str> </lst> </requestHandler>
将<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport. DataImportHandler"> 换行了。
解决办法:
将<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport. DataImportHandler"> 调整在一行即可
打开集群
导入数据
查询
参考地址:http://wiki.apache.org/solr/DIHQuickStart
配置多张表导入solr
配置文件 vim db-data-config.xml
<document> <entity name="bns_article" pk="id" query="select id,title,author,cover,digest, content from bns_article" deltaImportQuery="select id,title, author, cover,digest, content from bns_article where id='${dataimporter.delta.ID}'" deltaQuery="select id,title, author, cover, digest,content from bns_article where to_char(updatetime,'yyyy-mm-dd hh24:mi:ss')> '${dataimporter.last_index_time}'"> <field column="id" name="id"/> <field column="title" name="title"/> <field column="author" name="author"/> <field column="cover" name="cover"/> <field column="digest" name="digest"/> <field column="content" name="content"/> </entity> <entity name="bns_word" pk="id" query="select id, content, avgfreel, state, sentencenum, articlenum,updatetime, createtime from bns_word" deltaImportQuery="select id, content, avgfreel, state, sentencenum, articlenum,updatetime, createtime from bns_word where id='${dataimporter.delta.ID}'" deltaQuery="select id, content, avgfreel, state, sentencenum, articlenum,updatetime, createtime from bns_word where to_char(updatetime,'yyyy-mm-dd hh24:mi:ss')> '${dataimporter.last_index_time}'"> <field column="id" name="id"/> <field column="content" name="content"/> <field column="avgfreel" name="avgfreel"/> <field column="state" name="state"/> <field column="sentencenum" name="sentencenum"/> <field column="articlenum" name="articlenum"/> <field column="updatetime" name="updatetime"/> <field column="createtime" name="createtime"/> </entity> </document>
配置schema.xml文件
添加字段:
<field name="avgfeel" type="string" indexed="true" stored="true" multiValued="false"/> <field name="state" type="string" indexed="true" stored="true" multiValued="false"/> <field name="sentencenum" type="string" indexed="true" stored="true" multiValued="false"/> <field name="articlenum" type="string" indexed="true" stored="true" multiValued="false"/> <field name="updatetime" type="string" indexed="true" stored="true" multiValued="false"/> <field name="createtime" type="string" indexed="true" stored="true" multiValued="false"/>
新添加mysql字段:
<entity name="bns_sentence" pk="id" query ="select id, uid, createname, createheadimg, wid, word, content, articlenum, state, feel, forwardnum, supportnum, updatetime, createtime from bns_sentence" deltaImportQuery ="select id, uid, createname, createheadimg, wid, word, content, articlenum, state, feel, forwardnum, supportnum, updatetime, createtime from bns_sentence where id='${dataimporter.delta.ID}'" deltaQuery ="select id, uid, createname, createheadimg, wid, word, content, articlenum, state, feel, forwardnum, supportnum, updatetime, createtime from bns_sentence"> <field column="id" name="id"/> <field column="uid" name="uid"/> <field column="createname" name="createname"/> <field column="createheadimg" name="createheadimg"/> <field column="wid" name="wid"/> <field column="word" name="word"/> <field column="content" name="content"/> <field column="articlenum" name="articlenum"/> <field column="state" name="state"/> <field column="feel" name="feel"/> <field column="forwardnum" name="forwardnum"/> <field column="supportnum" name="supportnum"/> <field column="updatetime" name="updatetime"/> <field column="createtime" name="createtime"/> </entity> <entity name ="bns_user" pk="id" query= "select id, username, password, money, nickname, headimg, sex, articlenum, sentencenum, wordnum, createtime from bns_user" deltaImportQuery= "select id, username, password, money, nickname, headimg, sex, articlenum, sentencenum, wordnum, createtime from bns_user where id='${dataimporter.delta.ID}'" deltaQuery ="select id, username, password, money, nickname, headimg, sex, articlenum, sentencenum, wordnum, createtime from bns_user"> <field column="id" name="id"/> <field column="username" name="username"/> <field column="password" name="password"/> <field column="money" name="money"/> <field column="nickname" name="nickname"/> <field column="headimg" name="headimg"/> <field column="sex" name="sex"/> <field column="articlenum" name="articlenum"/> <field column="sentencenum" name="sentencenum"/> <field column="wordnum" name="wordnum"/> <field column="createtime" name="createtime"/> </entity>
配置schema.xml文件
添加字段:
<field name="uid" type="string" indexed="true" stored="true" multiValued="false"/> <field name="word" type="string" indexed="true" stored="true" multiValued="false"/> <field name="feel" type="string" indexed="true" stored="true" multiValued="false"/> <field name="forwardnum" type="string" indexed="true" stored="true" multiValued="false"/> <field name="supportnum" type="string" indexed="true" stored="true" multiValued="false"/> <field name="username" type="string" indexed="true" stored="true" multiValued="false"/> <field name="password" type="string" indexed="true" stored="true" multiValued="false"/> <field name="money" type="string" indexed="true" stored="true" multiValued="false"/> <field name="nickname" type="string" indexed="true" stored="true" multiValued="false"/> <field name="heading" type="string" indexed="true" stored="true" multiValued="false"/> <field name="sex" type="string" indexed="true" stored="true" multiValued="false"/> <field name="wordnum" type="string" indexed="true" stored="true" multiValued="false"/> <field name="nickname" type="string" indexed="true" stored="true" multiValued="false"/>
出现问题:就导入几条数据的时候,indexing 很慢