Solr DIH dataconfig配置
1.
配置文件data-config.xml定义了数据库的基本配置,以及导出数据的映射规则,即导出数据库表中对应哪些字段的值,以及对特定字段的值做如何处理
</pre><p><pre name="code" class="html"><dataConfig> <dataSource name="jdbc" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://172.0.8.249:5606/marketing_db_saved?zeroDateTimeBehavior=convertToNull" user="developer" password="sedept@shiyanjun.cn" /> <document name="mkt_data"> <entity name="marketing_data" pk="id" query="select * from marketing_data where id between ${dataimporter.request.offset} and ${dataimporter.request.offset}+1000000" deltaQuery="select * from marketing_data where updated_at > '${dih.last_index_time}'" transformer="RegexTransformer"> <field column="id" name="id" /> <field column="domain" name="domain" /> <field column="alex_rank" name="alex_rank" /> <field column="server_port" name="server_port" /> <field column="cert_validity_notBefore" name="cert_validity_notBefore" /> <field column="cert_validity_notAfter" /> <field column="cert_validity_notAfter_yyyyMMdd" regex="(.*?)\s+.*" name="cert_validity_notAfter_yyyyMMdd" sourceColName="cert_validity_notAfter" /> <field column="cert_issuer_brand" name="cert_issuer_brand" /> <field column="cert_validation" name="cert_validation" /> <field column="cert_isMultiDomain" name="cert_isMultiDomain" /> <field column="cert_issuer_brand_isXRelated" name="cert_issuer_brand_isXRelated" /> <field column="cert_isWildcard" name="cert_isWildcard" /> <field column="cert_notAfter" name="cert_notAfter" /> <field column="special_ssl" name="special_ssl" /> <field column="competitor_logo" name="competitor_logo" /> <field column="segment" name="segment" /> </entity> </document> </dataConfig>
Solr的DIH暴露了请求中传递的变量 ${dataimporter.request.offset},也就是在请求的requestHandler中可以附带附加属性条件,例如,下面请求URL中的offset=5000000参数:
http://172.0.8.212:8080/seaarch-server/core0/dataimport?command=full-import&offset=5000000
另外,还有一个参数是很重要的,它决定着是否清除已经存在的索引数据,默认为clean=true,如果不想删除以前的索引数据,一定要在请求的URL中指定该属性为false,请求URL如下:
http://172.0.8.212:8080/seaarch-server/core0/dataimport?command=full-import&offset=5000000&clean=false
另外,索引完成后一半需要执行commit操作,将内存中索引数据持久化到文件系统,防止改变丢失,所以需要在请求的URL中增加commit=true,例如: