1.6.9 UIMA Integration

1. UIMA 集成

  你可以使用solr集成Apache的非结构化信息管理架构(UIMA).UIMA可以让你定义自己的分析引擎通道,逐步添加元数据到文档标注.

  关于Solr UIMA的更多信息,参考https://wiki.apache.org/solr/SolrUIMA.

1.1 Configuring UIMA

 solr UIMA的UpdateRequestProcessor是一个自定义的更新请求处理器.发送它们给UIMA管道,然后返回具有丰富元数据的文档.按照下面步骤配置UIMA:

  1. solrconfig.xml,复制/solr-4.x.y/dist/solr-uima-4.x.y.jar包和它的contrib/uima/lib下面的类库到solr的类库目录下.

<lib dir="../../contrib/uima/lib" />
<lib dir="../../dist/" regex="solr-uima-\d.*\.jar" />

 

  2.schema.xml中,添加元数据字段:

<field name="language" type="string" indexed="true" stored="true"  required="false" />
<field name="concept" type="string" indexed="true" stored="true" multiValued="true" required="false" />
<field name="sentence" type="text" indexed="true" stored="true" multiValued="true" required="false" />

 

  3.在solrconfig.xml中添加如下片段:

<updateRequestProcessorChain name="uima">
    <processor
        class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
        <lst name="uimaConfig">
            <lst name="runtimeParameters">
                <str name="keyword_apikey">VALID_ALCHEMYAPI_KEY</str>
                <str name="concept_apikey">VALID_ALCHEMYAPI_KEY</str>
                <str name="lang_apikey">VALID_ALCHEMYAPI_KEY</str>
                <str name="cat_apikey">VALID_ALCHEMYAPI_KEY</str>
                <str name="entities_apikey">VALID_ALCHEMYAPI_KEY</str>
                <str name="oc_licenseID">VALID_OPENCALAIS_KEY</str>
            </lst>
            <str name="analysisEngine">
                /org/apache/uima/desc/OverridingParamsExtServicesAE.xml
            </st
r>
                <!-- Set to true if you want to continue indexing even if text processing 
                    fails. Default is false. That is, Solr throws RuntimeException and never 
                    indexed documents entirely in your session. -->
                <bool name="ignoreErrors">true</bool>
                <!-- This is optional. It is used for logging when text processing fails. 
                    If logField is not specified, uniqueKey will be used as logField. <str name="logField">id</str> -->
                <lst name="analyzeFields">
                    <bool name="merge">false</bool>
                    <arr name="fields">
                        <str>text</str>
                    </arr>
                </lst>
                <lst name="fieldMappings">
                    <lst name="type">
                        <str name="name">org.apache.uima.alchemy.ts.concept.ConceptFS</str>
                        <lst name="mapping">
                            <str name="feature">text</str>
                            <str name="field">concept</str>
                        </lst>
                    </lst>
                    <lst name="type">
                        <str name="name">org.apache.uima.alchemy.ts.language.LanguageFS</str>
                        <lst name="mapping">
                            <str name="feature">language</str>
                            <str name="field">language</str>
                        </lst>
                    </lst>
                    <lst name="type">
                        <str name="name">org.apache.uima.SentenceAnnotation</str>
                        <lst name="mapping">
                            <str name="feature">coveredText</str>
                            <str name="field">sentence</str>
                        </lst>
                    </lst>
                </lst>
        </lst>
    </processor>
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

   4. 在solrconfig.xml中替换已经存在的UpdateRequestHandler或者创建新的UpdateRequestHandler.

<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
  <lst name="defaults">
    <str name="update.processor">uima</str>
  </lst>
</requestHandler>

 

posted @ 2015-03-09 11:49  勿妄  阅读(341)  评论(0编辑  收藏  举报