Solr特性:Schemaless Mode(自动往Schema中添加field)
WiKi:https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode
介绍:
Schemaless Mode is a set of Solr features that, when used together, allow users to rapidly construct an effective schema by simply indexing sample data, without having to manually edit the schema. These Solr features, all specified in solrconfig.xml
, are:
- Managed schema: Schema modifications are made through Solr APIs rather than manual edits - see Managed Schema Definition in SolrConfig.
- Field value class guessing: Previously unseen fields are run through a cascading set of value-based parsers, which guess the Java class of field values - parsers for Boolean, Integer, Long, Float, Double, and Date are currently available.
- Automatic schema field addition, based on field value class(es): Previously unseen fields are added to the schema, based on field value Java classes, which are mapped to schema field types - see Solr Field Types
配置:
1.Enable Managed Schema
As described in the section Managed Schema Definition in SolrConfig, changing the schemaFactory
will allow the schema to be modified by the Schema API. Your solrconfig.xml
should have a section like the one below (and the ClassicIndexSchemaFactory should be commented out or removed).
< schemaFactory class = "ManagedIndexSchemaFactory" > < bool name = "mutable" >true</ bool > < str name = "managedSchemaResourceName" >managed-schema</ str > </ schemaFactory > |
2.Define an UpdateRequestProcessorChain
The UpdateRequestProcessorChain allows Solr to guess field types, and you can define the default field type classes to use. To start, you should define it as follows (see the javadoc links below for update processor factory documentation):
< updateRequestProcessorChain name = "add-unknown-fields-to-the-schema" > <!-- UUIDUpdateProcessorFactory will generate an id if none is present in the incoming document --> < processor class = "solr.UUIDUpdateProcessorFactory" /> < processor class = "solr.LogUpdateProcessorFactory" /> < processor class = "solr.DistributedUpdateProcessorFactory" /> < processor class = "solr.RemoveBlankFieldUpdateProcessorFactory" /> < processor class = "solr.FieldNameMutatingUpdateProcessorFactory" > < str name = "pattern" >[^\w-\.]</ str > < str name = "replacement" >_</ str > </ processor > < processor class = "solr.ParseBooleanFieldUpdateProcessorFactory" /> < processor class = "solr.ParseLongFieldUpdateProcessorFactory" /> < processor class = "solr.ParseDoubleFieldUpdateProcessorFactory" /> < processor class = "solr.ParseDateFieldUpdateProcessorFactory" > < arr name = "format" > < str >yyyy-MM-dd'T'HH:mm:ss.SSSZ</ str > < str >yyyy-MM-dd'T'HH:mm:ss,SSSZ</ str > < str >yyyy-MM-dd'T'HH:mm:ss.SSS</ str > < str >yyyy-MM-dd'T'HH:mm:ss,SSS</ str > < str >yyyy-MM-dd'T'HH:mm:ssZ</ str > < str >yyyy-MM-dd'T'HH:mm:ss</ str > < str >yyyy-MM-dd'T'HH:mmZ</ str > < str >yyyy-MM-dd'T'HH:mm</ str > < str >yyyy-MM-dd HH:mm:ss.SSSZ</ str > < str >yyyy-MM-dd HH:mm:ss,SSSZ</ str > < str >yyyy-MM-dd HH:mm:ss.SSS</ str > < str >yyyy-MM-dd HH:mm:ss,SSS</ str > < str >yyyy-MM-dd HH:mm:ssZ</ str > < str >yyyy-MM-dd HH:mm:ss</ str > < str >yyyy-MM-dd HH:mmZ</ str > < str >yyyy-MM-dd HH:mm</ str > < str >yyyy-MM-dd</ str > </ arr > </ processor > < processor class = "solr.AddSchemaFieldsUpdateProcessorFactory" > < str name = "defaultFieldType" >strings</ str > < lst name = "typeMapping" > < str name = "valueClass" >java.lang.Boolean</ str > < str name = "fieldType" >booleans</ str > </ lst > < lst name = "typeMapping" > < str name = "valueClass" >java.util.Date</ str > < str name = "fieldType" >tdates</ str > </ lst > < lst name = "typeMapping" > < str name = "valueClass" >java.lang.Long</ str > < str name = "valueClass" >java.lang.Integer</ str > < str name = "fieldType" >tlongs</ str > </ lst > < lst name = "typeMapping" > < str name = "valueClass" >java.lang.Number</ str > < str name = "fieldType" >tdoubles</ str > </ lst > </ processor > < processor class = "solr.RunUpdateProcessorFactory" /> </ updateRequestProcessorChain > |
3.Make the UpdateRequestProcessorChain the Default for the UpdateRequestHandler
Once the UpdateRequestProcessorChain has been defined, you must instruct your UpdateRequestHandlers to use it when working with index updates (i.e., adding, removing, replacing documents). Here is an example using InitParams to set the defaults on all /update
request handlers:
< initParams path = "/update/**" > < lst name = "defaults" > < str name = "update.chain" >add-unknown-fields-to-the-schema</ str > </ lst > </ initParams > |