lucene中Field简介

Lucene 6.1.0中存在的field种类如下（后缀是Field）：

下面介绍几个常用的Field类型：

TextField

A field that is indexed and tokenized, without term vectors. For example this would be used on a 'body' field, that contains the bulk of a document's text.
是一个会自动被索引和分词的字段。一般被用在文章的正文部分。

StringField

A field that is indexed but not tokenized: the entire String value is indexed as a single token. For example this might be used for a 'country' field or an 'id' field. If you also need to sort on this field, separately add a SortedDocValuesField to your document.
StringField会被索引，但是不会被分词，即会被当作一个完整的token处理，一般用在“国家”或者“ID”.

StoredField

A field whose value is stored so that IndexSearcher.doc(int) and IndexReader.document() will return the field and its value.
也就是一个默认会被存储的Field。

举个例子
（下面是对新闻数据进行索引的过程，数据存储在MySQL数据库中，title列存文章标题，content存正文，url存文章所在的链接，author是文章的作者）：

Field field = null;
if (rs.getString("title") != null) {
    field = new TextField("title", rs.getString("title"), Field.Store.YES);
    document.add(field);
}
if (rs.getString("content") != null) {
    field = new TextField("content", rs.getString("content"), Field.Store.NO);
    document.add(field);
}
if (rs.getString("url") != null) {
    field = new StringField("url", rs.getString("url"), Field.Store.YES);
    document.add(field);
}
if (rs.getString("author") != null) {
    field = new TextField("author", rs.getString("author"), Field.Store.YES);
    document.add(field);
}
    writer.addDocument(document);

第一个参数是设置field的name，第二个是value，第三个是选择是否存储，如果存储的话在检索的时候可以返回值。
一般对于文章正文都不需要存储，在检索的时候只需要返回文章的标题和url即可。

posted on 2017-03-01 13:13 代码ok 阅读(548) 评论(0) 收藏举报