docvalues和Fieldcache

Fieldcache:

docID->document->fieldvalue

无论是聚类排序关联等，首先都需要获得文档中某个字段的值，通过docID去获得整个document，然后再去获得字段值，term转换得到最终值，FieldCache一开始就缓存了所有文档的某个特定域(所有数值类型以及不分词的stringField)的值到内存，便于随机存取该域值！

Fieldcache实现过程：

http://moshalanye.iteye.com/blog/281379

缺点：

1. 常驻内存，大小是所有文档个数特定域类型大小

2. 初始加载过程耗时，需要遍历倒排索引及类型转换

Docvalues:

docID->fieldvalue

建索引时，建立了document到field value的面向列的正排索引数据结构，直接通过已知的docID定位到字段值，从而无需加载document，亦不需要term转换，遍历term找寻doc等的过程

优点：大约节省三分之一的内存!

缺点：由于是硬盘读取，而非内存模式，对于大批量的使用下，优势明显，速度更优；小量情况下没有内存快！总体会慢15-20%

20 February 2015 - Apache Lucene 5.0.0 and Apache Solr 5.0.0 Available

http://lucene.apache.org/

FieldCache is gone (moved to a dedicated UninvertingReader in the misc module). This means when you intend to sort on a field, you should index that field using doc values, which is much faster and less heap consuming than FieldCache.

LUCENE-5666：Change uninverted access (sorting, faceting, grouping, etc) to use the DocValues API instead of FieldCache

Es中

https://www.elastic.co/guide/en/elasticsearch/guide/current/doc-values.html

Sorl中

http://wiki.apache.org/solr/DocValues?cm_mc_uid=56088888487714180880058&cm_mc_sid_50200000=1448507379

https://cwiki.apache.org/confluence/display/solr/DocValues

posted on 2016-09-01 16:00 lovebeauty 阅读(755) 评论(0) 编辑收藏举报

刷新页面返回顶部

lovebeauty