Solr 深度分页
概述
长期以来,我们一直有一个深分页问题。如果直接跳到很靠后的页数,查询速度会比较慢。这是因为Solr的需要为查询从开始遍历所有数据。直到Solr的4.7这个问题一直没有一个很好的解决方案。直到solr4.7引入了游标才解决这个问题。
问题
深分页的问题是很清楚。Solr必须为返回的搜索结果准备一个列表,并返回它的一部分。如果该部分来源于该列表的前面并不难。但如果我们想返回第10000页(每页20条记录)的数据,Solr需要准备一个包含大小为200000(10000 * 20)的列表。这样,它不仅需要时间,还需要内存。像我们现在生产上的历史数据达到了6个亿的数据,如果直接跳转到最后一页,必定内存溢出。
solr4.7是怎么解决这个问题的?
答:Solr 4.7的发布改变了这一状况,引入了游标的概念。游标是一个动态结构,不需要存储在服务器上。游标包含了查询的结果的偏移量,因此,Solr的不再需要每次从头开始遍历结果直到我们想要的记录,游标的功能可以大幅提升深翻页的性能。
用法
游标的使用非常简单。在第一个查询中,我们需要传递一个额外的参数- cursorMark = *,告诉Solr返回游标。在返回中除了搜索结果,我们还可以得到nextCursorMark信息。看看下面这个例子。
- http://192.168.238.133:8080/solr/collection1/select?q=*:*&rows=3&sort=price desc,id asc&cursorMark=*
- http://192.168.238.133:8080/solr/collection1/select?q=*:*&rows=3&sort=price desc,id asc&cursorMark=*
返回结果如下:
- <response>
- <lst name="responseHeader">
- <int name="status">0</int>
- <int name="QTime">186</int>
- <lst name="params">
- <str name="sort">price desc,id asc</str>
- <str name="q">*:*</str>
- <str name="cursorMark">*</str>
- <str name="rows">3</str>
- </lst>
- </lst>
- <result name="response" numFound="4160002" start="0">
- <doc>
- <str name="id">a004180000</str>
- <str name="name">ickes_4180000</str>
- <float name="price">5180000.0</float>
- <str name="price_c">5180000.0,USD</str>
- <str name="url">www.eksliang.iteye4180000</str>
- <long name="_version_">1483095619858857993</long>
- </doc>
- <doc>
- <str name="id">a004179999</str>
- <str name="name">ickes_4179999</str>
- <float name="price">5179999.0</float>
- <str name="price_c">5179999.0,USD</str>
- <str name="url">www.eksliang.iteye4179999</str>
- <long name="_version_">1483095619858857992</long>
- </doc>
- <doc>
- <str name="id">a004179998</str>
- <str name="name">ickes_4179998</str>
- <float name="price">5179998.0</float>
- <str name="price_c">5179998.0,USD</str>
- <str name="url">www.eksliang.iteye4179998</str>
- <long name="_version_">1483095619858857991</long>
- </doc>
- </result>
- <str name="nextCursorMark">AoIISp4UvCphMDA0MTc5OTk4</str>
- </response>
- <response>
- <lst name="responseHeader">
- <int name="status">0</int>
- <int name="QTime">186</int>
- <lst name="params">
- <str name="sort">price desc,id asc</str>
- <str name="q">*:*</str>
- <str name="cursorMark">*</str>
- <str name="rows">3</str>
- </lst>
- </lst>
- <result name="response" numFound="4160002" start="0">
- <doc>
- <str name="id">a004180000</str>
- <str name="name">ickes_4180000</str>
- <float name="price">5180000.0</float>
- <str name="price_c">5180000.0,USD</str>
- <str name="url">www.eksliang.iteye4180000</str>
- <long name="_version_">1483095619858857993</long>
- </doc>
- <doc>
- <str name="id">a004179999</str>
- <str name="name">ickes_4179999</str>
- <float name="price">5179999.0</float>
- <str name="price_c">5179999.0,USD</str>
- <str name="url">www.eksliang.iteye4179999</str>
- <long name="_version_">1483095619858857992</long>
- </doc>
- <doc>
- <str name="id">a004179998</str>
- <str name="name">ickes_4179998</str>
- <float name="price">5179998.0</float>
- <str name="price_c">5179998.0,USD</str>
- <str name="url">www.eksliang.iteye4179998</str>
- <long name="_version_">1483095619858857991</long>
- </doc>
- </result>
- <str name="nextCursorMark">AoIISp4UvCphMDA0MTc5OTk4</str>
- </response>
我们看到,除了平时返回的结果外,还多了一个游标数据nextCursorMark,使用这个值作为我们翻下一页的参数。
在这个基础上要得到下一页数据怎么办:让cursorMark的值等于上次返回的nextCursorMark
例如现在的下一页是这样的
- http://192.168.238.133:8080/solr/collection1/select?q=*:*&rows=3&sort=price desc,id asc&cursorMark=AoIISp4UvCphMDA0MTc5OTk4
- http://192.168.238.133:8080/solr/collection1/select?q=*:*&rows=3&sort=price desc,id asc&cursorMark=AoIISp4UvCphMDA0MTc5OTk4
这个时候就可以得到下一页的数据,数据如下:
- <response>
- <lst name="responseHeader">
- <int name="status">0</int>
- <int name="QTime">234</int>
- <lst name="params">
- <str name="sort">price desc,id asc</str>
- <str name="q">*:*</str>
- <str name="cursorMark">AoIISp4UvCphMDA0MTc5OTk4</str>
- <str name="rows">3</str>
- </lst>
- </lst>
- <result name="response" numFound="4160002" start="0">
- <doc>
- <str name="id">a004179997</str>
- <str name="name">ickes_4179997</str>
- <float name="price">5179997.0</float>
- <str name="price_c">5179997.0,USD</str>
- <str name="url">www.eksliang.iteye4179997</str>
- <long name="_version_">1483095619858857990</long>
- </doc>
- <doc>
- <str name="id">a004179996</str>
- <str name="name">ickes_4179996</str>
- <float name="price">5179996.0</float>
- <str name="price_c">5179996.0,USD</str>
- <str name="url">www.eksliang.iteye4179996</str>
- <long name="_version_">1483095619858857989</long>
- </doc>
- <doc>
- <str name="id">a004179995</str>
- <str name="name">ickes_4179995</str>
- <float name="price">5179995.0</float>
- <str name="price_c">5179995.0,USD</str>
- <str name="url">www.eksliang.iteye4179995</str>
- <long name="_version_">1483095619858857988</long>
- </doc>
- </result>
- <str name="nextCursorMark">AoIISp4UtiphMDA0MTc5OTk1</str>
- </response>
- <response>
- <lst name="responseHeader">
- <int name="status">0</int>
- <int name="QTime">234</int>
- <lst name="params">
- <str name="sort">price desc,id asc</str>
- <str name="q">*:*</str>
- <str name="cursorMark">AoIISp4UvCphMDA0MTc5OTk4</str>
- <str name="rows">3</str>
- </lst>
- </lst>
- <result name="response" numFound="4160002" start="0">
- <doc>
- <str name="id">a004179997</str>
- <str name="name">ickes_4179997</str>
- <float name="price">5179997.0</float>
- <str name="price_c">5179997.0,USD</str>
- <str name="url">www.eksliang.iteye4179997</str>
- <long name="_version_">1483095619858857990</long>
- </doc>
- <doc>
- <str name="id">a004179996</str>
- <str name="name">ickes_4179996</str>
- <float name="price">5179996.0</float>
- <str name="price_c">5179996.0,USD</str>
- <str name="url">www.eksliang.iteye4179996</str>
- <long name="_version_">1483095619858857989</long>
- </doc>
- <doc>
- <str name="id">a004179995</str>
- <str name="name">ickes_4179995</str>
- <float name="price">5179995.0</float>
- <str name="price_c">5179995.0,USD</str>
- <str name="url">www.eksliang.iteye4179995</str>
- <long name="_version_">1483095619858857988</long>
- </doc>
- </result>
- <str name="nextCursorMark">AoIISp4UtiphMDA0MTc5OTk1</str>
- </response>
这个时候进一步查询就变得相当简单了,直接
- http://192.168.238.133:8080/solr/collection1/select?q=*:*&rows=3&sort=price desc,id asc&cursorMark=AoIISp4UtiphMDA0MTc5OTk1
- http://192.168.238.133:8080/solr/collection1/select?q=*:*&rows=3&sort=price desc,id asc&cursorMark=AoIISp4UtiphMDA0MTc5OTk1
solrj对Solr Deep Paging的支持
直接上代码
- static void deepPaging() throws SolrServerException{
- HttpSolrServer server = new HttpSolrServer("http://192.168.238.133:8080/solr/collection1");
- server.setSoTimeout(10000);
- server.setConnectionTimeout(10000);
- server.setDefaultMaxConnectionsPerHost(12);
- server.setAllowCompression(true);
- SolrQuery query = new SolrQuery();
- query.setQuery( "*:*" );
- query.setRows(4);
- query.addSort("price",ORDER.desc).addSort("id", ORDER.desc);
- query.set(CursorMarkParams.CURSOR_MARK_PARAM, "*");
- QueryResponse rsp = server.query( query );
- List<CursorMark> beans = rsp.getBeans(CursorMark.class);
- System.out.println(rsp.getNextCursorMark());//得到下一个游标
- for (CursorMark cursorMark : beans) {
- System.out.println(cursorMark);
- }
- }
- static void deepPaging() throws SolrServerException{
- HttpSolrServer server = new HttpSolrServer("http://192.168.238.133:8080/solr/collection1");
- server.setSoTimeout(10000);
- server.setConnectionTimeout(10000);
- server.setDefaultMaxConnectionsPerHost(12);
- server.setAllowCompression(true);
- SolrQuery query = new SolrQuery();
- query.setQuery( "*:*" );
- query.setRows(4);
- query.addSort("price",ORDER.desc).addSort("id", ORDER.desc);
- query.set(CursorMarkParams.CURSOR_MARK_PARAM, "*");
- QueryResponse rsp = server.query( query );
- List<CursorMark> beans = rsp.getBeans(CursorMark.class);
- System.out.println(rsp.getNextCursorMark());//得到下一个游标
- for (CursorMark cursorMark : beans) {
- System.out.println(cursorMark);
- }
- }
返回结果如下:
- AoIISp4UuiphMDA0MTc5OTk3
- CursorMark [id=a004180000, name=ickes_4180000, price=5180000.0, url=www.eksliang.iteye4180000]
- CursorMark [id=a004179999, name=ickes_4179999, price=5179999.0, url=www.eksliang.iteye4179999]
- CursorMark [id=a004179998, name=ickes_4179998, price=5179998.0, url=www.eksliang.iteye4179998]
- CursorMark [id=a004179997, name=ickes_4179997, price=5179997.0, url=www.eksliang.iteye4179997]
- AoIISp4UuiphMDA0MTc5OTk3
- CursorMark [id=a004180000, name=ickes_4180000, price=5180000.0, url=www.eksliang.iteye4180000]
- CursorMark [id=a004179999, name=ickes_4179999, price=5179999.0, url=www.eksliang.iteye4179999]
- CursorMark [id=a004179998, name=ickes_4179998, price=5179998.0, url=www.eksliang.iteye4179998]
- CursorMark [id=a004179997, name=ickes_4179997, price=5179997.0, url=www.eksliang.iteye4179997]