记一次Hbase的行键过滤器事故问题
数据总数:746条数据
因为后面需要进行算法合成,而且spark目前对这种算法支持并不好,因此采用代码编写,所以在查询hbase的过程中采用的是java直接查询,
但是为了加快查询速度,我尽可能的使用了过滤器
1:初期Hbase的rowkey组合:时间+"_"+订单id
查询思路:
1:能快速检索,减少GC,采用过滤器
2:支持时间段查询
根据上面两点,我采用时间过滤,比如:startTime=201904010000 endTime=201904180000|;【注意这个符号:“|” 】然后根据行键过滤器
CompareFilter.CompareOp.GREATER_OR_EQUAL和
CompareFilter.CompareOp.LESS_OR_EQUAL进行大小对比
使用代码在查询的时候,添加了行键过滤器
FilterList filterList=new FilterList(); //time+id if(startTime != null){ RowFilter rf = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL, new BinaryComparator(Bytes.toBytes(startTime))); filterList.addFilter(rf); } if(endTime != null){ RowFilter rf = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes(endTime))); filterList.addFilter(rf); } scan.setFilter(filterList);
完整代码:
/** * 行键过滤器 * */ public static List<Map<String , String>> rowFilter(String tableName , String startTime , String endTime){ Connection connection = null; Scan scan = new Scan(); scan.setCacheBlocks(false); ResultScanner rs = null; Table table = null; List<Map<String , String>> list = new ArrayList<Map<String , String>>(); try{ connection = ConnectionFactory.createConnection(config); table = connection.getTable(TableName.valueOf(tableName)); FilterList filterList=new FilterList(); //time+id if(startTime != null){ RowFilter rf = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL, new BinaryComparator(Bytes.toBytes(startTime))); filterList.addFilter(rf); } if(endTime != null){ RowFilter rf = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes(endTime))); filterList.addFilter(rf); } scan.setFilter(filterList); rs = table.getScanner(scan); for (Result r : rs) { Map<String , String> map = new HashMap<String , String>(); for (Cell cell : r.listCells()) { map.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()) , Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } list.add(map); } }catch (Exception e){ e.printStackTrace(); }finally { if (null != rs) { rs.close(); } try { if (null != table) { table.close(); } if (null != connection && !connection.isClosed()) { System.out.println("scan Result is closed"); connection.close(); } } catch (IOException e) { e.printStackTrace(); } } return list; }
那么这种方案查询后返回的结果是:361条数据! 实际Hbase测试表中有746条数据,那么可以肯定,是行键过滤器出错了(后面再研究为啥出错)
改善:
更改rowkey结构,采用:订单id+"_"+time来实现
然后过滤器代码改善:
FilterList filterList=new FilterList(); //id+time if(startTime != null){ RowFilter rf = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL, new RegexStringComparator(".*_"+startTime)); filterList.addFilter(rf); } if(endTime != null){ RowFilter rf = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new RegexStringComparator(".*_"+endTime)); filterList.addFilter(rf); } scan.setFilter(filterList);
上面其实就是采用正则表达式进行后缀识别,这样我就可以根据后缀进行时间过滤
完整代码:
/** * 行键过滤器 * */ public static List<Map<String , String>> rowEndFilter(String tableName , String startTime , String endTime){ Connection connection = null; Scan scan = new Scan(); scan.setCacheBlocks(false); ResultScanner rs = null; Table table = null; List<Map<String , String>> list = new ArrayList<Map<String , String>>(); try{ connection = ConnectionFactory.createConnection(config); table = connection.getTable(TableName.valueOf(tableName)); FilterList filterList=new FilterList(); //id+time if(startTime != null){ RowFilter rf = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL, new RegexStringComparator(".*_"+startTime)); filterList.addFilter(rf); } if(endTime != null){ RowFilter rf = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new RegexStringComparator(".*_"+endTime)); filterList.addFilter(rf); } scan.setFilter(filterList); rs = table.getScanner(scan); for (Result r : rs) { Map<String , String> map = new HashMap<String , String>(); for (Cell cell : r.listCells()) { map.put(Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()) , Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } list.add(map); } }catch (Exception e){ e.printStackTrace(); }finally { if (null != rs) { rs.close(); } try { if (null != table) { table.close(); } if (null != connection && !connection.isClosed()) { System.out.println("scan Result is closed"); connection.close(); } } catch (IOException e) { e.printStackTrace(); } } return list; }
上面就会查询出完整数据。