MyCat源码分析系列之——结果合并
更多MyCat源码分析,请戳MyCat源码分析系列
结果合并
在SQL下发流程和前后端验证流程中介绍过,通过用户验证的后端连接绑定的NIOHandler是MySQLConnectionHandler实例,在MySQL服务端返回执行结果时会调用到MySQLConnecionHandler.handleData(),用于不同类型的处理派发:
protected void handleData(byte[] data) { switch (resultStatus) { case RESULT_STATUS_INIT: switch (data[4]) { case OkPacket.FIELD_COUNT: handleOkPacket(data); break; case ErrorPacket.FIELD_COUNT: handleErrorPacket(data); break; case RequestFilePacket.FIELD_COUNT: handleRequestPacket(data); break; default: resultStatus = RESULT_STATUS_HEADER; header = data; fields = new ArrayList<byte[]>((int) ByteUtil.readLength(data, 4)); } break; case RESULT_STATUS_HEADER: switch (data[4]) { case ErrorPacket.FIELD_COUNT: resultStatus = RESULT_STATUS_INIT; handleErrorPacket(data); break; case EOFPacket.FIELD_COUNT: resultStatus = RESULT_STATUS_FIELD_EOF; handleFieldEofPacket(data); break; default: fields.add(data); } break; case RESULT_STATUS_FIELD_EOF: switch (data[4]) { case ErrorPacket.FIELD_COUNT: resultStatus = RESULT_STATUS_INIT; handleErrorPacket(data); break; case EOFPacket.FIELD_COUNT: resultStatus = RESULT_STATUS_INIT; handleRowEofPacket(data); break; default: handleRowPacket(data); } break; default: throw new RuntimeException("unknown status!"); } }
上述代码片段中用红色标注的几个方法是最为核心的,其中handleOkPacket()主要用于insert/update/delete和其余返回OK包的语句返回的执行结果,而handleFieldEofPacket()、handleRowPacket()和handleRowEofPacket()用于select语句返回的执行结果。这几个方法的内部的流程其实就是分别调用了其上绑定的ResponseHandler(SingleNodeHandler或MultiNodeQueryHandler)实例对应的这几个方法。
1. 先来看单节点操作的情况,SingleNodeHandler包含的这几个方法实现如下:
public void okResponse(byte[] data, BackendConnection conn) { boolean executeResponse = conn.syncAndExcute(); if (executeResponse) { session.releaseConnectionIfSafe(conn, LOGGER.isDebugEnabled(), false); endRunning(); ServerConnection source = session.getSource(); OkPacket ok = new OkPacket(); ok.read(data); if (rrs.isLoadData()) { byte lastPackId = source.getLoadDataInfileHandler() .getLastPackId(); ok.packetId = ++lastPackId;// OK_PACKET source.getLoadDataInfileHandler().clear(); } else { ok.packetId = ++packetId;// OK_PACKET } ok.serverStatus = source.isAutocommit() ? 2 : 1; recycleResources(); source.setLastInsertId(ok.insertId); ok.write(source); //TODO: add by zhuam //查询结果派发 QueryResult queryResult = new QueryResult(session.getSource().getUser(), rrs.getSqlType(), rrs.getStatement(), startTime); QueryResultDispatcher.dispatchQuery( queryResult ); } } public void fieldEofResponse(byte[] header, List<byte[]> fields, byte[] eof, BackendConnection conn) { //TODO: add by zhuam //查询结果派发 QueryResult queryResult = new QueryResult(session.getSource().getUser(), rrs.getSqlType(), rrs.getStatement(), startTime); QueryResultDispatcher.dispatchQuery( queryResult ); header[3] = ++packetId; ServerConnection source = session.getSource(); buffer = source.writeToBuffer(header, allocBuffer()); for (int i = 0, len = fields.size(); i < len; ++i) { byte[] field = fields.get(i); field[3] = ++packetId; buffer = source.writeToBuffer(field, buffer); } eof[3] = ++packetId; buffer = source.writeToBuffer(eof, buffer); if (isDefaultNodeShowTable) { for (String name : shardingTablesSet) { RowDataPacket row = new RowDataPacket(1); row.add(StringUtil.encode(name.toLowerCase(), source.getCharset())); row.packetId = ++packetId; buffer = row.write(buffer, source, true); } } } public void rowResponse(byte[] row, BackendConnection conn) { if(isDefaultNodeShowTable) { RowDataPacket rowDataPacket =new RowDataPacket(1); rowDataPacket.read(row); String table= StringUtil.decode(rowDataPacket.fieldValues.get(0),conn.getCharset()); if(shardingTablesSet.contains(table.toUpperCase())) return; } row[3] = ++packetId; buffer = session.getSource().writeToBuffer(row, allocBuffer()); } public void rowEofResponse(byte[] eof, BackendConnection conn) { ServerConnection source = session.getSource(); conn.recordSql(source.getHost(), source.getSchema(), node.getStatement()); // 判断是调用存储过程的话不能在这里释放链接 if (!rrs.isCallStatement()) { session.releaseConnectionIfSafe(conn, LOGGER.isDebugEnabled(), false); endRunning(); } eof[3] = ++packetId; buffer = source.writeToBuffer(eof, allocBuffer()); source.write(buffer); }
在okResponse()方法中,首先调用了conn.syncAndExcute(),这个过程就解释了之前在SQL下发过程中提及的当某个连接现有的设置需要修改时并未等待这些修改成功返回,这儿才对此做出了判断:
public boolean syncAndExcute() { StatusSync sync = this.statusSync; if (sync == null) { return true; } else { boolean executed = sync.synAndExecuted(this); if (executed) { statusSync = null; } return executed; } }
这里面又进一步依次调用了StatusSync.synAndExecuted()和updateConnectionInfo()方法:
public boolean synAndExecuted(MySQLConnection conn) { int remains = synCmdCount.decrementAndGet(); if (remains == 0) {// syn command finished this.updateConnectionInfo(conn); conn.metaDataSyned = true; return false; } else if (remains < 0) { return true; } return false; } private void updateConnectionInfo(MySQLConnection conn) { conn.xaStatus = (xaStarted == true) ? 1 : 0; if (schema != null) { conn.schema = schema; conn.oldSchema = conn.schema; } if (charsetIndex != null) { conn.setCharset(CharsetUtil.getCharset(charsetIndex)); } if (txtIsolation != null) { conn.txIsolation = txtIsolation; } if (autocommit != null) { conn.autocommit = autocommit; } }
假如当前的连接与所需连接在数据库名和字符集上存在不同,那需同步的数量为2,如果这两个修改都成功,那应该分别返回2个OK包(即触发两次SingleNodeHandler.okResponse()),在synAndExecuted()中通过对于收到的OK包数量synCmdCount进行判断,若全部收到则调用updateConnectionInfo(),将该连接的这些设置都设置为新值。该同步过程完成之后,才会真正进入到SQL语句执行返回的结果处理阶段(select/insert/update/delete)。
1.1 insert/update/delete
- okResponse():读取data字节数组,组成一个OKPacket,并调用ok.write(source)将结果写入前端连接FrontendConnection的写缓冲队列writeQueue中,真正发送给应用是由对应的NIOSocketWR从写队列中读取ByteBuffer并返回的。
1.2 select
- fieldEofResponse():元数据返回时触发,将header和元数据内容依次写入缓冲区中;
- rowResponse():行数据返回时触发,将行数据写入缓冲区中;
- rowEofResponse():行结束标志返回时触发,将EOF标志写入缓冲区,最后调用source.write(buffer)将缓冲区放入前端连接的写缓冲队列中,等待NIOSocketWR将其发送给应用。
2. 再来看多节点操作的结果合并和返回过程,MultiNodeQueryHandler负责这一过程的执行。
多节点操作和单节点操作的不同之处就在于:
1)接收来自多个MySQL节点各自发送的结果数据,可能需要对所有得到的结果进行简单的合并(顺序是不确定的,满足FIFO);
2)如果本次操作涉及聚合函数、group by、order by和limit,还需要对所有结果进行一系列归并
针对第一种情况,insert/update/delete和select的区别如下:
- insert/update/delete:这三类语句都会返回一个OK包,里面包含了最为核心的affectedRows,因此每得到一个MySQL节点发送回的affectedRows,就将其累加,当收到最后一个节点的数据后(通过decrementOkCountBy()方法判断),将结果返回给前端;
- select:每一个MySQL节点都会依次返回元数据、行数据1、行数据2...、行数据n、行结束标志位(EOF),其中元数据和行结束标志位每个节点返回上来都是一样的,但MultiNodeQueryHandler负责合并所有数据,因此实际只需要一份元数据、所有节点的行数据以及一份行结束标志位,所以在fieldEofResponse()方法中通过boolean类型的fieldsReturned判断获取第一份收到的元数据包,后续的统统丢弃,并在rowEofResponse()方法中通过decrementCountBy()方法判断是否收到了所有节点的EOF包,将最后一份的EOF写入缓冲区返回前端。
针对第二种情况,insert/update/delete与第一种情况完全一致,但select的处理由MultiNodeQueryHandler上包含的DataMergeService实例负责归并结果集(limit其实是在MultiNodeQueryHandler中实现的),我们先来看DataMergeService中包含的核心变量:
private int fieldCount; // 字段数 private RouteResultset rrs; // 路由信息 private RowDataSorter sorter; // 排序器 private RowDataPacketGrouper grouper; // 分组器 private volatile boolean hasOrderBy = false; private MultiNodeQueryHandler multiQueryHandler; public PackWraper END_FLAG_PACK = new PackWraper(); // 结束包 private AtomicInteger areadyAdd = new AtomicInteger(); private List<RowDataPacket> result = new Vector<RowDataPacket>(); // 归并结果队列 private static Logger LOGGER = Logger.getLogger(DataMergeService.class); private BlockingQueue<PackWraper> packs = new LinkedBlockingQueue<PackWraper>(); // 原始数据封装队列 private ConcurrentHashMap<String, Boolean> canDiscard = new ConcurrentHashMap<String, Boolean>(); // 是否可丢弃标志位map
其中,最为重要的就是用于排序的sorter和用于分组的grouper,以及最后存放归并结果的行数据包队列result,DataMergeService中核心的方法实现如下:
public void onRowMetaData(Map<String, ColMeta> columToIndx, int fieldCount) { if (LOGGER.isDebugEnabled()) { LOGGER.debug("field metadata inf:" + columToIndx.entrySet()); } int[] groupColumnIndexs = null; this.fieldCount = fieldCount; if (rrs.getGroupByCols() != null) { groupColumnIndexs = toColumnIndex(rrs.getGroupByCols(), columToIndx); } if (rrs.getHavingCols() != null) { ColMeta colMeta = columToIndx.get(rrs.getHavingCols().getLeft() .toUpperCase()); if (colMeta != null) { rrs.getHavingCols().setColMeta(colMeta); } } if (rrs.isHasAggrColumn()) { List<MergeCol> mergCols = new LinkedList<MergeCol>(); Map<String, Integer> mergeColsMap = rrs.getMergeCols(); if (mergeColsMap != null) { for (Map.Entry<String, Integer> mergEntry : mergeColsMap .entrySet()) { String colName = mergEntry.getKey().toUpperCase(); int type = mergEntry.getValue(); if (MergeCol.MERGE_AVG == type) { ColMeta sumColMeta = columToIndx.get(colName + "SUM"); ColMeta countColMeta = columToIndx.get(colName + "COUNT"); if (sumColMeta != null && countColMeta != null) { ColMeta colMeta = new ColMeta(sumColMeta.colIndex, countColMeta.colIndex, sumColMeta.getColType()); mergCols.add(new MergeCol(colMeta, mergEntry .getValue())); } } else { ColMeta colMeta = columToIndx.get(colName); mergCols.add(new MergeCol(colMeta, mergEntry.getValue())); } } } // add no alias merg column for (Map.Entry<String, ColMeta> fieldEntry : columToIndx.entrySet()) { String colName = fieldEntry.getKey(); int result = MergeCol.tryParseAggCol(colName); if (result != MergeCol.MERGE_UNSUPPORT && result != MergeCol.MERGE_NOMERGE) { mergCols.add(new MergeCol(fieldEntry.getValue(), result)); } } grouper = new RowDataPacketGrouper(groupColumnIndexs, mergCols.toArray(new MergeCol[mergCols.size()]), rrs.getHavingCols()); } if (rrs.getOrderByCols() != null) { LinkedHashMap<String, Integer> orders = rrs.getOrderByCols(); OrderCol[] orderCols = new OrderCol[orders.size()]; int i = 0; for (Map.Entry<String, Integer> entry : orders.entrySet()) { String key = StringUtil.removeBackquote(entry.getKey() .toUpperCase()); ColMeta colMeta = columToIndx.get(key); if (colMeta == null) { throw new java.lang.IllegalArgumentException( "all columns in order by clause should be in the selected column list!" + entry.getKey()); } orderCols[i++] = new OrderCol(colMeta, entry.getValue()); } // sorter = new RowDataPacketSorter(orderCols); RowDataSorter tmp = new RowDataSorter(orderCols); tmp.setLimit(rrs.getLimitStart(), rrs.getLimitSize()); hasOrderBy = true; sorter = tmp; } else { hasOrderBy = false; } MycatServer.getInstance().getBusinessExecutor().execute(this); } public boolean onNewRecord(String dataNode, byte[] rowData) { // 对于需要排序的数据,由于mysql传递过来的数据是有序的, // 如果某个节点的当前数据已经不会进入,后续的数据也不会入堆 if (canDiscard.size() == rrs.getNodes().length) { LOGGER.error("now we output to client"); packs.add(END_FLAG_PACK); return true; } if (canDiscard.get(dataNode) != null) { return true; } PackWraper data = new PackWraper(); data.node = dataNode; data.data = rowData; packs.add(data); areadyAdd.getAndIncrement(); return false; } public void run() { int warningCount = 0; EOFPacket eofp = new EOFPacket(); ByteBuffer eof = ByteBuffer.allocate(9); BufferUtil.writeUB3(eof, eofp.calcPacketSize()); eof.put(eofp.packetId); eof.put(eofp.fieldCount); BufferUtil.writeUB2(eof, warningCount); BufferUtil.writeUB2(eof, eofp.status); ServerConnection source = multiQueryHandler.getSession().getSource(); while (!Thread.interrupted()) { try { PackWraper pack = packs.take(); if (pack == END_FLAG_PACK) { break; } RowDataPacket row = new RowDataPacket(fieldCount); row.read(pack.data); if (grouper != null) { grouper.addRow(row); } else if (sorter != null) { if (!sorter.addRow(row)) { canDiscard.put(pack.node, true); } } else { result.add(row); } } catch (Exception e) { LOGGER.error("Merge multi data error", e); } } byte[] array = eof.array(); multiQueryHandler.outputMergeResult(source, array, getResults(array)); } private List<RowDataPacket> getResults(byte[] eof) { List<RowDataPacket> tmpResult = result; if (this.grouper != null) { tmpResult = grouper.getResult(); grouper = null; } if (sorter != null) { // 处理grouper处理后的数据 if (tmpResult != null) { Iterator<RowDataPacket> itor = tmpResult.iterator(); while (itor.hasNext()) { sorter.addRow(itor.next()); itor.remove(); } } tmpResult = sorter.getSortedResult(); sorter = null; } if (LOGGER.isDebugEnabled()) { LOGGER.debug("prepare mpp merge result for " + rrs.getStatement()); } return tmpResult; }
接下来对这几个方法逐一解释:
- onRowMetaData():在MultiNodeQueryHandler.fieldEofResponse()中被调用,初始化grouper和sorter,随后利用线程池调用run()方法;
- onNewRecord():在MultiNodeQueryHandler.rowResponse()中被调用,首先判断canDiscard的长度是否等于下发的节点个数,如果是说明后续所有节点的数据都会被丢弃,往packs中放入END_FLAG_PACK终止run()中的循环(另外一处更为常规的终止循环是由MultiNodeQueryHandler.rowEofResponse()中收到全部节点的EOF包后触发的),如果当前节点在canDiscard队列中也同样忽略该节点的后续数据,随后将节点名和行数据封装成一个PackWraper实例,放入packs中;
- run():核心是一个循环,每次阻塞式地从packs中读取一个PackWraper实例并生成RowDataPacket实例,如果发现是END_FLAG_PACK就退出,接下来进行if-else判断:
1)如果需要分组则调用grouper.addRow()添加该行,由于分组是优先于排序的,因此一旦有分组需求,那排序就必须等到所有分组行为完成后才能开始(getResults()中);
2)反之,如果需要排序则调用sorter.addRow()尝试添加该行,若加入不成功说明该节点后续的数据都不可能加入成功(因为这里的排序是通过构建最大堆MaxHeap实现的,堆一旦满了就会执行元素淘汰,并且每个节点返回的数据又满足内部有序),将该节点放入canDiscard中忽略后续数据;
3)反之,直接将该行加入result中。
当循环退出时,调用MultiNodeQueryHandler.outputMergeResult(),其中会先调用getResults()获取分组数据/分组排序数据/排序数据/普通数据,MultiNodeQueryHandler.outputMergeResult()就是为了执行limit,随后将处理好的结果集依次写入缓冲区,最后返回前端,具体实现如下:
public void outputMergeResult(final ServerConnection source, final byte[] eof, List<RowDataPacket> results) { try { lock.lock(); ByteBuffer buffer = session.getSource().allocate(); final RouteResultset rrs = this.dataMergeSvr.getRrs(); // 处理limit语句 int start = rrs.getLimitStart(); int end = start + rrs.getLimitSize(); if (start < 0) start = 0; if (rrs.getLimitSize() < 0) end = results.size(); if (end > results.size()) end = results.size(); for (int i = start; i < end; i++) { RowDataPacket row = results.get(i); row.packetId = ++packetId; buffer = row.write(buffer, source, true); } eof[3] = ++packetId; if (LOGGER.isDebugEnabled()) { LOGGER.debug("last packet id:" + packetId); } source.write(source.writeToBuffer(eof, buffer)); } catch (Exception e) { handleDataProcessException(e); } finally { lock.unlock(); dataMergeSvr.clear(); } }
为尊重原创成果,如需转载烦请注明本文出处: