四事务型MQ的最终一致性事务方案--2RocketMQ事务消息的回查
四 事务型MQ的最终一致性事务方案--2RocketMQ事务消息的回查
3.2.3 回查事务状态
根据终结事务的源码中,当事务消息在第一阶段prepared时,发送commitlog并被分发到RMQ_SYS_TRANS_HALF_TOPIC队列中。在prepared消息发送成功后,producer端调用executeLocalTransaction方法执行,获取本地事务状态;由于在后续endTransaction方法执行时,业务的事务方法尚未提交,因此建议本地事务方法返回unknown的事务状态,然后再结束事务时,不做任何处理。最后通过事务状态的定时回查以得到producer端明确的state(commit或者rollback)。
rocketMQ通过TransactionMessageCheckService线程,定时的去检测RMQ_SYS_TRANS_HALF_TOPIC主题中的消息,然后通过消息去producerGroup组中挑选一个producer,回查本地事务的状态。(检测频率默认1min)
TransactionalMessageCheckService事务消息回查服务的类继承结构,如下:
TransactionalMessageCheckService是brokerController内的属性,在brokerController.initialTransaction初始化时,被创建;然后再brokerController.start--》startProcessorByHa--》this.transactionalMessageCheckService.start()中被开启。
………………broker端启动事务状态回查………………
CHECK1 TransactionalMessageCheckService.start事务回查服务开启
public abstract class ServiceThread implements Runnable {
public void start() {
log.info("Try to start service thread:{} started:{} lastThread:{}", getServiceName(), started.get(), thread);
if (!started.compareAndSet(false, true)) {
return;
}
stopped = false;
//将当前TransactionalMessageCheckService作为服务,单独设置线程任务,并启动
this.thread = new Thread(this, getServiceName());
this.thread.setDaemon(isDaemon);
this.thread.start();
}
然后,执行TransactionalMessageCheckService.run
public class TransactionalMessageCheckService extends ServiceThread {
@Override
public void run() {
log.info("Start transaction check service thread!");
//从配置文件中获取事务回查间隔interval
long checkInterval = brokerController.getBrokerConfig().getTransactionCheckInterval();
//CH1 isStopped
while (!this.isStopped()) {
//当线程serviceThread没有标记stop时,执行
//CK2 waitForRunning
this.waitForRunning(checkInterval);
}
log.info("End transaction check service thread!");
}
ServiceThread
protected volatile boolean stopped = false;
protected final CountDownLatch2 waitPoint = new CountDownLatch2(1);//线程计数器,设置为1
protected volatile AtomicBoolean hasNotified = new AtomicBoolean(false);
public boolean isStopped() {
return stopped;
}
protected void waitForRunning(long interval) {
//case1 如果hasNotified为true,已经被通知过
if (hasNotified.compareAndSet(true, false)) {
//CH3 onWaitEnd---调用事务的回查TransactionalMessageService().check
this.onWaitEnd();
return;
}
//case2 hasNotified为false,尚未被通知过(或者已经执行过前面,设置过hasNotified为false)
waitPoint.reset();//重置
try {
//当前线程阻塞,等待waitPoint.countDown,等待被唤醒----在回查间隔时间内被唤醒
waitPoint.await(interval, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
log.error("Interrupted", e);
} finally {
hasNotified.set(false); //最后会重置hasNotified的值为false
//CH3 onWaitEnd
this.onWaitEnd();
}
}
TransactionalMessageCheckService
@Override
protected void onWaitEnd() {
//事务过期时间---消息存储+过期>当前时间,才会执行状态回查;否则,下一周期中执行状态回查
long timeout = brokerController.getBrokerConfig().getTransactionTimeOut();
//事务回查最大次数(超过最大回查次数,直接丢弃)
int checkMax = brokerController.getBrokerConfig().getTransactionCheckMax();
long begin = System.currentTimeMillis();
log.info("Begin to check prepare message, begin time:{}", begin);
//CHECK2 TransactionalMessageService().check调用事务消息服务的check回查
this.brokerController.getTransactionalMessageService().check(timeout, checkMax, this.brokerController.getTransactionalMessageCheckListener());
log.info("End to check prepare message, consumed time:{}", System.currentTimeMillis() - begin);
}
这里,TransactionalMessageCheckService回查服务的执行,依赖tomicBoolean hasNotified(通知标识位)和CountDownLatch2 waitPoint(线程计数器)实现等待通知。逻辑如下:
TransactionalMessageCheckService.run(){
case1 如果hasNotified为true,重新设置为false:
然后,调用事务的回查TransactionalMessageService().check;
case2 如果hasNotified为false,waitPoint.await(interval),阻塞当前时间为回查间隔,如果线程被wakeup,也会唤醒:
然后,仍旧调用事务的回查TransactionalMessageService().check;
}
CHECK2 TransactionalMessageService().check定时回查的执行体
TransactionalMessageServiceImpl
@Override
public void check(long transactionTimeout, int transactionCheckMax,
AbstractTransactionalMessageCheckListener listener) {
try { //1
String topic = MixAll.RMQ_SYS_TRANS_HALF_TOPIC;//RMQ_SYS_TRANS_HALF_TOPIC
//从对应的RMQ_SYS_TRANS_HALF_TOPIC队列中,获取所有的事务half消息的消息队列
Set<MessageQueue> msgQueues = transactionalMessageBridge.fetchMessageQueues(topic);
//逐个处理所有half消息
for (MessageQueue messageQueue : msgQueues) { //2
long startTime = System.currentTimeMillis();
//根据事务half消息队列,获取与之对应的RMQ_SYS_TRANS_OP_HALF_TOPIC已经处理过的队列
MessageQueue opQueue = getOpQueue(messageQueue);
//half消息的offset
long halfOffset = transactionalMessageBridge.fetchConsumeOffset(messageQueue);
//op消息的offset
long opOffset = transactionalMessageBridge.fetchConsumeOffset(opQueue);
//已经处理过的of消息的offset
List<Long> doneOpOffset = new ArrayList<>();
//已经处理过的halfoffset:ophalfoffset
HashMap<Long, Long> removeMap = new HashMap<>();
//fillOpRemoveMap:根据当前处理进度,从op队列中拉取32条消息,方便后续检查当前half消息,是否已经处理过(是否commit、rollback过),如果处理过,即不需要再向producer端发送执行回查的request请求
PullResult pullResult = fillOpRemoveMap(removeMap, opQueue, opOffset, halfOffset, doneOpOffset);
if (null == pullResult) {
log.error("The queue={} check msgOffset={} with opOffset={} failed, pullResult is null",
messageQueue, halfOffset, opOffset);
continue;
}
// single thread
int getMessageNullCount = 1;//获取空消息的次数统计
long newOffset = halfOffset;//当前half消息offset
long i = halfOffset;//
while (true) { //3
//如果当前消息队列的回查,超过最大时长,等待下一次任务调度再处理---默认60秒
if (System.currentTimeMillis() - startTime > MAX_PROCESS_TIME_LIMIT) {
log.info("Queue={} process time reach max={}", messageQueue, MAX_PROCESS_TIME_LIMIT);
break;
}
//当前i-halfOffset的half消息被处理过,后面递增这两个offset值---++,while中处理下一条消息
if (removeMap.containsKey(i)) {
log.info("Half offset {} has been committed/rolled back", i);
removeMap.remove(i);
} else {//4 未处理过
//从RMQ_SYS_TRANS_HALF_TOPIC的队列中,获取i位置的half消息
GetResult getResult = getHalfMsg(messageQueue, i);
MessageExt msgExt = getResult.getMsg();
//如果half消息为空-----根据默认重试次数,在下一个while中再拉一次
if (msgExt == null) {
if (getMessageNullCount++ > MAX_RETRY_COUNT_WHEN_HALF_NULL) {
break;
}
if (getResult.getPullResult().getPullStatus() == PullStatus.NO_NEW_MSG) {
log.debug("No new msg, the miss offset={} in={}, continue check={}, pull result={}", i,
messageQueue, getMessageNullCount, getResult.getPullResult());
break;
} else {
log.info("Illegal offset, the miss offset={} in={}, continue check={}, pull result={}",
i, messageQueue, getMessageNullCount, getResult.getPullResult());
i = getResult.getPullResult().getNextBeginOffset();
newOffset = i;
continue;
}
}
//当half消息不为空时
/**判断当前消息,是否需要discard丢弃或者skip跳过 :
discard逻辑:当前msg超过最大回查次数,消息丢弃----没回查一次+1,最大15次;
skip逻辑:事务消息超过文件过期时间,72小时,则跳过消息;
这两个操作,都是跳过当前消息,即++1*/
if (needDiscard(msgExt, transactionCheckMax) || needSkip(msgExt)) {
listener.resolveDiscardMsg(msgExt);
newOffset = i + 1;
i++;
continue;
}
if (msgExt.getStoreTimestamp() >= startTime) {
log.debug("Fresh stored. the miss offset={}, check it later, store={}", i,
new Date(msgExt.getStoreTimestamp()));
break;
}
//消息已经存储的时间
long valueOfCurrentMinusBorn = System.currentTimeMillis() - msgExt.getBornTimestamp();
//checkImmunityTime:立刻检查事务消息的时间------原理:在half消息发送时,不应该立即提交,应该在这个时间段后,发起check回查本地事务
//transactionTimeout:事务超时时间(当从op中拉取最后一条时间与check开始的时间差,如果超过transactionTimeout,那么无论是否小于checkImmunityTime,都会发送事务回查check的请求
long checkImmunityTime = transactionTimeout;
//事务消息配置的回查请求最晚的时间,只有该时间内,才可以回查(默认null)
String checkImmunityTimeStr = msgExt.getUserProperty(MessageConst.PROPERTY_CHECK_IMMUNITY_TIME_IN_SECONDS);
if (null != checkImmunityTimeStr) {
checkImmunityTime = getImmunityTime(checkImmunityTimeStr, transactionTimeout);
if (valueOfCurrentMinusBorn < checkImmunityTime) {
if (checkPrepareQueueOffset(removeMap, doneOpOffset, msgExt)) {
newOffset = i + 1;
i++;
continue;
}
}
//
} else {
//如果消息已经存储时间《不发起check时间,则跳过此次处理,等while下一次
if ((0 <= valueOfCurrentMinusBorn) && (valueOfCurrentMinusBorn < checkImmunityTime)) {
log.debug("New arrived, the miss offset={}, check it later checkImmunity={}, born={}", i,
checkImmunityTime, new Date(msgExt.getBornTimestamp()));
break;
}
}
//获取32条已经op的消息
List<MessageExt> opMsg = pullResult.getMsgFoundList();
//判断当前消息是否需要check,两种主要情况:
//1 op的32条队列为null,且消息存在时间超过了checkImmunityTime不回查时间;
//2 op不为null,且op最后一条消息时间—check开始时间,超过transactionTimeout事务超时时间,这时无论是否小于checkImmunityTime,都要进行check回查
boolean isNeedCheck = (opMsg == null && valueOfCurrentMinusBorn > checkImmunityTime)
|| (opMsg != null && (opMsg.get(opMsg.size() - 1).getBornTimestamp() - startTime > transactionTimeout))
|| (valueOfCurrentMinusBorn <= -1);
if (isNeedCheck) {
//CHECK3 putBackHalfMsgQueue 如果需要回查,需要把half消息,再次发送到RMQ_SYS_TRANS_HALF_TOPIC中
if (!putBackHalfMsgQueue(msgExt, i)) {
continue;
}
//CHECK4 resolveHalfMsg 由线程池异步发送事务回查消息
listener.resolveHalfMsg(msgExt);
} else {
//如果无法判断当前消息是否需要回查check,继续从op的队列中,再拉取后续的32条op消息,再判定是否有需要回查
pullResult = fillOpRemoveMap(removeMap, opQueue, pullResult.getNextBeginOffset(), halfOffset, doneOpOffset);
log.info("The miss offset:{} in messageQueue:{} need to get more opMsg, result is:{}", i,
messageQueue, pullResult);
continue;
}
} //4
newOffset = i + 1;
i++;
} //3
if (newOffset != halfOffset) {
//重新计算half消息中已经消费进度的offset
transactionalMessageBridge.updateConsumeOffset(messageQueue, newOffset);
}
long newOpOffset = calculateOpOffset(doneOpOffset, opOffset);
if (newOpOffset != opOffset) {
//更新op队列中消费进度
transactionalMessageBridge.updateConsumeOffset(opQueue, newOpOffset);
}
} //2
} catch (Exception e) { //1
e.printStackTrace();
log.error("Check error", e);
}
}
上述过程是TransactionalMessageServiceImpl这一定时线程回查消息的代码,内容比较多,下面进行逐步的分析:
step1 获取RMQ_SYS_TRANS_HALF_TOPIC的topic的所有队列
String topic = MixAll.RMQ_SYS_TRANS_HALF_TOPIC;//RMQ_SYS_TRANS_HALF_TOPIC
//从对应的RMQ_SYS_TRANS_HALF_TOPIC队列中,获取所有的事务half消息的消息队列
Set<MessageQueue> msgQueues = transactionalMessageBridge.fetchMessageQueues(topic);
forstep1 循环处理各个队列
//逐个处理所有half消息
for (MessageQueue messageQueue : msgQueues) { //2
long startTime = System.currentTimeMillis();
//根据事务half消息队列,获取与之对应的RMQ_SYS_TRANS_OP_HALF_TOPIC已经处理过的队列
MessageQueue opQueue = getOpQueue(messageQueue);
//half消息的offset
long halfOffset = transactionalMessageBridge.fetchConsumeOffset(messageQueue);
//op消息的offset
long opOffset = transactionalMessageBridge.fetchConsumeOffset(opQueue);
获取每个RMQ_SYS_TRANS_HALF_TOPIC对应的op队列,并获取两个队列的处理进度offset;
forstep2 从op队列拉取32条消息
//已经处理过的of消息的offset
List<Long> doneOpOffset = new ArrayList<>();
//已经处理过的halfoffset:ophalfoffset
HashMap<Long, Long> removeMap = new HashMap<>();
//fillOpRemoveMap:根据当前处理进度,从op队列中拉取32条消息,方便后续检查当前half消息,是否已经处理过(是否commit、rollback过),如果处理过,即不需要再向producer端发送执行回查的request请求
PullResult pullResult = fillOpRemoveMap(removeMap, opQueue, opOffset, halfOffset, doneOpOffset);
if (null == pullResult) {
log.error("The queue={} check msgOffset={} with opOffset={} failed, pullResult is null",
messageQueue, halfOffset, opOffset);
continue;
}
从op队列中拉取32条已经处理过(commit或者rollback)的消息,来对msg判断其是否处理过(减少不必要回查check的次数)。
whilestep1 未被处理过且为null时
while (true) { //3
//如果当前消息队列的回查,超过最大时长,等待下一次任务调度再处理---默认60秒
if (System.currentTimeMillis() - startTime > MAX_PROCESS_TIME_LIMIT) {
log.info("Queue={} process time reach max={}", messageQueue, MAX_PROCESS_TIME_LIMIT);
break;
}
//当前i-halfOffset的half消息被处理过,后面递增这两个offset值---++,while中处理下一条消息
if (removeMap.containsKey(i)) {
log.info("Half offset {} has been committed/rolled back", i);
removeMap.remove(i);
} else {//4 未处理过
//从RMQ_SYS_TRANS_HALF_TOPIC的队列中,获取i位置的half消息
GetResult getResult = getHalfMsg(messageQueue, i);
MessageExt msgExt = getResult.getMsg();
//如果half消息为空-----根据默认重试次数,在下一个while中再拉一次
if (msgExt == null) {
if (getMessageNullCount++ > MAX_RETRY_COUNT_WHEN_HALF_NULL) {
break;
}
if (getResult.getPullResult().getPullStatus() == PullStatus.NO_NEW_MSG) {
log.debug("No new msg, the miss offset={} in={}, continue check={}, pull result={}", i,
messageQueue, getMessageNullCount, getResult.getPullResult());
break;
} else {
log.info("Illegal offset, the miss offset={} in={}, continue check={}, pull result={}",
i, messageQueue, getMessageNullCount, getResult.getPullResult());
i = getResult.getPullResult().getNextBeginOffset();
newOffset = i;
continue;
}
}
在msg被处理过时,不做处理,只递增offset和opoffset的值,处理下一条;
如果未被处理过,且half消息队列上当前消息为null时,对重试回查字数+1处理,进入下一个while循环,处理下一条;
whilestep2 未被处理过,且非null
//当half消息不为空时
/**判断当前消息,是否需要discard丢弃或者skip跳过 :
discard逻辑:当前msg超过最大回查次数,消息丢弃----没回查一次+1,最大15次;
skip逻辑:事务消息超过文件过期时间,72小时,则跳过消息;
这两个操作,都是跳过当前消息,即++1*/
if (needDiscard(msgExt, transactionCheckMax) || needSkip(msgExt)) {
listener.resolveDiscardMsg(msgExt);
newOffset = i + 1;
i++;
continue;
}
if (msgExt.getStoreTimestamp() >= startTime) {
log.debug("Fresh stored. the miss offset={}, check it later, store={}", i,
new Date(msgExt.getStoreTimestamp()));
break;
}
//消息已经存储的时间
long valueOfCurrentMinusBorn = System.currentTimeMillis() - msgExt.getBornTimestamp();
//checkImmunityTime:立刻检查事务消息的时间------原理:在half消息发送时,不应该立即提交,应该在这个时间段后,发起check回查本地事务
//transactionTimeout:事务超时时间(当从op中拉取最后一条时间与check开始的时间差,如果超过transactionTimeout,那么无论是否小于checkImmunityTime,都会发送事务回查check的请求
long checkImmunityTime = transactionTimeout;
//事务消息配置的回查请求最晚的时间,只有该时间内,才可以回查(默认null)
String checkImmunityTimeStr = msgExt.getUserProperty(MessageConst.PROPERTY_CHECK_IMMUNITY_TIME_IN_SECONDS);
if (null != checkImmunityTimeStr) {
checkImmunityTime = getImmunityTime(checkImmunityTimeStr, transactionTimeout);
if (valueOfCurrentMinusBorn < checkImmunityTime) {
if (checkPrepareQueueOffset(removeMap, doneOpOffset, msgExt)) {
newOffset = i + 1;
i++;
continue;
}
}
//
} else {
//如果消息已经存储时间《不发起check时间,则跳过此次处理,等while下一次
if ((0 <= valueOfCurrentMinusBorn) && (valueOfCurrentMinusBorn < checkImmunityTime)) {
log.debug("New arrived, the miss offset={}, check it later checkImmunity={}, born={}", i,
checkImmunityTime, new Date(msgExt.getBornTimestamp()));
break;
}
}
判断当前消息是否需要discard或者skip
discard逻辑:当前msg超过最大回查次数,消息丢弃----没回查一次+1,最大15次;
skip逻辑:事务消息超过文件过期时间,72小时,则跳过消息;
这两个操作,都是跳过当前消息,即++1
然后是获取判断当前消息是否需要check的属性设置:
1 checkImmunityTime:立刻检查事务消息的时间------原理:在half消息发送时,不应该立即提交,应该在这个时间段后,发起check回查本地事务
2 transactionTimeout:事务超时时间(当从op中拉取最后一条时间与check开始的时间差,如果超过transactionTimeout,那么无论是否小于 checkImmunityTime,都会发送事务回查check的请求
3 PROPERTY_CHECK_IMMUNITY_TIME_IN_SECONDS查请求最晚的时间,只有该时间内,才可以回查(默认null
这三个参数决定事务消息msg是否回查:
如果存在时间valueOfCurrentMinusBorn>checkImmunityTime || 从op队列中拉取最后一条处理过的消息的存储时间—check当前时间>transactionTimeout(此时无论该差值是否小于checkImmunityTime),这两个情况,都需要执行回查
whilestep3 判断是否需要回查isNeedCheck
//获取32条已经op的消息
List<MessageExt> opMsg = pullResult.getMsgFoundList();
//判断当前消息是否需要check,两种主要情况:
//1 op的32条队列为null,且消息存在时间超过了checkImmunityTime不回查时间;
//2 op不为null,且op最后一条消息时间—check开始时间,超过transactionTimeout事务超时时间,这时无论是否小于checkImmunityTime,都要进行check回查
boolean isNeedCheck = (opMsg == null && valueOfCurrentMinusBorn > checkImmunityTime)
|| (opMsg != null && (opMsg.get(opMsg.size() - 1).getBornTimestamp() - startTime > transactionTimeout))
|| (valueOfCurrentMinusBorn <= -1);
if (isNeedCheck) {
//CHECK3 putBackHalfMsgQueue 如果需要回查,需要把half消息,再次发送到RMQ_SYS_TRANS_HALF_TOPIC中
if (!putBackHalfMsgQueue(msgExt, i)) {
continue;
}
//CHECK4 resolveHalfMsg 由线程池异步发送事务回查消息
listener.resolveHalfMsg(msgExt);
} else {
//如果无法判断当前消息是否需要回查check,继续从op的队列中,再拉取后续的32条op消息,再判定是否有需要回查
pullResult = fillOpRemoveMap(removeMap, opQueue, pullResult.getNextBeginOffset(), halfOffset, doneOpOffset);
log.info("The miss offset:{} in messageQueue:{} need to get more opMsg, result is:{}", i,
messageQueue, pullResult);
continue;
}
} //4
newOffset = i + 1;
i++;
} //3
此处,给出根据几个参数,决定是否需要回查:
//判断当前消息是否需要check,两种主要情况:
//1 op的32条队列为null,且消息存在时间超过了checkImmunityTime不回查时间;
//2 op不为null,且op最后一条消息时间—check开始时间,超过transactionTimeout事务超时时间,这时无论是否小于checkImmunityTime,都要进行check回查
如果需要回查,分别执行如下两个操作:
CHECK3 putBackHalfMsgQueue 如果需要回查,需要把half消息,再次发送到RMQ_SYS_TRANS_HALF_TOPIC中
CHECK4 resolveHalfMsg 由线程池异步发送事务回查消息
如果此时无法判定是否需要回查,那么会再次从op队列中拉取下一个32条消息,在下一次while中继续判定当前消息,是否需要回查check。
whilestep4 更新half和op队列
if (newOffset != halfOffset) {
//重新计算half消息中已经消费进度的offset
transactionalMessageBridge.updateConsumeOffset(messageQueue, newOffset);
}
long newOpOffset = calculateOpOffset(doneOpOffset, opOffset);
if (newOpOffset != opOffset) {
//更新op队列中消费进度
transactionalMessageBridge.updateConsumeOffset(opQueue, newOpOffset);
}
此处,着重分析whilestep3中两个回查方法:
CHECK3 putBackHalfMsgQueue 将待check消息再次存入half队列:原因分析
TransactionalMessageServiceImpl
private boolean putBackHalfMsgQueue(MessageExt msgExt, long offset) {
//将待回查消息,再次存入RMQ_SYS_TRANS_HALF_TOPIC队列
PutMessageResult putMessageResult = putBackToHalfQueueReturnResult(msgExt);
if (putMessageResult != null
&& putMessageResult.getPutMessageStatus() == PutMessageStatus.PUT_OK) {
msgExt.setQueueOffset(
putMessageResult.getAppendMessageResult().getLogicsOffset());
msgExt.setCommitLogOffset(
putMessageResult.getAppendMessageResult().getWroteOffset());
msgExt.setMsgId(putMessageResult.getAppendMessageResult().getMsgId());
log.debug(
"Send check message, the offset={} restored in queueOffset={} "
+ "commitLogOffset={} "
+ "newMsgId={} realMsgId={} topic={}",
offset, msgExt.getQueueOffset(), msgExt.getCommitLogOffset(), msgExt.getMsgId(),
msgExt.getUserProperty(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX),
msgExt.getTopic());
return true;
} else {
log.error(
"PutBackToHalfQueueReturnResult write failed, topic: {}, queueId: {}, "
+ "msgId: {}",
msgExt.getTopic(), msgExt.getQueueId(), msgExt.getMsgId());
return false;
}
}
此处,对于已经需要check的msg消息,需要再次传入RMQ_SYS_TRANS_HALF_TOPIC队列中,而且把新的消息重新设置最新的offset。这样做的目的和好处如下:
1 在判断msg需要执行回查check时,后续需要使用线程池异步的执行回查请求的发送,即listener.resolveHalfMsg,而由于是异步,所以无法知道回查操作是否成功,因此将次消息再次存入commitlog的RMQ_SYS_TRANS_HALF_TOPIC队列,然后继续推进half和op消息队列的进度。再half消息队列的推进过程中,再次执行到该消息,可以通过op队列,判断当前消息是否已经处理过;
2 其次的原因是,rocketmq采用顺序存储,效率高,而如果执行到msg发送异步的check,得到结果后,再回过头来处理已经执行过的half队列的信息,会影响性能。
CHECK4 AbstractTransactionalMessageCheckListener.resolveHalfMsg线程池执行异步的回查请求发送
AbstractTransactionalMessageCheckListener
public void resolveHalfMsg(final MessageExt msgExt) {
executorService.execute(new Runnable() {
@Override
public void run() {
try {
//发送回查消息的请求
sendCheckMessage(msgExt);
} catch (Exception e) {
LOGGER.error("Send check message error!", e);
}
}
});
}
public void sendCheckMessage(MessageExt msgExt) throws Exception {
//构造回查事务状态的请求头
CheckTransactionStateRequestHeader checkTransactionStateRequestHeader = new CheckTransactionStateRequestHeader();
checkTransactionStateRequestHeader.setCommitLogOffset(msgExt.getCommitLogOffset());
checkTransactionStateRequestHeader.setOffsetMsgId(msgExt.getMsgId());
checkTransactionStateRequestHeader.setMsgId(msgExt.getUserProperty(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX));
checkTransactionStateRequestHeader.setTransactionId(checkTransactionStateRequestHeader.getMsgId());
checkTransactionStateRequestHeader.setTranStateTableOffset(msgExt.getQueueOffset());
msgExt.setTopic(msgExt.getUserProperty(MessageConst.PROPERTY_REAL_TOPIC));
msgExt.setQueueId(Integer.parseInt(msgExt.getUserProperty(MessageConst.PROPERTY_REAL_QUEUE_ID)));
msgExt.setStoreSize(0);
//根据消息的producerGroup,从中选择一个向producer发送请求
String groupId = msgExt.getProperty(MessageConst.PROPERTY_PRODUCER_GROUP);
Channel channel = brokerController.getProducerManager().getAvaliableChannel(groupId);
if (channel != null) {
//发送回查消息请求
brokerController.getBroker2Client().checkProducerTransactionState(groupId, channel, checkTransactionStateRequestHeader, msgExt);
} else {
LOGGER.warn("Check transaction failed, channel is null. groupId={}", groupId);
}
}
这里,要从生产者组group中根据groupid选择一个生产者,发送回查请求
然后调用client发送回查请求:
public class Broker2Client {
private static final InternalLogger log = InternalLoggerFactory.getLogger(LoggerName.BROKER_LOGGER_NAME);
private final BrokerController brokerController;
public Broker2Client(BrokerController brokerController) {
this.brokerController = brokerController;
}
public void checkProducerTransactionState(
final String group,
final Channel channel,
final CheckTransactionStateRequestHeader requestHeader,
final MessageExt messageExt) throws Exception {
RemotingCommand request =
RemotingCommand.createRequestCommand(RequestCode.CHECK_TRANSACTION_STATE, requestHeader);
request.setBody(MessageDecoder.encode(messageExt, false));
try {
//发送回查请求
this.brokerController.getRemotingServer().invokeOneway(channel, request, 10);
} catch (Exception e) {
log.error("Check transaction failed because invoke producer exception. group={}, msgId={}", group, messageExt.getMsgId(), e.getMessage());
}
}
………………producer端响应事务状态回查………………
producer端收到请求后,处理回查
CHECK5 ClientRemotingProcessor.processRequest
由clientRemotingProcessor处理器,处理请求
public class ClientRemotingProcessor implements NettyRequestProcessor {
private final InternalLogger log = ClientLogger.getLog();
private final MQClientInstance mqClientFactory;
@Override
public RemotingCommand processRequest(ChannelHandlerContext ctx,
RemotingCommand request) throws RemotingCommandException {
switch (request.getCode()) {
//check
case RequestCode.CHECK_TRANSACTION_STATE:
return this.checkTransactionState(ctx, request);
case RequestCode.NOTIFY_CONSUMER_IDS_CHANGED:
return this.notifyConsumerIdsChanged(ctx, request);
case RequestCode.RESET_CONSUMER_CLIENT_OFFSET:
return this.resetOffset(ctx, request);
case RequestCode.GET_CONSUMER_STATUS_FROM_CLIENT:
return this.getConsumeStatus(ctx, request);
case RequestCode.GET_CONSUMER_RUNNING_INFO:
return this.getConsumerRunningInfo(ctx, request);
case RequestCode.CONSUME_MESSAGE_DIRECTLY:
return this.consumeMessageDirectly(ctx, request);
default:
break;
}
return null;
}
public RemotingCommand checkTransactionState(ChannelHandlerContext ctx,
RemotingCommand request) throws RemotingCommandException {
final CheckTransactionStateRequestHeader requestHeader =
(CheckTransactionStateRequestHeader) request.decodeCommandCustomHeader(CheckTransactionStateRequestHeader.class);
final ByteBuffer byteBuffer = ByteBuffer.wrap(request.getBody());
//解码消息
final MessageExt messageExt = MessageDecoder.decode(byteBuffer);
if (messageExt != null) {
String transactionId = messageExt.getProperty(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX);
if (null != transactionId && !"".equals(transactionId)) {
messageExt.setTransactionId(transactionId);
}
//获取生产者组的name
final String group = messageExt.getProperty(MessageConst.PROPERTY_PRODUCER_GROUP);
if (group != null) {
//从producergroup中获取一个producer
MQProducerInner producer = this.mqClientFactory.selectProducer(group);
if (producer != null) {
final String addr = RemotingHelper.parseChannelRemoteAddr(ctx.channel());
//CHECK6 producer.checkTransactionState
producer.checkTransactionState(addr, messageExt, requestHeader);
} else {
log.debug("checkTransactionState, pick producer by group[{}] failed", group);
}
} else {
log.warn("checkTransactionState, pick producer group failed");
}
} else {
log.warn("checkTransactionState, decode message failed");
}
return null;
}
CHECK6 producer.checkTransactionState
@Override
public void checkTransactionState(final String addr, final MessageExt msg,
final CheckTransactionStateRequestHeader header) {
Runnable request = new Runnable() {
private final String brokerAddr = addr;
private final MessageExt message = msg;
private final CheckTransactionStateRequestHeader checkRequestHeader = header;
private final String group = DefaultMQProducerImpl.this.defaultMQProducer.getProducerGroup();
@Override
public void run() {
TransactionCheckListener transactionCheckListener = DefaultMQProducerImpl.this.checkListener();
TransactionListener transactionListener = getCheckListener();
if (transactionCheckListener != null || transactionListener != null) {
LocalTransactionState localTransactionState = LocalTransactionState.UNKNOW;
Throwable exception = null;
try {
if (transactionCheckListener != null) {
//执行transactionCheckListener.checkLocalTransactionState(message)的本地事务状态回查
localTransactionState = transactionCheckListener.checkLocalTransactionState(message);
} else if (transactionListener != null) {
log.debug("Used new check API in transaction message");
localTransactionState = transactionListener.checkLocalTransaction(message);
} else {
log.warn("CheckTransactionState, pick transactionListener by group[{}] failed", group);
}
} catch (Throwable e) {
log.error("Broker call checkTransactionState, but checkLocalTransactionState exception", e);
exception = e;
}
this.processTransactionState(
localTransactionState,
group,
exception);
} else {
log.warn("CheckTransactionState, pick transactionCheckListener by group[{}] failed", group);
}
}
private void processTransactionState(
final LocalTransactionState localTransactionState,
final String producerGroup,
final Throwable exception) {
final EndTransactionRequestHeader thisHeader = new EndTransactionRequestHeader();
thisHeader.setCommitLogOffset(checkRequestHeader.getCommitLogOffset());
thisHeader.setProducerGroup(producerGroup);
thisHeader.setTranStateTableOffset(checkRequestHeader.getTranStateTableOffset());
thisHeader.setFromTransactionCheck(true);
String uniqueKey = message.getProperties().get(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX);
if (uniqueKey == null) {
uniqueKey = message.getMsgId();
}
thisHeader.setMsgId(uniqueKey);
thisHeader.setTransactionId(checkRequestHeader.getTransactionId());
switch (localTransactionState) {
case COMMIT_MESSAGE:
thisHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_COMMIT_TYPE);
break;
case ROLLBACK_MESSAGE:
thisHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_ROLLBACK_TYPE);
log.warn("when broker check, client rollback this transaction, {}", thisHeader);
break;
case UNKNOW:
thisHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_NOT_TYPE);
log.warn("when broker check, client does not know this transaction state, {}", thisHeader);
break;
default:
break;
}
String remark = null;
if (exception != null) {
remark = "checkLocalTransactionState Exception: " + RemotingHelper.exceptionSimpleDesc(exception);
}
try {
//向broker端发送事务消息commit、rollback、unknown的处理事务状态
DefaultMQProducerImpl.this.mQClientFactory.getMQClientAPIImpl().endTransactionOneway(brokerAddr, thisHeader, remark,
3000);
} catch (Exception e) {
log.error("endTransactionOneway exception", e);
}
}
};
//事务状态的回查,是交给回查线程池处理
this.checkExecutor.submit(request);
}
事务的回查,是交给producer端线程池处理
this.checkExecutor = new ThreadPoolExecutor(
producer.getCheckThreadPoolMinSize(),
producer.getCheckThreadPoolMaxSize(),
1000 * 60,
TimeUnit.MILLISECONDS,
this.checkRequestQueue);
上述过程,如果本地回查状态commit,则producer向broker发送commit提交事务的命令;
如果本地回查rollback,则producer发送rollback的回滚事务操作;
如果unknown,则忽略此次提交。
由此,事务消息的处理过程,基本结束。