RocketMQ之事务消息

一、概述

Apache RocketMQ4.3.0版中已经支持分布式事务消息,通过消息的异步事务,可以保证本地事务和消息发送同时执行成功或失败,从而保证了数据的最终一致性。

二、案例

根据官方提供的例子,TransactionProducer.java如下:

public class TransactionProducer {

    public static final String PRODUCER_GROUP = "please_rename_unique_group_name";
    public static final String DEFAULT_NAMESRVADDR = "127.0.0.1:9876";
    public static final String TOPIC = "TopicTest1234";

    public static final int MESSAGE_COUNT = 10;

    public static void main(String[] args) throws MQClientException, InterruptedException {
        // 创建TransactionListener实例
        TransactionListener transactionListener = new TransactionListenerImpl();
        TransactionMQProducer producer = new TransactionMQProducer(PRODUCER_GROUP);

        //Uncomment the following line while debugging, namesrvAddr should be set to your local address
        //producer.setNamesrvAddr(DEFAULT_NAMESRVADDR);
        // 创建线程池,其线程的名称前缀”client-transaction-msg-check-thread“
        ExecutorService executorService = new ThreadPoolExecutor(2, 5, 100,
            TimeUnit.SECONDS, new ArrayBlockingQueue<>(2000), r -> {
                Thread thread = new Thread(r);
                thread.setName("client-transaction-msg-check-thread");
                return thread;
        });

        // 为事务消息发送者设置线程池
        producer.setExecutorService(executorService);
        // 为事务消息发送者设置事务监听器
        producer.setTransactionListener(transactionListener);
        producer.start();

        String[] tags = new String[] {"TagA", "TagB", "TagC", "TagD", "TagE"};

        for (int i = 0; i < MESSAGE_COUNT; i++) {
            try {
                Message msg = new Message(TOPIC, tags[i % tags.length], "KEY" + i,
                        ("Hello RocketMQ " + i).getBytes(RemotingHelper.DEFAULT_CHARSET));
                // 发送事务消息
                SendResult sendResult = producer.sendMessageInTransaction(msg, null);
                System.out.printf("%s%n", sendResult);

                Thread.sleep(10);
            } catch (MQClientException | UnsupportedEncodingException e) {
                e.printStackTrace();
            }
        }

        for (int i = 0; i < 100000; i++) {
            Thread.sleep(1000);
        }
        producer.shutdown();
    }
}

TransactionListener的实现TransactionListenerImpl

public class TransactionListenerImpl implements TransactionListener {

    private AtomicInteger transactionIndex = new AtomicInteger(0);

    private ConcurrentHashMap<String, Integer> localTrans = new ConcurrentHashMap<>();

    /**
     * 记录本地事务的事务状态
     * 
     * @param msg
     * @param arg
     * @return
     */
    @Override
    public LocalTransactionState executeLocalTransaction(Message msg, Object arg) {
        int value = transactionIndex.getAndIncrement();
        int status = value % 3;
        localTrans.put(msg.getTransactionId(), status);
        return LocalTransactionState.UNKNOW;
    }

    @Override
    public LocalTransactionState checkLocalTransaction(MessageExt msg) {
        Integer status = localTrans.get(msg.getTransactionId());
        if (null != status) {
            switch (status) {
                case 0:
                    return LocalTransactionState.UNKNOW;
                case 1:
                    return LocalTransactionState.COMMIT_MESSAGE;
                case 2:
                    return LocalTransactionState.ROLLBACK_MESSAGE;
                default:
                    return LocalTransactionState.COMMIT_MESSAGE;
            }
        }
        return LocalTransactionState.COMMIT_MESSAGE;
    }
}

三、流程概要

这里RocketMQ采用了2PC的思想来实现了提交事务消息,同时增加一个补偿逻辑来处理二阶段超时或者失败的消息,如下图所示。

上图说明了事务消息的大致方案,其中分为两个流程:正常事务消息的发送及提交、事务消息的补偿流程。

3.1 事务消息发送及提交

  1. 发送消息(half消息)。
  2. 服务端响应消息写入结果。
  3. 根据发送结果执行本地事务(如果写入失败,此时half消息对业务不可见,本地逻辑不执行)。
  4. 根据本地事务状态执行Commit或者Rollback(Commit操作生成消息索引,消息对消费者可见)

3.2 补偿流程

  1. 对没有Commit/Rollback的事务消息(pending状态的消息),从服务端发起一次“回查”
  2. Producer收到回查消息,检查回查消息对应的本地事务的状态
  3. 根据本地事务状态,重新Commit或者Rollback

其中,补偿阶段用于解决消息Commit或者Rollback发生超时或者失败的情况。

3.3 事务消息总的执行流程:

  1. Producer发送事务消息,Broker将其转成Half消息,备份topicqueueid
  2. Producer执行本地事务,根据本地事务执行状态,发送提交或回滚请求,Broker接收到提交请求,先将Half消息恢复成原消息的topicqueueid,放到可以供消费者消费的队列,并将其标记为删除,如果是回滚则直接标记为删除,两种情况下都再将消息写入half_op`队列,打上'd'标签,表示该消息被处理过,以供后面进行消息回查
  3. Producer60内没有收到请求回复,进行消息回查,查找Half消息中对应的op消息进行去重,然后将消息再保存到commitlog中,以便可以向前推进消费进度,最后发送回查请求
  4. 再根据事务的状态发送提交/回滚请求

四、事务消息设计

4.1 事务消息在一阶段对用户不可见

RocketMQ事务消息的主要流程中,一阶段的消息如何对用户不可见。其中,事务消息相对普通消息最大的特点就是一阶段发送的消息对用户是不可见的。那么,如何做到写入消息但是对用户不可见呢?RocketMQ事务消息的做法是:如果消息是half消息,将备份原消息的主题与消息消费队列,然后改变主题为RMQ_SYS_TRANS_HALF_TOPIC。由于消费组未订阅该主题,故消费端无法消费half类型的消息,然后RocketMQ会开启一个定时任务,从TopicRMQ_SYS_TRANS_HALF_TOPIC中拉取消息进行消费,根据生产者组获取一个服务提供者发送回查事务状态请求,根据事务状态来决定是提交或回滚消息。

RocketMQ中,消息在服务端的存储结构如下,每条消息都会有对应的索引信息,Consumer通过ConsumeQueue这个二级索引来读取消息实体内容,其流程如下:

rocketmqdesign11.png

RocketMQ的具体实现策略是:写入的如果事务消息,对消息的TopicQueue等属性进行替换,同时将原来的TopicQueue信息存储到消息的属性中,正因为消息主题被替换,故消息并不会转发到该原主题的消息消费队列,消费者无法感知消息的存在,不会消费。其实改变消息主题是RocketMQ的常用“套路”,回想一下延时消息的实现机制。

4.2 Commit和Rollback操作以及Op消息的引入

在完成一阶段写入一条对用户不可见的消息后,二阶段如果是Commit操作,则需要让消息对用户可见;如果是Rollback则需要撤销一阶段的消息。先说Rollback的情况。对于Rollback,本身一阶段的消息对用户是不可见的,其实不需要真正撤销消息(实际上RocketMQ也无法去真正的删除一条消息,因为是顺序写文件的)。但是区别于这条消息没有确定状态(Pending状态,事务悬而未决),需要一个操作来标识这条消息的最终状态。RocketMQ事务消息方案中引入了Op消息的概念,用Op消息标识事务消息已经确定的状态(Commit或者Rollback)。如果一条事务消息没有对应的Op消息,说明这个事务的状态还无法确定(可能是二阶段失败了)。引入Op消息后,事务消息无论是Commit或者Rollback都会记录一个Op操作。Commit相对于Rollback只是在写入Op消息前创建Half消息的索引。

4.3 Op消息的存储和对应关系

RocketMQOp消息写入到全局一个特定的Topic中通过源码中的方法—TransactionalMessageUtil.buildOpTopic();这个Topic是一个内部的Topic(像Half消息的Topic一样),不会被用户消费。Op消息的内容为对应的Half消息的存储的Offset,这样通过Op消息能索引到Half消息进行后续的回查操作。

rocketmqdesign12.png

4.4 Half消息的索引构建

在执行二阶段Commit操作时,需要构建出Half消息的索引。一阶段的Half消息由于是写到一个特殊的Topic,所以二阶段构建索引时需要读取出Half消息,并将TopicQueue替换成真正的目标的TopicQueue,之后通过一次普通消息的写入操作来生成一条对用户可见的消息。所以RocketMQ事务消息二阶段其实是利用了一阶段存储的消息的内容,在二阶段时恢复出一条完整的普通消息,然后走一遍消息写入流程。

4.5 如何处理二阶段失败的消息?

如果在RocketMQ事务消息的二阶段过程中失败了,例如在做Commit操作时,出现网络问题导致Commit失败,那么需要通过一定的策略使这条消息最终被CommitRocketMQ采用了一种补偿机制,称为“回查”。Broker端对未确定状态的消息发起回查,将消息发送到对应的Producer端(同一个GroupProducer),由Producer根据消息来检查本地事务的状态,进而执行Commit或者RollbackBroker端通过对比Half消息和Op消息进行事务消息的回查并且推进CheckPoint(记录那些事务消息的状态是确定的)。

值得注意的是,rocketmq并不会无休止的的信息事务状态回查,默认回查15次,如果15次回查还是无法得知事务状态,rocketmq默认回滚该消息。

五、源码分析

5.1 事务消息发送

5.1.1 发送事务入口

发送事务消息的入口为:TransactionMQProducer#sendMessageInTransaction:

发送事务消息使用TransactionMQProducer,此类继承DefaultMQProducer。委托DefaultMQProducerImpl执行发送逻辑

//org.apache.rocketmq.client.producer;
public class TransactionMQProducer extends DefaultMQProducer {
    
    //....
    
    @Override
    public TransactionSendResult sendMessageInTransaction(final Message msg,
        final Object arg) throws MQClientException {
        // transactionListener为空,则直接抛出异常
        if (null == this.transactionListener) {
            throw new MQClientException("TransactionListener is null", null);
        }

        msg.setTopic(NamespaceUtil.wrapNamespace(this.getNamespace(), msg.getTopic()));
        // 
        return this.defaultMQProducerImpl.sendMessageInTransaction(msg, null, arg);
    }
}

5.1.2 发送核心方法 - sendMessageInTransaction

发送的核心方法,主要逻辑

  • 校验事务监听器和消息相关配置(消息、topic、消息大小等)
  • 设置消息的事务属性,表示这是一个事务prepare消息。设置生产者组。用于回查本地事务,从生产者组中选择随机选择一个生产者。避免由于生产者挂掉导致一直回查失败
  • 发送prepare消息,返回成功结果后(SEND_OK)才执行本地回调事务监听器transactionListener。如果发送发生异常,则不会执行本地事务监听器
  • 发送本地处理结果给BrokerBroker根据状态回滚或者提交,如果本地事务执行成功,Broker会将事务消息从事务主题队列中移到目标TopicQueue中,此时订阅者就能消费到该消息
// org.apache.rocketmq.client.impl.producer;
public class DefaultMQProducerImpl implements MQProducerInner {

    //...

    /**
     * 发送事务消息
     * 
     * @param msg                           消息
     * @param localTransactionExecuter      事务监听器
     * @param arg                           其他参数,在TransactionListener回调函数中原值传入
     * @return
     * @throws MQClientException
     */
    public TransactionSendResult sendMessageInTransaction(final Message msg, 
                    final LocalTransactionExecuter localTransactionExecuter,
                    final Object arg) throws MQClientException {
        //指定监听类
        TransactionListener transactionListener = getCheckListener();
        if (null == localTransactionExecuter && null == transactionListener) {
            throw new MQClientException("tranExecutor is null", null);
        }

        // ignore DelayTimeLevel parameter
        if (msg.getDelayTimeLevel() != 0) {
            MessageAccessor.clearProperty(msg, MessageConst.PROPERTY_DELAY_TIME_LEVEL);
        }

        Validators.checkMessage(msg, this.defaultMQProducer);

        SendResult sendResult = null;
        // 设置属性,TRAN_MSG其值为true,表示为事务消息
        MessageAccessor.putProperty(msg, MessageConst.PROPERTY_TRANSACTION_PREPARED, "true");
        // 设置生产者组。用于回查本地事务,从生产者组中选择随机选择一个生产者。避免由于生产者挂掉导致一直回查失败
        MessageAccessor.putProperty(msg, MessageConst.PROPERTY_PRODUCER_GROUP,
                this.defaultMQProducer.getProducerGroup());
        try {
            // 同步发送半消息,以普通消息发送
            // 在broker端对消息进行处理,将topic改为事务主题和队列,原本的主题和队列放在消息的properties中
            sendResult = this.send(msg);
        } catch (Exception e) {
            throw new MQClientException("send message Exception", e);
        }

        LocalTransactionState localTransactionState = LocalTransactionState.UNKNOW;
        Throwable localException = null;
        switch (sendResult.getSendStatus()) {
            case SEND_OK: {
                try {
                    // 事务ID
                    if (sendResult.getTransactionId() != null) {
                        msg.putUserProperty("__transactionId__", sendResult.getTransactionId());
                    }
                    // UNIQ_KEY,客户端发送时生成的唯一ID
                    String transactionId = msg.getProperty(
                            MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX);
                    if (null != transactionId && !"".equals(transactionId)) {
                        msg.setTransactionId(transactionId);
                    }
                    // 执行本地配合的transactionListener逻辑。localTransactionExecuter已经过时
                    if (null != localTransactionExecuter) {
                        localTransactionState = localTransactionExecuter
                                .executeLocalTransactionBranch(msg, arg);
                    } else {
                        log.debug("Used new transaction API");
                        // 发送成功后回调执行本地事务
                        // 如果这个执行出现异常可能导致localTransactionState默认就是UNKNOW,
                        // 如果返回null,则需要赋值一个默认值UNKNOW
                        localTransactionState = transactionListener.executeLocalTransaction(msg, arg);
                    }
                    if (null == localTransactionState) {
                        localTransactionState = LocalTransactionState.UNKNOW;
                    }

                    if (localTransactionState != LocalTransactionState.COMMIT_MESSAGE) {
                        log.info("executeLocalTransactionBranch return: {} messageTopic: {}" +
                                        " transactionId: {} tag: {} key: {}",
                                localTransactionState, msg.getTopic(), msg.getTransactionId(),
                                msg.getTags(), msg.getKeys());
                    }
                } catch (Throwable e) {
                    log.error("executeLocalTransactionBranch exception, messageTopic: {}" +
                                    " transactionId: {} tag: {} key: {}",
                            msg.getTopic(), msg.getTransactionId(), msg.getTags(),
                            msg.getKeys(), e);
                    localException = e;
                }
            }
            break;
            // 未发送成功,设置回滚状态
            case FLUSH_DISK_TIMEOUT:
            case FLUSH_SLAVE_TIMEOUT:
            case SLAVE_NOT_AVAILABLE:
                localTransactionState = LocalTransactionState.ROLLBACK_MESSAGE;
                break;
            default:
                break;
        }

        try {
            // 发送本地处理结果给Broker
            this.endTransaction(msg, sendResult, localTransactionState, localException);
        } catch (Exception e) {
            log.warn("local transaction execute " + localTransactionState
                    + ", but end broker transaction failed", e);
        }

        TransactionSendResult transactionSendResult = new TransactionSendResult();
        transactionSendResult.setSendStatus(sendResult.getSendStatus());
        transactionSendResult.setMessageQueue(sendResult.getMessageQueue());
        transactionSendResult.setMsgId(sendResult.getMsgId());
        transactionSendResult.setQueueOffset(sendResult.getQueueOffset());
        transactionSendResult.setTransactionId(sendResult.getTransactionId());
        transactionSendResult.setLocalTransactionState(localTransactionState);
        return transactionSendResult;
    }

    /**
     * DEFAULT SYNC -------------------------------------------------------
     */
    public SendResult send(Message msg) throws MQClientException,
            RemotingException, MQBrokerException, InterruptedException {
        return send(msg, this.defaultMQProducer.getSendMsgTimeout());
    }

    public SendResult send(Message msg, long timeout) throws MQClientException,
            RemotingException, MQBrokerException, InterruptedException {
        return this.sendDefaultImpl(msg, CommunicationMode.SYNC, null, timeout);
    }
}

sendDefaultImpl方法在RocketMQ之消息发送源码分析里讲过,不在阐述。

5.1.3 发送本地结果 - endTransaction

发送本地处理结果给Broker,本地处理结果会做转换

  • 如果localTransactionState==COMMIT_MESSAGE,设置为MessageSysFlag.TRANSACTION_COMMIT_TYPE( 0x2 << 2;//1000)
  • 如果localTransactionState==ROLLBACK_MESSAGE,设置为MessageSysFlag.TRANSACTION_ROLLBACK_TYPE(0x3 << 2;//1100)
  • 如果localTransactionState==UNKNOW,设置为MessageSysFlag.TRANSACTION_NOT_TYPE(0;//0000)
// org.apache.rocketmq.client.impl.producer;
public class DefaultMQProducerImpl implements MQProducerInner {

    //...
    public void endTransaction(final Message msg, final SendResult sendResult,
            final LocalTransactionState localTransactionState, final Throwable localException) 
            throws RemotingException, MQBrokerException, InterruptedException, UnknownHostException {
        final MessageId id;
        // 服务端的消息ID
        if (sendResult.getOffsetMsgId() != null) {
            id = MessageDecoder.decodeMessageId(sendResult.getOffsetMsgId());
        } else {
            id = MessageDecoder.decodeMessageId(sendResult.getMsgId());
        }
        // 事务Id
        String transactionId = sendResult.getTransactionId();
        final String destBrokerName = this.mQClientFactory.getBrokerNameFromMessageQueue(
                defaultMQProducer.queueWithNamespace(sendResult.getMessageQueue()));
        // prepare发送到哪个broker,就提交或者回滚在哪个Broker
        final String brokerAddr = this.mQClientFactory.findBrokerAddressInPublish(destBrokerName);
        EndTransactionRequestHeader requestHeader = new EndTransactionRequestHeader();
        requestHeader.setTransactionId(transactionId);
        // 事务消息的提交偏移量
        requestHeader.setCommitLogOffset(id.getOffset());
        requestHeader.setBname(destBrokerName);
        switch (localTransactionState) {
            case COMMIT_MESSAGE:
                requestHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_COMMIT_TYPE);
                break;
            case ROLLBACK_MESSAGE:
                requestHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_ROLLBACK_TYPE);
                break;
            case UNKNOW:
                requestHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_NOT_TYPE);
                break;
            default:
                break;
        }

        doExecuteEndTransactionHook(msg, sendResult.getMsgId(), brokerAddr,
                localTransactionState, false);
        requestHeader.setProducerGroup(this.defaultMQProducer.getProducerGroup());
        requestHeader.setTranStateTableOffset(sendResult.getQueueOffset());
        requestHeader.setMsgId(sendResult.getMsgId());
        // 携带本地执行事务回调的异常信息
        String remark = localException != null 
                ? ("executeLocalTransactionBranch exception: " + localException.toString())
                : null;
        
        // 发送事务本地处理结果给Broker
        this.mQClientFactory.getMQClientAPIImpl().endTransactionOneway(brokerAddr, requestHeader,
                remark, this.defaultMQProducer.getSendMsgTimeout());
    }
}

5.2 Broker处理Prepare消息

5.2.1 事务消息属性

与处理普通消息一样,事务的prepare消息也是通过SendMessageProcessor#processRequest处理。针对事务Prepare消息的存储与普通消息不同的是,其委托TransactionalMessageService进行处理

// org.apache.rocketmq.broker.processor;
public class SendMessageProcessor 
        extends AbstractSendMessageProcessor 
        implements NettyRequestProcessor {

    //...
    
    @Override
    public RemotingCommand processRequest(ChannelHandlerContext ctx,
                                          RemotingCommand request) throws RemotingCommandException {
        SendMessageContext sendMessageContext;
        switch (request.getCode()) {
            case RequestCode.CONSUMER_SEND_MSG_BACK:
                return this.consumerSendMsgBack(ctx, request);
            default:
                SendMessageRequestHeader requestHeader = parseRequestHeader(request);
                if (requestHeader == null) {
                    return null;
                }
                TopicQueueMappingContext mappingContext = this.brokerController
                    .getTopicQueueMappingManager().buildTopicQueueMappingContext(requestHeader, true);
                RemotingCommand rewriteResult = this.brokerController.getTopicQueueMappingManager()
                        .rewriteRequestForStaticTopic(requestHeader, mappingContext);
                if (rewriteResult != null) {
                    return rewriteResult;
                }
                sendMessageContext = buildMsgContext(ctx, requestHeader, request);
                try {
                    this.executeSendMessageHookBefore(sendMessageContext);
                } catch (AbortProcessException e) {
                    final RemotingCommand errorResponse = RemotingCommand.createResponseCommand(
                            e.getResponseCode(), e.getErrorMessage());
                    errorResponse.setOpaque(request.getOpaque());
                    return errorResponse;
                }

                RemotingCommand response;
                if (requestHeader.isBatch()) {
                    response = this.sendBatchMessage(ctx, request, sendMessageContext,
                            requestHeader, mappingContext,
                            (ctx1, response1) -> executeSendMessageHookAfter(response1, ctx1));
                } else {
                    response = this.sendMessage(ctx, request, sendMessageContext, 
                            requestHeader, mappingContext,
                            (ctx12, response12) -> executeSendMessageHookAfter(response12, ctx12));
                }

                return response;
        }
    }
    
    public RemotingCommand sendMessage(final ChannelHandlerContext ctx, final RemotingCommand request, 
                                       final SendMessageContext sendMessageContext, 
                                       final SendMessageRequestHeader requestHeader,
                                       final TopicQueueMappingContext mappingContext,
                                       final SendMessageCallback sendMessageCallback) 
            throws RemotingCommandException {

        final RemotingCommand response = preSend(ctx, request, requestHeader);
        if (response.getCode() != -1) {
            return response;
        }

        final SendMessageResponseHeader responseHeader = 
                (SendMessageResponseHeader) response.readCustomHeader();

        final byte[] body = request.getBody();

        int queueIdInt = requestHeader.getQueueId();
        TopicConfig topicConfig = this.brokerController.getTopicConfigManager()
                .selectTopicConfig(requestHeader.getTopic());

        if (queueIdInt < 0) {
            queueIdInt = randomQueueId(topicConfig.getWriteQueueNums());
        }

        MessageExtBrokerInner msgInner = new MessageExtBrokerInner();
        msgInner.setTopic(requestHeader.getTopic());
        msgInner.setQueueId(queueIdInt);

        Map<String, String> oriProps = MessageDecoder.string2messageProperties(
                requestHeader.getProperties());
        if (!handleRetryAndDLQ(requestHeader, response, request, msgInner, topicConfig, oriProps)) {
            return response;
        }

        msgInner.setBody(body);
        msgInner.setFlag(requestHeader.getFlag());

        String uniqKey = oriProps.get(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX);
        if (uniqKey == null || uniqKey.length() <= 0) {
            uniqKey = MessageClientIDSetter.createUniqID();
            oriProps.put(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX, uniqKey);
        }

        MessageAccessor.setProperties(msgInner, oriProps);

        CleanupPolicy cleanupPolicy = CleanupPolicyUtils.getDeletePolicy(Optional.of(topicConfig));
        if (Objects.equals(cleanupPolicy, CleanupPolicy.COMPACTION)) {
            if (StringUtils.isBlank(msgInner.getKeys())) {
                response.setCode(ResponseCode.MESSAGE_ILLEGAL);
                response.setRemark("Required message key is missing");
                return response;
            }
        }

        msgInner.setTagsCode(MessageExtBrokerInner.tagsString2tagsCode(
                topicConfig.getTopicFilterType(), msgInner.getTags()));
        msgInner.setBornTimestamp(requestHeader.getBornTimestamp());
        msgInner.setBornHost(ctx.channel().remoteAddress());
        msgInner.setStoreHost(this.getStoreHost());
        msgInner.setReconsumeTimes(requestHeader.getReconsumeTimes() == null 
                ? 0 
                : requestHeader.getReconsumeTimes());
        String clusterName = this.brokerController.getBrokerConfig().getBrokerClusterName();
        MessageAccessor.putProperty(msgInner, MessageConst.PROPERTY_CLUSTER, clusterName);

        msgInner.setPropertiesString(MessageDecoder
              .messageProperties2String(msgInner.getProperties()));

        // 获取producer发送的时候设置的事务消息属性[prepare消息  commit消息]
        String traFlag = oriProps.get(MessageConst.PROPERTY_TRANSACTION_PREPARED);
        boolean sendTransactionPrepareMessage = false;
        //For client under version 4.6.1
        if (Boolean.parseBoolean(traFlag)
                && !(msgInner.getReconsumeTimes() > 0 && msgInner.getDelayTimeLevel() > 0)) { 
            //是否允许事务消息存储,默认允许
            if (this.brokerController.getBrokerConfig().isRejectTransactionMessage()) {
                response.setCode(ResponseCode.NO_PERMISSION);
                response.setRemark(
                        "the broker[" + this.brokerController.getBrokerConfig().getBrokerIP1()
                                + "] sending transaction message is forbidden");
                return response;
            }
            sendTransactionPrepareMessage = true;
        }

        long beginTimeMillis = this.brokerController.getMessageStore().now();

        if (brokerController.getBrokerConfig().isAsyncSendEnable()) {
            CompletableFuture<PutMessageResult> asyncPutMessageFuture;
            if (sendTransactionPrepareMessage) {
                asyncPutMessageFuture = this.brokerController.getTransactionalMessageService()
                        .asyncPrepareMessage(msgInner);
            } else {
                asyncPutMessageFuture = this.brokerController.getMessageStore()
                        .asyncPutMessage(msgInner);
            }

            final int finalQueueIdInt = queueIdInt;
            final MessageExtBrokerInner finalMsgInner = msgInner;
            asyncPutMessageFuture.thenAcceptAsync(putMessageResult -> {
                RemotingCommand responseFuture = handlePutMessageResult(putMessageResult, response,
                        request, finalMsgInner, responseHeader, sendMessageContext, ctx,
                        finalQueueIdInt, beginTimeMillis, mappingContext, 
                        BrokerMetricsManager.getMessageType(requestHeader));
                if (responseFuture != null) {
                    doResponse(ctx, request, responseFuture);
                }
                sendMessageCallback.onComplete(sendMessageContext, response);
            }, this.brokerController.getPutMessageFutureExecutor());
            // Returns null to release the send message thread
            return null;
        } else {
            PutMessageResult putMessageResult = null;
            if (sendTransactionPrepareMessage) {
                putMessageResult = this.brokerController.getTransactionalMessageService()
                        .prepareMessage(msgInner);
            } else {
                putMessageResult = this.brokerController.getMessageStore().putMessage(msgInner);
            }
            handlePutMessageResult(putMessageResult, response, request, msgInner, responseHeader, 
                    sendMessageContext, ctx, queueIdInt, beginTimeMillis, mappingContext,
                    BrokerMetricsManager.getMessageType(requestHeader));
            sendMessageCallback.onComplete(sendMessageContext, response);
            return response;
        }
    }
}    

5.2.2 预存储事务消息

TransactionalMessageServiceImpl#prepareMessage调用TransactionalMessageBridge#putHalfMessage进行预存储事务消息,将消息原始的TopicQueueId信息备份到属性中(为了后续提交时使用),将消息的原始Topic更改为RMQ_SYS_TRANS_HALF_TOPIC,此Topic只有一个队列0。

//org.apache.rocketmq.broker.transaction.queue;
public class TransactionalMessageServiceImpl implements TransactionalMessageService {

    //...
    
    private TransactionalMessageBridge transactionalMessageBridge;
    
    @Override
    public PutMessageResult prepareMessage(MessageExtBrokerInner messageInner) {
        return transactionalMessageBridge.putHalfMessage(messageInner);
    }
    
}
//org.apache.rocketmq.broker.transaction.queue;
public class TransactionalMessageBridge {
    
    //...

    private final MessageStore store;
    
    public PutMessageResult putHalfMessage(MessageExtBrokerInner messageInner) {
        return store.putMessage(parseHalfMessageInner(messageInner));
    }
}

默认是调用DefaultMessageStore#putMessage。这里就跟普通消息的存储没有任何区别了。

// org.apache.rocketmq.store;
public class DefaultMessageStore implements MessageStore {

    //...
    
    @Override
    public PutMessageResult putMessage(MessageExtBrokerInner msg) {
        return waitForPutResult(asyncPutMessage(msg));
    }

    private PutMessageResult waitForPutResult(
            CompletableFuture<PutMessageResult> putMessageResultFuture) {
        try {
            int putMessageTimeout =
                    Math.max(this.messageStoreConfig.getSyncFlushTimeout(),
                            this.messageStoreConfig.getSlaveTimeout()) + 5000;
            return putMessageResultFuture.get(putMessageTimeout, TimeUnit.MILLISECONDS);
        } catch (ExecutionException | InterruptedException e) {
            return new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, null);
        } catch (TimeoutException e) {
            log.error("usually it will never timeout, putMessageTimeout" +
                    " is much bigger than slaveTimeout and "
                    + "flushTimeout so the result can be got anyway, " +
                    "but in some situations timeout will happen like full gc "
                    + "process hangs or other unexpected situations.");
            return new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, null);
        }
    }
}

从上面可知,事务Prepare消息的存储与普通消息并没有太大区别,那Broker是如何保证Prepare不会被Consume消费掉的呢?主要通过一下方式

  • 旧的实现方式(废弃)
    1. Broker在消息写入CommitLog的时候会判断消息类型,如果是prepare或者rollback的事务消息,ConsumeQueue的queueOffset不会增加(queueOffset每追加一条就会自增)。
    2. Broker在构造ConsumeQueue时会判断prepare和rollback消息,如果是则不会将消息写入ConsumeQueue。即此消息不会在ConsumeQueue中,所以Consumer也就不会消费
  • 新的实现方式:更改原有的Topic,只有Commit消息后才会将其发送到原始的Topic下,这样就保证没有Commit前,Consumer无法消费

5.2.3 事务消息相关的判断

查看CommitLog.DefaultAppendMessageCallback#doAppend是事务消息相关的判断。由于前面将Topic换成事务的Topic,并且将事务的标记去掉了,所以这里标记永远是TRANSACTION_NOT_TYPE。之所以有这个逻辑,我猜测是之前的事务实现方式(没有更改Topic的方式)

    class DefaultAppendMessageCallback implements AppendMessageCallback {

    public AppendMessageResult doAppend(final long fileFromOffset,
                                        final ByteBuffer byteBuffer, 
                                        final int maxBlank,
                                        final MessageExtBrokerInner msgInner) {
        
        //...省略...
        
        // Record ConsumeQueue information
        keyBuilder.setLength(0);
        keyBuilder.append(msgInner.getTopic());
        keyBuilder.append('-');
        keyBuilder.append(msgInner.getQueueId());
        String key = keyBuilder.toString();
        Long queueOffset = CommitLog.this.topicQueueTable.get(key);
        if (null == queueOffset) {
            queueOffset = 0L;
            CommitLog.this.topicQueueTable.put(key, queueOffset);
        }

        //事务消息Prepare和Rollback消息,队列偏移量都设置的是0
        final int tranType = MessageSysFlag.getTransactionValue(msgInner.getSysFlag());
        switch (tranType) {
            // Prepared and Rollback message is not consumed, will not enter the
            // consumer queuec
            case MessageSysFlag.TRANSACTION_PREPARED_TYPE:
            case MessageSysFlag.TRANSACTION_ROLLBACK_TYPE:
                queueOffset = 0L;
                break;
            case MessageSysFlag.TRANSACTION_NOT_TYPE:
            case MessageSysFlag.TRANSACTION_COMMIT_TYPE:
            default:
                break;
        }

        //...省略...
        
        // 只有事务TRANSACTION_COMMIT_TYPE消息和TRANSACTION_NOT_TYPE才会设置队列偏移量
        switch (tranType) {
            case MessageSysFlag.TRANSACTION_PREPARED_TYPE:
            case MessageSysFlag.TRANSACTION_ROLLBACK_TYPE:
                break;
            case MessageSysFlag.TRANSACTION_NOT_TYPE:
            case MessageSysFlag.TRANSACTION_COMMIT_TYPE:
                // The next update ConsumeQueue information
                CommitLog.this.topicQueueTable.put(key, ++queueOffset);
                break;
            default:
                break;
        }
        return result;
    }
}

5.2.4 异步构造

查看DefaultMessageStore.CommitLogDispatcherBuildConsumeQueue#dispatch是异步构造ConsumeQueue的地方。这里可以看到,如果是PrepareRollback消息,并不会构造。这样Consumer也就无法消费了

class CommitLogDispatcherBuildConsumeQueue implements CommitLogDispatcher {

    //...
    
    @Override
    public void dispatch(DispatchRequest request) {
        final int tranType = MessageSysFlag.getTransactionValue(request.getSysFlag());
        switch (tranType) {
            case MessageSysFlag.TRANSACTION_NOT_TYPE:
            case MessageSysFlag.TRANSACTION_COMMIT_TYPE:
                DefaultMessageStore.this.putMessagePositionInfo(request);
                break;
            // 对于prepare和rollback消息不会构造ConsumeQueue
            case MessageSysFlag.TRANSACTION_PREPARED_TYPE:
            case MessageSysFlag.TRANSACTION_ROLLBACK_TYPE:
                break;
        }
    }
}

5.3 Broker处理提交/回滚消息

5.3.1 结果处理

Producer发送本地结果给Broker,调用RequestCode.END_TRANSACTION命令,此命令在broker端是通过EndTransactionProcessor来进行处理的。EndTransactionProcessor#processRequest处理逻辑如下

  • 只有Master节点可以处理,打印相关日志。只有提交或者回滚的消息才会向下执行
  • 如果是回滚消息,根据偏移量从RMQ_SYS_TRANS_HALF_TOPIC查询出提交的消息,并检查Prepare消息的正确性。删除回滚消息(其实就是向RMQ_SYS_TRANS_OP_HALF_TOPIC主题写入消息,tag是d),标识此消息已经被删除
  • 如果是提交消息,根据偏移量从RMQ_SYS_TRANS_HALF_TOPIC查询出提交的消息,并检查Prepare消息的正确性。恢复原始消息,包括恢复原始Topic、Queue等,并且清除事务属性,并且将原始消息存储到CommitLog中,存储成功时删除prepare消息(其实就是向RMQ_SYS_TRANS_OP_HALF_TOPIC主题写入消息,tag是d),标识此消息已经被删除
// org.apache.rocketmq.broker.processor;

public class EndTransactionProcessor implements NettyRequestProcessor {
    
    //...
    
    @Override
    public RemotingCommand processRequest(ChannelHandlerContext ctx, RemotingCommand request) 
            throws RemotingCommandException {
        final RemotingCommand response = RemotingCommand.createResponseCommand(null);
        final EndTransactionRequestHeader requestHeader = (EndTransactionRequestHeader) 
                request.decodeCommandCustomHeader(EndTransactionRequestHeader.class);
        
        log.info("Transaction request:{}", requestHeader);
        // 从节点不允许处理事务消息
        if (BrokerRole.SLAVE == brokerController.getMessageStoreConfig().getBrokerRole()) {
            response.setCode(ResponseCode.SLAVE_NOT_AVAILABLE);
            log.warn("Message store is slave mode, so end transaction is forbidden. ");
            return response;
        }
        // 事务回查标记,是否为事务回查(仅仅打印日志),只有提交或者回滚的消息才向后处理
        if (requestHeader.getFromTransactionCheck()) {
            switch (requestHeader.getCommitOrRollback()) {
                case MessageSysFlag.TRANSACTION_NOT_TYPE: {
                    log.warn("Check producer[{}] transaction state, but it's pending status."
                                    + "RequestHeader: {} Remark: {}",
                            RemotingHelper.parseChannelRemoteAddr(ctx.channel()),
                            requestHeader.toString(),
                            request.getRemark());
                    return null;
                }

                case MessageSysFlag.TRANSACTION_COMMIT_TYPE: {
                    log.warn("Check producer[{}] transaction state, the producer commit the message."
                                    + "RequestHeader: {} Remark: {}",
                            RemotingHelper.parseChannelRemoteAddr(ctx.channel()),
                            requestHeader.toString(),
                            request.getRemark());
                    break;
                }

                case MessageSysFlag.TRANSACTION_ROLLBACK_TYPE: {
                    log.warn("Check producer[{}] transaction state, the producer rollback the message."
                                    + "RequestHeader: {} Remark: {}",
                            RemotingHelper.parseChannelRemoteAddr(ctx.channel()),
                            requestHeader.toString(),
                            request.getRemark());
                    break;
                }
                default:
                    return null;
            }
        } else {
            switch (requestHeader.getCommitOrRollback()) {
                case MessageSysFlag.TRANSACTION_NOT_TYPE: {
                    log.warn("The producer[{}] end transaction in sending message,  "
                                    + "and it's pending status. RequestHeader: {} Remark: {}",
                            RemotingHelper.parseChannelRemoteAddr(ctx.channel()),
                            requestHeader.toString(),
                            request.getRemark());
                    return null;
                }

                case MessageSysFlag.TRANSACTION_COMMIT_TYPE: {
                    break;
                }

                case MessageSysFlag.TRANSACTION_ROLLBACK_TYPE: {
                    log.warn("The producer[{}] end transaction in sending message,"
                                    + " rollback the message. RequestHeader: {} Remark: {}",
                            RemotingHelper.parseChannelRemoteAddr(ctx.channel()),
                            requestHeader.toString(),
                            request.getRemark());
                    break;
                }
                default:
                    return null;
            }
        }
        OperationResult result = new OperationResult();
        if (MessageSysFlag.TRANSACTION_COMMIT_TYPE == requestHeader.getCommitOrRollback()) {
            // 根据之前提交的内部事务topic的偏移量查出来提交的这条消息
            result = this.brokerController.getTransactionalMessageService()
                    .commitMessage(requestHeader);
            if (result.getResponseCode() == ResponseCode.SUCCESS) {
                // 校验查询出来的这条消息是否正确
                RemotingCommand res = checkPrepareMessage(result.getPrepareMessage(), requestHeader);
                if (res.getCode() == ResponseCode.SUCCESS) {
                    // 恢复原始消息
                    MessageExtBrokerInner msgInner = endMessageTransaction(result.getPrepareMessage());
                    msgInner.setSysFlag(MessageSysFlag.resetTransactionValue(msgInner.getSysFlag(),
                            requestHeader.getCommitOrRollback()));
                    msgInner.setQueueOffset(requestHeader.getTranStateTableOffset());
                    msgInner.setPreparedTransactionOffset(requestHeader.getCommitLogOffset());
                    msgInner.setStoreTimestamp(result.getPrepareMessage().getStoreTimestamp());
                    //存储到CommitLog文件中,如果成功,则删除半消息
                    RemotingCommand sendResult = sendFinalMessage(msgInner);
                    if (sendResult.getCode() == ResponseCode.SUCCESS) {
                        // 删除prepare消息,其实就是向RMQ_SYS_TRANS_OP_HALF_TOPIC主题写入消息,tag是d
                        // 因为RocketMQ是追加消息,不支持更改和删除,所以删除就是在特有的主题下新增一条消息
                        // 这样无论是提交还是回滚,都可以找到,以此来判断是回滚还是提交了。如果没有则是未知状态
                        this.brokerController.getTransactionalMessageService()
                                .deletePrepareMessage(result.getPrepareMessage());
                    }
                    return sendResult;
                }
                return res;
            }
        } else if (MessageSysFlag.TRANSACTION_ROLLBACK_TYPE == requestHeader.getCommitOrRollback()) {
            result = this.brokerController.getTransactionalMessageService()
                     .rollbackMessage(requestHeader);
            if (result.getResponseCode() == ResponseCode.SUCCESS) {
                RemotingCommand res = checkPrepareMessage(result.getPrepareMessage(), requestHeader);
                if (res.getCode() == ResponseCode.SUCCESS) {
                    this.brokerController.getTransactionalMessageService()
                            .deletePrepareMessage(result.getPrepareMessage());
                }
                return res;
            }
        }
        response.setCode(result.getResponseCode());
        response.setRemark(result.getResponseRemark());
        return response;
    }
}

5.3.2 删除消息

TransactionalMessageServiceImpl#deletePrepareMessage删除消息(并不是物理删除,而是追加),删除消息本质是向RMQ_SYS_TRANS_OP_HALF_TOPIC主题的队列追加一条有特定tag的消息

//TransactionalMessageServiceImpl#deletePrepareMessage
@Override
public boolean deletePrepareMessage(MessageExt msgExt) {
    if (this.transactionalMessageBridge.putOpMessage(msgExt, TransactionalMessageUtil.REMOVETAG)) {
        log.info("Transaction op message write successfully. messageId={}, queueId={} msgExt:{}",
        msgExt.getMsgId(), msgExt.getQueueId(), msgExt);
        return true;
    } else {
        log.error("Transaction op message write failed. messageId is {}, queueId is {}",
        msgExt.getMsgId(), msgExt.getQueueId());
        return false;
    }
}

// TransactionalMessageBridge#putOpMessage 
// 向RMQ_SYS_TRANS_OP_HALF_TOPIC追加消息
public boolean putOpMessage(MessageExt messageExt, String opType) {
    // messageExt是Prepare消息
    // 构建一个消息队列
    MessageQueue messageQueue = new MessageQueue(messageExt.getTopic(),
        this.brokerController.getBrokerConfig().getBrokerName(), messageExt.getQueueId());
    if (TransactionalMessageUtil.REMOVETAG.equals(opType)) {
        return addRemoveTagInTransactionOp(messageExt, messageQueue);
    }
    return true;
}

//TransactionalMessageBridge#addRemoveTagInTransactionOp
private boolean addRemoveTagInTransactionOp(MessageExt messageExt, MessageQueue messageQueue) {
    Message message = new Message(TransactionalMessageUtil.buildOpTopic(),
        TransactionalMessageUtil.REMOVETAG,
        String.valueOf(messageExt.getQueueOffset()).getBytes(TransactionalMessageUtil.charset));
    writeOp(message, messageQueue);
    return true;
}

//TransactionalMessageBridge#writeOp 
// 调用存储putMessage
private void writeOp(Message message, MessageQueue mq) {
    //此处mq指的是Prepare消息队列(RMQ_SYS_TRANS_HALF_TOPIC主题的)
    //key=RMQ_SYS_TRANS_HALF_TOPIC队列与value=RMQ_SYS_TRANS_OP_HALF_TOPIC缓存
    MessageQueue opQueue;
    if (opQueueMap.containsKey(mq)) {
        opQueue = opQueueMap.get(mq);
    } else {
        // 创建一个RMQ_SYS_TRANS_OP_HALF_TOPIC主题的消息队列
        opQueue = getOpQueueByHalf(mq);
        // 如果已经存在不会覆盖已有的值,直接返回已有的值
        MessageQueue oldQueue = opQueueMap.putIfAbsent(mq, opQueue);
        if (oldQueue != null) {
            opQueue = oldQueue;
        }
    }
    //TODO by jannal 此处为什么会为null ??
    if (opQueue == null) {
        opQueue = new MessageQueue(TransactionalMessageUtil.buildOpTopic(),
                mq.getBrokerName(), mq.getQueueId());
    }
    putMessage(makeOpMessageInner(message, opQueue));
}

5.4 Broker事务回查

5.4.1 事务定期回查

正常情况下如果客户端在处理回调后,会返回给Broker相关的状态。假设Producer此时挂了,或者因为网络原因调用Broker失败了。这个时候就需要Broker事务定期回查。

TransactionalMessageCheckService是一个服务线程,在Broker启动时,此服务线程会启动。默认60s执行一次回查,每次执行的超时时间是6s,最大回查次数15次。调用TransactionalMessageService#check方法做消息回查

//BrokerController#start
 
//...省略....
if (BrokerRole.SLAVE != messageStoreConfig.getBrokerRole()) {
    if (this.transactionalMessageCheckService != null) {
        log.info("Start transaction service!");
        this.transactionalMessageCheckService.start();
    }
} 

//TransactionalMessageCheckService#run
@Override
public void run() {
    log.info("Start transaction check service thread!");
  	// 默认60s
    long checkInterval = brokerController.getBrokerConfig().getTransactionCheckInterval();
    while (!this.isStopped()) {
        this.waitForRunning(checkInterval);
    }
    log.info("End transaction check service thread!");
}

@Override
protected void onWaitEnd() {
    //默认6000ms
    long timeout = brokerController.getBrokerConfig().getTransactionTimeOut();
    //默认最大检查15次
    int checkMax = brokerController.getBrokerConfig().getTransactionCheckMax();
    long begin = System.currentTimeMillis();
    log.info("Begin to check prepare message, begin time:{}", begin);
    this.brokerController.getTransactionalMessageService().check(timeout, 
        checkMax, this.brokerController.getTransactionalMessageCheckListener());
    log.info("End to check prepare message, consumed time:{}", System.currentTimeMillis() - begin);
}

5.4.2 check主要流程

TransactionalMessageService#check主要流程

  • 根据RMQ_SYS_TRANS_HALF_TOPIC查找队列,目前只有一个0队列
  • 遍历RMQ_SYS_TRANS_HALF_TOPIC的MessageQueue,每个MessageQueue的处理时间是60s
  • 通过RMQ_SYS_TRANS_HALF_TOPIC的MessageQueue作为Key从缓存中获取RMQ_SYS_TRANS_OP_HALF_TOPIC的MessageQueue,如果不存在,则创建一个
  • 使用CID_RMQ_SYS_TRANS消费组拉取op队列里的消息,一次拉取32条
  • 判断prepare中获取到的消息与OP中的对比,如果OP中包含此消息,则不回查。如果回查超过15次、消息过期,则直接跳过
  • 如果处理时间已经超过了事务的回查时间,则进行回查,否则继续拉取消息。
  • 将消息重新追加prepare消息队列并更新偏移量
  • 发送回查消息给Producer
  • 更新prepare队列和op队列的消费进度
@Override
public void check(long transactionTimeout, int transactionCheckMax,
    AbstractTransactionalMessageCheckListener listener) {
    try {
        String topic = MixAll.RMQ_SYS_TRANS_HALF_TOPIC;
        // RMQ_SYS_TRANS_HALF_TOPIC 只有一个队列
        Set<MessageQueue> msgQueues = transactionalMessageBridge.fetchMessageQueues(topic);
        if (msgQueues == null || msgQueues.size() == 0) {
            log.warn("The queue of topic is empty :" + topic);
            return;
        }
        log.info("Check topic={}, queues={}", topic, msgQueues);
        for (MessageQueue messageQueue : msgQueues) {
            // 队列处理开始时间
            long startTime = System.currentTimeMillis();
            // 一条prepare消息队列对应一个op队列(提交或回滚后),实际就一个队列
            MessageQueue opQueue = getOpQueue(messageQueue);
            // 获取prepare消息队列最新的的消费偏移量
            long halfOffset = transactionalMessageBridge.fetchConsumeOffset(messageQueue);
            // 获取op消息队列最新的消费偏移量
            long opOffset = transactionalMessageBridge.fetchConsumeOffset(opQueue);
            log.info("Before check, the queue={} msgOffset={} opOffset={}", messageQueue,
                     halfOffset, opOffset);
            if (halfOffset < 0 || opOffset < 0) {
                log.error("MessageQueue: {} illegal offset read: {}, op offset: {},skip this queue",
                    messageQueue, halfOffset, opOffset);
                continue;
            }
            // 已经被处理的Op消息的偏移量
            List<Long> doneOpOffset = new ArrayList<>();
            // 已经被删除的prepare消息
            HashMap<Long, Long> removeMap = new HashMap<>();
            // 确认prepare消息是否已经被删除。主要目的是为了避免重复调用事务回查请求
            PullResult pullResult = fillOpRemoveMap(removeMap, opQueue, opOffset,
                    halfOffset, doneOpOffset);
            if (null == pullResult) {
                log.error("The queue={} check msgOffset={} with opOffset={} failed,"
                    + " pullResult is null", messageQueue, halfOffset, opOffset);
                continue;
            }
            // single thread
            // 获取空消息的次数
            int getMessageNullCount = 1;
            long newOffset = halfOffset;
            // 逻辑偏移量
            long i = halfOffset;
            while (true) {
                // 每一个MessageQueue处理时间限制在60s内
                if (System.currentTimeMillis() - startTime > MAX_PROCESS_TIME_LIMIT) {
                    log.info("Queue={} process time reach max={}", 
                        messageQueue, MAX_PROCESS_TIME_LIMIT);
                    break;
                }
                // 如果Prepare消息已经被处理过,则直接remove
                if (removeMap.containsKey(i)) {
                    log.info("Half offset {} has been committed/rolled back", i);
                    removeMap.remove(i);
                } else {
                    // 获取当前要处理的prepare消息
                    GetResult getResult = getHalfMsg(messageQueue, i);
                    MessageExt msgExt = getResult.getMsg();
                    // 消息不存在,直接退出循环
                    if (msgExt == null) {
                        if (getMessageNullCount++ > MAX_RETRY_COUNT_WHEN_HALF_NULL) {
                            break;
                        }
                        if (getResult.getPullResult().getPullStatus() == PullStatus.NO_NEW_MSG) {
                            log.info("No new msg, the miss offset={} in={}, continue check={},"
                                + " pull result={}", i,
                                messageQueue, getMessageNullCount, getResult.getPullResult());
                            break;
                        } else {
                            log.info("Illegal offset, the miss offset={} in={}, continue check={},"
                                + " pull result={}", i,
                                messageQueue, getMessageNullCount, getResult.getPullResult());
                            i = getResult.getPullResult().getNextBeginOffset();
                            newOffset = i;
                            continue;
                        }
                    }
                    // 超过15次丢弃,或者消息过期了(超过了设置的文件保存时间,默认72小时)
                    if (needDiscard(msgExt, transactionCheckMax) || needSkip(msgExt)) {
                        // 默认是打印一条日志
                        listener.resolveDiscardMsg(msgExt);
                        newOffset = i + 1;
                        i++;
                        continue;
                    }
                    // 消息存储时间大于回查程序开始时间的不处理,这是一个防御
                    if (msgExt.getStoreTimestamp() >= startTime) {
                        log.info("Fresh stored. the miss offset={}, check it later, store={}", i,
                            new Date(msgExt.getStoreTimestamp()));
                        break;
                    }
                    // 消息已存储的时间差
                    long valueOfCurrentMinusBorn = System.currentTimeMillis() - msgExt.getBornTimestamp();
                    //默认超时6s
                    long checkImmunityTime = transactionTimeout;
                    //目前此属性只是在测试用例使用
                    String checkImmunityTimeStr = msgExt.getUserProperty(MessageConst.PROPERTY_CHECK_IMMUNITY_TIME_IN_SECONDS);
                    if (null != checkImmunityTimeStr) {
                        //checkImmunityTimeStr如果是-1,则使用transactionTimeout
                        checkImmunityTime = getImmunityTime(checkImmunityTimeStr, transactionTimeout);
                        // 给事务预留时间用于提交事务,此时不应该做回查
                        if (valueOfCurrentMinusBorn < checkImmunityTime) {
                            // 超过检查时间,重新写回prepare消息队列
                            if (checkPrepareQueueOffset(removeMap, doneOpOffset, msgExt, checkImmunityTime)) {
                                newOffset = i + 1;
                                i++;
                                continue;
                            }
                        }
                    } else {
                        // 新提交的prepare消息,暂不处理,此时可能正在提交的路上
                        if ((0 <= valueOfCurrentMinusBorn) && (valueOfCurrentMinusBorn < checkImmunityTime)) {
                            log.info("New arrived, the miss offset={}, check it later checkImmunity={}, born={}", i,
                                checkImmunityTime, new Date(msgExt.getBornTimestamp()));
                            break;
                        }
                    }
                    List<MessageExt> opMsg = pullResult.getMsgFoundList();
                    // checkImmunityTime默认是6秒,第一次可以检查的时间
                    // 时间超过事务超时时间、最后一条消息的存储时间减去处理的起始时间超过超时时间
                    // TODO valueOfCurrentMinusBorn什么情况会<=-1????
                    boolean isNeedCheck = (opMsg == null && valueOfCurrentMinusBorn > checkImmunityTime)
                        || (opMsg != null && (opMsg.get(opMsg.size() - 1).getBornTimestamp() - startTime > transactionTimeout))
                        || (valueOfCurrentMinusBorn <= -1);

                    if (isNeedCheck) {
                        // 把这个消息重新写回prepare消息队列里并更新偏移量
                        if (!putBackHalfMsgQueue(msgExt, i)) {
                            continue;
                        }
                        // 事务回查(异步线程发送),发送消息给Producer
                        listener.resolveHalfMsg(msgExt);
                    } else {
                        // 如果没有超时继续拉
                        pullResult = fillOpRemoveMap(removeMap, opQueue, pullResult.getNextBeginOffset(),
                                    halfOffset, doneOpOffset);
                        log.info("The miss offset:{} in messageQueue:{} need to get more opMsg, result is:{}", i,
                            messageQueue, pullResult);
                        continue;
                    }
                }
                // 消费偏移量加+1
                newOffset = i + 1;
                i++;
            }
            // 说明消费了,此时需要更新偏移量
            if (newOffset != halfOffset) {
                transactionalMessageBridge.updateConsumeOffset(messageQueue, newOffset);
            }

            long newOpOffset = calculateOpOffset(doneOpOffset, opOffset);
            if (newOpOffset != opOffset) {
                transactionalMessageBridge.updateConsumeOffset(opQueue, newOpOffset);
            }
        }
    } catch (Exception e) {
        e.printStackTrace();
        log.error("Check error", e);
    }
}

六、总结

事务消息之所以能解决分布式事务问题,主要是基于两阶段提交,Producer在执行本地事务代码之前,先向Broker发送一条半消息(半消息并没有发到目标队列,而是发送到了系统的事务队列中),此时消费者是消费不到该消息的,然后执行本地事务代码,执行完后将状态告诉BrokerBroker收到状态后,判断是提交还是回滚,提交的话就是将原本在系统事务主题的消息重新投递到目标的Topic,此时消费者能消费到了,如果回滚就将半消息删除,Consumer永远也消费不到该消息啦。以上步骤都能在一个JVM进程中完成,所以能确保这些操作在同一个本地事务中。

Consumer接收到消息后,执行相应的代码,也能确保在一个事务中执行,就算消费过程中出现异常,由于重试机制和死信队列的存在,我们可以认为Consumer最终一定会执行成功。

如果Producer执行本地事务时间太久,一直不给Broker反馈执行状态的话,那么就会触发定时回查机制,其实回查机制就是让Producer自己去执行监听类的check代码,然后发消息告诉Broker当前状态。

参考文章

posted @ 2023-05-06 18:52  夏尔_717  阅读(545)  评论(0编辑  收藏  举报