记一次gp扩容失败后无法回滚的问题

异常描述

gp版本:6.9.0

异常信息:替换gp版本扩容后,gpexpand -r 回滚失败,提示异常:Catalog has been changed, the cluster can not rollback.

解决办法

排查思路

gpexpand文件:

 def rollback(self, dburl):
        """Rolls back and expansion setup that didn't successfully complete"""
        status_history = self.statusLogger.get_status_history()
        if not status_history:
            raise ExpansionError('No status history to rollback.')
     
        if (status_history[-1])[0] == 'EXPANSION_PREPARE_DONE':
            raise ExpansionError('Expansion preparation complete.  Nothing to rollback')

        for status in reversed(status_history):
            #失败位置
            if not self.statusLogger.can_rollback(status[0]):
                raise ExpansionError('Catalog has been changed, the cluster can not rollback.')


   ....
   #调用位置
    def can_rollback(self, status):
        """Return if it can rollback under current status"""
        #状态>=UPDATE_CATALOG_DONE则无法回滚
        if int(self._status_values[status]) >= int(self._status_values['UPDATE_CATALOG_DONE']):
            return False
        return True


    ...
def __init__(self, logger, coordinator_data_directory, coordinator_mirror=None):
        self.logger = logger
        #状态,大于8则无法回滚
        self._status_values = {'UNINITIALIZED': 1,
                               'EXPANSION_PREPARE_STARTED': 2,
                               'BUILD_SEGMENT_TEMPLATE_STARTED': 3,
                               'BUILD_SEGMENT_TEMPLATE_DONE': 4,
                               'BUILD_SEGMENTS_STARTED': 5,
                               'BUILD_SEGMENTS_DONE': 6,
                               'UPDATE_CATALOG_STARTED': 7,
                               'UPDATE_CATALOG_DONE': 8,
                               'SETUP_EXPANSION_SCHEMA_STARTED': 9,
                               'SETUP_EXPANSION_SCHEMA_DONE': 10,
                               'PREPARE_EXPANSION_SCHEMA_STARTED': 11,
                               'PREPARE_EXPANSION_SCHEMA_DONE': 12,
                               'EXPANSION_PREPARE_DONE': 13
                               }

继续排查,找到扩容状态来源于master数据目录下的gpexpand.status文件,删除UPDATE_CATALOG_DONE:None行及后面的数据,重新执行gpexpand -r回滚就可以成功。

其他问题

如果gp已经停止,并且无法启动,先以master模式启动:

gpstart -m

成功后执行以下命令回滚:

PGOPTIONS="-c gp_session_role=utility" gpexpand -r

退出master模式:

gpstop -m 

启动gp集群:

gpstart -a

 

posted @ 2024-02-23 11:49  李煜.YN  阅读(20)  评论(0编辑  收藏  举报