游戏,工作,投资,悟禅

工作就是修行

  博客园 :: 首页 :: 博问 :: 闪存 :: 新随笔 :: 联系 :: 订阅 订阅 :: 管理 ::

昨晚给某个gpdb集群做元数据检查,执行gpcheckcat 之后发现 persistent 测试有问题,日志提示如下:

SUMMARY REPORT
===================================================================
Total runtime for 15 test(s): 0:00:11.36
Failed test(s) that are not reported here: persistent
See /home/gpadmin/gpAdminLogs/gpcheckcat_20171122.log for detail

进一步查看 gpcheckcat_20171122.log 日志文件,查看到报错信息如下:

20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[INFO]:-[FAIL] gp_persistent_relation_node   <=> filesystem
20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:-gp_persistent_relation_node   <=> filesystem found 4 issue(s)
20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:-
    SELECT coalesce(a.tablespace_oid, b.tablespace_oid) as tablespace_oid,
       coalesce(a.database_oid, b.database_oid) as database_oid,
       coalesce(a.relfilenode_oid, b.relfilenode_oid) as relfilenode_oid,
       coalesce(a.segment_file_num, b.segment_file_num) as segment_file_num,
       a.relfilenode_oid is null as filesystem,
       b.relfilenode_oid is null as persistent,
       b.relkind, b.relstorage
    FROM   gp_persistent_relation_node a
    FULL OUTER JOIN (
      SELECT p.*, c.relkind, c.relstorage
      FROM   gp_persistent_relation_node_check() p
        LEFT OUTER JOIN pg_class c
          ON (p.relfilenode_oid = c.relfilenode)
      WHERE (p.segment_file_num = 0 or c.relstorage != 'h')
    ) b ON (a.tablespace_oid   = b.tablespace_oid    and
            a.database_oid     = b.database_oid      and
            a.relfilenode_oid  = b.relfilenode_oid   and
            a.segment_file_num = b.segment_file_num)
    WHERE (a.relfilenode_oid is null OR
           (a.persistent_state = 2 and b.relfilenode_oid is null))  and
      coalesce(a.database_oid, b.database_oid) in (
        SELECT oid FROM pg_database WHERE datname = current_database()
        UNION ALL
        SELECT 0
      );

20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:---------
20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:-gpmdw:40000:/data/pri/gpseg0
20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:-  tablespace_oid | database_oid | relfilenode_oid | segment_file_num | filesystem | persistent | relkind | relstorage
20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:-  1663 | 17146 | 9991234 | 0 | t | f | None | None
20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:-  1663 | 17146 | 9998763 | 0 | t | f | None | None
20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:-  1663 | 17146 | 9998764 | 0 | t | f | None | None
20171122:16:21:10:035636 gpcheckcat:gpmdw:gpadmin-[ERROR]:-  1663 | 17146 | 9998765 | 0 | t | f | None | None

发现是文件系统中的文件和-gp_persistent_relation_node 中记录的不一致,进一步检查发现是文件系统中多了一些残留文件,是之前这个节点发生过实例宕机,事务回滚时,有些磁盘文件没有清理;

 

处理方法:

定位到/data/pri/gpseg0/base/17146 目录下,找到这些多余文件9991234,9998763,9998764,9998765,将其移到备份目录;

然后重新执行元数据检查,检查结果正常。

[gpadmin@gpmdw gpcheckcat_log]$ cat gpcheckcat2.log 

Connected as user 'gpadmin' to database 'testdb1', port '5432', gpdb version '4.3'
-------------------------------------------------------------------
Performing test 'unique_index_violation'
Total runtime for test 'unique_index_violation': 0:00:01.13
Performing test 'duplicate'
Total runtime for test 'duplicate': 0:00:01.67
Performing test 'missing_extraneous'
Total runtime for test 'missing_extraneous': 0:00:03.33
Performing test 'inconsistent'
Total runtime for test 'inconsistent': 0:00:02.65
Performing test 'foreign_key'
Total runtime for test 'foreign_key': 0:00:01.47
Performing test 'acl'
Total runtime for test 'acl': 0:00:00.05
Performing test 'persistent'
Total runtime for test 'persistent': 0:00:00.18
Performing test 'pgclass'
Total runtime for test 'pgclass': 0:00:00.02
Performing test 'namespace'
Total runtime for test 'namespace': 0:00:00.02
Performing test 'distribution_policy'
Total runtime for test 'distribution_policy': 0:00:00.00
Performing test 'dependency'
Total runtime for test 'dependency': 0:00:00.58
Performing test 'owner'
Total runtime for test 'owner': 0:00:00.06
Performing test 'part_integrity'
Total runtime for test 'part_integrity': 0:00:00.04
Performing test 'part_constraint'
Total runtime for test 'part_constraint': 0:00:00.08
Performing test 'duplicate_persistent'
Total runtime for test 'duplicate_persistent': 0:00:00.05

SUMMARY REPORT
===================================================================
Total runtime for 15 test(s): 0:00:11.38
Found no catalog issue

 

posted on 2017-11-23 08:38  爱玩游戏的码农  阅读(1948)  评论(0编辑  收藏  举报