KingbaseES V8R6数据库运维案例之---sys_resetwal案例

案例说明:
如下图所示,KingbaseES R6数据库启动失败,通过sys_resetwal重建checkpoint启动数据库。

适用版本:
KingbaseES V8R6

一、问题现象

二、问题分析

1、查看reset前的控制文件的信息

[kingbase@node1 bin]$ ./sys_controldata -D /data/kingbase/v8r6_021/data
sys_control version number:            1201
Catalog version number:               201909212
Database system identifier:           6934557703176943302
Database cluster state:               shut down
sys_control last modified:             Mon 01 Mar 2021 01:04:46 PM CST
Latest checkpoint location:           0/23000028
Latest checkpoint's REDO location:    0/23000028
Latest checkpoint's REDO WAL file:    000000040000000000000023
Latest checkpoint's TimeLineID:       4
Latest checkpoint's PrevTimeLineID:   4
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:1454
Latest checkpoint's NextOID:          57607
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        858
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Mon 01 Mar 2021 01:04:46 PM CST
Fake LSN counter for unlogged rels:   0/3E8
......

2、执行wal日志reset

sys_resetwal帮助信息:

[kingbase@node1 bin]$ ./sys_resetwal --help
sys_resetwal resets the Kingbase write-ahead log.

Usage:
  sys_resetwal [OPTION]... DATADIR

Options:
  -c, --commit-timestamp-ids=XID,XID
                                 set oldest and newest transactions bearing
                                 commit timestamp (zero means no change)
 [-D, --kingbase-data=]DATADIR          data directory
  -e, --epoch=XIDEPOCH           set next transaction ID epoch
  -f, --force                    force update to be done
  -g, --dbmode                   set database mode
  -l, --next-wal-file=WALFILE    set minimum starting location for new WAL
  -m, --multixact-ids=MXID,MXID  set next and oldest multitransaction ID
  -n, --dry-run                  no update, just show what would be done
  -o, --next-oid=OID             set next OID
  -O, --multixact-offset=OFFSET  set next multitransaction offset
  -V, --version                  output version information, then exit
  -x, --next-transaction-id=XID  set next transaction ID
      --wal-segsize=SIZE         size of WAL segments, in megabytes
  -?, --help                     show this help, then exit

Report bugs to <kingbase-bugs@kingbase.com.cn>.

-l walfile

手工设置 WAL 开始地址。
WAL 起始地址应该比当前存在于数据目录下pg_wal 目录中的任意 WAL 段文件名更大。这些名称也是十六进制的并且有三个部分。
第一部分是“时间线 ID”并且通常应该被保持相同。例如,如果00000001000000320000004A是pg_wal中最大的项, 则使用-l 00000001000000320000004B或更高的值。
注意
pg_resetwal本身查看pg_wal 中的文件并选择一个超出最新现存文件名的默认-l设置。因此, 只有当你知道 WAL 段文件当前不在pg_wal中时,或者当pg_wal的内容完全丢失时,才需要对-l 的手工调整,例如一个离线归档中的项。

查看当前数据库wal日志信息:

[kingbase@node1 sys_wal]$ ls -lh
total 81M
-rw-r----- 1 kingbase kingbase  43 Mar  1  2021 00000002.history
-rw-r----- 1 kingbase kingbase  87 Mar  1  2021 00000003.history
-rw-r----- 1 kingbase kingbase 16M Mar  1  2021 000000040000000000000022
-rw-r----- 1 kingbase kingbase 16M Mar  1  2021 000000040000000000000023
-rw-r----- 1 kingbase kingbase 16M Mar  1  2021 000000040000000000000024
-rw------- 1 kingbase kingbase 16M Mar  1  2021 000000040000000000000025
-rw------- 1 kingbase kingbase 16M Mar  1  2021 000000040000000000000026
-rw------- 1 kingbase kingbase 131 Mar  1  2021 00000004.history
drwx------ 2 kingbase kingbase 127 Mar  1  2021 archive_status

三、问题解决

如下所示,执行reset wal更新控制文件:

1、执行reset wal

[kingbase@node1 bin]$ ./sys_resetwal -D /data/kingbase/v8r6_021/data -l 000000040000000000000026
Write-ahead log reset

2、查看reset后的controlfile信息

[kingbase@node1 bin]$ ./sys_controldata  -D /data/kingbase/v8r6_021/data
sys_control version number:            1201
Catalog version number:               201909212
Database system identifier:           6934557703176943302
Database cluster state:               shut down
sys_control last modified:             Wed 09 Mar 2022 11:21:25 AM CST
Latest checkpoint location:           0/27000028
Latest checkpoint's REDO location:    0/27000028
Latest checkpoint's REDO WAL file:    000000040000000000000027
Latest checkpoint's TimeLineID:       4
Latest checkpoint's PrevTimeLineID:   4
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:1454
Latest checkpoint's NextOID:          57607
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        858
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Wed 09 Mar 2022 11:21:25 AM CST
Fake LSN counter for unlogged rels:   0/3E8
......

3、启动数据库服务

 [kingbase@node1 bin]$ ./sys_ctl start -D /data/kingbase/v8r6_021/data
 waiting for server to start....2022-03-09 11:26:08.988 CST [14400] LOG:  sepapower extension initialized
 2022-03-09 11:26:09.222 CST [14400] LOG:  starting KingbaseES V008R006C004B0021 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
 2022-03-09 11:26:09.223 CST [14400] LOG:  listening on IPv6 address "::1", port 54321
 2022-03-09 11:26:09.223 CST [14400] LOG:  listening on IPv4 address "127.0.0.1", port 54321
 2022-03-09 11:26:09.387 CST [14400] LOG:  listening on Unix socket "/tmp/.s.KINGBASE.54321"
 2022-03-09 11:26:09.659 CST [14400] LOG:  redirecting log output to logging collector process
 2022-03-09 11:26:09.659 CST [14400] HINT:  Future log output will appear in directory "sys_log".
 . done
 server started

 [kingbase@node1 sys_log]$ netstat -an |grep 54321
 tcp        0      0 127.0.0.1:54321         0.0.0.0:*               LISTEN     
 tcp6       0      0 ::1:54321               :::*                    LISTEN     
 unix  2      [ ACC ]     STREAM     LISTENING     407954   /tmp/.s.KINGBASE.54321

四、总结:
这个命令不能在服务器正在运行时被使用。如果在数据目录中发现一个服务器锁文件,sys_resetwal将拒绝启动。如果服务器崩溃那么一个锁文件可能会被留下, 在那种情况下你能移除该锁文件来让sys_resetwal运行。 但是在你那样做之前,再次确认没有服务器进程仍然存活。sys_resetwal仅适用于相同主要版本的服务器。

posted @   天涯客1224  阅读(41)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」
点击右上角即可分享
微信分享提示