KingbaseES V8R6数据库运维案例之---sys_resetwal案例
案例说明:
如下图所示,KingbaseES R6数据库启动失败,通过sys_resetwal重建checkpoint启动数据库。
适用版本:
KingbaseES V8R6
一、问题现象
二、问题分析
1、查看reset前的控制文件的信息
[kingbase@node1 bin]$ ./sys_controldata -D /data/kingbase/v8r6_021/data
sys_control version number: 1201
Catalog version number: 201909212
Database system identifier: 6934557703176943302
Database cluster state: shut down
sys_control last modified: Mon 01 Mar 2021 01:04:46 PM CST
Latest checkpoint location: 0/23000028
Latest checkpoint's REDO location: 0/23000028
Latest checkpoint's REDO WAL file: 000000040000000000000023
Latest checkpoint's TimeLineID: 4
Latest checkpoint's PrevTimeLineID: 4
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:1454
Latest checkpoint's NextOID: 57607
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 858
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 0
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Mon 01 Mar 2021 01:04:46 PM CST
Fake LSN counter for unlogged rels: 0/3E8
......
2、执行wal日志reset
sys_resetwal帮助信息:
[kingbase@node1 bin]$ ./sys_resetwal --help
sys_resetwal resets the Kingbase write-ahead log.
Usage:
sys_resetwal [OPTION]... DATADIR
Options:
-c, --commit-timestamp-ids=XID,XID
set oldest and newest transactions bearing
commit timestamp (zero means no change)
[-D, --kingbase-data=]DATADIR data directory
-e, --epoch=XIDEPOCH set next transaction ID epoch
-f, --force force update to be done
-g, --dbmode set database mode
-l, --next-wal-file=WALFILE set minimum starting location for new WAL
-m, --multixact-ids=MXID,MXID set next and oldest multitransaction ID
-n, --dry-run no update, just show what would be done
-o, --next-oid=OID set next OID
-O, --multixact-offset=OFFSET set next multitransaction offset
-V, --version output version information, then exit
-x, --next-transaction-id=XID set next transaction ID
--wal-segsize=SIZE size of WAL segments, in megabytes
-?, --help show this help, then exit
Report bugs to <kingbase-bugs@kingbase.com.cn>.
-l walfile
手工设置 WAL 开始地址。
WAL 起始地址应该比当前存在于数据目录下pg_wal 目录中的任意 WAL 段文件名更大。这些名称也是十六进制的并且有三个部分。
第一部分是“时间线 ID”并且通常应该被保持相同。例如,如果00000001000000320000004A是pg_wal中最大的项, 则使用-l 00000001000000320000004B或更高的值。
注意
pg_resetwal本身查看pg_wal 中的文件并选择一个超出最新现存文件名的默认-l设置。因此, 只有当你知道 WAL 段文件当前不在pg_wal中时,或者当pg_wal的内容完全丢失时,才需要对-l 的手工调整,例如一个离线归档中的项。
查看当前数据库wal日志信息:
[kingbase@node1 sys_wal]$ ls -lh
total 81M
-rw-r----- 1 kingbase kingbase 43 Mar 1 2021 00000002.history
-rw-r----- 1 kingbase kingbase 87 Mar 1 2021 00000003.history
-rw-r----- 1 kingbase kingbase 16M Mar 1 2021 000000040000000000000022
-rw-r----- 1 kingbase kingbase 16M Mar 1 2021 000000040000000000000023
-rw-r----- 1 kingbase kingbase 16M Mar 1 2021 000000040000000000000024
-rw------- 1 kingbase kingbase 16M Mar 1 2021 000000040000000000000025
-rw------- 1 kingbase kingbase 16M Mar 1 2021 000000040000000000000026
-rw------- 1 kingbase kingbase 131 Mar 1 2021 00000004.history
drwx------ 2 kingbase kingbase 127 Mar 1 2021 archive_status
三、问题解决
如下所示,执行reset wal更新控制文件:
1、执行reset wal
[kingbase@node1 bin]$ ./sys_resetwal -D /data/kingbase/v8r6_021/data -l 000000040000000000000026
Write-ahead log reset
2、查看reset后的controlfile信息
[kingbase@node1 bin]$ ./sys_controldata -D /data/kingbase/v8r6_021/data
sys_control version number: 1201
Catalog version number: 201909212
Database system identifier: 6934557703176943302
Database cluster state: shut down
sys_control last modified: Wed 09 Mar 2022 11:21:25 AM CST
Latest checkpoint location: 0/27000028
Latest checkpoint's REDO location: 0/27000028
Latest checkpoint's REDO WAL file: 000000040000000000000027
Latest checkpoint's TimeLineID: 4
Latest checkpoint's PrevTimeLineID: 4
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:1454
Latest checkpoint's NextOID: 57607
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 858
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 0
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Wed 09 Mar 2022 11:21:25 AM CST
Fake LSN counter for unlogged rels: 0/3E8
......
3、启动数据库服务
[kingbase@node1 bin]$ ./sys_ctl start -D /data/kingbase/v8r6_021/data
waiting for server to start....2022-03-09 11:26:08.988 CST [14400] LOG: sepapower extension initialized
2022-03-09 11:26:09.222 CST [14400] LOG: starting KingbaseES V008R006C004B0021 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
2022-03-09 11:26:09.223 CST [14400] LOG: listening on IPv6 address "::1", port 54321
2022-03-09 11:26:09.223 CST [14400] LOG: listening on IPv4 address "127.0.0.1", port 54321
2022-03-09 11:26:09.387 CST [14400] LOG: listening on Unix socket "/tmp/.s.KINGBASE.54321"
2022-03-09 11:26:09.659 CST [14400] LOG: redirecting log output to logging collector process
2022-03-09 11:26:09.659 CST [14400] HINT: Future log output will appear in directory "sys_log".
. done
server started
[kingbase@node1 sys_log]$ netstat -an |grep 54321
tcp 0 0 127.0.0.1:54321 0.0.0.0:* LISTEN
tcp6 0 0 ::1:54321 :::* LISTEN
unix 2 [ ACC ] STREAM LISTENING 407954 /tmp/.s.KINGBASE.54321
四、总结:
这个命令不能在服务器正在运行时被使用。如果在数据目录中发现一个服务器锁文件,sys_resetwal将拒绝启动。如果服务器崩溃那么一个锁文件可能会被留下, 在那种情况下你能移除该锁文件来让sys_resetwal运行。 但是在你那样做之前,再次确认没有服务器进程仍然存活。sys_resetwal仅适用于相同主要版本的服务器。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」