Oracle集群RAC DG日常检查指令
操作系统进程检查
Pmon检查
- pmon(Process Monitor process)用于监控其他后台进程。负责在连接出现异常中止后进行清理工作。例如,一个专用服务器进程崩溃或者出于某种原因被结束掉,就要由PMON进程负责善后(恢复或者撤销工作),并释放资源。PMON会回滚未提交的工作,释放锁,并释放之前为失败进程分配的SGA资源。
PMON还负责监视其他Oracle后台进程,并在必要时重启这些后台进程。 - 因此在操作系统侧,可以首先检查pmon进程
092755.221: [oracle ~ ]$ ps -ef |grep -i pmon
092755.266: oracle 22589 22483 0 09:27 pts/1 00:00:00 grep --color=auto -i pmon
092755.271: grid 41673 1 0 Nov03 ? 00:00:19 asm_pmon_+ASM1
092755.271: oracle 43062 1 0 Nov03 ? 00:00:55 ora_pmon_oracle1
负载检查
[oracle@ ~ ]$ top
094246.285: top - 09:42:46 up 3 days, 13:45, 2 users, load average: 0.44, 0.49, 0.60
094246.290: Tasks: 1179 total, 2 running, 1177 sleeping, 0 stopped, 0 zombie
094246.290: %Cpu(s): 2.7 us, 0.6 sy, 0.0 ni, 96.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
094246.290: KiB Mem: 26347395+total, 24792364+used, 15550296 free, 1356312 buffers
094246.290: KiB Swap: 33554428 total, 8960 used, 33545468 free. 71150048 cached Mem
094246.295:
094246.295: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
094246.295: 35858 oracle 20 0 0.148t 88408 83244 R 100.00 0.034 0:21.06 oracle_35858_or
094246.295: 36340 oracle 20 0 0.148t 85412 80980 S 11.111 0.032 0:00.04 oracle_36340_or
094246.295: 36341 oracle 20 0 16348 3500 2224 R 11.111 0.001 0:00.05 top
094246.295: 35766 root 20 0 5509468 156808 31652 S 5.556 0.060 23:24.58 java
094246.295: 36338 oracle 20 0 0.148t 85392 80960 S 5.556 0.032 0:00.04 oracle_36338_or
094246.295: 1 root 20 0 181660 6184 4180 S 0.000 0.002 2:08.17 systemd
094246.295: 2 root 20 0 0 0 0 S 0.000 0.000 0:00.12 kthreadd
094246.295: 4 root 0 -20 0 0 0 S 0.000 0.000 0:00.00 kworker/0:0H
094246.295: 6 root 0 -20 0 0 0 S 0.000 0.000 0:00.00 mm_percpu_wq
094246.295: 7 root 20 0 0 0 0 S 0.000 0.000 0:13.67 ksoftirqd/0
094246.295: 8 root 20 0 0 0 0 S 0.000 0.000 6:47.36 rcu_sched
094246.295: 9 root 20 0 0 0 0 S 0.000 0.000 0:00.00 rcu_bh
094246.295: 10 root rt 0 0 0 0 S 0.000 0.000 0:00.00 migration/0
094246.295: 11 root rt 0 0 0 0 S 0.000 0.000 0:00.37 watchdog/0
094246.295: 12 root 20 0 0 0 0 S 0.000 0.000 0:00.00 cpuhp/0
094246.295: 13 root 20 0 0 0 0 S 0.000 0.000 0:00.00 cpuhp/1
094246.295: 14 root rt 0 0 0 0 S 0.000 0.000 0:00.36 watchdog/1
094246.295: 15 root rt 0 0 0 0 S 0.000 0.000 0:00.01 migration/1
094246.295: 16 root 20 0 0 0 0 S 0.000 0.000 0:03.20 ksoftirqd/1
094246.295: 18 root 0 -20 0 0 0 S 0.000 0.000 0:00.00 kworker/1:0H
094246.295: 19 root 20 0 0 0 0 S 0.000 0.000 0:00.00 cpuhp/2
094246.295: 20 root rt 0 0 0 0 S 0.000 0.000 0:00.37 watchdog/2
094246.295: 21 root rt 0 0 0 0 S 0.000 0.000 0:00.01 migration/2
094246.300: 22 root 20 0 0 0 0 S 0.000 0.000 0:02.63 ksoftirqd/2
094246.300: 24 root 0 -20 0 0 0 S 0.000 0.000 0:00.00 kworker/2:0H
094246.300: 25 root 20 0 0 0 0 S 0.000 0.000 0:00.00 cpuhp/3
094246.300: 26 root rt 0 0 0 0 S 0.000 0.000 0:00.38 watchdog/3
094246.300: 27 root rt 0 0 0 0 S 0.000 0.000 0:00.01 migration/3
094246.300: 28 root 20 0 0 0 0 S 0.000 0.000 0:01.19 ksoftirqd/3
094246.300: 30 root 0 -20 0 0 0 S 0.000 0.000 0:00.00 kworker/3:0H
094246.320: 31 root 20 0 0 0 0 S 0.000 0.000 0:00.00 cpuhp/4
094246.320: 32 root rt 0 0 0 0 S 0.000 0.000 0:00.39 watchdog/4
094246.320: 33 root rt 0 0 0 0 S 0.000 0.000 0:00.00 migration/4
094246.320: 34 root 20 0 0 0 0 S 0.000 0.000 0:02.01 ksoftirqd/4
094246.320: 36 root 0 -20 0 0 0 S 0.000 0.000 0:00.00 kworker/4:0H
094246.320: 37 root 20 0 0 0 0 S 0.000 0.000 0:00.00 cpuhp/5
094246.320: 38 root rt 0 0 0 0 S 0.000 0.000 0:00.40 watchdog/5
094246.320: 39 root rt 0 0 0 0 S 0.000 0.000 0:00.00 migration/5
094246.320: 40 root 20 0 0 0 0 S 0.000 0.000 0:03.99 ksoftirqd/5
094246.320: 42 root 0 -20 0 0 0 S 0.000 0.000 0:00.00 kworker/5:0H
094246.320: 43 root 20 0 0 0 0 S 0.000 0.000 0:00.00 cpuhp/6
094246.325: 44 root rt 0 0 0 0 S 0.000 0.000 0:00.39 watchdog/6
094246.325: 45 root rt 0 0 0 0 S 0.000 0.000 0:00.00 migration/6
094246.325: 46 root 20 0 0 0 0 S 0.000 0.000 0:02.34 ksoftirqd/6
094246.325: 48 root 0 -20 0 0 0 S 0.000 0.000 0:00.00 kworker/6:0H
094246.325: 49 root 20 0 0 0 0 S 0.000 0.000 0:00.00 cpuhp/7
094246.325: 50 root rt 0 0 0 0 S 0.000 0.000 0:00.40 watchdog/7
094246.325: 51 root rt 0 0 0 0 S 0.000 0.000 0:00.00 migration/7
094246.325: 52 root 20 0 0 0 0 S 0.000 0.000 0:01.66 ksoftirqd/7
094246.325: 54 root 0 -20 0 0 0 S 0.000 0.000 0:00.00 kworker/7:0H
094246.325: 55 root 20 0 0 0 0 S 0.000 0.000 0:00.00 cpuhp/8
094246.325: 56 root rt 0 0 0 0 S 0.000 0.000 0:00.37 watchdog/8
数据库检查
查看数据库打开状态和相关信息
- 确定当前库为主库
093045.590: SQL> select inst_id,open_mode,database_role from gv$database;
093045.610:
093045.610: INST_ID OPEN_MODE DATABASE_ROLE
093045.610: ---------- -------------------- ----------------
093045.610: 2 READ WRITE PRIMARY
093045.610: 1 READ WRITE PRIMARY
查找主库
show parameter config
show parameter unique (主库唯一名)
show parameter log_archive_config(主库 备库唯一名都能看到,由此确认备库唯一名)
判断集群正常与否
判断会话等待
检查以下会话等待,主要最多的会话等待没有以下几项即可
SQL>
094456.693: 1 select inst_id,event,count(*)
094456.693: 2 from gv$session
094456.693: 3 where WAIT_CLASS!='IDLE'
094456.693: 4* group by inst_id,event
094456.698:
094456.698: INST_ID EVENT COUNT(*)
094456.698: ---------- ---------------------------------------------------------------- ----------
094456.698: 2 Space Manager: slave idle wait 52
094456.698: 2 gcs yield cpu 3
094456.698: 2 Data Guard: Timer 1
094456.723: 2 GCR sleep 2
094456.723: 2 wait for unread message on broadcast channel 3
094456.723: 2 pman timer 1
094456.723: 2 gcs remote message 3
094456.723: 2 REPL Capture/Apply: RAC AQ qmn coordinator 1
094456.723: 2 lreg timer 1
094456.723: 1 lreg timer 1
094456.723: 2 PX Deq: Execution Msg 1
094456.723: 1 Space Manager: slave idle wait 51
094456.723: 1 pmon timer 7
094456.728: 1 VKTM Logical Idle Wait 1
094456.728: 1 DIAG idle wait 2
094456.728: 1 SCM slave idle 1
094456.728: 1 gcs remote message 3
094456.728: 1 heartbeat redo informer 1
094456.728: 1 smon timer 1
094456.728: 2 class slave wait 6
094456.728: 2 VKTM Logical Idle Wait 1
094456.728: 2 Data Guard: Gap Manager 1
094456.728: 2 ges remote message 2
094456.728: 1 OFS idle 1
094456.728: 1 PING 1
094456.728: 1 pman timer 1
094456.728: 1 ges remote message 2
094456.728: 1 Streams AQ: qmn coordinator idle wait 1
094456.728: 2 LMS CR slave timer 3
094456.733: 2 DIAG idle wait 2
094456.733: 2 PING 1
094456.733: 2 smon timer 1
094456.733: 1 Streams AQ: waiting for time management or cleanup tasks 1
094456.733: 1 Data Guard: Timer 1
094456.733: 2 Streams AQ: qmn coordinator idle wait 1
094456.733: 1 watchdog main loop 5
094456.733: 1 wait for unread message on broadcast channel 3
094456.733: 2 ASM cluster membership changes 1
094456.733: 2 OFS idle 1
094456.733: 2 VKRM Idle 1
094456.733: 2 Streams AQ: waiting for time management or cleanup tasks 1
094456.733: 1 rdbms ipc message 32
094456.733: 1 ASM background timer 1
094456.733: 1 GCR sleep 2
094456.733: 1 class slave wait 6
094456.733: 1 Data Guard: Gap Manager 1
094456.733: 2 rdbms ipc message 32
094456.733: 2 pmon timer 7
094456.733: 2 watchdog main loop 5
094456.733: 2 heartbeat redo informer 1
094456.733: 2 AQPC idle 1
094456.733: 2 SQL*Net message from client 6
094456.733: 1 ASM cluster membership changes 1
094456.733: 1 LMS CR slave timer 3
094456.733: 1 gcs yield cpu 3
094456.733: 1 VKRM Idle 1
094456.733: 2 ASM background timer 1
094456.733: 2 SCM slave idle 1
094456.733: 2 Streams AQ: qmn slave idle wait 1
094456.733: 1 SQL*Net message from client 187
094456.733: 1 Streams AQ: qmn slave idle wait 1
094456.733: 1 PX Deq: Execution Msg 1
094456.738: 1 PX Deq: Execute Reply 1
094456.738: 1 AQPC idle 1
094456.738: 1 REPL Capture/Apply: RAC AQ qmn coordinator 1
094456.738:
查看连接数 并与数据库配置对比
select inst_id,status,count(*) from gv$session
group by inst_id,status order by 1,3;
show parameter process
判断集群和DG状态
RAC
su - grid
crsctl check crs 判断几个核心任务是否online
crsctl stat res -t 查看所有资源是否online
crsctl query css votedisk 看所有投票盘是否在线
asmcmd lsdg 看freeMB 和 offline disk
DG
select * from gv$dataguard_stats order by inst_id 查看应用时间和传输时间
select process,status form v$managed_standby; 看是否有MRO0进程(该进程负责同步日志)
select database_role,to_char(current_scn) from gv$database; 查看两个库的scn