【转载】CRS-9520:The storage of Grid Infrastructure Management Repository is 93% full

1 巡检数据库发现如下告警:

2021-08-16 02:13:59.207: 

[crflogd(6575)]CRS-9520:The storage of Grid Infrastructure Management Repository is 92% full. The storage location is '/data/grid/11.2.0/crf/db/host1'.

2021-08-16 02:19:04.207: 

[crflogd(6575)]CRS-9520:The storage of Grid Infrastructure Management Repository is 93% full. The storage location is '/data/grid/11.2.0/crf/db/host1'.

 

2 查看主机目录使用情况,磁盘目录使用正常

[root@host1 ~]# df -h

Filesystem                    Size  Used Avail Use% Mounted on

/dev/mapper/VolGroup-lv_root   50G   12G   36G  24% /

tmpfs                         7.9G  297M  7.6G   4% /dev/shm

/dev/sda2                     485M   62M  398M  14% /boot

/dev/sda1                     200M  260K  200M   1% /boot/efi

/dev/mapper/VolGroup-lv_home   77G   47G   27G  64% /home

 

3 查看集群状态正常

[grid@host1 host1]$ crsctl stat res -t

--------------------------------------------------------------------------------

NAME           TARGET  STATE        SERVER                   STATE_DETAILS       

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.ARCHIVELOG.dg

               ONLINE  ONLINE       host1                                      

               ONLINE  ONLINE       host2                                      

ora.DATA.dg

               ONLINE  ONLINE       host1                                      

               ONLINE  ONLINE       host2                                      

ora.FLASHBAK.dg

               ONLINE  ONLINE       host1                                      

               ONLINE  ONLINE       host2                                      

ora.LISTENER.lsnr

               ONLINE  ONLINE       host1                                      

               ONLINE  ONLINE       host2                                      

ora.DATA1.dg

               ONLINE  ONLINE       host1                                      

               ONLINE  ONLINE       host2                                      

ora.DATA2.dg

               ONLINE  ONLINE       host1                                      

               ONLINE  ONLINE       host2                                      

ora.asm

               ONLINE  ONLINE       host1                  Started             

               ONLINE  ONLINE       host2                  Started             

ora.gsd

               OFFLINE OFFLINE      host1                                      

               OFFLINE OFFLINE      host2                                      

ora.net1.network

               ONLINE  ONLINE       host1                                      

               ONLINE  ONLINE       host2                                      

ora.ons

               ONLINE  ONLINE       host1                                      

               ONLINE  ONLINE       host2                                      

ora.registry.acfs

               ONLINE  ONLINE       host1                                      

               ONLINE  ONLINE       host2                                      

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.LISTENER_SCAN1.lsnr

      1        ONLINE  ONLINE       host1                                      

ora.cvu

      1        ONLINE  ONLINE       host1                                      

ora.oc4j

      1        OFFLINE OFFLINE                                                   

ora.scan1.vip

      1        ONLINE  ONLINE       host1                                      

ora.ywtest.db

      1        ONLINE  ONLINE       host2                  Open                

      2        ONLINE  ONLINE       host1                  Open                

ora.host1.vip

      1        ONLINE  ONLINE       host1                                      

ora.host2.vip

      1        ONLINE  ONLINE       host2       

      

4 查看CRF资源的状态,为ONLINE ,使用空间为290M左右,使用的不多     

      

 [grid@host1 host1]$ crsctl stat res ora.crf -init -t

--------------------------------------------------------------------------------

NAME           TARGET  STATE        SERVER                   STATE_DETAILS       

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.crf

      1        ONLINE  ONLINE       host1    

 

查看CRF相关的进程      

      [grid@host1 host1]$ ps -ef|grep osysmond

root      5329     1  4 Apr10 ?        5-10:43:26 /data/grid/11.2.0/bin/osysmond.bin

grid     15537 12181  0 14:43 pts/25   00:00:00 grep osysmond

 

[grid@host1 host1]$ ps -ef|grep ologgerd

root      6575     1  1 Apr10 ?        2-08:01:02 /data/grid/11.2.0/bin/ologgerd -m host2 -r -d 

/data/grid/11.2.0/crf/db/host1

grid     16077 12181  0 14:45 pts/25   00:00:00 grep ologgerd    

 

 

查看CRF对应的目录,发现使用的不多,总共就290M

[root@host1 host1]# du -k 

290396  .

 

使用如下命令,查看CRF 采集数据的时间,单位为秒,17个小时,时间不长。

[grid@host1 ~]$ oclumon

 

query>  manage -get repsize

 

CHM Repository Size = 61511

 

 Done 

5 使用如下方法清理目录

检查ora.crf状态 

 /data/grid/11.2.0/bin/crsctl stat res ora.crf -init -t    

停止ora.crf  

/data/grid/11.2.0/bin/crsctl stop res ora.crf -init 

检查ora.crf状态 

 /data/grid/11.2.0/bin/crsctl stat res ora.crf -init -t  

 

删除日志:

rm crf*.bdb

 

启动ora.crf 

/data/grid/11.2.0/bin/crsctl start res ora.crf -init   

 

 

 

[root@host1 /]# /data/grid/11.2.0/bin/crsctl stat res ora.crf -init -t 

--------------------------------------------------------------------------------

NAME           TARGET  STATE        SERVER                   STATE_DETAILS       

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.crf

      1        ONLINE  ONLINE       host1                                      

[root@host1 /]# /data/grid/11.2.0/bin/crsctl stop res ora.crf -init

CRS-2673: Attempting to stop 'ora.crf' on 'host1'

CRS-2677: Stop of 'ora.crf' on 'host1' succeeded

 

[root@host1 host1]# pwd

/data/grid/11.2.0/crf/db/host1

[root@host1 host1]# rm *.bdb

rm: remove regular file `crfalert.bdb'? y

rm: remove regular file `crfclust.bdb'? y

rm: remove regular file `crfconn.bdb'? y

rm: remove regular file `crfcpu.bdb'? y

rm: remove regular file `crfhosts.bdb'? y

rm: remove regular file `crfloclts.bdb'? y

rm: remove regular file `crfts.bdb'? y

rm: remove regular file `repdhosts.bdb'? y

 

[root@host1 host1]# /data/grid/11.2.0/bin/crsctl start res ora.crf -init  

CRS-2672: Attempting to start 'ora.crf' on 'host1'

CRS-2676: Start of 'ora.crf' on 'host1' succeeded

[root@host1 host1]# /data/grid/11.2.0/bin/crsctl stat res ora.crf -init -t  

--------------------------------------------------------------------------------

NAME           TARGET  STATE        SERVER                   STATE_DETAILS       

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.crf

      1        ONLINE  ONLINE       host1   

      

 

[root@host1 host1]# du -sk

64732   .  

 

6   第二天再次检查,还是发现有同样的告警,使用空间226M

 

2021-08-17 02:23:33.856: 

[crflogd(1019)]CRS-9520:The storage of Grid Infrastructure Management Repository is 91% full. 

The storage location is '/data/grid/11.2.0/crf/db/host1'.

2021-08-17 02:28:38.855: 

[crflogd(1019)]CRS-9520:The storage of Grid Infrastructure Management Repository is 92% full. 

The storage location is '/data/grid/11.2.0/crf/db/host1'.

2021-08-17 02:33:43.855: 

[crflogd(1019)]CRS-9520:The storage of Grid Infrastructure Management Repository is 93% full. 

The storage location is '/data/grid/11.2.0/crf/db/host1'.

 

 

[root@host1 host1]# du -sk

226504  .

 

7 查看官方文件,可以关闭此服务,不影响集群的正常使用。如果此目录达到100%,根据网上查询结果,有可能导致集群异常,故果断关闭此服务,避免引起集群异常。

以下为 Cluster Health Monitor (CHM) FAQ (Doc ID 1328466.1)  Oracle官方文档的部分内容:

 

What is the Cluster Health Monitor?

The Cluster Health Monitor collects OS statistics (system metrics) such as memory and swap space usage, 

processes, IO usage, and network related data. The Cluster Health Monitor collects information in real time 

and usually once a second. The Cluster Health Monitor collects OS statistics using OS API to gain performance 

and reduce the CPU usage overhead. The Cluster Health Monitor collects as much of system metrics 

and data as feasible that is restricted by the acceptable level of resource consumption by the tool.

What is the purpose of the Cluster Health Monitor?

The Cluster Health Monitor is developed to provide system metrics and data for troubleshooting many different 

types of problems such as node reboot and hang, instance eviction and hang, severe performance degradation, 

and any other problems that need the system metrics and data.

 

By monitoring the data constantly, users can use the Cluster Health Monitor detect potential problem areas 

s uch as CPU load, memory constraints, and spinning processes before the problem causes an unwanted outage.

 

Is stop/start ora.crf affecting clusterware function or cluster database function?

No, stop/start ora.crf resource will stop and start Cluster Health Monitor and its data collection, it will not affect clusterware or database functionality.

 

How much of overhead does the Cluster Health Monitor cause?

In today's server environment, the Cluster Health Monitor uses approximately less than 3% of the server's capacity for CPU. The overhead of using the Cluster Health Monitor is minimal.  However. CHM on the server with large number of disks or IO devices and more CPUs/memory would use more CPU than CHM on a server that does not have many disks and CPUs/memory.

 

How much of disk space is needed for the Cluster Health Monitor?

The Cluster Health Monitor takes up 1GB space by default on all nodes in the cluster. The approximate amount of 

data collected is 0.5 GB per node per day. The size of the repository can increase to collect and save data up to 3 

days, and this will increase the disk usage appropriately.

How do I find out the size of data collected and saved by the Cluster Health Monitor in my system?

“oclumon manage -get repsize” will show the size in seconds.

To estimate the space required, use the following formula:

 

# of nodes * 720MB * 3 = Size required for 3 days retention 

eg. for 4 node cluster: 4 * 720 * 3 = 8,640MB (8.4GB)

How can I increase the size of the Cluster Health Monitor repository ?

“oclumon manage -repos resize <number in seconds less than 259200>”. Setting the value to 259200 will 

collect and save the data for 72 hours (3 days). It is recommended to set 72 hours of retention based on above 

formula. This space needs to be available on all node in the cluster. Please resize the repositories or moving them 

if necessary in order to achieve 72 hours of retention.

 

 

参考如下文档,可以调整CHM目录使用的大小,但由于此系统空间不足,故不做更改。

How to Relocate Cluster Health Monitor (CHM) Repository and Increase Retention Time (Doc ID 2062234.1)

 

11.2

In 11.2, the repository of CHM is in Grid home, to change the retention time: 

$ <GRID_HOME>/bin/oclumon manage -repos resize 259200

racnode1 --> retention check successful

racnode2 --> retention check successful

New retention is 259200 and will use 4525424640 bytes of disk space

CRS-9115-Cluster Health Monitor repository size change completed on all nodes.

Done

Note: the command line specifies for how many seconds to retain the data and it's recommended to be at least 259200 which is 3 days.

 

In case there's insufficient amount of space in Grid home, relocate CHM data with the following command:

$ <GRID_HOME>/bin/oclumon manage -repos reploc /home/grid/chm

racnode1 --> Ready to commit new location

racnode2 --> Ready to commit new location

New retention is 259200 and will use 4525424640 bytes of disk space

CRS-9113-Cluster Health Monitor repository location change completed on all nodes. Restarting Loggerd.

 

8 关闭CRF

[root@host1 11.2.0]# /data/grid/11.2.0/bin/crsctl stop res ora.crf -init

CRS-2673: Attempting to stop 'ora.crf' on 'host1'

CRS-2677: Stop of 'ora.crf' on 'host1' succeeded

[root@host1 11.2.0]# /data/grid/11.2.0/bin/crsctl stat res ora.crf -init -t 

--------------------------------------------------------------------------------

NAME           TARGET  STATE        SERVER                   STATE_DETAILS       

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.crf

      1        OFFLINE OFFLINE  

 

转载地址:

http://blog.itpub.net/69996316/viewspace-2787409/

posted @ 2022-11-24 17:50  雪竹子  阅读(309)  评论(0编辑  收藏  举报