【转载】CRS-9520:The storage of Grid Infrastructure Management Repository is 93% full

1 巡检数据库发现如下告警：

2021-08-16 02:13:59.207:

[crflogd(6575)]CRS-9520:The storage of Grid Infrastructure Management Repository is 92% full. The storage location is '/data/grid/11.2.0/crf/db/host1'.

2021-08-16 02:19:04.207:

[crflogd(6575)]CRS-9520:The storage of Grid Infrastructure Management Repository is 93% full. The storage location is '/data/grid/11.2.0/crf/db/host1'.

2 查看主机目录使用情况,磁盘目录使用正常

[root@host1 ~]# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/mapper/VolGroup-lv_root 50G 12G 36G 24% /

tmpfs 7.9G 297M 7.6G 4% /dev/shm

/dev/sda2 485M 62M 398M 14% /boot

/dev/sda1 200M 260K 200M 1% /boot/efi

/dev/mapper/VolGroup-lv_home 77G 47G 27G 64% /home

3 查看集群状态正常

[grid@host1 host1]$ crsctl stat res -t

--------------------------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.ARCHIVELOG.dg

ONLINE ONLINE host1

ONLINE ONLINE host2

ora.DATA.dg

ONLINE ONLINE host1

ONLINE ONLINE host2

ora.FLASHBAK.dg

ONLINE ONLINE host1

ONLINE ONLINE host2

ora.LISTENER.lsnr

ONLINE ONLINE host1

ONLINE ONLINE host2

ora.DATA1.dg

ONLINE ONLINE host1

ONLINE ONLINE host2

ora.DATA2.dg

ONLINE ONLINE host1

ONLINE ONLINE host2

ora.asm

ONLINE ONLINE host1 Started

ONLINE ONLINE host2 Started

ora.gsd

OFFLINE OFFLINE host1

OFFLINE OFFLINE host2

ora.net1.network

ONLINE ONLINE host1

ONLINE ONLINE host2

ora.ons

ONLINE ONLINE host1

ONLINE ONLINE host2

ora.registry.acfs

ONLINE ONLINE host1

ONLINE ONLINE host2

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.LISTENER_SCAN1.lsnr

1 ONLINE ONLINE host1

ora.cvu

1 ONLINE ONLINE host1

ora.oc4j

1 OFFLINE OFFLINE

ora.scan1.vip

1 ONLINE ONLINE host1

ora.ywtest.db

1 ONLINE ONLINE host2 Open

2 ONLINE ONLINE host1 Open

ora.host1.vip

1 ONLINE ONLINE host1

ora.host2.vip

1 ONLINE ONLINE host2

4 查看CRF资源的状态，为ONLINE ，使用空间为290M左右，使用的不多

[grid@host1 host1]$ crsctl stat res ora.crf -init -t

--------------------------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.crf

1 ONLINE ONLINE host1

查看CRF相关的进程

[grid@host1 host1]$ ps -ef|grep osysmond

root 5329 1 4 Apr10 ? 5-10:43:26 /data/grid/11.2.0/bin/osysmond.bin

grid 15537 12181 0 14:43 pts/25 00:00:00 grep osysmond

[grid@host1 host1]$ ps -ef|grep ologgerd

root 6575 1 1 Apr10 ? 2-08:01:02 /data/grid/11.2.0/bin/ologgerd -m host2 -r -d

/data/grid/11.2.0/crf/db/host1

grid 16077 12181 0 14:45 pts/25 00:00:00 grep ologgerd

查看CRF对应的目录，发现使用的不多，总共就290M

[root@host1 host1]# du -k

290396 .

使用如下命令，查看CRF 采集数据的时间，单位为秒，17个小时，时间不长。

[grid@host1 ~]$ oclumon

query> manage -get repsize

CHM Repository Size = 61511

Done

5 使用如下方法清理目录

检查ora.crf状态

/data/grid/11.2.0/bin/crsctl stat res ora.crf -init -t

停止ora.crf

/data/grid/11.2.0/bin/crsctl stop res ora.crf -init

检查ora.crf状态

/data/grid/11.2.0/bin/crsctl stat res ora.crf -init -t

删除日志：

rm crf*.bdb

启动ora.crf

/data/grid/11.2.0/bin/crsctl start res ora.crf -init

[root@host1 /]# /data/grid/11.2.0/bin/crsctl stat res ora.crf -init -t

--------------------------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.crf

1 ONLINE ONLINE host1

[root@host1 /]# /data/grid/11.2.0/bin/crsctl stop res ora.crf -init

CRS-2673: Attempting to stop 'ora.crf' on 'host1'

CRS-2677: Stop of 'ora.crf' on 'host1' succeeded

[root@host1 host1]# pwd

/data/grid/11.2.0/crf/db/host1

[root@host1 host1]# rm *.bdb

rm: remove regular file `crfalert.bdb'? y

rm: remove regular file `crfclust.bdb'? y

rm: remove regular file `crfconn.bdb'? y

rm: remove regular file `crfcpu.bdb'? y

rm: remove regular file `crfhosts.bdb'? y

rm: remove regular file `crfloclts.bdb'? y

rm: remove regular file `crfts.bdb'? y

rm: remove regular file `repdhosts.bdb'? y

[root@host1 host1]# /data/grid/11.2.0/bin/crsctl start res ora.crf -init

CRS-2672: Attempting to start 'ora.crf' on 'host1'

CRS-2676: Start of 'ora.crf' on 'host1' succeeded

[root@host1 host1]# /data/grid/11.2.0/bin/crsctl stat res ora.crf -init -t

--------------------------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.crf

1 ONLINE ONLINE host1

[root@host1 host1]# du -sk

64732 .

6 第二天再次检查，还是发现有同样的告警,使用空间226M

2021-08-17 02:23:33.856:

[crflogd(1019)]CRS-9520:The storage of Grid Infrastructure Management Repository is 91% full.

The storage location is '/data/grid/11.2.0/crf/db/host1'.

2021-08-17 02:28:38.855:

[crflogd(1019)]CRS-9520:The storage of Grid Infrastructure Management Repository is 92% full.

The storage location is '/data/grid/11.2.0/crf/db/host1'.

2021-08-17 02:33:43.855:

[crflogd(1019)]CRS-9520:The storage of Grid Infrastructure Management Repository is 93% full.

The storage location is '/data/grid/11.2.0/crf/db/host1'.

[root@host1 host1]# du -sk

226504 .

7 查看官方文件，可以关闭此服务，不影响集群的正常使用。如果此目录达到100%，根据网上查询结果，有可能导致集群异常，故果断关闭此服务，避免引起集群异常。

以下为 Cluster Health Monitor (CHM) FAQ (Doc ID 1328466.1) Oracle官方文档的部分内容：

What is the Cluster Health Monitor?

The Cluster Health Monitor collects OS statistics (system metrics) such as memory and swap space usage,

processes, IO usage, and network related data. The Cluster Health Monitor collects information in real time

and usually once a second. The Cluster Health Monitor collects OS statistics using OS API to gain performance

and reduce the CPU usage overhead. The Cluster Health Monitor collects as much of system metrics

and data as feasible that is restricted by the acceptable level of resource consumption by the tool.

What is the purpose of the Cluster Health Monitor?

The Cluster Health Monitor is developed to provide system metrics and data for troubleshooting many different

types of problems such as node reboot and hang, instance eviction and hang, severe performance degradation,

and any other problems that need the system metrics and data.

By monitoring the data constantly, users can use the Cluster Health Monitor detect potential problem areas

s uch as CPU load, memory constraints, and spinning processes before the problem causes an unwanted outage.

Is stop/start ora.crf affecting clusterware function or cluster database function?

No, stop/start ora.crf resource will stop and start Cluster Health Monitor and its data collection, it will not affect clusterware or database functionality.

How much of overhead does the Cluster Health Monitor cause?

In today's server environment, the Cluster Health Monitor uses approximately less than 3% of the server's capacity for CPU. The overhead of using the Cluster Health Monitor is minimal. However. CHM on the server with large number of disks or IO devices and more CPUs/memory would use more CPU than CHM on a server that does not have many disks and CPUs/memory.

How much of disk space is needed for the Cluster Health Monitor?

The Cluster Health Monitor takes up 1GB space by default on all nodes in the cluster. The approximate amount of

data collected is 0.5 GB per node per day. The size of the repository can increase to collect and save data up to 3

days, and this will increase the disk usage appropriately.

How do I find out the size of data collected and saved by the Cluster Health Monitor in my system?

“oclumon manage -get repsize” will show the size in seconds.

To estimate the space required, use the following formula:

# of nodes * 720MB * 3 = Size required for 3 days retention

eg. for 4 node cluster: 4 * 720 * 3 = 8,640MB (8.4GB)

How can I increase the size of the Cluster Health Monitor repository ?

“oclumon manage -repos resize <number in seconds less than 259200>”. Setting the value to 259200 will

collect and save the data for 72 hours (3 days). It is recommended to set 72 hours of retention based on above

formula. This space needs to be available on all node in the cluster. Please resize the repositories or moving them

if necessary in order to achieve 72 hours of retention.

参考如下文档，可以调整CHM目录使用的大小，但由于此系统空间不足，故不做更改。

How to Relocate Cluster Health Monitor (CHM) Repository and Increase Retention Time (Doc ID 2062234.1)

11.2

In 11.2, the repository of CHM is in Grid home, to change the retention time:

$ <GRID_HOME>/bin/oclumon manage -repos resize 259200

racnode1 --> retention check successful

racnode2 --> retention check successful

New retention is 259200 and will use 4525424640 bytes of disk space

CRS-9115-Cluster Health Monitor repository size change completed on all nodes.

Done

Note: the command line specifies for how many seconds to retain the data and it's recommended to be at least 259200 which is 3 days.

In case there's insufficient amount of space in Grid home, relocate CHM data with the following command:

$ <GRID_HOME>/bin/oclumon manage -repos reploc /home/grid/chm

racnode1 --> Ready to commit new location

racnode2 --> Ready to commit new location

New retention is 259200 and will use 4525424640 bytes of disk space

CRS-9113-Cluster Health Monitor repository location change completed on all nodes. Restarting Loggerd.

8 关闭CRF

[root@host1 11.2.0]# /data/grid/11.2.0/bin/crsctl stop res ora.crf -init

CRS-2673: Attempting to stop 'ora.crf' on 'host1'

CRS-2677: Stop of 'ora.crf' on 'host1' succeeded

[root@host1 11.2.0]# /data/grid/11.2.0/bin/crsctl stat res ora.crf -init -t

--------------------------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.crf

1 OFFLINE OFFLINE

转载地址：

http://blog.itpub.net/69996316/viewspace-2787409/

posted @ 2022-11-24 17:50 雪竹子阅读(309) 评论(0) 编辑收藏举报

刷新页面返回顶部

云居

当你发现自己的才华撑不起野心时，就请安静下来学习吧！

【转载】CRS-9520:The storage of Grid Infrastructure Management Repository is 93% full

公告