【转载】CRS-9520:The storage of Grid Infrastructure Management Repository is 93% full
1 巡检数据库发现如下告警:
2021-08-16 02:13:59.207:
[crflogd(6575)]CRS-9520:The storage of Grid Infrastructure Management Repository is 92% full. The storage location is '/data/grid/11.2.0/crf/db/host1'.
2021-08-16 02:19:04.207:
[crflogd(6575)]CRS-9520:The storage of Grid Infrastructure Management Repository is 93% full. The storage location is '/data/grid/11.2.0/crf/db/host1'.
2 查看主机目录使用情况,磁盘目录使用正常
[root@host1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root 50G 12G 36G 24% /
tmpfs 7.9G 297M 7.6G 4% /dev/shm
/dev/sda2 485M 62M 398M 14% /boot
/dev/sda1 200M 260K 200M 1% /boot/efi
/dev/mapper/VolGroup-lv_home 77G 47G 27G 64% /home
3 查看集群状态正常
[grid@host1 host1]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCHIVELOG.dg
ONLINE ONLINE host1
ONLINE ONLINE host2
ora.DATA.dg
ONLINE ONLINE host1
ONLINE ONLINE host2
ora.FLASHBAK.dg
ONLINE ONLINE host1
ONLINE ONLINE host2
ora.LISTENER.lsnr
ONLINE ONLINE host1
ONLINE ONLINE host2
ora.DATA1.dg
ONLINE ONLINE host1
ONLINE ONLINE host2
ora.DATA2.dg
ONLINE ONLINE host1
ONLINE ONLINE host2
ora.asm
ONLINE ONLINE host1 Started
ONLINE ONLINE host2 Started
ora.gsd
OFFLINE OFFLINE host1
OFFLINE OFFLINE host2
ora.net1.network
ONLINE ONLINE host1
ONLINE ONLINE host2
ora.ons
ONLINE ONLINE host1
ONLINE ONLINE host2
ora.registry.acfs
ONLINE ONLINE host1
ONLINE ONLINE host2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE host1
ora.cvu
1 ONLINE ONLINE host1
ora.oc4j
1 OFFLINE OFFLINE
ora.scan1.vip
1 ONLINE ONLINE host1
ora.ywtest.db
1 ONLINE ONLINE host2 Open
2 ONLINE ONLINE host1 Open
ora.host1.vip
1 ONLINE ONLINE host1
ora.host2.vip
1 ONLINE ONLINE host2
4 查看CRF资源的状态,为ONLINE ,使用空间为290M左右,使用的不多
[grid@host1 host1]$ crsctl stat res ora.crf -init -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.crf
1 ONLINE ONLINE host1
查看CRF相关的进程
[grid@host1 host1]$ ps -ef|grep osysmond
root 5329 1 4 Apr10 ? 5-10:43:26 /data/grid/11.2.0/bin/osysmond.bin
grid 15537 12181 0 14:43 pts/25 00:00:00 grep osysmond
[grid@host1 host1]$ ps -ef|grep ologgerd
root 6575 1 1 Apr10 ? 2-08:01:02 /data/grid/11.2.0/bin/ologgerd -m host2 -r -d
/data/grid/11.2.0/crf/db/host1
grid 16077 12181 0 14:45 pts/25 00:00:00 grep ologgerd
查看CRF对应的目录,发现使用的不多,总共就290M
[root@host1 host1]# du -k
290396 .
使用如下命令,查看CRF 采集数据的时间,单位为秒,17个小时,时间不长。
[grid@host1 ~]$ oclumon
query> manage -get repsize
CHM Repository Size = 61511
Done
5 使用如下方法清理目录
检查ora.crf状态
/data/grid/11.2.0/bin/crsctl stat res ora.crf -init -t
停止ora.crf
/data/grid/11.2.0/bin/crsctl stop res ora.crf -init
检查ora.crf状态
/data/grid/11.2.0/bin/crsctl stat res ora.crf -init -t
删除日志:
rm crf*.bdb
启动ora.crf
/data/grid/11.2.0/bin/crsctl start res ora.crf -init
[root@host1 /]# /data/grid/11.2.0/bin/crsctl stat res ora.crf -init -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.crf
1 ONLINE ONLINE host1
[root@host1 /]# /data/grid/11.2.0/bin/crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'host1'
CRS-2677: Stop of 'ora.crf' on 'host1' succeeded
[root@host1 host1]# pwd
/data/grid/11.2.0/crf/db/host1
[root@host1 host1]# rm *.bdb
rm: remove regular file `crfalert.bdb'? y
rm: remove regular file `crfclust.bdb'? y
rm: remove regular file `crfconn.bdb'? y
rm: remove regular file `crfcpu.bdb'? y
rm: remove regular file `crfhosts.bdb'? y
rm: remove regular file `crfloclts.bdb'? y
rm: remove regular file `crfts.bdb'? y
rm: remove regular file `repdhosts.bdb'? y
[root@host1 host1]# /data/grid/11.2.0/bin/crsctl start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'host1'
CRS-2676: Start of 'ora.crf' on 'host1' succeeded
[root@host1 host1]# /data/grid/11.2.0/bin/crsctl stat res ora.crf -init -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.crf
1 ONLINE ONLINE host1
[root@host1 host1]# du -sk
64732 .
6 第二天再次检查,还是发现有同样的告警,使用空间226M
2021-08-17 02:23:33.856:
[crflogd(1019)]CRS-9520:The storage of Grid Infrastructure Management Repository is 91% full.
The storage location is '/data/grid/11.2.0/crf/db/host1'.
2021-08-17 02:28:38.855:
[crflogd(1019)]CRS-9520:The storage of Grid Infrastructure Management Repository is 92% full.
The storage location is '/data/grid/11.2.0/crf/db/host1'.
2021-08-17 02:33:43.855:
[crflogd(1019)]CRS-9520:The storage of Grid Infrastructure Management Repository is 93% full.
The storage location is '/data/grid/11.2.0/crf/db/host1'.
[root@host1 host1]# du -sk
226504 .
7 查看官方文件,可以关闭此服务,不影响集群的正常使用。如果此目录达到100%,根据网上查询结果,有可能导致集群异常,故果断关闭此服务,避免引起集群异常。
以下为 Cluster Health Monitor (CHM) FAQ (Doc ID 1328466.1) Oracle官方文档的部分内容:
What is the Cluster Health Monitor?
The Cluster Health Monitor collects OS statistics (system metrics) such as memory and swap space usage,
processes, IO usage, and network related data. The Cluster Health Monitor collects information in real time
and usually once a second. The Cluster Health Monitor collects OS statistics using OS API to gain performance
and reduce the CPU usage overhead. The Cluster Health Monitor collects as much of system metrics
and data as feasible that is restricted by the acceptable level of resource consumption by the tool.
What is the purpose of the Cluster Health Monitor?
The Cluster Health Monitor is developed to provide system metrics and data for troubleshooting many different
types of problems such as node reboot and hang, instance eviction and hang, severe performance degradation,
and any other problems that need the system metrics and data.
By monitoring the data constantly, users can use the Cluster Health Monitor detect potential problem areas
s uch as CPU load, memory constraints, and spinning processes before the problem causes an unwanted outage.
Is stop/start ora.crf affecting clusterware function or cluster database function?
No, stop/start ora.crf resource will stop and start Cluster Health Monitor and its data collection, it will not affect clusterware or database functionality.
How much of overhead does the Cluster Health Monitor cause?
In today's server environment, the Cluster Health Monitor uses approximately less than 3% of the server's capacity for CPU. The overhead of using the Cluster Health Monitor is minimal. However. CHM on the server with large number of disks or IO devices and more CPUs/memory would use more CPU than CHM on a server that does not have many disks and CPUs/memory.
How much of disk space is needed for the Cluster Health Monitor?
The Cluster Health Monitor takes up 1GB space by default on all nodes in the cluster. The approximate amount of
data collected is 0.5 GB per node per day. The size of the repository can increase to collect and save data up to 3
days, and this will increase the disk usage appropriately.
How do I find out the size of data collected and saved by the Cluster Health Monitor in my system?
“oclumon manage -get repsize” will show the size in seconds.
To estimate the space required, use the following formula:
# of nodes * 720MB * 3 = Size required for 3 days retention
eg. for 4 node cluster: 4 * 720 * 3 = 8,640MB (8.4GB)
How can I increase the size of the Cluster Health Monitor repository ?
“oclumon manage -repos resize <number in seconds less than 259200>”. Setting the value to 259200 will
collect and save the data for 72 hours (3 days). It is recommended to set 72 hours of retention based on above
formula. This space needs to be available on all node in the cluster. Please resize the repositories or moving them
if necessary in order to achieve 72 hours of retention.
参考如下文档,可以调整CHM目录使用的大小,但由于此系统空间不足,故不做更改。
How to Relocate Cluster Health Monitor (CHM) Repository and Increase Retention Time (Doc ID 2062234.1)
11.2
In 11.2, the repository of CHM is in Grid home, to change the retention time:
$ <GRID_HOME>/bin/oclumon manage -repos resize 259200
racnode1 --> retention check successful
racnode2 --> retention check successful
New retention is 259200 and will use 4525424640 bytes of disk space
CRS-9115-Cluster Health Monitor repository size change completed on all nodes.
Done
Note: the command line specifies for how many seconds to retain the data and it's recommended to be at least 259200 which is 3 days.
In case there's insufficient amount of space in Grid home, relocate CHM data with the following command:
$ <GRID_HOME>/bin/oclumon manage -repos reploc /home/grid/chm
racnode1 --> Ready to commit new location
racnode2 --> Ready to commit new location
New retention is 259200 and will use 4525424640 bytes of disk space
CRS-9113-Cluster Health Monitor repository location change completed on all nodes. Restarting Loggerd.
8 关闭CRF
[root@host1 11.2.0]# /data/grid/11.2.0/bin/crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'host1'
CRS-2677: Stop of 'ora.crf' on 'host1' succeeded
[root@host1 11.2.0]# /data/grid/11.2.0/bin/crsctl stat res ora.crf -init -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.crf
1 OFFLINE OFFLINE
转载地址:
http://blog.itpub.net/69996316/viewspace-2787409/
微信赞赏
支付宝赞赏