Ocfs2文件系统常见问题解决方法

现象一:
mount -t ocfs2 -o datavolume,nointr /dev/sdb1 /webdata

mount.ocfs2: Transport endpoint is not connected while mounting /dev/sdb1 on /webdata. Check 'dmesg' for more information on this error.

 

 

可能问题:

1:防火墙打开着,没有关闭,屏蔽了心跳端口

2:各个节点的/etc/init.d/o2cb configure值配置不同导致。

IXDBA.NET社区论坛

3:一个节点处于挂载中,另外一个节点刚刚配置好,重启了ocfs2服务导致,此时只要把连个节点都重启一下服务即可完成挂载。

4:SElinux没有关闭导致。

下面是一个案例:

[root@test02 ~]# mount -t ocfs2 /dev/vg_ocfs/lv_u02 /u02
mount.ocfs2: Transport endpoint is not connected while mounting /dev/vg_ocfs/lv_u02 on /u02. Check 'dmesg' for more information on this error.

出现这个错误是由于配置OCFS时O2CB_HEARTBEAT_THRESHOLD各节点的值不一样导致的。我用/etc/init.d/o2cb configure时其实各个节点的值已经都一样了,不过第一个节点忘了重启o2cb,结果查了好久才发现。接下当然是把已经MOUNT的OCFS目录UMOUNT掉,结果又出错了:

[root@test01 u02]# umount -f /u02
umount2: Device or resource busy
umount: /u02: device is busy
umount2: Device or resource busy
umount: /u02: device is busy

这时候应该用/etc/init.d/ocfs2 stop和/etc/init.d/o2cb stop停掉OCFS2和O2CB再UMOUJNT才行,然后把OCFS2和O2CB启动以后其他节点就可以顺利MOUNT OCFS了。

 

现象二:
# /etc/init.d/o2cb online ocfs2

Starting cluster ocfs2: Failed

Cluster ocfs2 created

o2cb_ctl: Configuration error discovered while populating cluster ocfs2. None of its nodes were considered local. A node is considered local when its node name in the configuration maches this machine's host name.

Stopping cluster ocfs2: OK

 主机名问题,检查more /etc/ocfs2/cluster.conf以及/etc/hosts文件信息,修改相应的主机名即可

注意:为了保证开机能自动挂载ocfs2文件系统,需要在/etc/fstab加入自动启动选项后,必须在/etc/hosts中加入两个节点的主机名和ip的对应解析,主机名和 /etc/ocfs2/cluster.conf配置的主机名一定要相同。

现象三

1: Starting O2CB cluster ocfs2: Failed
在安装完ocfs2 后,配置o2cb 出错:
[root@rac1 ocfs2]# /etc/init.d/o2cb configure
Configuring the O2CB driver.

This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot. The current values will be shown in brackets ('[]'). Hitting
<ENTER> without typing an answer will keep that current value. Ctrl-C
will abort.

Load O2CB driver on boot (y/n) [y]:
Cluster to start on boot (Enter "none" to clear) [ocfs2]:
Specify heartbeat dead threshold (>=7) [7]:
Writing O2CB configuration: OK
Starting O2CB cluster ocfs2: Failed
Cluster ocfs2 created
     o2cb_ctl: Configuration error discovered while populating cluster ocfs2. None of its nodes were considered local. A node is considered local when its node name in the configuration matches this machine's host name.
Stopping O2CB cluster ocfs2: OK

 

 

出现这中情况,应该是OCFS没有配置,可以看一下,有一个图形ocfs配置命令,首先要配置他,而且最好 用IP地址,不要用主机名!

也就是说,在启动ocfs2时,ocfs节点配置文件一定要配置好,如果没有配置正确,就会报错,同时在用图形界面配置的时候,/etc/ocfs2/cluster.conf文件最好是空文件,要不然也会报错!


现象四
挂载ocfs2文件系统遇到
mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted"
mount -t ocfs2 -o datavolume /dev/sdb1 /u02/oradata/orcl
ocfs2_hb_ctl: Bad magic number in superblock while reading uuid
mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted"


这个问题是由于ocfs2文件文件系统分区没有格式化引起的错误,在挂载ocfs2文件系统之前,用于这个文件系统的分区一定要进行格式化.

 

现象五:
 Configuration assistant "Oracle Cluster Verification Utility" failed
10g rac 安装请教 oracle 10.2.0.1 solaris 5.9 双机 安装crs最后一步有错,不知如何解决?

LOG 信息:
INFO: Configuration assistant "Oracle Cluster Verification Utility" failed
-----------------------------------------------------------------------------
*** Starting OUICA ***
Oracle Home set to /orabase/product/10.2
Configuration directory is set to /orabase/product/10.2/cfgtoollogs. All xml files under the directory will be processed
INFO: The "/orabase/product/10.2/cfgtoollogs/configToolFailedCommands" script contains all commands that failed, were skipped or were cancelled. This file may be used to run these configuration assistants outside of OUI. Note that you may have to update this script with passwords (if any) before executing the same.
-----------------------------------------------------------------------------
SEVERE: OUI-25031:Some of the configuration assistants failed. It is strongly recommended that you retry the configuration assistants at this time. Not successfully running any "Recommended" assistants means your system will not be correctly configured.
1. Check the Details panel on the Configuration Assistant Screen to see the errors resulting in the failures.
2. Fix the errors causing these failures.
3. Select the failed assistants and click the 'Retry' button to retry them.
INFO: User Selected: Yes/OK

是vip地址没有启动造成的,建议在执行完orainstRoot.sh和root.sh命令后新开个窗口执行vipca,把crs服务都起来后再执行最后的verify步骤,可以尝试一下。

去crs的bin目录下执行crs_stat -t 看看服务是不是都起了,这种情况应该是vip没起来。

 

现象六:
Failed to upgrade Oracle Cluster Registry configuration
在安装CRS时,在第二个节点执行./root.sh时,出现如下提示,我在第一个节点执行正常.请大虾指点一些,不胜感激!谢谢!
[root@RACtest2 crs]# ./root.sh
WARNING: directory '/app/oracle/product/10.2.0' is not owned by root
WARNING: directory '/app/oracle/product' is not owned by root
WARNING: directory '/app/oracle' is not owned by root
WARNING: directory '/app' is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
PROT-1: Failed to initialize ocrconfig
Failed to upgrade Oracle Cluster Registry configuration

 

错误原因:

是因为安装crs的设备权限有问题,例如我的设备用raw来放置ocr和vote,此时要设置好这些硬件设备以及连接的文件的权限,下面是我的环境:

[root@rac2 oracrs]#

lrwxrwxrwx  1 root root 13 Jan 27 12:49 ocr.crs -> /dev/raw/raw1

lrwxrwxrwx  1 root root 13 Jan 26 13:31 vote.crs -> /dev/raw/raw2

 

 

chown root:oinstall /dev/raw/raw1

chown root:oinstall /dev/raw/raw2

chmod 660 /dev/raw/raw1

chmod 660 /dev/raw/raw2

其中/dev/sdb1放置ocr,/dev/sdb2放置vote.

[root@rac2 oracrs]# service rawdevices reload

Assigning devices:

           /dev/raw/raw1  -->   /dev/sdb1

/dev/raw/raw1:  bound to major 8, minor 17

           /dev/raw/raw2  -->   /dev/sdb2

/dev/raw/raw2:  bound to major 8, minor 18

Done

然后再次执行就ok了.

[root@rac2 oracrs]# /oracle/app/oracle/product/crs/root.sh

WARNING: directory '/oracle/app/oracle/product' is not owned by root

WARNING: directory '/oracle/app/oracle' is not owned by root

Checking to see if Oracle CRS stack is already configured

 

Setting the permissions on OCR backup directory

Setting up NS directories

Oracle Cluster Registry configuration upgraded successfully

WARNING: directory '/oracle/app/oracle/product' is not owned by root

WARNING: directory '/oracle/app/oracle' is not owned by root

clscfg: EXISTING configuration version 3 detected.

clscfg: version 3 is 10G Release 2.

assigning default hostname rac1 for node 1.

assigning default hostname rac2 for node 2.

Successfully accumulated necessary OCR keys.

Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.

node <nodenumber>: <nodename> <private interconnect name> <hostname>

node 1: rac1 priv1 rac1

node 2: rac2 priv2 rac2

clscfg: Arguments check out successfully.

 

现象七
 Startup will be queued to init within 90 seconds
在安装的a节点上运行root.sh如下:
[root@rac2 OraHome1]# ./root.sh
WARNING: directory '/oracle' is not owned by root
Checking to see if Oracle CRS stack is already configured
/etc/oracle does not exist. Creating it now.

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/oracle' is not owned by root
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
assigning default hostname rac1 for node 1.
assigning default hostname rac2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: rac1 vip1 rac1
node 2: rac2 vip2 rac2
clscfg: Arguments check out successfully.

NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 90 seconds.
就挂起了,察看日志ocrconfig_7758.log
::::::::::::::
Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved.
2006-10-29 22:47:09.537: [ OCRCONF][3086919360]ocrconfig starts...
2006-10-29 22:47:09.541: [ OCRCONF][3086919360]Upgrading OCR data
2006-10-29 22:47:09.649: [ OCRRAW][3086919360]propriogid:1: INVALID FORMAT
2006-10-29 22:47:09.660: [ OCRRAW][3086919360]ibctx:1:ERROR: INVALID FORMAT
2006-10-29 22:47:09.660: [ OCRRAW][3086919360]proprinit roblem reading the bootblock or superbloc 22

2006-10-29 22:47:09.661: [ default][3086919360]a_init:7!: Backend init unsuccessful : [22]
2006-10-29 22:47:09.662: [ OCRCONF][3086919360]Exporting OCR data to [OCRUPGRADEFILE]
2006-10-29 22:47:09.663: [ OCRAPI][3086919360]a_init:7!: Backend init unsuccessful : [33]
2006-10-29 22:47:09.663: [ OCRCONF][3086919360]There was no previous version of OCR. error:[PROC-33: Oracle Cluster Registry is not
configured]
2006-10-29 22:47:09.666: [ OCRRAW][3086919360]propriogid:1: INVALID FORMAT
2006-10-29 22:47:09.668: [ OCRRAW][3086919360]ibctx:1:ERROR: INVALID FORMAT
2006-10-29 22:47:09.668: [ OCRRAW][3086919360]proprinit roblem reading the bootblock or superbloc 22

2006-10-29 22:47:09.668: [ default][3086919360]a_init:7!: Backend init unsuccessful : [22]
2006-10-29 22:47:09.672: [ OCRRAW][3086919360]propriogid:1: INVALID FORMAT
2006-10-29 22:47:09.673: [ OCRRAW][3086919360]ibctx:1:ERROR: INVALID FORMAT
2006-10-29 22:47:09.673: [ OCRRAW][3086919360]proprinit roblem reading the bootblock or superbloc 22

首先检查防火墙是否关闭:

检查并关闭 UDP ICMP 拒绝

在 Linux 安装期间,我指出不配置防火墙选项。默认情况下,配置防火墙的选项由安装程序选择。这使我吃了好几次苦头,因此我要仔细检查防火墙选项是否未配置,并确保 udp ICMP 过滤已关闭。

如果 UDP ICMP 被防火墙阻塞或拒绝,Oracle 集群件软件将在运行几分钟之后崩溃。如果 Oracle 集群进程出现故障,您的 <machine_name>_evmocr.log 文件中将出现以下类似内容:

08/29/2005 22:17:19
 oac_init:2: Could not connect to server, clsc retcode = 9
 08/29/2005 22:17:19
 a_init:12!: Client init unsuccessful : [32]
 ibctx:1:ERROR: INVALID FORMAT
 proprinit:problem reading the bootblock or superbloc 22

 如果遇到此类错误,解决方法是移除 udp ICMP (iptables) 拒绝规则,或者只需关闭防火墙选项。之后,Oracle 集群件软件将开始正常工作,而不会崩溃。以下命令应该以 root 用户帐户的身份执行:

1.            检查以确保防火墙选项关闭。如果防火墙选项已停用(如下面的示例所示),则不必继续执行以下步骤。
# /etc/rc.d/init.d/iptables statusFirewall is stopped
.

2.            如果防火墙选项已启用,您首先需要手动停用 UDP ICMP 拒绝:
# /etc/rc.d/init.d/iptables stopFlushing firewall rules: [ OK ]Setting chains to policy ACCEPT: filter [ OK ]Unloading iptables modules: [ OK ]

3.            然后,针对下一次服务器重启关闭 UDP ICMP 拒绝(应该始终被关闭):
# chkconfig iptables off

再次,如果不是上面的问题,

建议你先用dd把ocr和votedisk清除信息,再给权限,再运行 root.sh

看看我的第一个节点运行root.sh的反馈信息:
[root@node1 crs10.2.0]# ./root.sh
WARNING: directory '/ora10g/product' is not owned by root
WARNING: directory '/ora10g' is not owned by root
Checking to see if Oracle CRS stack is already configured
/etc/oracle does not exist. Creating it now.

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/ora10g/product' is not owned by root
WARNING: directory '/ora10g' is not owned by root
assigning default hostname node1 for node 1.
assigning default hostname node2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: node1 privnode1 node1
node 2: node2 privnode2 node2
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting device: /ocfs/votedisk/votedisk.dat
Format of 1 voting devices complete.
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
node1
CSS is inactive on these nodes.
node2
Local node checking complete.
Run root.sh on remaining nodes to start CRS daemons.
[root@node1 crs10.2.0]#

现象八:
 CRS-0215: Could not start resource 'ora.orcl.orcl1.inst'.
$ srvctl start instance -d orcl -i orcl1
PRKP-1001 : 在节点 znawdb1 上启动实例 orcl1 时出错
CRS-0215: Could not start resource 'ora.orcl.orcl1.inst'.

出现这个问题的原因是因为装载数据库数据文件的ocfs2文件系统,或者是ASM实例没有挂载的原因,例如,我的环境是raw+ASM,当没有挂载ASM实例,然后用srvctl start asm –n rac1启动了ASM,但是这个启动,并没有挂载ASM,因此当再次执行srvctl start instance -d orcl -i orcl1的时候,就出现了CRS-0215: Could not start resource 'ora.orcl.orcl2.inst'.

此时执行ALTER DISKGROUP dgroup1 MOUNT;后再次执行srvctl start instance -d orcl -i orcl1,启动成功.

 

现象九
CRS-0223: Resource 'ora.rac1.LISTENER_RAC1.lsnr' has placement error.
错误一般提示如下:

[oracle@rac1 admin]$ srvctl start nodeapps -n rac1

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.gsd' has placement error.

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.vip' has placement error.

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.ons' has placement error.

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.LISTENER_RAC1.lsnr' has placement error.

[oracle@rac1 admin]$ srvctl start nodeapps -n rac1

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.gsd' has placement error.

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.vip' has placement error.

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.ons' has placement error.

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.LISTENER_RAC1.lsnr' has placement error.

原因:

出现这个问题的原因,主要是资源占用,也就是说两个实例资源出现在同一个节点上,导致另外一个节点得不到需要得资源.

解决办法:

出现这个问题,最好是手工用命令启动相关的CRS服务,然后看看具体报什么错误。
启动服务得时候,一定要将所有节点服务关闭,然后先启动一个节点,接着观察crs_stat的状态。当这个节点的所有服务正常后,再启动另一个节点。最后通过crs_stat观察全局节点状态.

posted on 2009-01-14 16:12  一江水  阅读(5973)  评论(0编辑  收藏  举报