麒麟v10 SP3上的19c rac,optachauto安装补丁出错
1、麒麟V10 SP3上新安装的一套19c RAC,在使用opatchauto打补丁时报错,具体信息如下所示。
[root@db01 soft]# /u01/app/19.3.0/grid/OPatch/opatchauto apply /soft/35037840/
OPatchauto session is initiated at Tue 0ct 10 11:04:45 2023
System initialization log file is /u01/app/19.3.0/grid/cfgtoollogs/opatchautodb/systemconfig2023-10-10 11-04-45AM.log
OPATCHAUTO-72050:Systeminstance creation failed. OPATCHAUT0-72050:Failed while retrieving system information. OPATCHAUT0-72050:please check log file for more details . oPatchauto session completed at Tue 0ct 10 11:05:20 2023 Time taken to complete the session 0 minute, 35 seconds Topology creation failed. |
2、依据命令行的错误提示来看,是在获取系统信息时出现了错误,导致opatchauto失败,需要查看对应的log日志,截取opatchauto生成的log日志中出现的错误日志,具体如下所示。
2023-10-10 11:04:56,298 SEVERE [1] com.oracle.glcm.patch.auto.db.integration.model.productsupport.topology.TopologyCreator - Not able to retrieve system instance details :: Unable to determine if "/u01/app/19.3.0/grid" is a shared oracle home. Failed: Verification of shared storage accessibility was unsuccessful on all the specified nodes. NODE_STATUS::db02:EFAIL The result of cluvfy command contain EFAIL NODE_STATUS::db02:EFAIL …… 2023-10-10 11:04:56,298 SEVERE [1] com.oracle.glcm.patch.auto.db.integration.model.productsupport.topology.TopologyCreator - Failure reason::java.lang.Exception: The result of cluvfy command contain EFAIL NODE_STATUS::db02:EFAIL |
从log日志中的错误信息可以看出,opatchauto报错,是因为opatchauto时会自动执行cluvfy命令来检测整个集群的状态,而调用cluvfy命令检测集群状态时,在检测共享存储的访问性这项时出错,无法确认GRID_HOME是否是共享的,所以最终导致opatchauto出错。
3、手动调用cluvfy命令检测集群的存储状态,检测结果如下所示。
[grid@db01 ~]$ cluvfy comp ssa -n all -verbose
Verification of shared storage accessibility was unsuccessful on all the specified nodes.
CVU operation performed: shared storage accessibility Date: 0ct 10,2023 11:43:32 AM CVU home : /u01/app/19.3.0/grid/ User: grid [grid@db01 ~]$ |
4、此时,只能针对cluvfy命令开启DEBUG模式, 获取cluvfy命令的更加详细的日志信息。当前是19cRAC,方法如下。
[grid@db01~]$ rm -rf /tmp/cvutrace [grid@db01~]$ mkdir /tmp/cvutrace [grid@db01~]$ export CV_TRACELOC=/tmp/cvutrace [grid@db01~]$ export SRVM_TRACE=true [grid@db01~]$ export SRVM_TRACE_LEVEL=1 [grid@db01~]$ cluvfy comp ssa -n all -verbose |
5、查看生成的cvutrace.log.0日志文件,搜索failed状态字。发现大量scp远程复制文件失败的日志,具体如何所示。
[Worker 3] [ 2023-10-10 11:47:24.701 CST ] [NativeSystem.isCmdScv:599] isCmdScv: cmd=[/usr/bin/scp -p /tmp/CVU_19.0.0.0.0_grid/check_vip_restart_attempt.sh db02:'/tmp/CVU_19.0.0.0.0_grid//check_vip_restart_attempt.sh'] [Worker 3] [ 2023-10-10 11:47:24.701 CST ] [NativeSystem.isCmdScv:649] isCmdScv: /usr/bin/scp is present. [Worker 3] [ 2023-10-10 11:47:24.701 CST ] [NativeSystem.isCmdScv:651] isCmdScv: /usr/bin/scp is a file. [Worker 3] [ 2023-10-10 11:47:24.701 CST ] [NativeSystem.isCmdScv:668] isCmdScv: returned true [Worker 3] [ 2023-10-10 11:47:24.701 CST ] [NativeSystem.rununixcmd:1345] NativeSystem.rununixcmd: RetString 1|Authorized users only. All activities may be monitored and reported. :successful [Worker 3] [ 2023-10-10 11:47:24.701 CST ] [CopyCommand.execute:171] CopyCommand.execute: native copyFile returns `1|Authorized users only. All activities may be monitored and reported. :successful' [Worker 3] [ 2023-10-10 11:47:24.701 CST ] [NativeResult.<init>:93] NativeResult: The String obtained is1|Authorized users only. All activities may be monitored and reported. :successful [Worker 3] [ 2023-10-10 11:47:24.701 CST ] [NativeResult.<init>:101] The status string is: 1 [Worker 3] [ 2023-10-10 11:47:24.701 CST ] [NativeResult.<init>:114] The result string is: Authorized users only. All activities may be monitored and reported. :successful 1 [Worker 3] [ 2023-10-10 11:47:24.701 CST ] [CopyCommand.execute:179] The copy command failed. Details: Authorized users only. All activities may be monitored and reported. :successful |
从日志可以看出,执行完scp命令后,返回的状态码为1,返回的命令结果为“The result string is: Authorized users only. All activities may be monitored and reported.”。
手动执行scp命令,文件远程复制成功后,也会返回这么一条记录。怀疑是不是因为这条多余的记录,导致命令执行结果检测异常?
6、命令执行完成后,返回The result string is: Authorized users only. All activities may be monitored and reported.这条记录,实际上是因为麒麟操作系统做的安全加固,取消该安全加固策略之后,手动调用cluvfy命令检测集群的存储状态,检测结果仍然报错。
7、再次开启cluvfy命令的DEBUG模式,在新生成的日志文件中发现如下错误日志。
[VerificationLogData.traceLogData:259] ERROR: [sVerificationUtil.getUniqueDistributionID:548] /bin/sh [VerificationLogData.traceLogData:259] ERROR: [sVerificationUtil.getUniqueDistributionID:552] no package provides oraclelinux-release [VerificationLogData.traceLogData:259] ERROR: [sVerificationUtil.getUniqueDistributionID:548] /bin/sh [VerificationLogData.traceLogData:259] ERROR: [sVerificationUtil.getUniqueDistributionID:552] no package provides enterprise-release [VerificationLogData.traceLogData:259] ERROR: [sVerificationUtil.getUniqueDistributionID:548] /bin/sh [VerificationLogData.traceLogData:259] ERROR: [sVerificationUtil.getUniqueDistributionID:552] no package provides redhat-release [VerificationLogData.traceLogData:259] ERROR: [sVerificationUtil.getUniqueDistributionID:548] /bin/sh [VerificationLogData.traceLogData:259] ERROR: [sVerificationUtil.getUniqueDistributionID:552] package SLES-for-VMware-release is not installed [VerificationLogData.traceLogData:259] ERROR: [sVerificationUtil.getUniqueDistributionID:548] /bin/sh [VerificationLogData.traceLogData:259] ERROR: [sVerificationUtil.getUniqueDistributionID:552] package sles-release is not installed [VerificationLogData.traceLogData:259] ERROR: [sVerificationUtil.getUniqueDistributionID:548] /bin/sh [VerificationLogData.traceLogData:259] ERROR: [sVerificationUtil.getUniqueDistributionID:552] package asianux-release is not installed [VerificationLogData.traceLogData:259] ERROR: [Result.addErrorDescription:760] PRVG-0282 : failed to retrieve the operating system distribution ID |
从错误日志可以看出,无法获取操作系统的distribution ID。由于这是麒麟操作系统,所以只能设置CV_ASSUME_DISTID环境变量,给当前系统设置一个值。
export CV_ASSUME_DISTID=OL7 |
8、再次手动调用cluvfy命令检测集群的存储状态,检测结果终于成功。
9、重新执行opatchauto命令安装补丁,仍然出现新的错误,命令行的错误信息如下所示。
[root@db01 soft]# export CV_ASSUME_DISTID=OL7 [root@db01 soft]# /u01/app/19.3.0/grid/Patch/opatchauto apply /soft/35037840/
OPatchauto session is initiated at Tue 0ct 10 16:36:20 2023
System initialization log file is /u01/app/19.3.0/gr id/cfgtoollogs/opatchautodb/systemconfig2023-10-10 04-36-25PM.log
OPATCHAUTO-72035: Failed to create System Instance XML file. OPATCHAUT0-72035:File creation failed due to permission. OPATCHAUTO-72035: check user has permiss ion to create the file. oPatchauto session completed at Tue 0ct 10 16:37:47 2023 Time taken to complete the session 1 minute, 22 seconds Topology creation failed. [root@dbo1 soft]# |
这次opatchauto命令的错误提示是由于权限不够,导致文件无法创建成功。
10、查看systemconfig2023-10-10 04-36-25PM.log日志文件,没有明确的错误原因,只是在日志文件的最后有这么一个错误信息。
2023-10-10 12:44:54,498 SEVERE [1] com.oracle.glcm.patch.auto.db.integration.model.productsupport.topology.TopologyCreator - Not able to write system instance details |
11、此时,只能使用strace命令跟踪opatchauto执行的过程。
# strace -f -T -tt -o /tmp/opatchauto.out /u01/app/19.3.0/grid/Patch/opatchauto apply /soft/35037840/ |
最终,发现是因为/tmp目录的权限出现问题,修改成777权限后,opatchauto成功执行。
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· 单线程的Redis速度为什么快?
· SQL Server 2025 AI相关能力初探
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 展开说说关于C#中ORM框架的用法!