Exadata计算节点升级工具patchmgr出现BUG
Exadata的image升级主要分成三大组件:存储节点、Infiniband或者ROCE融合以太交换机、计算节点。
这三大组件基本上都是使用patchmgr工具完成升级工作,但计算节点的patchmgr工具与另外两个组件的patchmgr工具不能通用。存储节点和交换机的patchmgr升级工具包含在各自的升级补丁包中,而计算节点的patchmgr升级工具是一个单独的补丁(PatchID: 21634633),与计算节点的升级补丁包是完全独立。
一次,对客户的Exadata进行image升级时,发现当时发布的计算节点的patchmgr升级工具存在BUG。本文主要记录整个故障过程及解决办法。
1、升级计算节点之前,从MOS网站下载了当时最新版本的patchmgr升级工具(p21634633_231100_Linux-x86-64.zip)。计划将计算节点的image从11.2.3.3.1版本升级到18.1.34版本,主要的升级命令如下所示。
[root@dm01dbadm01 18.1.34]# cd db [root@dm01dbadm01 db]# ls -ltr total 1736632 -rwxrwxr-x 1 grid oinstall 1214059228 Mar 10 21:44 p32743378_181000_Linux-x86-64.zip -rwxrwxr-x 1 grid oinstall 562496506 May 11 07:14 p21634633_231100_Linux-x86-64.zip [root@dm01dbadm01 db]# unzip p21634633_231100_Linux-x86-64.zip Archive: p21634633_231100_Linux-x86-64.zip creating: dbserver_patch_230418/ inflating: dbserver_patch_230418/patchmgr inflating: dbserver_patch_230418/dbserver_backup_mandatory.list inflating: dbserver_patch_230418/ExadataSendNotification.pm creating: dbserver_patch_230418/linux.db.rpms/ inflating: dbserver_patch_230418/README.txt inflating: dbserver_patch_230418/install.sh.sample inflating: dbserver_patch_230418/exa_json_utils.py inflating: dbserver_patch_230418/error_code_json_functions inflating: dbserver_patch_230418/ExaXMLNode.pm inflating: dbserver_patch_230418/ExadataImageNotification.pl inflating: dbserver_patch_230418/imageLogger inflating: dbserver_patch_230418/timing_record.sh inflating: dbserver_patch_230418/get_hardware_property.sh inflating: dbserver_patch_230418/patchmgr_timing_report inflating: dbserver_patch_230418/dbserver_backup_excludes.sample inflating: dbserver_patch_230418/master_error_code.json inflating: dbserver_patch_230418/md5sum_files.lst inflating: dbserver_patch_230418/dcli extracting: dbserver_patch_230418/dbnodeupdate.zip inflating: dbserver_patch_230418/exadata.img.hw inflating: dbserver_patch_230418/patchReport.py inflating: dbserver_patch_230418/exadata.img.env inflating: dbserver_patch_230418/patchmgr_functions inflating: dbserver_patch_230418/cellboot_usb_pci_path [root@dm01dbadm01 db]# ls -ltr total 1736636 -rwxrwxr-x 1 grid oinstall 1214059228 Mar 10 21:44 p32743378_181000_Linux-x86-64.zip drwxrwxr-x 3 root root 4096 Apr 19 03:11 dbserver_patch_230418 -rwxrwxr-x 1 grid oinstall 562496506 May 11 07:14 p21634633_231100_Linux-x86-64.zip [root@dm01dbadm01 db]# cd dbserver_patch_230418/ [root@dm01dbadm01 dbserver_patch_230418]# ls -ltr total 550904 -rwx------ 1 root root 1393 Apr 19 02:42 cellboot_usb_pci_path -r--r--r-- 1 root root 48746 Apr 19 02:56 imageLogger -r-xr-x--- 1 root root 5824 Apr 19 02:56 get_hardware_property.sh -r-xr-x--- 1 root root 65182 Apr 19 02:56 exadata.img.hw -r--r--r-- 1 root root 11482 Apr 19 02:56 exadata.img.env -r--r----- 1 root root 6133 Apr 19 02:56 ExaXMLNode.pm -r-xr-xr-x 1 root root 6776 Apr 19 02:56 exa_json_utils.py -r-xr-xr-x 1 root root 1570 Apr 19 02:56 ExadataSendNotification.pm -r-xr-xr-x 1 root root 63754 Apr 19 02:56 ExadataImageNotification.pl -r-xr-xr-x 1 root root 10180 Apr 19 02:56 error_code_json_functions -r-xr-xr-x 1 root root 56396 Apr 19 02:56 dcli -r-xr-xr-x 1 root root 13593 Apr 19 02:56 timing_record.sh -r-xr-xr-x 1 root root 111545 Apr 19 02:56 patchReport.py -r-xr-xr-x 1 root root 41549 Apr 19 02:56 patchmgr_timing_report -r-xr-xr-x 1 root root 16884 Apr 19 02:56 patchmgr_functions -r-xr-xr-x 1 root root 40614 Apr 19 02:56 master_error_code.json drwxrwxr-x 2 root root 4096 Apr 19 03:05 linux.db.rpms -r-xr-xr-x 1 root root 618296 Apr 19 03:05 patchmgr -r-xr-xr-x 1 root root 123767 Apr 19 03:10 dbserver_backup_mandatory.list -rw-rw-r-- 1 root root 562243165 Apr 19 03:10 dbnodeupdate.zip -rw-rw-r-- 1 root root 578 Apr 19 03:11 README.txt -rw-rw-r-- 1 root root 1008 Apr 19 03:11 md5sum_files.lst -r-xr-xr-x 1 root root 677 Apr 19 03:11 install.sh.sample -r-xr-xr-x 1 root root 270 Apr 19 03:11 dbserver_backup_excludes.sample [root@dm01dbadm01 dbserver_patch_230418]# vi db_group "db_group" [New] 1L, 12C written [root@dm01dbadm01 dbserver_patch_230418]# [root@dm01dbadm01 dbserver_patch_230418]# ./patchmgr --dbnodes db_group --repo /u01/soft/18.1.34/db/p32743378_181000_Linux-x86-64.zip --target_version 18.1.34.0.0.210717 --backup ./patchmgr: line 10291: declare: -A: invalid option declare: usage: declare [-afFirtx] [-p] [name[=value] ...]
***************************************************************************************** NOTE patchmgr release: 23.230418 (always check MOS 1553103.1 for the latest release of dbserver.patch.zip) NOTE WARNING Do not interrupt the patchmgr session. WARNING Do not resize the screen. It may disturb the screen layout. WARNING Do not reboot database nodes during update or rollback. WARNING Do not open logfiles in write mode and do not try to alter them. ***************************************************************************************** /u01/soft/18.1.34/db/dbserver_patch_230418/error_code_json_functions: line 25: declare: -A: invalid option declare: usage: declare [-afFirtx] [-p] [name[=value] ...] /u01/soft/18.1.34/db/dbserver_patch_230418/error_code_json_functions: line 26: declare: -A: invalid option declare: usage: declare [-afFirtx] [-p] [name[=value] ...] /u01/soft/18.1.34/db/dbserver_patch_230418/error_code_json_functions: line 54: EXAUPG-00008: value too great for base (error token is "00008")
2023-05-13 18:16:29 +0800 :SUCCESS: Completed run of command: ./patchmgr --dbnodes db_group --repo /u01/soft/18.1.34/db/p32743378_181000_Linux-x86-64.zip --target_version 18.1.34.0.0.210717 --backup 2023-05-13 18:16:29 +0800 :INFO : Backup performed on dbnode(s) in file db_group: [dm01dbadm02] 2023-05-13 18:16:29 +0800 :INFO : For details, check the following files in /u01/soft/18.1.34/db/dbserver_patch_230418: 2023-05-13 18:16:29 +0800 :INFO : - <dbnode_name>_dbnodeupdate.log 2023-05-13 18:16:29 +0800 :INFO : - patchmgr.log 2023-05-13 18:16:29 +0800 :INFO : - patchmgr.trc 2023-05-13 18:16:29 +0800 :INFO : Exit status:0 2023-05-13 18:16:29 +0800 :INFO : Exiting. [root@dm01dbadm01 dbserver_patch_230418]# |
从patchmgr的命令输出可以看出,patchmgr执行失败,因为patchmgr工具中的declare命令,调用了-A选项,而提示这是一个无效的选项。
2、尝试使用当前版本的patchmgr升级工具中自带的dbnodeupdate.sh脚本进行升级。dbnodeupdate.sh升级工具是一个单独的补丁包,对应的补丁号为:16486998,需要注意的是从12.2.1.1.0版本开始,补丁16486998已经不再更新,也即不单独提供下载,它被整合到了dbserver.patch.zip补丁包中,也即被整合到了patchmgr工具中。我们仍然可以使用dbserver.patch.zip补丁包中的dbnodeupdate.sh升级工具来手动地升级每个计算节点。
[root@dm01dbadm02 db]# ls -ltr -rwxr-xr-x 1 grid oinstall 562496506 May 13 18:05 p21634633_231100_Linux-x86-64.zip -rwxr-xr-x 1 grid oinstall 1214059228 May 13 18:05 p32743378_181000_Linux-x86-64.zip [root@dm01dbadm02 db]# unzip p21634633_231100_Linux-x86-64.zip [root@dm01dbadm02 db]# cd dbserver_patch_230418/ [root@dm01dbadm02 dbserver_patch_230418]# unzip dbnodeupdate.zip -d dbnodeupdate [root@dm01dbadm02 dbserver_patch_230418]# cd dbnodeupdate [root@dm01dbadm02 dbnodeupdate]# ./dbnodeupdate.sh -h (*) 2023-05-13 20:38:33: Initializing logfile /var/log/cellos/dbnodeupdate.log /u01/soft/18.1.34/db/dbserver_patch_230418/dbnodeupdate/image_functions: line 1464: /opt/oracle.cellos/exadata.img.hw: No such file or directory /u01/soft/18.1.34/db/dbserver_patch_230418/dbnodeupdate/error_code_json_functions: line 25: declare: -A: invalid option declare: usage: declare [-afFirtx] [-p] [name[=value] ...] /u01/soft/18.1.34/db/dbserver_patch_230418/dbnodeupdate/error_code_json_functions: line 26: declare: -A: invalid option declare: usage: declare [-afFirtx] [-p] [name[=value] ...] |
从dbnodeupdate.sh的help命令输出可以看出,dbnodeupdate.sh脚本中的declare命令同样调用了-A选项,而提示这是一个无效的选项。看样子只能放弃这个当前最新的patchmgr工具。
3、尝试使用旧版本的patchmgr升级工具(p21634633_221000_Linux-x86-64.zip)对计算节点进行升级。
[root@dm01dbadm02 db]# ls -ltr -rwxr-xr-x 1 grid oinstall 562496506 May 13 18:05 p21634633_231100_Linux-x86-64.zip -rwxr-xr-x 1 grid oinstall 1214059228 May 13 18:05 p32743378_181000_Linux-x86-64.zip -rwxr-xr-x 1 grid oinstall 447499625 May 13 19:32 p21634633_221000_Linux-x86-64.zip [root@dm01dbadm02 db]# [root@dm01dbadm02 db]# unzip p21634633_221000_Linux-x86-64.zip [root@dm01dbadm02 db]# cd dbserver_patch_220728/ |
使用旧版本的patchmgr升级工具升级计算节点时,也出现了其他的错误,导致旧版本的patchmgr升级工具仍然无法使用。
4、尝试使用旧版本的patchmgr升级工具中自带的dbnodeupdate.sh脚本进行升级,具体升级命令如下所示。
[root@dm01dbadm02 db]# rm -rf dbserver_patch_230418 [root@dm01dbadm02 db]# unzip p21634633_221000_Linux-x86-64.zip [root@dm01dbadm02 db]# cd dbserver_patch_220728/ [root@dm01dbadm02 dbserver_patch_220728]# unzip dbnodeupdate.zip -d dbnodeupdate [root@dm01dbadm02 dbserver_patch_220728]# cd dbnodeupdate
[root@dm01dbadm02 dbnodeupdate]# ./dbnodeupdate.sh -u -l /u01/soft/18.1.34/db/p32743378_181000_Linux-x86-64.zip (*) 2023-05-13 20:56:41: Initializing logfile /var/log/cellos/dbnodeupdate.log /u01/soft/18.1.34/db/dbserver_patch_220728/dbnodeupdate/image_functions: line 1431: /opt/oracle.cellos/exadata.img.hw: No such file or directory ######################################################################################### # # # Guidelines for using dbnodeupdate.sh (rel. 21.220728): # # # # - Prerequisites for usage: # # 1. Refer to dbnodeupdate.sh options. See MOS 1553103.1 # # 2. Always use the latest release of dbnodeupdate.sh. See patch 21634633 # # 3. Run the prereq check using the '-v' flag. # # 4. Carefully run prereq check with the '-M' flag to allow conflicting rpms being removed or preupdated. # # # # I.e.: ./dbnodeupdate.sh -u -l /u01/my-iso-repo.zip -v (may see rpm conflicts) # # ./dbnodeupdate.sh -u -l http://my-yum-repo -v -M (resolves known rpm comflicts) # # # # - Prerequisite rpm dependency check failures can happen due to customization: # # - The prereq check detects dependency issues that need to be addressed prior to running a successful update. # # - Customized rpm packages may fail the built-in dependency check and system updates cannot proceed until resolved. # # - Prereq check may fail because -M flag was not used and known conflicting rpms were not removed. # # # # - As part of the update, a number of rpms will be removed. # # # # - In case of any problem when filing an SR, upload the following: # # - /var/log/cellos/dbnodeupdate.log # # - /var/log/cellos/dbnodeupdate.<runid>.diag # # - where <runid> is the unique number of the failing run. # # # # *** This is an update run, changes will be made. *** # # # #########################################################################################
WARNING: Your dbnodeupdate.sh is more than 60 days old. Check MOS 1553103.1 and make sure you have the most recent version.
Continue ? [y/n] y
(*) 2023-05-13 20:56:44: Unzipping helpers (/u01/soft/18.1.34/db/dbserver_patch_220728/dbnodeupdate/dbupdate-helpers.zip) to /opt/oracle.SupportTools/dbnodeupdate_helpers (*) 2023-05-13 20:56:46: Collecting system configuration settings. This may take up to 5 seconds. (*) 2023-05-13 20:57:11: Validating system settings for known issues and best practices. This may take a while... (*) 2023-05-13 20:57:11: Checking free space in /u01/soft/18.1.34/db/iso.stage (*) 2023-05-13 20:57:11: Unzipping /u01/soft/18.1.34/db/p32743378_181000_Linux-x86-64.zip to /u01/soft/18.1.34/db/iso.stage, this may take a while (*) 2023-05-13 20:57:20: Generating Exadata repository file /etc/yum.repos.d/Exadata-computenode.repo (*) 2023-05-13 20:57:21: Service status and file attribute report created in the background and will become available in: /etc/exadata/reports (*) 2023-05-13 20:58:04: Searching for custom rpms (*) 2023-05-13 20:58:04: Verifying the current boot loader is valid for an OS upgrade. (*) 2023-05-13 20:58:05: Checking for custom rpms for Oracle Linux 5 to Oracle Linux 6 update. (*) 2023-05-13 20:58:05: Validating the specified source location. (*) 2023-05-13 20:58:06: Cleaning up the yum cache. (*) 2023-05-13 20:58:10: Checking installed packages for missing files.
Active Image version : 11.2.3.3.1.140529.1 Active Kernel version : 2.6.39-400.128.17.el5uek Active LVM Name : /dev/mapper/VGExaDb-LVDbSys1 Inactive Image version : n/a Inactive LVM Name : /dev/mapper/VGExaDb-LVDbSys2 Current user id : root Action : upgrade (validate image status, fix known issues, Upgrading to : 18.1.34.0.0.210717 - Oracle Linux 5->6 upgrade Baseurl : file:///var/www/html/yum/unknown/EXADATA/dbserver/130523205641/x86_64/ (iso) Iso file : /u01/soft/18.1.34/db/iso.stage/exadata_ol6_base_repo_18.1.34.0.0.210717.iso Create a backup : Yes (backup at update mandatory for major OS upgrades) Shutdown EM agents : Yes Shutdown stack : No (Currently stack is down) Missing package files : Succeeded. Installed packages verified for having no missing files. RPM exclusion list : Function not available for major OS upgrades RPM obsolete list : Function not available for major OS upgrades Exact dependencies : Function not available for major OS upgrades Minimum dependencies : Function not available for major OS upgrades Custom rpms to remove : None Logfile : /var/log/cellos/dbnodeupdate.log (runid: 130523205641) Diagfile : /var/log/cellos/dbnodeupdate.130523205641.diag Server model : SUN SERVER X4-2 dbnodeupdate.sh rel. : 21.220728 (always check MOS 1553103.1 for the latest release of dbnodeupdate.sh) Note : After upgrading and rebooting run './dbnodeupdate.sh -c' to finish post steps.
WARNING: This Exadata update includes an Oracle Linux 5 to Oracle Linux 6 update. This is a major update which requires proper validation before attemting this on a critical system.
WARNING: Although the system was analyzed for custom rpms, any other custom installed software (such as tar-balls) cannot be detected. When such software is installed remove it before performing a major OS update.
WARNING: After updating to another major OS it may be necessary to update any script used for mounting DBFS so that it can correctly determine whether or not DBFS has been successfully mounted since the expected output from the 'stat' command, which is used to determine the status of the DBFS mount point, has changed. If an early version of mount-dbfs.sh from Note 1054431.1 has been used or extended, download and use/extend the new version that checks for 'fuseblk' as the output of the 'stat' command. Failure to update the script will lead to incorrect status being returned for DBFS, with the script reporting that DBFS has not been mounted successfully when it has. Any CRS resource that makes use of such a script will also show the incorrect status!
The following known issues will be checked for but require manual follow-up: (*) - Yum rolling update requires fix for 11768055 when Grid Infrastructure is below 11.2.0.2 BP12 (*) - DBFS mount scripts may fail after a major os upgrade
Continue ? [y/n] y
(*) 2023-05-13 20:58:23: Verifying GI and DB's are shutdown (*) 2023-05-13 20:58:26: Unmount of /boot successful (*) 2023-05-13 20:58:26: Check for /dev/sda1 successful (*) 2023-05-13 20:58:26: Mount of /boot successful (*) 2023-05-13 20:58:29: Searching for custom rpms (*) 2023-05-13 20:58:29: Verifying the current boot loader is valid for an OS upgrade. (*) 2023-05-13 20:58:29: Checking for custom rpms for Oracle Linux 5 to Oracle Linux 6 update. (*) 2023-05-13 20:58:30: Disabling stack from starting (*) 2023-05-13 20:58:30: Performing filesystem backup to /dev/mapper/VGExaDb-LVDbSys2. Average 30 minutes (maximum of 120) depends per environment................................................... (*) 2023-05-13 21:03:01: Backup successful (*) 2023-05-13 21:03:11: ExaWatcher stopped successful (*) 2023-05-13 21:03:18: EM agent in /home/oracle/agent1/core/12.1.0.2.0 stopped (*) 2023-05-13 21:03:18: Auto-start of EM agents disabled
WARNING: Unable to remove a jdk from the system. Investigate and manually remove:
Continue ? [y/n] y
(*) 2023-05-13 21:06:31: Validating the specified source location. (*) 2023-05-13 21:06:32: Cleaning up the yum cache. (*) 2023-05-13 21:06:34: Executing OL5->OL6 upgrade steps, system is expected to reboot multiple times. (*) 2023-05-13 21:08:12: Initialize of Oracle Linux 6 Upgrade successful. Rebooting now... Read from remote host dm01dbadm02: No route to host Connection to dm01dbadm02 closed. |
可以看出,使用旧版本patchmgr升级工具中自带的dbnodeupdate.sh脚本进行升级计算节点时,出现一个告警信息“WARNING: Your dbnodeupdate.sh is more than 60 days old. Check MOS 1553103.1 and make sure you have the most recent version.”,但是整个升级工作能够正常进行。