Linux监控工具介绍系列——OSWatcher Black Box
2016-12-22 12:00 潇湘隐者 阅读(9564) 评论(1) 编辑 收藏 举报
OSWatcher Balck Box简介
OSWatcher Black Box (oswbb)是Oracle开发、提供的一个小巧,但是实用、强大的系统工具,它可以用来抓取操作系统的性能指标,用于辅助监控系统的资源使用。其安装部署、卸载都非常简单;资源消耗也比较小,原理也十分简单,它通过调用OS的的一些命令(例如vmstat、iostat等)来采集、存储CPU/Memory/Swap/Disk IO/Nentwork相关数据。安装和运行oswbba可以帮助在性能诊断时提供丰富多样的各类性能数据、图文报表支持。
OSWatcher 在4.0的版本时被命名为OSWatcher Black Box,简称为oswbb,同时增加了数据分析功能,即OSWatcher Black Box Analyzer (OSWbba)这个绘图和分析工具,其捆绑在 OS Watcher Black Box当中。替代了之前的OSWg。 也就是说在OSWatcher 4.0 之前是:OSWatcher 和 OSWg的关系,OSWatcher 4.0 后变成了: OSWbb 与 OSWbba 的关系。
OSWatcher Black Box(oswbb)支持多个操作系统,也分Linux与Window版本,当然这两个版本有所差别,本篇只讲述Linux版本。另外,OSWatcher Black Box(oswbb)由两个部分组成:
1. oswbb: 一个Unix的 shell script脚本集合,其用来收集和归档数据,从而帮助定位问题。
2. oswbba: 一个Java工具来自动分析数据,提供建议,并且生成一个包含图形的 html 文档。
OSWatcher Black Box(oswbb)的官方下载地址以及相关资料如下(Oracle Metalink上的资料需要账号)
How To Start OSWatcher Black Box (OSWBB) Every System Boot Using RPM oswbb-service(文档 ID 580513.1)
OSWatcher Analyzer User Guide (文档 ID 461053.1)
官方介绍文档对oswbb,oswbba的介绍如下:
OSWatcher (oswbb) is a collection of UNIX shell scripts intended to collect and archive operating system and network metrics to aid support in diagnosing performance issues. OSWatcher operates as a set of background processes on the server and gathers OS data on a regular basis, invoking such Unix utilities as vmstat, netstat and iostat. OSWatcher can be downloaded from this note. OSWatcher is also included in the RAC-DDT script file, but is not installed by RAC-DDT. For more information on RAC-DDT see RAC-DDT User Guide. OSWatcher is installed on each node where data is to be collected. Installation instructions for OSWatcher are provided in this user guide.
The OSWatcher Analyzer (oswbba) is a graphing and analysis utility which comes bundled with OSWatcher v4.0 and higher. oswbba allows the user to graphically display data collected, generate reports containing these graphs and provides a built in analyzer to analyze the data and provide details on any performance problems it detects. The ability to graph and analyze this information relieves the user of manually inspecting all the files.
NOTE: oswbba replaces the utility OSWg. This was done to eliminate the confusion caused by having multiple tools in support named OSWatcher. oswbba is only supported for data collected by oswbb and no other tool.
安装OSWatcher
Linux平台的安装简单到了不能再简单,如下所示,解压安装包生成oswbb文件夹。当然你最好将其部署或移动到合适的目录。
[oracle@DB-Server monitoring]$ tar -xvf oswbb734.tar
[oracle@DB-Server monitoring]$ ls -lrt
total 6196
drwxr-xr-x 6 oracle oinstall 4096 Jul 25 22:22 oswbb
-rw-r--r-- 1 oracle oinstall 6318080 Nov 8 02:33 oswbb734.tar
[oracle@DB-Server monitoring]$ cd oswbb
[oracle@DB-Server oswbb]$ ls –lrt
卸载OSWatcher
卸载OSWatche也是非常简单,只要稍懂Linux的应该都会。
1:卸载前先停止OSWatcher
./stopOSWbb.sh
2:删除OSWatcher的目录oswbb
rm –fr oswbb
启动OSWatcher
启动OSWatcher也非常简单,只需要执行startOSWbb.sh脚本即可,如下所示
./startOSWbb.sh 10 2
后面参数表示10秒采集一次数据,只保留最后采集2个小时的数据在归档文件中。首次启动的日志最好做一下检查,注意是否存在Warning信息,如下测试所示,检查时发现ifconfig命令找不到(oracle账户下无法运行ifconfig命令)
[oracle@DB-Server oswbb]$ ./startOSWbb.sh 10 2
[oracle@DB-Server oswbb]$ Setting the archive log directory to/home/oracle/monitoring/oswbb/archive
Testing for discovery of OS Utilities...
VMSTAT found on your system.
IOSTAT found on your system.
MPSTAT found on your system.
Warning... IFCONFIG not found on your system. No IFCONFIG data will be collected.
NETSTAT found on your system.
TOP found on your system.
Testing for discovery of OS CPU COUNT
oswbb is looking for the CPU COUNT on your system
CPU COUNT will be used by oswbba to automatically look for cpu problems
CPU COUNT found on your system.
CPU COUNT = 4
Discovery completed.
Starting OSWatcher v7.3.3 on Sun Dec 4 08:01:57 EST 2016
With SnapshotInterval = 10
With ArchiveInterval = 2
OSWatcher - Written by Carl Davis, Center of Expertise,
Oracle Corporation
For questions on install/usage please go to MOS (Note:301137.1)
If you need further assistance or have comments or enhancement
requests you can email me Carl.Davis@Oracle.com
Data is stored in directory: /home/oracle/monitoring/oswbb/archive
Starting Data Collection...
oswbb heartbeat:Sun Dec 4 08:02:02 EST 2016
oswbb heartbeat:Sun Dec 4 08:02:12 EST 2016
./startOSWbb.sh
如果没有指定参数,默认参数为30,48,意味着30秒采集一次数据,只保留最后48小时的数据到归档文件当中。其实startOSWbb.sh可以定义四个参数
参数1: 指定多少秒采集一次数据。
参数2: 指定采集的数据文件在归档路径保留多少个小时
参数3: 可选参数,打包压缩工具, 在完成收集后OSW 将使用其来打包压缩归档文件
参数4: 可选参数,指定采集归档数据的输出目录,默认为系统变量OSWBB_ARCHIVE_DEST的值。
使用上面方式启动OSWatcher,会被输出信息一直刷屏。基本上很少使用这种方式,一般使用nohup启动,这样可以让OSW能够在后台持续运行并在当前会话终止后不会被挂断。当然如果系统重启,那么OSWatcher就会停止。
nohup ./startOSWbb.sh 30 48 &
第一次启动OSWatcher会在oswbb目录下创建 gif、archive、tmp、locks目录。其归档文件夹和osw<工具名> 子文件夹会被建
OSWatcher采集的数据都存放在archive(目录路径)下,下面会生成oswiostat、oswmeminfo、oswmpstat等10个目录
[oracle@DB-Server archive]$ tree
.
|-- oswifconfig
|-- oswiostat
| |-- DB-Server.localdomain_iostat_16.12.09.1300.dat
| |-- DB-Server.localdomain_iostat_16.12.09.1400.dat
| `-- DB-Server.localdomain_iostat_16.12.09.1500.dat
|-- oswmeminfo
| |-- DB-Server.localdomain_meminfo_16.12.09.1300.dat
| |-- DB-Server.localdomain_meminfo_16.12.09.1400.dat
| `-- DB-Server.localdomain_meminfo_16.12.09.1500.dat
|-- oswmpstat
| |-- DB-Server.localdomain_mpstat_16.12.09.1300.dat
| |-- DB-Server.localdomain_mpstat_16.12.09.1400.dat
| `-- DB-Server.localdomain_mpstat_16.12.09.1500.dat
|-- oswnetstat
| |-- DB-Server.localdomain_netstat_16.12.09.1300.dat
| |-- DB-Server.localdomain_netstat_16.12.09.1400.dat
| `-- DB-Server.localdomain_netstat_16.12.09.1500.dat
|-- oswprvtnet
|-- oswps
| |-- DB-Server.localdomain_ps_16.12.09.1300.dat
| |-- DB-Server.localdomain_ps_16.12.09.1400.dat
| `-- DB-Server.localdomain_ps_16.12.09.1500.dat
|-- oswslabinfo
| |-- DB-Server.localdomain_slabinfo_16.12.09.1300.dat
| |-- DB-Server.localdomain_slabinfo_16.12.09.1400.dat
| `-- DB-Server.localdomain_slabinfo_16.12.09.1500.dat
|-- oswtop
| |-- DB-Server.localdomain_top_16.12.09.1300.dat
| |-- DB-Server.localdomain_top_16.12.09.1400.dat
| `-- DB-Server.localdomain_top_16.12.09.1500.dat
`-- oswvmstat
|-- DB-Server.localdomain_vmstat_16.12.09.1300.dat
|-- DB-Server.localdomain_vmstat_16.12.09.1400.dat
`-- DB-Server.localdomain_vmstat_16.12.09.1500.dat
10 directories, 24 files
采集的数据文件命名格式为OSWatcher 输出文件格式为:<节点名>_<操作系统工具名>_YY.MM.DD.HH24.dat 。如下所示,
配置OSWatcher自启动
OSWatcher在系统重启过后,是无法自动重启的,如果需要设置OSWatcher开机自启动,需要安装、配置osw-service这个RPM包。这个安装包可以从How To Start OSWatcher Black Box (OSWBB) Every System Boot Using RPM oswbb-service (文档 ID 580513.1)出下载。
[root@DB-Server tmp]# rpm -ivh oswbb-service-7.2.0-1.noarch.rpm
Preparing... ########################################### [100%]
1:oswbb-service ########################################### [100%]
[root@DB-Server tmp]#
安装osw-service这个RPM包后,需要配置/etc/oswbb.conf文件后,才能配置OSWatcher自启动。如下所示:
[root@DB-Server ~]#
[root@DB-Server ~]# /sbin/chkconfig oswbb on
[root@DB-Server ~]# /sbin/service oswbb start
Starting OSWatcher: [ OK ]
[root@DB-Server ~]#
关闭OSWatcher
关闭OSWatcher也是非常简单,网上有种说法:stopOSWbb.sh 是Oracle唯一支持的用于停止OSW的方法,其实不然,应该说有两种方法
1:执行脚本stopOSWbb.sh关闭OSWatcher
./stopOSWbb.sh
2:kill进程方法
[root@DB-Server ~]# ps -ef | grep -i OSW
oracle 24863 1 0 16:02 pts/1 00:00:00 /bin/sh ./OSWatcher.sh 30 48
oracle 24904 24863 0 16:03 pts/1 00:00:00 /bin/sh ./OSWatcherFM.sh 48 /home/oracle/monitoring/oswbb/archive
root 25330 18699 0 16:05 pts/2 00:00:00 grep -i osw
[root@DB-Server ~]# kill 24863
[root@DB-Server ~]# kill 24904
[root@DB-Server ~]# ps -ef | grep -i OSW
root 25342 18699 0 16:05 pts/2 00:00:00 grep -i osw
[root@DB-Server ~]#
如果你查看stopOSWbb.sh脚本,你会发现它停止OSWatcher也是通过kill OSWatcher program相关进程的方法,如下所示:
[oracle@DB-Server oswbb]$ more stopOSWbb.sh
#!/bin/sh
######################################################################
# stopOSW.sh
# This is the script which terminates all processes associated with
# the OSWatcher program.
######################################################################
# Kill the OSWatcher processes
######################################################################
PLATFORM=`/bin/uname`
case $PLATFORM in
AIX)
kill -15 `ps -ef | grep OSWatch | grep -v grep | awk '{print $2}'`
;;
*)
kill -15 `ps -e | grep OSWatch | awk '{print $1}'`
;;
esac
OSWatcher bba使用总结
OSWatcher now provides an analysis tool oswbba which analyzes the log files produced by OSWatcher. This tool allows OSWatcher to be selfanalyzing.
This tool also provides a graphing capability to graph the data and to produce a html profile. See the "Graphing and Analyzing the Output" section below.
oswbba is written in java and requires as a minimum java version 1.4.2 or higher. oswbba can run on any Unix X Windows or PC Windows platform. An X Windows environment is required because oswbba uses Oracle Chartbuilder which requires it.
OSWatcher bba 是一个Java语言写的应用程序,需要安装Java 1.4.2 或更高的版本。oswbba能够在任何有X Windows的Unix平台或Windows平台上运行, X Windows环境是必须的,因为oswbba需要用到Oracle Chartbuilder组件,而这个组件需要它。
[root@DB-Server oswbb]# java -version
java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)
[root@DB-Server oswbb]# java -jar oswbba.jar -i /home/oracle/monitoring/oswbb/archive/
或
[root@DB-Server oswbb]# java -jar Xmx512M oswbba.jar -i /home/oracle/monitoring/oswbb/archive/
如果你只想生成某个时间段的报表,你可以使用参数-B -E 如下案例所示
[root@DB-Server oswbb]#java -jar -Xmx256m oswbba.jar -i /home/oracle/scripts/oswbb/archive -B Dec 7 15:30:00 2016 -E Dec 7 17:00:00 2016
在处理过程OSWatcher有下面一些参数可供选择,如下所示。根据具体需要选择。 生成对应的html报表后,将其下载到本机,你就可以做一些分析了。
如下截图所示,这些图表仅仅是其中部分,OSWatcher给人印象最深的就是丰富的图表,而不是一堆枯燥的数字。
当然oswbba也能生成一个非常详细的分析报告,是文本格式,如下部分截图所示。
其实使用OSWatcher的使用并没有什么难度,反而是你要理解报表或分析报告中的指标意义,能够结合AWR、ASH以及其他数据库诊断信息来分析、诊断问题。这个才是有难度的。关于具体指标的详细说明,OSWatcher Analyzer User Guide (文档 ID 461053.1) 中已有详细解说,在此就不班门弄斧了。
OSWatcher常见应用问题
1: 如何判断OSWatcher是否正在运行
判断OSWatcher是否正在运行,非常简单,使用ps命令查看是否存在OSWatcher的相关进程即可。如下所示
[oracle@DB-Server ~]$ ps -ef | grep -i OSWatcher
oracle 23532 1 0 08:01 pts/2 00:00:14 /bin/sh ./OSWatcher.sh 10 2
oracle 23587 23532 0 08:02 pts/2 00:00:00 /bin/sh ./OSWatcherFM.sh 2 /home/oracle/monitoring/oswbb/archive
oracle 25808 24564 0 09:22 pts/3 00:00:00 grep --color=auto -i OSWatcher
[oracle@DB-Server ~]$
2: 没有设置JAVA环境变量,会遇到下面错误:
[oracle@mylnx02 oswbb]$ java -jar oswbba.jar -i /home/oracle/oswbb/archive -B Dec 21 09:00:00 2016 -E Dec 21 10:00:00 2016
Validating times in the archive...
Starting OSW Analyzer V7.3.3
OSWatcher Analyzer Written by Oracle Center of Expertise
Copyright (c) 2014 by Oracle Corporation
Parsing Data. Please Wait...
ERROR. You do not have a legitimate version of java in your PATH.
Linux users please download and install java from java.sun.com or
see the oswbba README for instructions on how to use the version of java
that comes shipped with the Oracle database.
如下所示,设置JAVA环境变量后,然后执行oswbba的相关命令即可。
[oracle@mylnx02 oswbb]$ export PATH=$ORACLE_HOME/jre/1.4.2/bin:$PATH
[oracle@mylnx02 oswbb]$ java -jar oswbba.jar -i /home/oracle/oswbb/archive -B Dec 21 09:00:00 2016 -E Dec 21 10:00:00 2016
Validating times in the archive...
Starting OSW Analyzer V7.3.3
OSWatcher Analyzer Written by Oracle Center of Expertise
Copyright (c) 2014 by Oracle Corporation
Parsing Data. Please Wait...
Scanning file headers for version and platform info...
Parsing file getlnx14.gfg1.esquel.com_iostat_16.12.21.0800.dat ...
Parsing file getlnx14.gfg1.esquel.com_iostat_16.12.21.0900.dat ...
3:运行"java -jar oswbba.jar ..."命令时报"Exception in thread "main" java.lang.OutOfMemoryError"
这个是因为JAVA的堆栈设置太小的缘故,加载到内存中的数据量过于庞大,需要设置参数-Xmx 可以用-Xmx256m或-Xmx512m 如下所示
java -jar -Xmx256m oswbba.jar -i /home/oracle/scripts/oswbb/archive
oswbba parses all the archive files in memory prior to generating graphs or performing an analysis. If you have a large amount of files to parse you may need to allocate more memory in the java heap. If you experience any error messages regarding out of memory such as java.lang.OutOfMemoryError, you may have to increase the size of the java heap. To increase the size of the java heap use the -Xmx flag.
4:oswbba是否可以在命令行界面生成html报告文件? 答案是可以,当然你会遇到很多杂七杂八问题,下面列举了一些我遇到的问题
1: ”No X11 DISPLAY variable was set, but this program performed an operation which requires it“
如果是root账号,只要执行export DISPLAY=:0.0命令后,即可在SecureCRT或 命令窗口生成报表文件。如下所示:
[root@DB-Server oswbb]# export DISPLAY=:0.0
[root@DB-Server oswbb]# java -jar oswbba.jar -i /home/oracle/monitoring/oswbb/archive/
Starting OSW Analyzer V7.3.3
OSWatcher Analyzer Written by Oracle Center of Expertise
Copyright (c) 2014 by Oracle Corporation
Parsing Data. Please Wait...
Scanning file headers for version and platform info...
Parsing file DB-Server.localdomain_iostat_16.12.04.0800.dat ...
Parsing file DB-Server.localdomain_iostat_16.12.04.0900.dat ...
Parsing file DB-Server.localdomain_vmstat_16.12.04.0800.dat ...
Parsing file DB-Server.localdomain_vmstat_16.12.04.0900.dat ...
Parsing file DB-Server.localdomain_netstat_16.12.04.0800.dat ...
Parsing file DB-Server.localdomain_netstat_16.12.04.0900.dat ...
Parsing file DB-Server.localdomain_top_16.12.04.0800.dat ...
Parsing file DB-Server.localdomain_top_16.12.04.0900.dat ...
Parsing file DB-Server.localdomain_ps_16.12.04.0800.dat ...
Parsing file DB-Server.localdomain_ps_16.12.04.0900.dat ...
Parsing Completed.
Enter 1 to Display CPU Process Queue Graphs
Enter 2 to Display CPU Utilization Graphs
Enter 3 to Display CPU Other Graphs
Enter 4 to Display Memory Graphs
Enter 5 to Display Disk IO Graphs
Enter 6 to Generate All CPU Gif Files
Enter 7 to Generate All Memory Gif Files
Enter 8 to Generate All Disk Gif Files
Enter L to Specify Alternate Location of Gif Directory
Enter T to Alter Graph Time Scale Only (Does not change analysis dataset)
Enter D to Return to Default Graph Time Scale
Enter R to Remove Currently Displayed Graphs
Enter A to Analyze Data
Enter S to Analyze Subset of Data(Changes analysis dataset including graph time scale)
Enter P to Generate A Profile
Enter X to Export Parsed Data to File
Enter Q to Quit Program
Please Select an Option:1
Enter 1 to Display CPU Process Queue Graphs
Enter 2 to Display CPU Utilization Graphs
Enter 3 to Display CPU Other Graphs
Enter 4 to Display Memory Graphs
Enter 5 to Display Disk IO Graphs
Enter 6 to Generate All CPU Gif Files
Enter 7 to Generate All Memory Gif Files
Enter 8 to Generate All Disk Gif Files
Enter L to Specify Alternate Location of Gif Directory
Enter T to Alter Graph Time Scale Only (Does not change analysis dataset)
Enter D to Return to Default Graph Time Scale
Enter R to Remove Currently Displayed Graphs
Enter A to Analyze Data
Enter S to Analyze Subset of Data(Changes analysis dataset including graph time scale)
Enter P to Generate A Profile
Enter X to Export Parsed Data to File
Enter Q to Quit Program
Please Select an Option:P
Enter a unique profile directory name or enter <CR> to accept default name:kkk.html
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_Run_Queue.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_Block_Queue.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_Cpu_Idle.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_Cpu_System.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_Cpu_User.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_Cpu_Wa.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_Cpu_Interrupts.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_Context_Switches.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_Memory_Swap.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_Memory_Free.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_Memory_Page_In_Rate.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_IO_ST.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_IO_RPS.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_IO_WPS.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_IO_PB.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_IO_PBTP_1.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_IO_PBTP_2.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_IO_PBTP_3.gif
Generating file profile/DB-Server.localdomain_kkk.html/OSW_profile_files/OSWg_OS_IO_TPS.gif
上面使用的是root用户,如果是oracle用户,还是有可能还会遇到下面问题
此时需要切换到root账号,执行下面命令,然后切换到oracle用户下执行命令
[oracle@DB-Server oswbb]$ su - root
Password:
[root@DB-Server ~]# export DISPLAY=:0.0
[root@DB-Server ~]# xhost local:oracle
non-network local connections being added to access control list
当然,在SecureCRT中,你不能选择1-5这几个选项,否则依然会遇到 Can't connect to X11 window server using ':0.0'这个错误。这个只能在图形化界面才能执行。例如通过VNC来生成。
java -jar -Xmx256m oswbba.jar -i /home/oracle/oswbb/archive -B Mar 5 16:00:00 2016 -E Mar 5 16:30:00 2016
java -jar -Djava.awt.headless=true -Xmx256m oswbba.jar -i /home/oracle/oswbb/archive -B Mar 5 16:00:00 2016 -E Mar 5 16:30:00 2016
参考资料:
OSWatcher(包括:[视频]) (文档 ID 1526578.1)
OSWatcher (Includes: [Video]) (文档 ID 301137.1) How To Start OSWatcher Black Box (OSWBB) Every System Boot Using RPM oswbb-service(文档 ID 580513.1) OSWatcher Analyzer User Guide (文档 ID 461053.1)
|