Oracle性能分析(一) DB Time和DB CPU

请关注个人小站:http://sqlhis.com/

分析数据库的负载高低,通常可以通过CPU利用率,磁盘响应速度等进行分析,Oracle数据库提供AWR报告,报告的内容很多很多,但是却没有一个定性的指标,能回答:

1. 到底数据库当前的负载是高还是低?

2. 我感觉我数据库很慢,我的数据库硬件是是不是应该升级了?

 

先看一段典型的AWR报告,截取部分片段

 

Host NamePlatformCPUsCoresSocketsMemory (GB)
ABCD Linux x86 64-bit 12 12 12 94.15
 Snap IdSnap TimeSessionsCursors/SessionInstances
Begin Snap: 13647 24-Aug-18 00:00:40 575 4.1 2
End Snap: 13695 25-Aug-18 00:00:06 558 4.6 2
Elapsed:   1,439.43 (mins)      
DB Time:   2,762.39 (mins)      

这里有三个关键信息:

1. 首先,我们的服务器是12个CPU,12核

2. 这个AWR报告的区间是从  24-Aug-18 00:00:40 到 25-Aug-18 00:00:06,整个区间是1,439.43 (mins), 24小时。

3. 然后是DB TIME: 2,762.39 (mins), 这是一个关键的性能指标,就是所有数据库请求的运行时间加总。

 

举个栗子,假设一家理发店24小时(1,439.43 (mins))营业,一共有12个理发师(12 CPU),顾客在理发店的累计逗留时间就是 DB TIME:2,762.39 (mins)。

如果拿顾客累计逗留时间除以营业时间,2762.30/1439.43=1.92, 也就是说,在营业时间内,店内平均有1.92个顾客。

顾客的累计逗留时间(DB TIME)不是理发师(CPU)的累计工作时间,如果两个顾客都要烫发,正巧只有一个烫发机,则其中一个顾客需要等待(阻塞),这段时间理发师没有干活,

所以:

DB CPU 是一定会小于 DB Time的, 理发师剪头发的时间一定小于顾客在店内逗留的时间。

 

继续看AWR报告,给出了理发师的工作时间,DB CPUS(S)=0.9 ,我们有12个理发师,也就是全天CPU的利用率是 0.9/12=7.5%

Load Profile

 

 

 

 Per SecondPer TransactionPer ExecPer Call
DB Time(s): 1.9 0.1 0.00 0.00
DB CPU(s): 0.9 0.0 0.00 0.00
Redo size (bytes): 2,286,644.3 69,730.7    
Logical read (blocks): 117,205.0 3,574.1    
Block changes: 9,588.5 292.4    
Physical read (blocks): 1,768.0 53.9    
Physical write (blocks): 287.0 8.8    
Read IO requests: 201.7 6.2    
Write IO requests: 75.0 2.3    
Read IO (MB): 13.8 0.4    
Write IO (MB): 2.2 0.1    
Global Cache blocks received: 29.1 0.9    
Global Cache blocks served: 37.5 1.1    
User calls: 3,122.8 95.2    
Parses (SQL): 473.3 14.4    
Hard parses (SQL): 9.5 0.3    
SQL Work Area (MB): 15.8 0.5    
Logons: 0.1 0.0    
Executes (SQL): 11,292.7 344.4    
Rollbacks: 0.7 0.0    
Transactions: 32.8      

 

全天CPU利用率只有7.5%,整体看比较低,但是不意味着数据库的负荷很低,有两种情况:

1. 负荷可能集中在某个时间段,这就需要我们观察每个时间段的数据库负荷。

2. 数据库的瓶颈可能不在CPU,而在磁盘或者网络

1)数据库内存较少,大部分时间消耗在磁盘的读取上,这个可以从磁盘的响应速度判断

2)网络带宽不足,数据库要发送大量数据到客户端,在这里形成了等待。

  

所以现在我们想分时间段获得DB TIME和DB CPU这两个指标,如果需要按每个时间段导出一个AWR报告,这是一件令人抓狂的事,还好,AWR报告实际上就是通过一些数据库视图生成的,这些视图就是数据库定时的快照,所以我们可以利用SQL语句很方便的生成。

首先看下快照的主表:

SELECT * FROM DBA_HIST_SNAPSHOT ORDER BY SNAP_ID

    SNAP_ID DBID INSTANCE_NUMBER STARTUP_TIME BEGIN_INTERVAL_TIME END_INTERVAL_TIME FLUSH_ELAPSED SNAP_LEVEL ERROR_COUNT SNAP_FLAG SNAP_TIMEZONE
1 13648 1647988888 2 10-8月 -18   05.20.05.000 下午 24-8月 -18   12.00.40.204 上午 24-8月 -18   12.30.25.506 上午 +00000 00:00:00.5 1 0 0 +0 08:00:00
2 13648 1647988888 1 28-7月 -18   08.01.15.000 下午 24-8月 -18   12.00.40.229 上午 24-8月 -18   12.30.25.524 上午 +00000 00:00:02.5 1 0 0 +0 08:00:00
3 13649 1647988888 1 28-7月 -18   08.01.15.000 下午 24-8月 -18   12.30.25.524 上午 24-8月 -18   01.00.16.839 上午 +00000 00:00:02.2 1 0 0 +0 08:00:00
4 13649 1647988888 2 10-8月 -18   05.20.05.000 下午 24-8月 -18   12.30.25.506 上午 24-8月 -18   01.00.16.820 上午 +00000 00:00:01.2 1 0 0 +0 08:00:00
5 13650 1647948888 1 28-7月 -18   08.01.15.000 下午 24-8月 -18   01.00.16.839 上午 24-8月 -18   01.30.08.125 上午 +00000 00:00:02.9 1 0 0 +0 08:00:00
6 13650 1647988888 2 10-8月 -18   05.20.05.000 下午 24-8月 -18   01.00.16.820 上午 24-8月 -18   01.30.08.072 上午 +00000 00:00:00.9 1 0 0 +0 08:00:00
7 13651 1647988888 1 28-7月 -18   08.01.15.000 下午 24-8月 -18   01.30.08.125 上午 24-8月 -18   02.00.02.352 上午 +00000 00:00:01.6 1 0 0 +0 08:00:00
8 13651 1647988888 2 10-8月 -18   05.20.05.000 下午 24-8月 -18   01.30.08.072 上午 24-8月 -18   02.00.02.334 上午 +00000 00:00:00.7 1 0 0 +0 08:00:00

 通过这个表我们可以判断如下:

1. 半小时一个快照(这个是配置值,可改),快照并不是完全的整点,存在一定的误差,即快照的区间接近半小时,可能会多几十秒或者少几十秒。

2. 这里有两个INSTANCE_NUMBER,因为这个是集群,所以实际上有两个物理机。

 

DBA_HIST_SYS_TIME_MODEL 表里面有我们需要的DB CPU和DB time,这个值是累计值,即需要算一个区间的数,需要那结束时间的值减去开始时间的值。

SELECT * FROM DBA_HIST_SYS_TIME_MODEL
WHERE SNAP_ID='13648'
AND STAT_NAME IN ('DB CPU','DB time')

    SNAP_ID DBID INSTANCE_NUMBER STAT_ID STAT_NAME VALUE
1 13648 1647988888 2 2748282437 DB CPU 41061757271
2 13648 1647988888 1 2748282437 DB CPU 2214287108077
3 13648 1647988888 2 3649082374 DB time 105530922248
4 13648 1647988888 1 3649082374 DB time 3992370209915

 

 

创建一个表存储原始数据,

CREATE TABLE SYSTEMLOAD AS 
WITH TEMP AS(
  SELECT a.INSTANCE_NUMBER,
         to_number(to_char(A.BEGIN_INTERVAL_TIME,'yyyymmdd')) AS BEGIN_DATE,
         TRUNC(((A.BEGIN_INTERVAL_TIME+0)-trunc(A.BEGIN_INTERVAL_TIME+0))*48) AS SNAPORDER,
         SUBSTR('0' || TRUNC(TRUNC(((A.BEGIN_INTERVAL_TIME+0)-trunc(A.BEGIN_INTERVAL_TIME+0))*48)/2) ||
         (CASE WHEN mod(TRUNC(((A.BEGIN_INTERVAL_TIME+0)-trunc(A.BEGIN_INTERVAL_TIME+0))*48),2)=1 THEN '30' ELSE '00' END),-4)
         AS BEGIN_TIME,
         A.BEGIN_INTERVAL_TIME+0 AS BEGIN_INTERVAL_TIME,
         A.END_INTERVAL_TIME+0 AS END_INTERVAL_TIME,
         ((A.END_INTERVAL_TIME+0)-(A.BEGIN_INTERVAL_TIME+0))*60*60*24 AS TIMEINTERVAL,
         B.SNAP_ID AS STARTSNAP_ID,
         A.SNAP_ID AS ENDSNAP_ID
  FROM DBA_HIST_SNAPSHOT A
  INNER JOIN DBA_HIST_SNAPSHOT B  /*寻找前一个SNAP_ID*/
  ON A.SNAP_ID-1=B.SNAP_ID
  AND A.INSTANCE_NUMBER=B.INSTANCE_NUMBER
  ORDER BY A.SNAP_ID
)
SELECT A.*,
       DBTIME_E.VALUE-DBTIME_S.VALUE AS DBTIME,
       DBCPU_E.VALUE-DBCPU_S.VALUE AS DBCPU,  
       REDOSIZE_E.VALUE-REDOSIZE_S.VALUE AS REDOSIZE,
       PHYSICALREAD_E.VALUE-PHYSICALREAD_S.VALUE AS PHYSICALREAD,
       PHYSICALWRITE_E.VALUE-PHYSICALWRITE_S.VALUE AS PHYSICALWRITE      
FROM TEMP A
LEFT OUTER JOIN DBA_HIST_SYS_TIME_MODEL  DBTIME_S 
ON a.STARTSNAP_ID=DBTIME_S.SNAP_ID
AND A.INSTANCE_NUMBER=DBTIME_S.INSTANCE_NUMBER
AND DBTIME_S.STAT_NAME='DB time'
LEFT OUTER JOIN DBA_HIST_SYS_TIME_MODEL  DBTIME_E
ON a.ENDSNAP_ID=DBTIME_E.SNAP_ID
AND A.INSTANCE_NUMBER=DBTIME_E.INSTANCE_NUMBER
AND DBTIME_E.STAT_NAME='DB time'
LEFT OUTER JOIN DBA_HIST_SYS_TIME_MODEL  DBCPU_S 
ON a.STARTSNAP_ID=DBCPU_S.SNAP_ID
AND A.INSTANCE_NUMBER=DBCPU_S.INSTANCE_NUMBER
AND DBCPU_S.STAT_NAME='DB CPU'
LEFT OUTER JOIN DBA_HIST_SYS_TIME_MODEL  DBCPU_E
ON a.ENDSNAP_ID=DBCPU_E.SNAP_ID
AND A.INSTANCE_NUMBER=DBCPU_E.INSTANCE_NUMBER
AND DBCPU_E.STAT_NAME='DB CPU'
LEFT OUTER JOIN DBA_HIST_SYSSTAT REDOSIZE_S
ON A.STARTSNAP_ID=REDOSIZE_S.SNAP_ID
AND A.INSTANCE_NUMBER=REDOSIZE_S.INSTANCE_NUMBER
AND REDOSIZE_S.STAT_NAME='redo size'
LEFT OUTER JOIN DBA_HIST_SYSSTAT REDOSIZE_E
ON A.ENDSNAP_ID=REDOSIZE_E.SNAP_ID
AND A.INSTANCE_NUMBER=REDOSIZE_E.INSTANCE_NUMBER
AND REDOSIZE_E.STAT_NAME='redo size'
LEFT OUTER JOIN DBA_HIST_SYSSTAT PHYSICALREAD_S
ON A.STARTSNAP_ID=PHYSICALREAD_S.SNAP_ID
AND A.INSTANCE_NUMBER=PHYSICALREAD_S.INSTANCE_NUMBER
AND PHYSICALREAD_S.STAT_NAME='physical read bytes'
LEFT OUTER JOIN DBA_HIST_SYSSTAT PHYSICALREAD_E
ON A.ENDSNAP_ID=PHYSICALREAD_E.SNAP_ID
AND A.INSTANCE_NUMBER=PHYSICALREAD_E.INSTANCE_NUMBER
AND PHYSICALREAD_E.STAT_NAME='physical read bytes'
LEFT OUTER JOIN DBA_HIST_SYSSTAT PHYSICALWRITE_S
ON A.STARTSNAP_ID=PHYSICALWRITE_S.SNAP_ID
AND A.INSTANCE_NUMBER=PHYSICALWRITE_S.INSTANCE_NUMBER
AND PHYSICALWRITE_S.STAT_NAME='physical write bytes'
LEFT OUTER JOIN DBA_HIST_SYSSTAT PHYSICALWRITE_E
ON A.ENDSNAP_ID=PHYSICALWRITE_E.SNAP_ID
AND A.INSTANCE_NUMBER=PHYSICALWRITE_E.INSTANCE_NUMBER
AND PHYSICALWRITE_E.STAT_NAME='physical write bytes'
AND DBTIME_E.VALUE-DBTIME_S.VALUE>0 /*重启可能导致重新累积,不加入这些异常数据*/
--WHERE A.INSTANCE_NUMBER=1
ORDER BY A.STARTSNAP_ID

 

输出单日的运行指标:

SELECT BEGIN_DATE,
       MAX(BEGIN_TIME) AS BEGIN_TIME,
       ROUND(SUM(DBTIME)/SUM(TIMEINTERVAL)/1000000,2) AS AVGDBTIME,
       ROUND(SUM(DBCPU)/SUM(TIMEINTERVAL)/1000000,2) AS AVGDBCPU,
       ROUND(SUM(REDOSIZE)/SUM(TIMEINTERVAL)/1024/1024,2) AS AVGREDOSIZE ,
       ROUND(SUM(PHYSICALREAD)/SUM(TIMEINTERVAL)/1024/1024,2) AS AVGPHYSICALREAD,
       ROUND(SUM(PHYSICALWRITE)/SUM(TIMEINTERVAL)/1024/1024,2) AS AVGPHYSICALWRITE
FROM SYSTEMLOAD
WHERE INSTANCE_NUMBER=1
AND BEGIN_DATE='20180831'
GROUP BY BEGIN_DATE,SNAPORDER
ORDER BY BEGIN_DATE,SNAPORDER

 

以下为输出,可以看出DB TIME和DB CPU其实都不高,如果算CPU的利用率,将AVGDBCPU除以12即可,大概不到20%

    BEGIN_DATE BEGIN_TIME AVGDBTIME AVGDBCPU AVGREDOSIZE AVGPHYSICALREAD AVGPHYSICALWRITE
1 20180831 0 1.71 1.16 0.98 1.37 1.19
2 20180831 30 0.98 0.89 0.31 0.14 0.44
3 20180831 100 1.31 0.88 0.28 0.12 0.43
4 20180831 130 1.19 1.06 0.47 0.05 0.55
5 20180831 200 1.79 1.38 0.49 4.81 0.69
6 20180831 230 1.75 1.62 0.67 0.58 0.75
7 20180831 300 1.42 1.33 0.53 0.73 0.73
8 20180831 330 1.44 1.28 0.59 5.46 0.67
9 20180831 400 0.96 0.92 0.2 0.02 0.37
10 20180831 430 1.03 0.95 0.4 0.06 0.38
11 20180831 500 1.16 1.02 1.03 1.47 1.32
12 20180831 530 1.11 1 0.4 0.65 0.4
13 20180831 600 1.74 1.21 2.08 5.51 1.46
14 20180831 630 1.37 1.13 0.51 1.5 0.57
15 20180831 700 1.24 1.12 0.41 0.66 0.64
16 20180831 730 1.29 1.06 0.62 2.06 0.66
17 20180831 800 5.89 2.02 3.64 43.82 3.93
18 20180831 830 2.28 1.66 1.82 11.51 2.21
19 20180831 900 3.29 2.08 3.69 12.46 3.15
20 20180831 930 3.15 1.7 7.08 27.65 7.41
21 20180831 1000 2.37 1.36 6.3 20.12 5.92
22 20180831 1030 1.93 1.1 5.46 25.37 5.04
23 20180831 1100 2.54 1.3 5.55 32.08 5.92
24 20180831 1130 1.76 1.03 4.82 17.46 5.66
25 20180831 1200 1.03 0.57 2.19 18.68 2.47
26 20180831 1230 1.22 0.89 2.67 17.34 3.04
27 20180831 1300 1.68 1.14 4.73 21.53 4.47
28 20180831 1330 1.92 1.05 4.83 21.42 5.06
29 20180831 1400 1.88 1.23 5.21 22.94 5.14
30 20180831 1430 1.18 0.65 4.03 10.04 4.96
31 20180831 1500 1.32 0.62 2.97 6.28 2.83
32 20180831 1530 1.68 0.91 3.85 14.67 5.12
33 20180831 1600 5.56 1.05 5.9 41.36 5.46
34 20180831 1630 2.82 0.86 3.34 7.82 4.21
35 20180831 1700 2.85 1.31 3.24 37.8 4.28
36 20180831 1730 2.7 1.16 3.43 19.22 4.11
37 20180831 1800 2.36 0.73 2.57 25.6 3.81
38 20180831 1830 1.82 1.12 1.98 17.04 6.45
39 20180831 1900 1.63 0.83 2.36 15.45 3.15
40 20180831 1930 0.9 0.48 1.59 4.32 2.08
41 20180831 2000 1.09 0.28 1.01 9.93 2.01
42 20180831 2030 0.61 0.21 0.85 0.21 1.08
43 20180831 2100 0.49 0.21 0.77 0.21 1.03
44 20180831 2130 0.41 0.2 0.96 1.19 1.06
45 20180831 2200 2.35 0.72 2.79 22.78 4.89
46 20180831 2230 0.78 0.39 0.82 8.1 1.54
47 20180831 2300 0.2 0.12 0.66 0.53 0.87
48 20180831 2330 0.19 0.1 0.53 0.07 0.77

 

总结:

统计了DB CPU和DB TIME两个关键指标,但是DB CPU是数据库层面的数据,并不是服务器层面的数据,有可能服务器上面还运行了其他程序,占用了大量的CPU,导致DB CPU这个值很低,但是实际上服务器的CPU非常繁忙。下一篇文章将获得服务器层面的CPU利用率。

 

posted @ 2018-09-01 13:43  artmouse  阅读(8597)  评论(0编辑  收藏  举报