ASM磁盘组空间不足--ORA-15041:DISGROUP DATA space exhausted (生产库案例)

原创作品,出自 “深蓝的blog” 博客,深蓝的blog:http://blog.csdn.net/huangyanlong/article/details/47277715

近日,处理了一个关于ASM磁盘组空间不足引起的问题。

简单记录如下:

一、问题的反馈

驻地工程师的反馈:

驻地工程师以邮件的形式告知了出现的问题,以及解决该问题的紧急性。

大概这样的描述:告知了巡检时发现了某照片表空间已满,对其进行扩容操作,报错:ORA-15041:DISGROUP "DATA" space exhausted。由于月初需要对上月数据进行考核,客户上传一些照片,此事比较紧急,需立刻解决。

附件中,附带了一些查询信息,如下:
SQL> select group_number,name,total_mb,free_mb from v$ASM_DISKGROUP;

GROUP_NUMBER NAME                             TOTAL_MB    FREE_MB
------------ ------------------------------ ---------- ----------
           1 ARCH                               860159     405817
           2 CRS                                 30717      29791
           3 DATA                              1638394        238

SQL> select name,group_number,state,redundancy,total_mb,free_mb,path from v$asm_disk;

NAME                           GROUP_NUMBER STATE    REDUNDA   TOTAL_MB
------------------------------ ------------ -------- ------- ----------
   FREE_MB
----------
PATH
--------------------------------------------------------------------------------
ARCH_0000                                 1 NORMAL   UNKNOWN     860159
    405817
/dev/oracleasm/disks/ARCH

CRS_0002                                  2 NORMAL   UNKNOWN      10239
      9931
/dev/oracleasm/disks/VOTE_CRS3

NAME                           GROUP_NUMBER STATE    REDUNDA   TOTAL_MB
------------------------------ ------------ -------- ------- ----------
   FREE_MB
----------
PATH
--------------------------------------------------------------------------------

CRS_0001                                  2 NORMAL   UNKNOWN      10239
      9930
/dev/oracleasm/disks/VOTE_CRS2

DATA_0001                                 3 NORMAL   UNKNOWN     819197
       112

NAME                           GROUP_NUMBER STATE    REDUNDA   TOTAL_MB
------------------------------ ------------ -------- ------- ----------
   FREE_MB
----------
PATH
--------------------------------------------------------------------------------
/dev/oracleasm/disks/DATA2

DATA_0000                                 3 NORMAL   UNKNOWN     819197
       126
/dev/oracleasm/disks/DATA1

CRS_0000                                  2 NORMAL   UNKNOWN      10239

NAME                           GROUP_NUMBER STATE    REDUNDA   TOTAL_MB
------------------------------ ------------ -------- ------- ----------
   FREE_MB
----------
PATH
--------------------------------------------------------------------------------
      9930
/dev/oracleasm/disks/VOTE_CRS1


6 rows selected.


二、紧急的处理

连入生成库,查询确实asm空间严重不足了。
ASMCMD> lsdg

State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576    860159   405780                0          405780              0             N  ARCH/
MOUNTED  NORMAL  N         512   4096  1048576     30717    29791            10239            9776              0             Y  CRS/
MOUNTED  EXTERN  N         512   4096  1048576   1638394      238                0             238              0             N  DATA/

    为快速解决问题,让应用跑起来,决定先从如何解决无法扩充表空间的方面进行入手。
想到的是缩减低利用率的表空间。

    于是查看表空间的使用情况:
1、发现undo表空间、temp表空间被扩容了很大,可以对其缩减;
2、发现了一些低利用率的表空间,诸如GB级别的只存了几M的数据量,可以考虑缩减;

    于是连续使用诸如下面这样的命令:

ALTER DATABASE 
  TEMPFILE '+DATA/xcky/xckytmp04.dbf'
 RESIZE 1024M;

用来实现对可缩减表空间的大小进行缩减。

经过一番空间缩减后,再次查询空间使用率,满足扩容表空间的需求,完成了业务中存储照片表空间的扩容。应用系统使用恢复正常。


三、阶段性回馈

    快速回馈驻地工程师问题解决情况。
 
    问题原因是:ASM磁盘组空间不足引起。 

1、临时采取的方法是缩减了其它表空间的大小,为/DATA目录释放空间(缩减了undo表空间、temp表空间、其它空间利用率较低的表空间的大小)。

并且,已经新建了一个10G,自动扩展,存储照片的表空间,命名为photo_info47.dbf。

2、但后续建议:

(1)为存储扩容。
按照本环境的ASM规划策略,目前ASM磁盘组中的/DATA已经使用了约1.4T(总大小约为1.5T),/DATA下目前可用空间剩余约50G。

(2)或重新规划asm存储,考虑临时在/ARCH上扩充表空间(目前剩余400G可用),但该/ARCH是用于存放归档文件的,不建议这么做,后续有如果归档剧增,有引发出现hang停数据库的可能。

 
四、后续解决本质性问题

 

再次连接生产库,查询是否有进一步解决问题的好方法。
先来查询目前空间的大致使用情况。  
SQL> conn sys/oracle as sysdba
Connected.
SQL> show user
USER is "SYS"
SQL> select path,total_mb,free_mb from v$asm_disk_stat;

PATH                                                 TOTAL_MB    FREE_MB
-------------------------------------------------- ---------- ----------
/dev/oracleasm/disks/ARCH                              860159     405780
/dev/oracleasm/disks/VOTE_CRS3                          10239       9931
/dev/oracleasm/disks/VOTE_CRS2                          10239       9930
/dev/oracleasm/disks/DATA2                             819197      25466
/dev/oracleasm/disks/DATA1                             819197      25480
/dev/oracleasm/disks/VOTE_CRS1                          10239       9930

6 rows selected.

ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576    860159   404777                0          404777              0             N  ARCH/
MOUNTED  NORMAL  N         512   4096  1048576     30717    29791            10239            9776              0             Y  CRS/
MOUNTED  EXTERN  N         512   4096  1048576   1638394    49590                0           49590              0             N  DATA/

查一下磁盘组的情况
SQL> select name,state from v$asm_diskgroup;

NAME                           STATE
------------------------------ -----------
ARCH                           CONNECTED
CRS                            MOUNTED
DATA                           CONNECTED


查看系统的磁盘使用情况,发下了一个好信息。
不知道为什么,存储中,竟然有一块盘没有利用。那太好了,可以把它分给ASM了。
于是,下面先把这块盘查询出来。

[root@gzxkdb1 ~]# fdisk -l
Disk /dev/emcpoweree: 2147.4 GB, 2147483648000 bytes
255 heads, 63 sectors/track, 261083 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/emcpoweree doesn't contain a valid partition table
通过以上信息,锁定/dev/emcpoweree设备没有被划分使用。 
 
对该设备进行磁盘分区 
[root@gzxkdb1 ~]# fdisk /dev/emcpoweree
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.


The number of cylinders for this disk is set to 261083.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-261083, default 1): 
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-261083, default 261083): +500G

Command (m for help): p

Disk /dev/emcpoweree: 2147.4 GB, 2147483648000 bytes
255 heads, 63 sectors/track, 261083 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

          Device Boot      Start         End      Blocks   Id  System
/dev/emcpoweree1               1       60789   488287611   83  Linux

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (60790-261083, default 60790): 
Using default value 60790
Last cylinder or +size or +sizeM or +sizeK (60790-261083, default 261083): +500G

Command (m for help): p

Disk /dev/emcpoweree: 2147.4 GB, 2147483648000 bytes
255 heads, 63 sectors/track, 261083 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

          Device Boot      Start         End      Blocks   Id  System
/dev/emcpoweree1               1       60789   488287611   83  Linux
/dev/emcpoweree2           60790      121578   488287642+  83  Linux

Command (m for help): m
Command action
   a   toggle a bootable flag
   b   edit bsd disklabel
   c   toggle the dos compatibility flag
   d   delete a partition
   l   list known partition types
   m   print this menu
   n   add a new partition
   o   create a new empty DOS partition table
   p   print the partition table
   q   quit without saving changes
   s   create a new empty Sun disklabel
   t   change a partition's system id
   u   change display/entry units
   v   verify the partition table
   w   write table to disk and exit
   x   extra functionality (experts only)

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (121579-261083, default 121579): 
Using default value 121579
Last cylinder or +size or +sizeM or +sizeK (121579-261083, default 261083): +500G

Command (m for help): p

Disk /dev/emcpoweree: 2147.4 GB, 2147483648000 bytes
255 heads, 63 sectors/track, 261083 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

          Device Boot      Start         End      Blocks   Id  System
/dev/emcpoweree1               1       60789   488287611   83  Linux
/dev/emcpoweree2           60790      121578   488287642+  83  Linux
/dev/emcpoweree3          121579      182367   488287642+  83  Linux

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Selected partition 4
First cylinder (182368-261083, default 182368): 
Using default value 182368
Last cylinder or +size or +sizeM or +sizeK (182368-261083, default 261083): 
Using default value 261083

Command (m for help): p

Disk /dev/emcpoweree: 2147.4 GB, 2147483648000 bytes
255 heads, 63 sectors/track, 261083 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

          Device Boot      Start         End      Blocks   Id  System
/dev/emcpoweree1               1       60789   488287611   83  Linux
/dev/emcpoweree2           60790      121578   488287642+  83  Linux
/dev/emcpoweree3          121579      182367   488287642+  83  Linux
/dev/emcpoweree4          182368      261083   632286270   83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

 
    以上,完成了对该磁盘分区。分四个主分区,3个500G,剩余最后部分一个区。

查看asm磁盘列表


[root@gzxkdb1 ~]# service oracleasm listdisks
ARCH
DATA1
DATA2
VOTE_CRS1
VOTE_CRS2
VOTE_CRS3
创建asm磁盘
[root@gzxkdb1 ~]# service oracleasm createdisk DATA3 /dev/emcpoweree1
Marking disk "DATA3" as an ASM disk:                       [  OK  ]
[root@gzxkdb1 ~]# service oracleasm createdisk DATA4 /dev/emcpoweree2
Marking disk "DAT43" as an ASM disk:                       [  OK  ]
[root@gzxkdb1 ~]# service oracleasm createdisk DATA5 /dev/emcpoweree3
Marking disk "DATA5" as an ASM disk:                       [  OK  ]
[root@gzxkdb1 ~]# service oracleasm createdisk DATA6 /dev/emcpoweree4
Marking disk "DATA6" as an ASM disk:                       [  OK  ]

在另外一个节点,对新添加的磁盘进行扫描
[root@gzxkdb2 ~]# service oracleasm scandisks   //节点2上完成扫描磁盘
[root@gzxkdb2 ~]# service oracleasm listdisks
ARCH
DATA4
DATA1
DATA2
DATA3
DATA5
DATA6
VOTE_CRS1
VOTE_CRS2
VOTE_CRS3 
 
在节点1,用sysasm用户进行登录实例
[grid@gzxkdb1 ~]$ sqlplus '/as sysasm'

SQL*Plus: Release 11.2.0.3.0 Production on Mon Aug 3 17:48:58 2015

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

查看asm磁盘情况
SQL> set linesize 200
SQL> set pagesize 200
SQL> col NAME for a30
SQL> col PATH for a50
SQL> r
  1* select name,path,mode_status,state,disk_number,failgroup from v$asm_disk

NAME                           PATH                                               MODE_ST STATE    DISK_NUMBER FAILGROUP
------------------------------ -------------------------------------------------- ------- -------- ----------- ------------------------------
                               /dev/oracleasm/disks/DATA6                         ONLINE  NORMAL             0
                               /dev/oracleasm/disks/DATA5                         ONLINE  NORMAL             1
                               /dev/oracleasm/disks/DATA4                         ONLINE  NORMAL             2
                               /dev/oracleasm/disks/DATA3                         ONLINE  NORMAL             3
ARCH_0000                      /dev/oracleasm/disks/ARCH                          ONLINE  NORMAL             0 ARCH_0000
CRS_0002                       /dev/oracleasm/disks/VOTE_CRS3                     ONLINE  NORMAL             2 CRS_0002
CRS_0001                       /dev/oracleasm/disks/VOTE_CRS2                     ONLINE  NORMAL             1 CRS_0001
DATA_0001                      /dev/oracleasm/disks/DATA2                         ONLINE  NORMAL             1 DATA_0001
DATA_0000                      /dev/oracleasm/disks/DATA1                         ONLINE  NORMAL             0 DATA_0000
CRS_0000                       /dev/oracleasm/disks/VOTE_CRS1                     ONLINE  NORMAL             0 CRS_0000

10 rows selected.


批量扩容ASM磁盘组
SQL> alter diskgroup DATA add disk '/dev/oracleasm/disks/DATA3' rebalance 10;

Diskgroup altered.

SQL> alter diskgroup DATA add disk '/dev/oracleasm/disks/DATA4' rebalance 10;

Diskgroup altered.

SQL> alter diskgroup DATA add disk '/dev/oracleasm/disks/DATA5' rebalance 10;

Diskgroup altered.

SQL> alter diskgroup DATA add disk '/dev/oracleasm/disks/DATA6' rebalance 10;

Diskgroup altered.

 

SQL> select * from v$asm_operation;

GROUP_NUMBER OPERA STAT      POWER     ACTUAL      SOFAR   EST_WORK   EST_RATE EST_MINUTES ERROR_CODE
------------ ----- ---- ---------- ---------- ---------- ---------- ---------- ----------- --------------------------------------------
           3 REBAL RUN          10         10      59949     634963       5143         111

当查询v$asm_operation没有数据时,表示IO自动均衡已经完成


SQL> select * from v$asm_operation;
no rows selected


           
再次查看磁盘组的空间           
ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576    860159   404170                0          404170              0             N  ARCH/
MOUNTED  NORMAL  N         512   4096  1048576     30717    29791            10239            9776              0             Y  CRS/
MOUNTED  EXTERN  Y         512   4096  1048576   3686390  2097561                0         2097561              0             N  DATA/          
DATA/已经得到扩容,已经有近2T的剩余空间,可以满足一段时期业务的需求了。


五、最后反馈          

留言:           
    昨天贵州的“ASM磁盘组空间不足”问题。后续,发现了“盘阵”有未用空间,约2T,已经为ASM添加。
可以满足一段时间的磁盘空间需要了。           
           
    驻地工程师表示了感谢。

 

至此,本次任务记录完成。   

posted on 2016-11-14 22:33  张冲andy  阅读(2269)  评论(0编辑  收藏  举报

导航