笔记287 数据库系统异常排查之 DMV 2013-5-23
笔记287 数据库系统异常排查之 DMV 2013-5-23
1 --数据库系统异常排查之 DMV 2013-5-23 2 3 -- 一 从数据库连接情况来判断异常: 4 -- 5 --1. 首先我们来看一下目前数据库系统所有请求情况 6 --request info 7 select s. session_id , s . status, db_name (r . database_id) as database_name , 8 s .login_name , s. login_time , s . host_name, 9 c .client_net_address , c. client_tcp_port ,s . program_name, 10 r .cpu_time , r. reads , r . writes, c .num_reads , c. num_writes , 11 s .client_interface_name , 12 s .last_request_start_time , s. last_request_end_time , 13 c .connect_time , c. net_transport , c . net_packet_size, 14 r .start_time , r. status , r . command, 15 r .blocking_session_id , r. wait_type , 16 r .wait_time , r. last_wait_type , r . wait_resource, r. open_transaction_count, 17 r .percent_complete , r. granted_query_memory 18 from Sys.dm_exec_requests r with( nolock ) 19 right outer join Sys.dm_exec_sessions s with ( nolock) 20 on r. session_id = s . session_id 21 right outer join Sys.dm_exec_connections c with ( nolock) 22 on s. session_id = c . session_id 23 where s. session_id >50 24 order by s .session_id 25 26 -- 这个查询将目前数据库中的所有请求都显示出来了,其中比较重要的有 Status 、Login_name 、 Host_Name, 27 --Client_Net_Address 、 Program_name等,但是信息比较多,我们很难查看有什么异常, 28 -- 初步判断连接数是否超过了平时的标准,很这个查询返回的记录数 29 -- (很多时候系统异常是连接数过多造成的,而连接数过多又是因为其他原因影响的)。 30 31 ---------------------------------------------- 32 --2. 哪个用户连接数最多: 33 34 --request info by user 35 select login_name, COUNT (0 ) user_count 36 from Sys.dm_exec_requests r with( nolock ) 37 right outer join Sys.dm_exec_sessions s with ( nolock) 38 on r. session_id = s . session_id 39 right outer join Sys.dm_exec_connections c with ( nolock) 40 on s. session_id = c . session_id 41 where s. session_id >50 42 group by login_name 43 order by 2 DESC 44 45 -- 从图中我们可以很方便的看出用户连接数情况, 46 -- 如果我们的不同的功能是使用不同的的数据库账号的话, 47 -- 就能初步判断是哪部分功能可能出现了异常 48 49 ---------------------------------------------------- 50 --3. 哪台机器发起到数据库的连接数最多 51 --request info by hostname 52 select s. host_name ,c . client_net_address, COUNT (0 ) host_count 53 from Sys.dm_exec_requests r with( nolock ) 54 right outer join Sys.dm_exec_sessions s with ( nolock) 55 on r. session_id = s . session_id 56 right outer join Sys.dm_exec_connections c with ( nolock) 57 on s. session_id = c . session_id 58 where s. session_id >50 59 group by host_name, client_net_address 60 order by 3 DESC 61 62 -- 这个查询能够一下就帮我们找出来哪些机器发起了对数据库的链接,它们的链接数量是否有异常;这个其实对调查某些问题非常有用,我有一次就遇 63 -- 64 -- 到一个case : 65 -- 66 -- 用户反映,过一两个星期,系统就会出现一次异常,出问题时数据库连接数量很高,大量的访问被数据库拒绝,过半个小时左右,系统又自动恢复了,但是 67 -- 68 -- 在数据库里面查看,并没有发现有异常的进程和错误的信息,问题一时很棘手,很难定位,系统不稳定领导不满, DBA 顶着压力一时不知道如何是好;后面 69 -- 70 -- 转换方向,通过调查问题发生时,为什么会产生这么多连接,这些连接是那些机器发过来的,这些连接发过来正常吗,是数据库不砍业务的重负,还是业务 71 -- 72 -- 在某个时间段内会出现暴涨等一系列原因,最终找出是一台 Web 因为开发人员代码写的有问题,内存出现内存泄露,导致大量的连接不能释放,出问题是, 73 -- 74 -- 发出的数据库连接数比平时高倍,最终影响到了数据库,问题压根和数据库没关系(从这个事实看出, DBA 真是的炮灰角色,不是自己的问题,也得顶 75 -- 76 -- 着压力调查出原因呀);如果在类似问题发生时,我们能通过这个查询及早知道问题是出在某台 Web 机器上,那就不用费尽心力来调查数据库了。 77 78 79 --------------------------------------------------------------------------------------------- 80 --4. 哪些连接在访问哪个库 81 --request info by databases 82 select db_name ( r. database_id ) as database_name, COUNT (0 ) host_count 83 from Sys.dm_exec_requests r with( nolock ) 84 right outer join Sys.dm_exec_sessions s with ( nolock) 85 on r. session_id = s . session_id 86 right outer join Sys.dm_exec_connections c with ( nolock) 87 on s. session_id = c . session_id 88 where s. session_id >50 89 group by r .database_id 90 order by 2 DESC 91 92 -- 结果(为NULL 的估计是没办法定位库) 93 94 ------------------------------------------------------------------------------------- 95 --5. 进程状态: 96 --request info by status 97 select s. status ,COUNT ( 0) host_count 98 from Sys.dm_exec_requests r with( nolock ) 99 right outer join Sys.dm_exec_sessions s with ( nolock) 100 on r. session_id = s . session_id 101 right outer join Sys.dm_exec_connections c with ( nolock) 102 on s. session_id = c . session_id 103 where s. session_id >50 104 group by s .status 105 order by 2 DESC 106 107 -- 结果(running 数比较多,表示数据库压力比较大): 108 109 -------------------------------------------------------------------------------- 110 -- 判断阻塞 111 --1. 查看数据库阻塞情况: 112 ----------------------------------------Blocked Info---------------------------------- 113 -- 记录当前阻塞信息 114 select t1. resource_type as [lock type] , db_name( resource_database_id ) as [database] 115 , t1. resource_associated_entity_id as [blk object] 116 , t1. request_mode as [lock req] -- lock requested 117 , t1. request_session_id as [waiter sid] -- spid of waiter 118 , t2. wait_duration_ms as [wait time] 119 ,( select text from sys.dm_exec_requests as r with ( nolock) --- get sql for waiter 120 cross apply sys.dm_exec_sql_text (r . sql_handle) 121 where r. session_id = t1 . request_session_id) as waiter_batch 122 ,( select substring (qt . text, r .statement_start_offset / 2, 123 ( case when r. statement_end_offset = - 1 then len( convert (nvarchar ( max), qt. text)) * 2 124 else r. statement_end_offset end - r. statement_start_offset )/2 + 1) 125 from sys.dm_exec_requests as r with (nolock ) 126 cross apply sys.dm_exec_sql_text (r . sql_handle) as qt 127 where r. session_id = t1 . request_session_id) as waiter_stmt --- statement executing now 128 , t2. blocking_session_id as [blocker sid] --- spid of blocker 129 ,( select text from sys.sysprocesses as p with ( nolock) --- get sql for blocker 130 cross apply sys.dm_exec_sql_text (p . sql_handle) 131 where p. spid = t2 . blocking_session_id) as blocker_stmt , getdate() time 132 from sys.dm_tran_locks as t1 with (nolock ) , sys.dm_os_waiting_tasks as t2 with (nolock ) 133 where t1. lock_owner_address = t2 . resource_address 134 135 -------------------------------------------------------------------- 136 --2. 查看阻塞其他进程的进程(阻塞源头): 137 select t2. blocking_session_id ,COUNT ( 0) counts 138 from sys.dm_tran_locks as t1 with (nolock ) , sys.dm_os_waiting_tasks as t2 with (nolock ) 139 where t1. lock_owner_address = t2 . resource_address 140 group by blocking_session_id 141 order by 2 142 143 144 145 146 ----------------------------------------------------------------------- 147 --3. 被阻塞时间最长的进程: 148 -- 被阻塞时间最长的 session 149 select top 10 t1 .resource_type as [lock type] , db_name( resource_database_id ) as [database] 150 , t1. resource_associated_entity_id as [blk object] 151 , t1. request_mode as [lock req] -- lock requested 152 , t1. request_session_id as [waiter sid] -- spid of waiter 153 , t2. wait_duration_ms as [wait time] 154 ,( select text from sys.dm_exec_requests as r with ( nolock) --- get sql for waiter 155 cross apply sys.dm_exec_sql_text (r . sql_handle) 156 where r. session_id = t1 . request_session_id) as waiter_batch 157 ,( select substring (qt . text, r .statement_start_offset / 2, 158 ( case when r. statement_end_offset = - 1 then len( convert (nvarchar ( max), qt. text)) * 2 159 else r. statement_end_offset end - r. statement_start_offset )/2 + 1) 160 from sys.dm_exec_requests as r with (nolock ) 161 cross apply sys.dm_exec_sql_text (r . sql_handle) as qt 162 where r. session_id = t1 . request_session_id) as waiter_stmt --- statement executing now 163 , t2. blocking_session_id as [blocker sid] --- spid of blocker 164 ,( select text from sys.sysprocesses as p with ( nolock) --- get sql for blocker 165 cross apply sys.dm_exec_sql_text (p . sql_handle) 166 where p. spid = t2 . blocking_session_id) as blocker_stmt , getdate() time 167 from sys.dm_tran_locks as t1 with (nolock ) , sys.dm_os_waiting_tasks as t2 with (nolock ) 168 where t1. lock_owner_address = t2 . resource_address 169 order by t2 .wait_duration_ms desc