人人都是 DBA(XI)I/O 信息收集脚本汇编
什么?有个 SQL 执行了 8 秒!
哪里出了问题?臣妾不知道啊,得找 DBA 啊。
DBA 人呢?离职了!!擦!!!
程序员在无处寻求帮助时,就得想办法自救,努力让自己变成 "伪 DBA"。
索引
- 数据文件和日志文件位置和大小
- 查看指定数据库文件的大小和可用空间
- 服务器 Disk 容量和挂载信息
- 查看 Disk 剩余空间
- 查询数据库设置的 Recovery Model
- 查看最近的 Full Backup 信息
- 获取所有数据库的 VLF 数量
- SQL Server 的错误日志位置
- 查询近期的 Error Log 信息
- 在错误日志中查询 I/O 超过 15s 的请求
- 查询 Disk 的性能指标
- 查看哪个数据库文件 I/O 瓶颈最严重
- 按照 Write I/O 进行排名
- 获取数据库的 I/O 使用率
- 查看指定数据库文件的 I/O 状况
- 找出 I/O 平均使用最多的语句
- 查询正在等待 I/O 的请求等待时间
数据文件和日志文件位置和大小
SELECT DB_NAME([database_id]) AS [Database Name] ,[file_id] ,[name] ,physical_name ,type_desc ,state_desc ,is_percent_growth ,growth ,CONVERT(BIGINT, growth / 128.0) AS [Growth in MB] ,CONVERT(BIGINT, size / 128.0) AS [Total Size in MB] FROM sys.master_files WITH (NOLOCK) WHERE [database_id] > 4 AND [database_id] <> 32767 OR [database_id] = 2 ORDER BY DB_NAME([database_id]) OPTION (RECOMPILE);
通常会查看:
- 数据文件是否与日志文件放到了不同的磁盘上?
- 难道都装到了 C 盘上?
- tempdb 是否指定了独立的磁盘?
- 有几个 tempdb 文件?
- 这些 tempdb 都多大了?
- 数据库是否包含多个文件?
- 数据文件的增长步长合适吗?
查看指定数据库文件的大小和可用空间
需要指定数据库,或使用 use 指定,例如 use TEST。
SELECT f.[name] AS [File Name] ,f.physical_name AS [Physical Name] ,CAST((f.size / 128.0) AS DECIMAL(15, 2)) AS [Total Size in MB] ,CAST(f.size / 128.0 - CAST(FILEPROPERTY(f.[name], 'SpaceUsed') AS INT) / 128.0 AS DECIMAL(15, 2)) AS [Available Space In MB] ,[file_id] ,fg.[name] AS [Filegroup Name] FROM sys.database_files AS f WITH (NOLOCK) LEFT OUTER JOIN sys.data_spaces AS fg WITH (NOLOCK) ON f.data_space_id = fg.data_space_id OPTION (RECOMPILE);
服务器 Disk 容量和挂载信息
SELECT DISTINCT vs.volume_mount_point ,vs.file_system_type ,vs.logical_volume_name ,CONVERT(DECIMAL(18, 2), vs.total_bytes / 1073741824.0) AS [Total Size (GB)] ,CONVERT(DECIMAL(18, 2), vs.available_bytes / 1073741824.0) AS [Available Size (GB)] ,CAST(CAST(vs.available_bytes AS FLOAT) / CAST(vs.total_bytes AS FLOAT) AS DECIMAL(18, 2)) * 100 AS [Space Free %] FROM sys.master_files AS f WITH (NOLOCK) CROSS APPLY sys.dm_os_volume_stats(f.database_id, f.[file_id]) AS vs OPTION (RECOMPILE);
SELECT db_name(vs.database_id) AS DatabaseName ,vs.file_id ,vs.volume_mount_point ,vs.volume_id ,vs.logical_volume_name ,vs.file_system_type ,(vs.total_bytes / 1024 / 1024 / 1024) AS [TotalSize(GB)] ,(vs.available_bytes / 1024 / 1024 / 1024) AS [AvailableSize(GB)] ,vs.supports_compression ,vs.supports_alternate_streams ,vs.supports_sparse_files ,vs.is_read_only ,vs.is_compressed FROM sys.master_files mf CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.file_id) vs;
查看 Disk 剩余空间
EXEC master.dbo.xp_fixeddrives
SELECT DISTINCT SUBSTRING(volume_mount_point, 1, 1) AS Volume_mount_point ,total_bytes / 1024 / 1024 AS Total_MB ,available_bytes / 1024 / 1024 AS Available_MB FROM sys.master_files AS f CROSS APPLY sys.dm_os_volume_stats(f.database_id, f.file_id);
查询数据库设置的 Recovery Model
SELECT db.[name] AS [Database Name] ,db.recovery_model_desc AS [Recovery Model] ,db.state_desc ,db.log_reuse_wait_desc AS [Log Reuse Wait Description] ,CONVERT(DECIMAL(18, 2), ls.cntr_value / 1024.0) AS [Log Size (MB)] ,CONVERT(DECIMAL(18, 2), lu.cntr_value / 1024.0) AS [Log Used (MB)] ,CAST(CAST(lu.cntr_value AS FLOAT) / CAST(ls.cntr_value AS FLOAT) AS DECIMAL(18, 2)) * 100 AS [Log Used %] ,db.[compatibility_level] AS [DB Compatibility Level] ,db.page_verify_option_desc AS [Page Verify Option] ,db.is_auto_create_stats_on ,db.is_auto_update_stats_on ,db.is_auto_update_stats_async_on ,db.is_parameterization_forced ,db.snapshot_isolation_state_desc ,db.is_read_committed_snapshot_on ,db.is_auto_close_on ,db.is_auto_shrink_on ,db.target_recovery_time_in_seconds ,db.is_cdc_enabled FROM sys.databases AS db WITH (NOLOCK) INNER JOIN sys.dm_os_performance_counters AS lu WITH (NOLOCK) ON db.NAME = lu.instance_name INNER JOIN sys.dm_os_performance_counters AS ls WITH (NOLOCK) ON db.NAME = ls.instance_name WHERE lu.counter_name LIKE N'Log File(s) Used Size (KB)%' AND ls.counter_name LIKE N'Log File(s) Size (KB)%' AND ls.cntr_value > 0 OPTION (RECOMPILE);
通常会关注:
- 数据库实例(Instance)上建立了多少个数据库?
- 它们都分别使用了什么恢复模型(Recovery Model)?
- Log 重用是如何设置的?
- 事务日志(Transaction Log)现在多大了?
- 兼容等级(Compatibility Level)是怎么配置的?
- 页验证选项(Page Verify Option)的设置是什么?通常为 CHECKSUM。
- 是否设置了 Auto Update Statistics Asynchronously 选项?
- 确保未开启 auto_shrink 和 auto_close 选项。
查看最近的 Full Backup 信息
SELECT TOP (30) bs.machine_name ,bs.server_name ,bs.database_name AS [Database Name] ,bs.recovery_model ,CONVERT(BIGINT, bs.backup_size / 1048576) AS [Uncompressed Backup Size (MB)] ,CONVERT(BIGINT, bs.compressed_backup_size / 1048576) AS [Compressed Backup Size (MB)] ,CONVERT(NUMERIC(20, 2), (CONVERT(FLOAT, bs.backup_size) / CONVERT(FLOAT, bs.compressed_backup_size))) AS [Compression Ratio] ,DATEDIFF(SECOND, bs.backup_start_date, bs.backup_finish_date) AS [Backup Elapsed Time (sec)] ,bs.backup_finish_date AS [Backup Finish Date] FROM msdb.dbo.backupset AS bs WITH (NOLOCK) WHERE DATEDIFF(SECOND, bs.backup_start_date, bs.backup_finish_date) > 0 AND bs.backup_size > 0 AND bs.type = 'D' -- Change to L if you want Log backups AND database_name = DB_NAME(DB_ID()) ORDER BY bs.backup_finish_date DESC OPTION (RECOMPILE);
获取所有数据库的 VLF 数量
VLF :Virtual Log File
SQL Server 将日志文件 LDF 划分为多个 VLF 以提高日志处理效率。VLF 的数量和大小不能通过配置指定,SQL Server 会按情况自行判断处理,而如果生成了过多的 VLF 则会产生性能问题。通过指定合理的日志文件初始大小和增长步长,可以有效的防止过多 VLF 的产生。
- 1M-64M 4
- 64M-1GB 8
- >1GB 16
CREATE TABLE #VLFInfo ( RecoveryUnitID INT ,FileID INT ,FileSize BIGINT ,StartOffset BIGINT ,FSeqNo BIGINT ,[Status] BIGINT ,Parity BIGINT ,CreateLSN NUMERIC(38) ); CREATE TABLE #VLFCountResults ( DatabaseName SYSNAME ,VLFCount INT ); EXEC sp_MSforeachdb N'Use [?]; INSERT INTO #VLFInfo EXEC sp_executesql N''DBCC LOGINFO([?])''; INSERT INTO #VLFCountResults SELECT DB_NAME(), COUNT(*) FROM #VLFInfo; TRUNCATE TABLE #VLFInfo;' SELECT DatabaseName ,VLFCount FROM #VLFCountResults ORDER BY VLFCount DESC; DROP TABLE #VLFInfo; DROP TABLE #VLFCountResults;
较高的 VLF 数量会影响写入性能,并且会使数据库的恢复过程变慢,通常需要保持 VLF Counts 在 200 以下。
SQL Server 的错误日志位置
SELECT is_enabled ,[path] ,max_size ,max_files FROM sys.dm_os_server_diagnostics_log_configurations WITH (NOLOCK) OPTION (RECOMPILE);
查询近期的 Error Log 信息
DECLARE @Time_Start DATETIME; DECLARE @Time_End DATETIME; SET @Time_Start = getdate() - 2; SET @Time_End = getdate(); -- Create the temporary table CREATE TABLE #ErrorLog ( logdate DATETIME ,processinfo VARCHAR(255) ,Message VARCHAR(500) ) -- Populate the temporary table INSERT #ErrorLog ( logdate ,processinfo ,Message ) EXEC master.dbo.xp_readerrorlog 0 ,1 ,NULL ,NULL ,@Time_Start ,@Time_End ,N'desc'; -- Filter the temporary table SELECT LogDate ,Message FROM #ErrorLog WHERE ( Message LIKE '%error%' OR Message LIKE '%failed%' ) AND processinfo NOT LIKE 'logon' ORDER BY logdate DESC -- Drop the temporary table DROP TABLE #ErrorLog
在错误日志中查询 I/O 超过 15s 的请求
CREATE TABLE #IOWarningResults ( LogDate DATETIME ,ProcessInfo SYSNAME ,LogText NVARCHAR(1000) ); INSERT INTO #IOWarningResults EXEC xp_readerrorlog 0 ,1 ,N'taking longer than 15 seconds'; INSERT INTO #IOWarningResults EXEC xp_readerrorlog 1 ,1 ,N'taking longer than 15 seconds'; INSERT INTO #IOWarningResults EXEC xp_readerrorlog 2 ,1 ,N'taking longer than 15 seconds'; INSERT INTO #IOWarningResults EXEC xp_readerrorlog 3 ,1 ,N'taking longer than 15 seconds'; INSERT INTO #IOWarningResults EXEC xp_readerrorlog 4 ,1 ,N'taking longer than 15 seconds'; SELECT LogDate ,ProcessInfo ,LogText FROM #IOWarningResults ORDER BY LogDate DESC; DROP TABLE #IOWarningResults;
如果能够查询出结果,可以说明 I/O 性能存在问题,但是哪里引起的还需进一步探索。
查询 Disk 的性能指标
SELECT [Drive] ,CASE WHEN num_of_reads = 0 THEN 0 ELSE (io_stall_read_ms / num_of_reads) END AS [Read Latency (ms)] ,CASE WHEN io_stall_write_ms = 0 THEN 0 ELSE (io_stall_write_ms / num_of_writes) END AS [Write Latency (ms)] ,CASE WHEN ( num_of_reads = 0 AND num_of_writes = 0 ) THEN 0 ELSE (io_stall / (num_of_reads + num_of_writes)) END AS [Overall Latency (ms)] ,CASE WHEN num_of_reads = 0 THEN 0 ELSE (num_of_bytes_read / num_of_reads) END AS [Avg Bytes/Read] ,CASE WHEN io_stall_write_ms = 0 THEN 0 ELSE (num_of_bytes_written / num_of_writes) END AS [Avg Bytes/Write] ,CASE WHEN ( num_of_reads = 0 AND num_of_writes = 0 ) THEN 0 ELSE ((num_of_bytes_read + num_of_bytes_written) / (num_of_reads + num_of_writes)) END AS [Avg Bytes/Transfer] FROM ( SELECT LEFT(UPPER(mf.physical_name), 2) AS Drive ,SUM(num_of_reads) AS num_of_reads ,SUM(io_stall_read_ms) AS io_stall_read_ms ,SUM(num_of_writes) AS num_of_writes ,SUM(io_stall_write_ms) AS io_stall_write_ms ,SUM(num_of_bytes_read) AS num_of_bytes_read ,SUM(num_of_bytes_written) AS num_of_bytes_written ,SUM(io_stall) AS io_stall FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs INNER JOIN sys.master_files AS mf WITH (NOLOCK) ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id GROUP BY LEFT(UPPER(mf.physical_name), 2) ) AS tab ORDER BY [Overall Latency (ms)] OPTION (RECOMPILE);
通常 Latency 的值大于 20-25 ms 时可考虑有性能问题。
查看哪个数据库文件 I/O 瓶颈最严重
SELECT DB_NAME(fs.database_id) AS [Database Name] ,CAST(fs.io_stall_read_ms / (1.0 + fs.num_of_reads) AS NUMERIC(10, 1)) AS [avg_read_stall_ms] ,CAST(fs.io_stall_write_ms / (1.0 + fs.num_of_writes) AS NUMERIC(10, 1)) AS [avg_write_stall_ms] ,CAST((fs.io_stall_read_ms + fs.io_stall_write_ms) / (1.0 + fs.num_of_reads + fs.num_of_writes) AS NUMERIC(10, 1)) AS [avg_io_stall_ms] ,CONVERT(DECIMAL(18, 2), mf.size / 128.0) AS [File Size (MB)] ,mf.physical_name ,mf.type_desc ,fs.io_stall_read_ms ,fs.num_of_reads ,fs.io_stall_write_ms ,fs.num_of_writes ,fs.io_stall_read_ms + fs.io_stall_write_ms AS [io_stalls] ,fs.num_of_reads + fs.num_of_writes AS [total_io] FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS fs INNER JOIN sys.master_files AS mf WITH (NOLOCK) ON fs.database_id = mf.database_id AND fs.[file_id] = mf.[file_id] ORDER BY avg_io_stall_ms DESC OPTION (RECOMPILE);
考虑将数据库文件移动到不同的磁盘上,或更快的磁盘阵列上以改进性能。
按照 Write I/O 进行排名
SELECT [ReadLatency] = CASE WHEN [num_of_reads] = 0 THEN 0 ELSE ([io_stall_read_ms] / [num_of_reads]) END ,[WriteLatency] = CASE WHEN [num_of_writes] = 0 THEN 0 ELSE ([io_stall_write_ms] / [num_of_writes]) END ,[Latency] = CASE WHEN ( [num_of_reads] = 0 AND [num_of_writes] = 0 ) THEN 0 ELSE ([io_stall] / ([num_of_reads] + [num_of_writes])) END ,[AvgBytesPerRead] = CASE WHEN [num_of_reads] = 0 THEN 0 ELSE ([num_of_bytes_read] / [num_of_reads]) END ,[AvgBytesPerWrite] = CASE WHEN [num_of_writes] = 0 THEN 0 ELSE ([num_of_bytes_written] / [num_of_writes]) END ,[AvgBytesPerTransfer] = CASE WHEN ( [num_of_reads] = 0 AND [num_of_writes] = 0 ) THEN 0 ELSE (([num_of_bytes_read] + [num_of_bytes_written]) / ([num_of_reads] + [num_of_writes])) END ,LEFT([mf].[physical_name], 2) AS [Drive] ,DB_NAME([vfs].[database_id]) AS [DB] ,[mf].[physical_name] ,[mf].file_id FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS [vfs] JOIN sys.master_files AS [mf] ON [vfs].[database_id] = [mf].[database_id] AND [vfs].[file_id] = [mf].[file_id] ORDER BY [WriteLatency] DESC;
获取数据库的 I/O 使用率
WITH Aggregate_IO_Statistics AS ( SELECT DB_NAME(database_id) AS [Database Name] ,CAST(SUM(num_of_bytes_read + num_of_bytes_written) / 1048576 AS DECIMAL(12, 2)) AS io_in_mb FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS [DM_IO_STATS] GROUP BY database_id ) SELECT ROW_NUMBER() OVER ( ORDER BY io_in_mb DESC ) AS [I/O Rank] ,[Database Name] ,io_in_mb AS [Total I/O (MB)] ,CAST(io_in_mb / SUM(io_in_mb) OVER () * 100.0 AS DECIMAL(5, 2)) AS [I/O Percent] FROM Aggregate_IO_Statistics ORDER BY [I/O Rank] OPTION (RECOMPILE);
查看指定数据库文件的 I/O 状况
需要指定数据库,或使用 use 指定,例如 use TEST。
SELECT DB_NAME(DB_ID()) AS [Database Name] ,df.[name] AS [Logical Name] ,vfs.[file_id] ,df.physical_name AS [Physical Name] ,vfs.num_of_reads ,vfs.num_of_writes ,vfs.io_stall_read_ms ,vfs.io_stall_write_ms ,CAST(100. * vfs.io_stall_read_ms / (vfs.io_stall_read_ms + vfs.io_stall_write_ms) AS DECIMAL(10, 1)) AS [IO Stall Reads Pct] ,CAST(100. * vfs.io_stall_write_ms / (vfs.io_stall_write_ms + vfs.io_stall_read_ms) AS DECIMAL(10, 1)) AS [IO Stall Writes Pct] ,(vfs.num_of_reads + vfs.num_of_writes) AS [Writes + Reads] ,CAST(vfs.num_of_bytes_read / 1048576.0 AS DECIMAL(10, 2)) AS [MB Read] ,CAST(vfs.num_of_bytes_written / 1048576.0 AS DECIMAL(10, 2)) AS [MB Written] ,CAST(100. * vfs.num_of_reads / (vfs.num_of_reads + vfs.num_of_writes) AS DECIMAL(10, 1)) AS [# Reads Pct] ,CAST(100. * vfs.num_of_writes / (vfs.num_of_reads + vfs.num_of_writes) AS DECIMAL(10, 1)) AS [# Write Pct] ,CAST(100. * vfs.num_of_bytes_read / (vfs.num_of_bytes_read + vfs.num_of_bytes_written) AS DECIMAL(10, 1)) AS [Read Bytes Pct] ,CAST(100. * vfs.num_of_bytes_written / (vfs.num_of_bytes_read + vfs.num_of_bytes_written) AS DECIMAL(10, 1)) AS [Written Bytes Pct] FROM sys.dm_io_virtual_file_stats(DB_ID(), NULL) AS vfs INNER JOIN sys.database_files AS df WITH (NOLOCK) ON vfs.[file_id] = df.[file_id] OPTION (RECOMPILE);
协助从 I/O 角度来观察数据库文件所承载的压力。
找出 I/O 平均使用最多的语句
SELECT TOP (50) OBJECT_NAME(qt.objectid, dbid) AS [SP Name] ,(qs.total_logical_reads + qs.total_logical_writes) / qs.execution_count AS [Avg IO] ,qs.execution_count AS [Execution Count] ,SUBSTRING(qt.[text], qs.statement_start_offset / 2 + 1, ( CASE WHEN qs.statement_end_offset = - 1 THEN LEN(CONVERT(NVARCHAR(max), qt.[text])) * 2 ELSE qs.statement_end_offset END - qs.statement_start_offset ) / 2) AS [Query Text] FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK) CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt WHERE qt.[dbid] = DB_ID() ORDER BY [Avg IO] DESC OPTION (RECOMPILE);
参考资料:
查询正在等待 I/O 的请求等待时间
SELECT DB_NAME(database_id) AS [DBNAME] ,file_id ,io_stall ,io_pending_ms_ticks ,scheduler_address FROM sys.dm_io_virtual_file_stats(NULL, NULL) iovfs ,sys.dm_io_pending_io_requests AS iopior WHERE iovfs.file_handle = iopior.io_handle
《人人都是 DBA》系列文章索引:
本系列文章《人人都是 DBA》由 Dennis Gao 发表自博客园,未经作者本人同意禁止任何形式的转载,任何自动或人为的爬虫转载行为均为耍流氓。