读懂sqliosim的测试结果

重点看红字部分

 

Target IO Duration (ms)

 

 

A threshold--SQLIOSim tries to hammer the system yet tries to stay under that threshold

 

 

If the average I/O duration exceeds the target I/O duration, the SQLIOSim utility throttles the number of outstanding I/O operations to decrease the load and to improve I/O completion time.

 

 

Pasted from <http://support.microsoft.com/kb/231619>

 

 

 

 

 

Running Average IO Duration (ms)

 

 

Low values are good

 

 

[Bob Dorr] Characterized as "Response time for the stress test"

 

 

[Bob Dorr] Target IO is 100ms; since ideal I/O is 4-8ms, anything higher is not a preferred outcome

 

 

[JWM] Though strictly speaking this isn't "disk latency" (Avg. Disk sec/[Read|Write|Transfer], acceptable  values are similar:

 

 

Data files:  0-5ms or so are excellent

 

 

Log files:  0-2ms are acceptable

 

 

Warnings for I/Os exceeding 15s (à la SP4 diags)

 

 

Algorithm is designed to mitigate big swings, esp. because of one very large I/O (similar to SQL Server Checkpoint logic)

 

 

Used to control the number of outstanding I/O requests

 

 

 

 

 

IO request blocks

 

 

High numbers are good

 

 

The more requests a system can handle, the more I/Os are issued by the tool

 

 

Characterized as "Concurrent IOs"

 

 

Capped by MaxIO; not uncommon to have 100 or more, esp. for Enterprise Edition for which read-aheads & such are attempted

 

 

 

 

 

Accumulators:  Reads, Scatter Reads, Writes, Gather Writes

 

High numbers are good

 

 

Simple Accumulators

 

 

Each time an operation is posted, the category for the specific request type is incremented

 

 

 

 

 

Total IO Time (ms)

 

 

Desirable numbers are relative

 

 

Useful for comparing drives, especially on the same system. 

 

 

If the values for the Accumulators are similar for two drives, but one drive has significantly less Total IO Time,

 

 

then the latter is more performant because it was able to service the same I/O in less time.

 

 

Characterized as "Accumulated total duration time for the I/O in ms"

 

 

The same duration is used to track the "Running Average IO Duration (ms)"

 

 

Sum of accumulators * running average = ~Running Average IO Duration

 

 

 

 

 

Number of times IO throttled

 

 

Low numbers are better

 

 

Number of times a request was removed because the duration was exceeded

 

 

 

 

 

Additional Documentation

 

 

Additional information will be based on demand & will be added to KB article & Wiki

 

 

 

 

 

Architecture

 

 

SQLIOSim is built with lightweight UMS/SQLOS-like layer to better simulate SQL Server, incl. logical CPUs

 

 

Output

 

 

Output Example #1:

 

 

Data

 

 

********** Final Summary for file S:\sqliosim.mdx **********

 

 

Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 17, Number of times IO throttled = 9006, IO request blocks = 33

 

 

Reads = 75414, Scatter Reads = 88286, Writes = 3295, Gather Writes = 54845, Total IO Time (ms) = 44936446

 

 

********** Final Summary for file T:\sqliosim.mdx **********

 

 

Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 0, Number of times IO throttled = 4135, IO request blocks = 106

 

 

Reads = 67091, Scatter Reads = 81249, Writes = 2501, Gather Writes = 53652, Total IO Time (ms) = 13033268

 

 

Interpretation

 

 

[George Reynya] The running average of 17ms for S: does not look all that bad as a target as it means that over the life of the test you maintained 17ms response time for the stress test.  Your test shows that the number of concurrent IOs has dropped to 33 which is not so good.  Especially when you compare the same basic work load against the T: drive--0ms w/ 106 I/O requests.   I would look at the S drive IO path a bit to determine the differences.

 

 

[JWM] Note:  The trial lasted 300s.  The interpretation is consistent with physical layout:  Disks were SAS DAS 73GB 15K RAID 10; S: was comprised of 6 disks, T: of 8 disks.

 

 

[JWM] Note:  George's comment regarding 17ms vs. 33ms was relative; in fact, a neither value would not be acceptable to me—I’m shooting for <5ms for my data drives.

 

 

Output Example #2:

 

 

Data

 

 

Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 121, Number of times IO throttled = 1167, IO request blocks = 13

 

 

Interpretation

 

 

[George Reynya]  Customer disk system exceeded Target IO Duration by 21ms for this file and had to throttle 1167 times in an attempt to keep the IO within the target.    With the number of IOs we do for a file this is a bit concerning.  We tried to throttle several times and they could not keep the average below the 100ms target.

 

 

Output Example #3:

 

 

Data

 

 

04/27/07 14:24:55 ********** Final Summary for file M:\sqliosim.mdx ********** 

 

 

04/27/07 14:24:55 Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 15, Number of times IO throttled = 382, IO request blocks = 108

 

 

04/27/07 14:24:55 Reads = 187508, Scatter Reads = 259268, Writes = 15286, Gather Writes = 130498, Total IO Time (ms) = 5460993

 

 

04/27/07 14:24:55 DRIVE LEVEL: Sector size = 512, Cylinders = 163177, Media type = 12, Sectors per track = 63, Tracks per Cylinders = 255

 

 

04/27/07 14:24:55 DRIVE LEVEL: Read cache enabled = Yes, Write cache enabled = No

 

 

04/27/07 14:24:55 DRIVE LEVEL: Read count = 886653, Read time = 3333820, Write count = 616951, Write time = 6606954, Idle time = 170616, Bytes read = 74679347200, Bytes written = 74403336192, Split IO Count = 19, Storage number = 4, Storage manager name = FTDISK 

 

 

04/27/07 14:24:55 ********** Final Summary for file M:\sqliosim2.mdx **********

 

 

04/27/07 14:24:55 File Attributes: Compression = No, Encryption = No, Sparse = No

 

 

04/27/07 14:24:55 Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 16, Number of times IO throttled = 762, IO request blocks = 44

 

 

04/27/07 14:24:55 Reads = 183061, Scatter Reads = 256624, Writes = 15038, Gather Writes = 132071, Total IO Time (ms) = 5890665

 

 

04/27/07 14:24:55 DRIVE LEVEL: Sector size = 512, Cylinders = 163177, Media type = 12, Sectors per track = 63, Tracks per Cylinders = 255

 

 

04/27/07 14:24:55 DRIVE LEVEL: Read cache enabled = Yes, Write cache enabled = No

 

 

04/27/07 14:24:55 DRIVE LEVEL: Read count = 886717, Read time = 3333859, Write count = 616967, Write time = 6606962, Idle time = 170630, Bytes read = 74679609344, Bytes written = 74403405824, Split IO Count = 19, Storage number = 4, Storage manager name = FTDISK 

 

 

04/27/07 14:24:55 ********** Final Summary for file M:\sqliosim.ldx **********

 

 

04/27/07 14:24:55 File Attributes: Compression = No, Encryption = No, Sparse = No

 

 

04/27/07 14:24:55 Target IO Duration (ms) = 100, Running Average IO Duration (ms) = 0, Number of times IO throttled = 0, IO request blocks = 8

 

 

04/27/07 14:24:55 Reads = 0, Scatter Reads = 0, Writes = 316141, Gather Writes = 0, Total IO Time (ms) = 1346919

 

 

04/27/07 14:24:55 DRIVE LEVEL: Read cache enabled = Yes, Write cache enabled = No

 

 

04/27/07 14:24:55 DRIVE LEVEL: Read count = 886653, Read time = 3333820, Write count = 616962, Write time = 6606959, Idle time = 170623, Bytes read = 74679347200, Bytes written = 74403381248, Split IO Count = 19, Storage number = 4, Storage manager name = FTDISK 

 

 

Stress Test Attempt Complete

 

 

Interpretation

 

 

[JWM]  This test was done against what was thought by the customer to be a well-architected disk I/O subsystem (EMC Clariion CX700, lots of 146GB 15K disks, RAID 5 (RAID5--when will  they learn?)).  The system is intended to support a system from which very high performance is required.

 

 

The results of this test belie this assumption.  I would want Average IO Duration to be less than 5ms, no more than 10ms; however both data files have values exceeding this:  15ms & 16ms, respectively.

 

 

The IO Request Blocks aren't nearly as high as I'd expect, especially compared to the 8 disk RAID 10 drive T: in Output Example #1.

 

 

The far less respectable values for M:\sqliosim2.mdx compared to the first file will prompt further investigation.

 

 

Note also that I/O was throttled 382 & 762 times, respectively.

 

 

The Average IO Duration for the log file is excellent.  I don't have enough experience interpreting output for log files, so I don't know whether to be concerned about the log number of IO Request Blocks.

  

 

posted @ 2011-12-04 20:42  鸽子飞扬  阅读(1707)  评论(0编辑  收藏  举报