Libata Error Message 解析
Libata error messages
Contents[hide] |
Overview
All libata error messages produced by the kernel use a standard format:
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) ata3.00: status: { DRDY }
Prefix
The prefix
ata3.00:
decodes as
ata | prefix, indicating this is a libata port or device message |
3 | port number, counting from one (1) |
00 | device number, usually zero unless Port Multiplier or PATA master/slave is involved |
Exception line
The exception line gives an overview of the EH (Error Handler) state.
exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Emask | Error classification bitmask (AC_ERR_xxx in source code) |
SAct | SATA SActive register |
SErr | SATA SError register |
action | ATA_EH_xxx actions, like revalidate, softreset, hardreset (see include/linux/libata.h) |
frozen | if present, indicates the port was frozen for EH |
t<number> | number of retries |
Input taskfile
The "cmd" line gives the ATA command (taskfile) sent to the device:
cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
This lists ATA registers in the following order:
ea | Command (FLUSH CACHE EXT EAh, Non-Data) |
/ | (separator) |
00 | Feature |
00 | NSect |
00 | LBA L |
00 | LBA M |
00 | LBA H |
/ | (separator) |
00 | HOB Feature |
00 | HOB NSect |
00 | HOB LBA L |
00 | HOB LBA M |
00 | HOB LBA H |
/ | (separator) |
a0 | Device/Head |
tag | NCQ tag |
0 | NCQ tag number, or listed as zero if NCQ is not active/applicable. |
Output taskfile, error summary
The next line contains a current dump of the ATA device's registers, along with an error summary:
res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
In order:
40 | Status |
/ | (separator) |
00 | Error |
00 | NSect |
01 | LBA L |
4f | LBA M |
c2 | LBA H |
/ | (separator) |
00 | HOB Error |
00 | HOB NSect |
00 | HOB LBA L |
00 | HOB LBA M |
00 | HOB LBA H |
/ | (separator) |
00 | Device/Head |
Emask | ATA command's internal error mask (AC_ERR_xxx in source code) |
0x4 | An English summary of the error, such as
See below for a full list. |
Error classes
These are the possible values for the internal error mask in each error message mentioned above.
AC_ERR_XXX, ATA Completion Errors were defined in include/linux/libata.h.
0x20 | host bus error | Host<->chip bus error (i.e. PCI, if on PCI bus) |
0x10 | ATA bus error | chip<->device bus error |
0x4 | timeout | Controller failed to respond to an active ATA command. This could be any number of causes. Most often this is due to an unrelated interrupt subsystem bug (try booting with 'pci=nomsi' or 'acpi=off' or 'noapic'), which failed to deliver an interrupt when we were expecting one from the hardware. |
0x2 | HSM violation | Hardware failed to respond in an expected manner. "HSM" stands for Host State Machine, a software-based finite state machine required by ATA that expects certain hardware behaviors, based on the current ATA command and other hardware-state programming details. |
0x40 | internal error | Hardware flagged an impossible condition, most likely due to software misprogramming. |
0x8 | media error | Software detected a media error |
0x80 | invalid argument | Software marked ATA command as invalid, for some reason |
0x1 | device error | Hardware indicates an error with last command. This error is delivered directly from the ATA device. If you see a lot of these, that is often an indication of a hardware problem. |
0x100 | unknown error | Uncategorized error (should never happen) |
ATA status expansion
The final line
status: { DRDY }
expands the ATA status register returned in the output taskfile into its component bits:
Busy | Device busy (all other bits invalid) |
DRDY | Device ready. Normally 1, when all is OK. |
DRQ | Data ready to be sent/received via PIO |
DF | Device fault |
ERR | Error (see Error register for more info) |
ATA error expansion
If any bits in the Error register are set, the Error register contents will be expanded into its component bits, for example:
error: { ICRC ABRT }
ICRC | Interface CRC error during Ultra DMA transfer - often either a bad cable or power problem, though possibly an incorrect Ultra DMA mode setting by the driver |
UNC | Uncorrectable error - often due to bad sectors on the disk |
IDNF | Requested address was not found |
ABRT | Command aborted - either command not supported, unable to complete, or interface CRC (with ICRC) |
SATA SError expansion
If any bits in the SATA SError register are set, the SError register contents will be expanded into its component bits, for example:
SError: { PHYRdyChg CommWake }
These bits are set by the SATA host interface in response to error conditions on the SATA link. Unless a drive hotplug or unplug operation occurred, it is generally not normal to see any of these bits set. If they are, it usually points strongly toward a hardware problem (often a bad SATA cable or a bad or inadequate power supply).
RecovData | Data integrity error occurred, but the interface recovered |
RecovComm | Communications between device and host temporarily lost, but regained |
UnrecovData | Data integrity error occurred, interface did not recover |
Persist | Persistent communication or data integrity error |
Proto | SATA protocol violation detected |
HostInt | Host bus adapter internal error |
PHYRdyChg | PhyRdy signal changed state |
PHYInt | PHY internal error |
CommWake | COMWAKE detected by PHY (PHY woken up) |
10B8B | 10b to 8b decoding error occurred |
Dispar | Incorrect disparity detected |
BadCRC | Link layer CRC error occurred |
Handshk | R_ERR handshake response received in response to frame transmission |
LinkSeq | Link state machine error occurred |
TrStaTrns | Transport layer state transition error occurred |
UnrecFIS | Unrecognized FIS (frame information structure) received |
DevExch | Device presence has changed |