linux scsi相关的一些学习笔记
最近看scsi相关处理的一些备忘,比较零碎,仅作参考。
先从最显而易见的打印入手:
[0:0:0:0] disk ATA INTEL SSDSC2BX20 0150 - [0:0:1:0] disk ATA INTEL SSDSC2BX20 0150 - [0:1:0:0] disk LSI Logical Volume 3000 /dev/sda [5:0:0:0] enclosu AIC 12G 4U60: Hub 0c29 - [5:0:1:0] disk SEAGATE ST4000NM0025 N003 /dev/sdb [5:0:2:0] disk SEAGATE ST4000NM0025 N004 /dev/sdc [5:0:3:0] disk SEAGATE ST4000NM0025 N003 /dev/sdd [5:0:4:0] disk SEAGATE ST4000NM0025 N003 /dev/sde [5:0:5:0] disk SEAGATE ST4000NM0025 N003 /dev/sdf [5:0:6:0] disk SEAGATE ST4000NM0025 N003 /dev/sdg [5:0:7:0] disk SEAGATE ST4000NM0025 N004 /dev/sdh [5:0:8:0] disk SEAGATE ST4000NM0025 N003 /dev/sdi [5:0:9:0] disk SEAGATE ST4000NM0025 N004 /dev/sdj [5:0:10:0] disk SEAGATE ST4000NM0025 N003 /dev/sdk [5:0:11:0] disk SEAGATE ST4000NM0025 N004 /dev/sdl [5:0:12:0] disk SEAGATE ST4000NM0025 N004 /dev/sdm [5:0:13:0] disk SEAGATE ST4000NM0025 N004 /dev/sdn [5:0:14:0] disk SEAGATE ST4000NM0025 N004 /dev/sdo [5:0:15:0] disk SEAGATE ST4000NM0025 N003 /dev/sdp [5:0:16:0] disk SEAGATE ST4000NM0025 N003 /dev/sdq [5:0:17:0] disk SEAGATE ST4000NM0025 N003 /dev/sdr [5:0:18:0] disk SEAGATE ST4000NM0025 N004 /dev/sds [5:0:19:0] disk SEAGATE ST4000NM0025 N003 /dev/sdt [5:0:20:0] disk SEAGATE ST4000NM0025 N003 /dev/sdu [5:0:21:0] enclosu AIC 12G 4U60: Edge-C 0c2a - [5:0:22:0] disk SEAGATE ST4000NM0025 N003 /dev/sdv [5:0:23:0] disk SEAGATE ST4000NM0025 N003 /dev/sdw [5:0:24:0] disk SEAGATE ST4000NM0025 N004 /dev/sdx [5:0:25:0] disk SEAGATE ST4000NM0025 N003 /dev/sdy [5:0:26:0] disk SEAGATE ST4000NM0025 N003 /dev/sdz [5:0:27:0] disk SEAGATE ST4000NM0025 N003 /dev/sdaa [5:0:28:0] disk SEAGATE ST4000NM0025 N003 /dev/sdab [5:0:29:0] disk SEAGATE ST4000NM0025 N004 /dev/sdac [5:0:30:0] disk SEAGATE ST4000NM0025 N004 /dev/sdad [5:0:31:0] disk SEAGATE ST4000NM0025 N003 /dev/sdae [5:0:32:0] disk SEAGATE ST4000NM0025 N004 /dev/sdaf [5:0:33:0] disk SEAGATE ST4000NM0025 N003 /dev/sdag [5:0:34:0] disk SEAGATE ST4000NM0025 N003 /dev/sdah [5:0:35:0] disk SEAGATE ST4000NM0025 N003 /dev/sdai [5:0:36:0] disk SEAGATE ST4000NM0025 N003 /dev/sdaj [5:0:37:0] disk SEAGATE ST4000NM0025 N004 /dev/sdak [5:0:38:0] disk SEAGATE ST4000NM0025 N003 /dev/sdal [5:0:39:0] disk SEAGATE ST4000NM0025 N003 /dev/sdam [5:0:40:0] disk SEAGATE ST4000NM0025 N003 /dev/sdan [5:0:41:0] disk SEAGATE ST4000NM0025 N004 /dev/sdao [5:0:42:0] enclosu AIC 12G 4U60: Edge-R 0c2a - [5:0:43:0] disk SEAGATE ST4000NM0025 N003 /dev/sdap [5:0:44:0] disk SEAGATE ST4000NM0025 N003 /dev/sdaq [5:0:45:0] disk SEAGATE ST4000NM0025 N003 /dev/sdar [5:0:46:0] disk SEAGATE ST4000NM0025 N003 /dev/sdas [5:0:47:0] disk SEAGATE ST4000NM0025 N003 /dev/sdat [5:0:48:0] disk SEAGATE ST4000NM0025 N004 /dev/sdau [5:0:49:0] disk SEAGATE ST4000NM0025 N004 /dev/sdav [5:0:50:0] disk SEAGATE ST4000NM0025 N004 /dev/sdaw [5:0:51:0] disk SEAGATE ST4000NM0025 N003 /dev/sdax [5:0:52:0] disk SEAGATE ST4000NM0025 N004 /dev/sday [5:0:53:0] disk SEAGATE ST4000NM0025 N003 /dev/sdaz [5:0:54:0] disk SEAGATE ST4000NM0025 N003 /dev/sdba [5:0:55:0] disk SEAGATE ST4000NM0025 N003 /dev/sdbb [5:0:56:0] disk SEAGATE ST4000NM0025 N003 /dev/sdbc [5:0:57:0] disk SEAGATE ST4000NM0025 N003 /dev/sdbd [5:0:58:0] disk SEAGATE ST4000NM0025 N003 /dev/sdbe [5:0:59:0] disk SEAGATE ST4000NM0025 N003 /dev/sdbf [5:0:60:0] disk SEAGATE ST4000NM0025 N003 /dev/sdbg [5:0:61:0] disk SEAGATE ST4000NM0025 N003 /dev/sdbh [5:0:62:0] disk SEAGATE ST4000NM0025 N003 /dev/sdbi [5:0:63:0] enclosu AIC 12G 4U60: Edge-L 0c2a - [6:0:0:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbj [6:0:1:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbk [6:0:2:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbl [6:0:3:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbm [6:0:4:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbn [6:0:5:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbo [6:0:6:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbp [6:0:7:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbq [7:0:0:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbr [7:0:1:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbs [7:0:2:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbt [7:0:3:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbu [7:0:4:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbv [7:0:5:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbw [7:0:6:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdbx [7:0:7:0] disk HGST SDLL1DLR960GCAA1 W150 /dev/sdby
前面第一列数字是什么?各个数字之间的关系是什么?内核中对scsi层的抽象是怎么做的?scsi命令的抽象是什么?
scsi命令下发后遇到错误怎么办,返回超时怎么办?正常返回的流程是什么样的?下面就带着这些疑问来看代码。
前面第一列数字是什么?
lsscsi显示的第一列是scsi设备在内核中展示的各级编号,根据编号可以唯一确定一个设备,
如果使用 cat /proc/scsi/scsi 来查看会显得好理解一些:
cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 01 Id: 00 Lun: 00 Vendor: LSI Model: Logical Volume Rev: 3000 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: ATA Model: INTEL SSDSC2BX20 Rev: 0150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi0 Channel: 00 Id: 01 Lun: 00 Vendor: ATA Model: INTEL SSDSC2BX20 Rev: 0150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 00 Lun: 00 Vendor: AIC 12G Model: 4U60: Hub Rev: 0c29 Type: Enclosure ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 01 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 02 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 03 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 04 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 05 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 06 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 07 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 08 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 09 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 10 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 11 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 12 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 13 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 14 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 15 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 16 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 17 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 18 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 19 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 20 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 21 Lun: 00 Vendor: AIC 12G Model: 4U60: Edge-C Rev: 0c2a Type: Enclosure ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 22 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 23 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 24 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 25 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 26 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 27 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 28 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 29 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 30 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 31 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 32 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 33 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 34 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 35 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 36 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 37 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 38 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 39 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 40 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 41 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 42 Lun: 00 Vendor: AIC 12G Model: 4U60: Edge-R Rev: 0c2a Type: Enclosure ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 43 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 44 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 45 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 46 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 47 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 48 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 49 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 50 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 51 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 52 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N004 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 53 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 54 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 55 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 56 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 57 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 58 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 59 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 60 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 61 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 62 Lun: 00 Vendor: SEAGATE Model: ST4000NM0025 Rev: N003 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi5 Channel: 00 Id: 63 Lun: 00 Vendor: AIC 12G Model: 4U60: Edge-L Rev: 0c2a Type: Enclosure ANSI SCSI revision: 05 Host: scsi6 Channel: 00 Id: 00 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi6 Channel: 00 Id: 01 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi6 Channel: 00 Id: 02 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi6 Channel: 00 Id: 03 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi6 Channel: 00 Id: 04 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi6 Channel: 00 Id: 05 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi6 Channel: 00 Id: 06 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi6 Channel: 00 Id: 07 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi7 Channel: 00 Id: 00 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi7 Channel: 00 Id: 01 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi7 Channel: 00 Id: 02 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi7 Channel: 00 Id: 03 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi7 Channel: 00 Id: 04 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi7 Channel: 00 Id: 05 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi7 Channel: 00 Id: 06 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06 Host: scsi7 Channel: 00 Id: 07 Lun: 00 Vendor: HGST Model: SDLL1DLR960GCAA1 Rev: W150 Type: Direct-Access ANSI SCSI revision: 06
从编号可以看出,第一级是host,第二级是channel,第三级是target编号,第四级是LUN号
h == hostadapter id (first one being 0) c == SCSI channel on hostadapter (first one being 0) t == ID l == LUN (first one being 0)
各个数字之间的关系是什么?
一个主板可能接多个host,比如上面的服务器,在有多个sas芯片的情况下,肯定就有多个host。一个sas芯片又可以分割为多个通道,也就是channel,也叫bus。一个通道下多个target,一个target下多个lun。
如果一个硬盘支持双通道,那么在scsi层,就是展示为两个scsi标号。
内核中对scsi层的抽象是怎么做的?
对于device,有个scsi_device的抽象,host成员指向它归属的scsi_host,siblings成员嵌入到host的__device成员中。同时,它的 sdev_gendev 成员的parent指向 对应的scsi_target的dev地址,
这个只要熟悉linux的驱动模型就能理解了。
下面看一下scsi_device的实际例子:
crash> scsi_device ffff881fcee44800 struct scsi_device { host = 0xffff883fd0e38000,-----------------指向scsi_host,这个会在后面描述 request_queue = 0xffff883fc1e28828,--------这个大家应该清楚,就是之前申请存放下发io的request_queue,要注意区分单队列和多队列 siblings = {-------------------------------当前host下的所有scsi_device通过这个串起来,他们是兄弟关系,所以成员名就叫siblings next = 0xffff881fcece9810, prev = 0xffff881fcee44010 }, same_target_siblings = {------------------这个是同一个target下的scsi_device的串接,这里有个问题是,串接这个也需要获取host的锁,其实可以优化。 next = 0xffff883fc1e21c18, prev = 0xffff883fc1e21c18 }, { device_busy = { counter = 6 }, __UNIQUE_ID_rh_kabi_hide20 = { device_busy = 6 }, {<No data fields>} }, list_lock = { { rlock = { raw_lock = { { head_tail = 1215842424, tickets = { head = 18552, tail = 18552 } } } } } }, cmd_list = { next = 0xffff881f49a2d508, prev = 0xffff883eeccee308 }, starved_entry = { next = 0xffff881fcee44848, prev = 0xffff881fcee44848 }, current_cmnd = 0x0, queue_depth = 254, max_queue_depth = 254, last_queue_full_depth = 0, last_queue_full_count = 0, last_queue_full_time = 0, queue_ramp_up_period = 120000, last_queue_ramp_up = 0, id = 4,--------------------------------这个一般赋值为target的id lun = 0,-------------------------------就是大家看到的四级编号的最后一级,lun channel = 0,---------------------------通道号 manufacturer = 0, sector_size = 512, hostdata = 0xffff883fca92ed20, type = 0 '\000', scsi_level = 7 '\a', inq_periph_qual = 0 '\000', inquiry_len = 144 '\220', inquiry = 0xffff883fc1e60b40 "", vendor = 0xffff883fc1e60b48 "SEAGATE ST4000NM0025 N003ZC18ASFP", model = 0xffff883fc1e60b50 "ST4000NM0025 N003ZC18ASFP", rev = 0xffff883fc1e60b60 "N003ZC18ASFP", current_tag = 0 '\000', sdev_target = 0xffff883fc1e21c00,------这个指向scsi_target,按注释是说single lun的时候才有效,但我看target的single lun的值为0,比较奇怪,稳妥取scsi_target最好不用这个 sdev_bflags = 0, eh_timeout = 10000, writeable = 1, removable = 0, changed = 0, busy = 0, lockable = 0, locked = 0, borken = 0, disconnect = 0, soft_reset = 0, sdtr = 0, wdtr = 0, ppr = 1, tagged_supported = 1, simple_tags = 0, ordered_tags = 0, was_reset = 0, expecting_cc_ua = 0, use_10_for_rw = 1, use_10_for_ms = 0, no_report_opcodes = 0, no_write_same = 0, use_16_for_rw = 1, skip_ms_page_8 = 0, skip_ms_page_3f = 0, skip_vpd_pages = 0, use_192_bytes_for_3f = 0, no_start_on_add = 0, allow_restart = 0, manage_start_stop = 0, start_stop_pwr_cond = 0, no_uld_attach = 0, select_no_atn = 0, fix_capacity = 0, guess_capacity = 0, retry_hwerror = 0, last_sector_bug = 0, no_read_disc_info = 0, no_read_capacity_16 = 0, try_rc_10_first = 0, is_visible = 1, wce_default_on = 0, no_dif = 0, broken_fua = 0, vpd_reserved = 0, xcopy_reserved = 0, lun_in_cdb = 0, disk_events_disable_depth = { counter = 0 }, supported_events = {0}, pending_events = {0}, event_list = { next = 0xffff881fcee44900, prev = 0xffff881fcee44900 }, event_work = { data = { counter = 68719476704 }, entry = { next = 0xffff881fcee44918, prev = 0xffff881fcee44918 }, func = 0xffffffff814241a0 <scsi_evt_thread> }, { device_blocked = { counter = 0 }, __UNIQUE_ID_rh_kabi_hide21 = { device_blocked = 0 }, {<No data fields>} }, max_device_blocked = 3, iorequest_cnt = {------------下发的io counter = 4641 }, iodone_cnt = {-----------------完成的io counter = 4635 }, ioerr_cnt = { counter = 283----------------这个要关注,出错的io统计,这个会导出到/proc/diskstat中 }, sdev_gendev = {----------------设备模型,scsi_device的sdev_gendev的的parent指向scsi_target的dev成员,驱动的树状模型体现。 parent = 0xffff883fc1e21c28, p = 0xffff883fd0cf2b40, kobj = { name = 0xffff883fc21d9790 "5:0:4:0",----------四级命名的name,说明host_no为5,channel为0,target的id为4,lun为0 entry = { next = 0xffff881fcee44c00, prev = 0xffff883fc1e21c40 }, parent = 0xffff883fc1e21c38, kset = 0xffff881fff86b6c0, ktype = 0xffffffff81a14e40 <device_ktype>, sd = 0xffff883fccc5ee70, kref = { refcount = { counter = 25 } }, state_initialized = 1, state_in_sysfs = 1, state_add_uevent_sent = 1, state_remove_uevent_sent = 0, uevent_suppress = 0 }, init_name = 0x0, type = 0xffffffff81a19760 <scsi_dev_type>, mutex = { count = { counter = 1 }, wait_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, wait_list = { next = 0xffff881fcee449b0, prev = 0xffff881fcee449b0 }, owner = 0x0, { osq = 0x0, __UNIQUE_ID_rh_kabi_hide0 = { spin_mlock = 0x0 }, {<No data fields>} } }, bus = 0xffffffff81a19320 <scsi_bus_type>,---------总线类型的指针,也在sdev_gendev成员中 driver = 0xffffffffa011e008 <sd_template+8>, platform_data = 0x0, power = { power_state = { event = 0 }, can_wakeup = 0, async_suspend = 1, is_prepared = false, is_suspended = false, ignore_children = false, early_init = true, lock = { { rlock = { raw_lock = { { head_tail = 1310740, tickets = { head = 20, tail = 20 } } } } } }, entry = { next = 0xffff881fcee44c98, prev = 0xffff883fc1e21cd8 }, completion = { done = 2147483647, wait = { lock = { { rlock = { raw_lock = { { head_tail = 131074, tickets = { head = 2, tail = 2 } } } } } }, task_list = { next = 0xffff881fcee44a18, prev = 0xffff881fcee44a18 } } }, wakeup = 0x0, wakeup_path = false, syscore = false, suspend_timer = { entry = { next = 0x0, prev = 0x0 }, expires = 0, base = 0xffff883fd0c94000, function = 0xffffffff81402e90 <pm_suspend_timer_fn>, data = 18446612268929272136, slack = -1, start_pid = -1, start_site = 0x0, start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" }, timer_expires = 0, work = { data = { counter = 68719476704 }, entry = { next = 0xffff881fcee44a98, prev = 0xffff881fcee44a98 }, func = 0xffffffff81402f10 <pm_runtime_work> }, wait_queue = { lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, task_list = { next = 0xffff881fcee44ab8, prev = 0xffff881fcee44ab8 } }, usage_count = { counter = 2 }, child_count = { counter = 0 }, disable_depth = 0, idle_notification = 0, request_pending = 0, deferred_resume = 0, run_wake = 0, runtime_auto = 0, no_callbacks = 0, irq_safe = 0, use_autosuspend = 1, timer_autosuspends = 0, memalloc_noio = 1, request = RPM_REQ_NONE, runtime_status = RPM_ACTIVE, runtime_error = 0, autosuspend_delay = -1, last_busy = 4295244282, active_jiffies = 0, suspended_jiffies = 0, accounting_timestamp = 4294683149, subsys_data = 0x0, qos = 0x0 }, pm_domain = 0x0, numa_node = 1, dma_mask = 0x0, coherent_dma_mask = 0, dma_parms = 0x0, dma_pools = { next = 0xffff881fcee44b40, prev = 0xffff881fcee44b40 }, dma_mem = 0x0, archdata = { dma_ops = 0x0, iommu = 0x0 }, of_node = 0x0, acpi_node = { { companion = 0x0, __UNIQUE_ID_rh_kabi_hide9 = { handle = 0x0 }, {<No data fields>} } }, devt = 0, id = 0, devres_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, devres_head = { next = 0xffff881fcee44b88, prev = 0xffff881fcee44b88 }, knode_class = { n_klist = 0x0, n_node = { next = 0x0, prev = 0x0 }, n_ref = { refcount = { counter = 0 } } }, class = 0x0, groups = 0x0, release = 0x0, iommu_group = 0x0, offline_disabled = false, offline = false, device_rh = 0xffff881fcee33378 }, sdev_dev = { parent = 0xffff881fcee44948, p = 0xffff883fd0cf2cc0, kobj = { name = 0xffff883fc21d9798 "5:0:4:0",---在 scsi_sysfs_device_initialize 函数中,设置为和scsi_device.sdev_gendev一样的name entry = { next = 0xffff883fc1854418, prev = 0xffff881fcee44960 }, parent = 0xffff881fcee4cde0, kset = 0xffff881fff86b6c0, ktype = 0xffffffff81a14e40 <device_ktype>, sd = 0xffff883fcac120e0, kref = { refcount = { counter = 3 } }, state_initialized = 1, state_in_sysfs = 1, state_add_uevent_sent = 1, state_remove_uevent_sent = 0, uevent_suppress = 0 }, init_name = 0x0, type = 0x0, mutex = { count = { counter = 1 }, wait_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, wait_list = { next = 0xffff881fcee44c50, prev = 0xffff881fcee44c50 }, owner = 0x0, { osq = 0x0, __UNIQUE_ID_rh_kabi_hide0 = { spin_mlock = 0x0 }, {<No data fields>} } }, bus = 0x0, driver = 0x0, platform_data = 0x0, power = { power_state = { event = 0 }, can_wakeup = 0, async_suspend = 1, is_prepared = false, is_suspended = false, ignore_children = false, early_init = true, lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, entry = { next = 0xffff883fc18544b0, prev = 0xffff881fcee449f8 }, completion = { done = 2147483647, wait = { lock = { { rlock = { raw_lock = { { head_tail = 131074, tickets = { head = 2, tail = 2 } } } } } }, task_list = { next = 0xffff881fcee44cb8, prev = 0xffff881fcee44cb8 } } }, wakeup = 0x0, wakeup_path = false, syscore = false, suspend_timer = { entry = { next = 0x0, prev = 0x0 }, expires = 0, base = 0xffff883fd0c94000, function = 0xffffffff81402e90 <pm_suspend_timer_fn>, data = 18446612268929272808, slack = -1, start_pid = -1, start_site = 0x0, start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" }, timer_expires = 0, work = { data = { counter = 68719476704 }, entry = { next = 0xffff881fcee44d38, prev = 0xffff881fcee44d38 }, func = 0xffffffff81402f10 <pm_runtime_work> }, wait_queue = { lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, task_list = { next = 0xffff881fcee44d58, prev = 0xffff881fcee44d58 } }, usage_count = { counter = 0 }, child_count = { counter = 0 }, disable_depth = 1, idle_notification = 0, request_pending = 0, deferred_resume = 0, run_wake = 0, runtime_auto = 1, no_callbacks = 0, irq_safe = 0, use_autosuspend = 0, timer_autosuspends = 0, memalloc_noio = 0, request = RPM_REQ_NONE, runtime_status = RPM_SUSPENDED, runtime_error = 0, autosuspend_delay = 0, last_busy = 0, active_jiffies = 0, suspended_jiffies = 0, accounting_timestamp = 4294680236, subsys_data = 0x0, qos = 0x0 }, pm_domain = 0x0, numa_node = 1, dma_mask = 0x0, coherent_dma_mask = 0, dma_parms = 0x0, dma_pools = { next = 0xffff881fcee44de0, prev = 0xffff881fcee44de0 }, dma_mem = 0x0, archdata = { dma_ops = 0x0, iommu = 0x0 }, of_node = 0x0, acpi_node = { { companion = 0x0, __UNIQUE_ID_rh_kabi_hide9 = { handle = 0x0 }, {<No data fields>} } }, devt = 0, id = 0, devres_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, devres_head = { next = 0xffff881fcee44e28, prev = 0xffff881fcee44e28 }, knode_class = { n_klist = 0xffff883fcff106a8, n_node = { next = 0xffff881fcece9e40, prev = 0xffff881fcee44640 }, n_ref = { refcount = { counter = 1 } } }, class = 0xffffffff81a193e0 <sdev_class>, groups = 0x0, release = 0x0, iommu_group = 0x0, offline_disabled = false, offline = false, device_rh = 0xffff881fcee33398 }, ew = { work = { data = { counter = 0 }, entry = { next = 0x0, prev = 0x0 }, func = 0x0 } }, requeue_work = { data = { counter = 68719476704 }, entry = { next = 0xffff881fcee44eb0, prev = 0xffff881fcee44eb0 }, func = 0xffffffff814236e0 <scsi_requeue_run_queue> }, scsi_dh_data = 0x0, sdev_state = SDEV_RUNNING,---------------当前设备的状态为运行态 { vpd_pg83 = 0xffff883fc1e62400 "", __UNIQUE_ID_rh_kabi_hide22 = { vpd_reserved1 = 0xffff883fc1e62400 }, {<No data fields>} }, { vpd_pg83_len = 76, __UNIQUE_ID_rh_kabi_hide23 = { vpd_reserved2 = 0x4c }, {<No data fields>} }, { vpd_pg80 = 0xffff883fc1e62300 "", __UNIQUE_ID_rh_kabi_hide24 = { vpd_reserved3 = 0xffff883fc1e62300 }, {<No data fields>} }, { vpd_pg80_len = 24, __UNIQUE_ID_rh_kabi_hide25 = { vpd_reserved4 = 0x18 }, {<No data fields>} }, vpd_reserved5 = 0 '\000', vpd_reserved6 = 0 '\000', vpd_reserved7 = 0 '\000', vpd_reserved8 = 0 '\000', vpd_reserved9 = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, rh_reserved1 = 0x0, rh_reserved2 = 0x0, rh_reserved3 = 0x0, rh_reserved4 = 0x0, rh_reserved5 = 0x0, rh_reserved6 = 0x0, scsi_mq_reserved1 = { counter = 0 }, scsi_mq_reserved2 = { counter = 0 }, sdev_data = 0xffff881fcee44f38 }
通过scsi_device 怎么找到它归属的scsi_target呢?从前面的打印看,
crash> scsi_device.sdev_target ffff881fcee44800 sdev_target = 0xffff883fc1e21c00 crash> struct -xo scsi_device.sdev_gendev ffff881fcee44800 struct scsi_device { [ffff881fcee44948] struct device sdev_gendev; } crash> device.parent ffff881fcee44948 parent = 0xffff883fc1e21c28 crash> struct -xo scsi_target.dev struct scsi_target { [0x28] struct device dev; } crash> px 0xffff883fc1e21c28-0x28 $4 = 0xffff883fc1e21c00--------------和直接取的sdev_target是一样的,不过建议还是用第二种方法
也可以直接看,不用一级一级查看:
crash> scsi_device.sdev_gendev.parent ffff881fcee44800
sdev_gendev.parent = 0xffff883fc1e21c28,
对于target,有个scsi_target 的抽象。它的starget_sdev_user成员指向当前active的lun,
/* * scsi_target: representation of a scsi target, for now, this is only * used for single_lun devices. If no one has active IO to the target,-------注释过时了么? * starget_sdev_user is NULL, else it points to the active sdev. */ struct scsi_target { struct scsi_device *starget_sdev_user;---要么之前当前active的scsi_device,要么为NULL,用于当前target只支持一个lun的场景
...
下面是一个scsi_target的例子:
crash> scsi_target 0xffff883fc1e21c00----这个就是前面scsi_device的归属scsi_target struct scsi_target { starget_sdev_user = 0x0, siblings = { next = 0xffff883fc1e5f008,-------------这个成员嵌入到host的__target成员 prev = 0xffff883fcaa4b408 }, devices = {-------------------一个target下的scsi_device的链 next = 0xffff881fcee44820, prev = 0xffff881fcee44820 }, dev = {-----------------------从驱动模型说,scsi_device的sdev_gendev的parent指向scsi_target的dev parent = 0xffff883fcaa4fc00, p = 0xffff883fd0cf29c0, kobj = { name = 0xffff883fc1e00660 "target5:0:4", entry = { next = 0xffff881fcee44960, prev = 0xffff883fc1853018 }, parent = 0xffff883fcaa4fc10, kset = 0xffff881fff86b6c0, ktype = 0xffffffff81a14e40 <device_ktype>, sd = 0xffff883fccc5ea10, 。。。。。。-----------------------------省略了其他device模型的又臭又长的结构体 reap_ref = 0, channel = 0, id = 4, create = 0, single_lun = 0, pdt_1f_for_no_lun = 0, no_report_luns = 0, expecting_lun_change = 0, { target_busy = { counter = 0 }, __UNIQUE_ID_rh_kabi_hide19 = { target_busy = 0 }, {<No data fields>} }, can_queue = 0, { target_blocked = { counter = 0 }, __UNIQUE_ID_rh_kabi_hide20 = { target_blocked = 0 }, {<No data fields>} }, max_target_blocked = 3, scsi_level = 7 '\a', ew = { work = { data = { counter = 0 }, entry = { next = 0x0, prev = 0x0 }, func = 0x0 } }, state = STARGET_RUNNING, hostdata = 0xffff883fc1e22000, rh_reserved1 = 0x0, rh_reserved2 = 0x0, rh_reserved3 = 0x0, rh_reserved4 = 0x0, scsi_mq_reserved1 = { counter = 0 }, scsi_mq_reserved2 = { counter = 0 }, starget_data = 0xffff883fc1e21f48 }
scsi_request_fn 函数在给某个设备发送io请求的时候,还会判断当前设备归属的scsi_target 是否busy。
static void scsi_request_fn(struct request_queue *q) __releases(q->queue_lock) __acquires(q->queue_lock) { 。。。。 if (!scsi_target_queue_ready(shost, sdev)) goto not_ready; 。。。。 }
虽然从内核管理的角度说,scsi_target和scsi_device是一对多的,但是我看到的实际情况却是一对一,由于这个 starget_sdev_user 成员会指向active的scsi_device,但这个是个瞬间态。
大多时候是为NULL的。
对于bus/channel,没有抽象,有一个id来表示,在host中有一个最大的channel编号 max_channel 成员来区分一个host下的各个channel。
对于scsi的host,有个scsi_host的抽象。它通过__devices 成员串接它管理的所有scsi_device,通过__targets成员串接它管理的所有target,通过 scsi_add_host 函数往系统增加host。
下面是一个host的例子:
crash> struct Scsi_Host 0xffff883fd0e38000 struct Scsi_Host { __devices = { next = 0xffff881fcee42810, prev = 0xffff883fc18a6010 }, __targets = { next = 0xffff883fc1e19008, prev = 0xffff883fc18f2408 }, cmd_pool = 0xffffffff81a18680 <scsi_cmd_pool>, free_list_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, free_list = { next = 0xffff883fc8db0008, prev = 0xffff883fc8db0008 }, starved_list = { next = 0xffff883fd0e38040, prev = 0xffff883fd0e38040 }, default_lock = { { rlock = { raw_lock = { { head_tail = 93193614, tickets = { head = 1422, tail = 1422 } } } } } }, host_lock = 0xffff883fd0e38050, scan_mutex = { count = { counter = 1 }, wait_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, wait_list = { next = 0xffff883fd0e38068, prev = 0xffff883fd0e38068 }, owner = 0x0, { osq = 0x0, __UNIQUE_ID_rh_kabi_hide1 = { spin_mlock = 0x0 }, {<No data fields>} } }, eh_cmd_q = { next = 0xffff883fd0e38088, prev = 0xffff883fd0e38088 }, ehandler = 0xffff881fcc24a280,----这个对应的是PID: 680 TASK: ffff881fcc24a280 CPU: 37 COMMAND: "scsi_eh_5" eh_action = 0x0, host_wait = { lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, task_list = { next = 0xffff883fd0e380b0, prev = 0xffff883fd0e380b0 } }, hostt = 0xffffffffa00d01c0,--------host的自己模板 transportt = 0xffff883fcd090000,----这个就是 mpt3sas_transport_template,不同的host类型有不同的传输类型模板 { bqt = 0x0, tag_set = 0x0 }, { host_busy = { counter = 8----------------8个busy的io,其实就是目前已经离开request_queue之后的io统计 }, __UNIQUE_ID_rh_kabi_hide30 = { host_busy = 8 }, {<No data fields>} }, host_failed = 0,--------------目前没有fail的 host_eh_scheduled = 0, host_no = 5,------------------这个关联的错误处理内核线程,有多少个host就有多少个错误处理线程--680 2 37 ffff881fcc24a280 IN 0.0 0 0 [scsi_eh_5] eh_deadline = -1, last_reset = 0, max_id = 4294967295, max_lun = 16895, max_channel = 0, unique_id = 1, max_cmd_len = 32, this_id = -1, can_queue = 2936, cmd_per_lun = 7, sg_tablesize = 128, sg_prot_tablesize = 0, max_sectors = 32767, dma_boundary = 4294967295, cmd_serial_number = 0, active_mode = 1, unchecked_isa_dma = 0, use_clustering = 1, use_blk_tcq = 0, host_self_blocked = 0, reverse_ordering = 0, ordered_tag = 0, tmf_in_progress = 0, async_scan = 0, eh_noresume = 0, no_write_same = 0, use_blk_mq = 0,------------------是否使用多队列 no_scsi2_lun_in_cdb = 0, work_q_name = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", work_q = 0x0, tmf_work_q = 0xffff881fcef98800, { host_blocked = { counter = 0 }, __UNIQUE_ID_rh_kabi_hide31 = { host_blocked = 0 }, {<No data fields>} }, max_host_blocked = 7, prot_capabilities = 7, prot_guard_type = 3 '\003', uspace_req_q = 0x0, base = 0, io_port = 0, n_io_port = 0 '\000', dma_channel = 255 '\377', irq = 0, shost_state = SHOST_RUNNING,---------60个硬盘的也是running状态 shost_gendev = { parent = 0xffff883fcfded098, p = 0xffff883fd0c8b500, kobj = { name = 0xffff883fcd0661c0 "host5",---------设备驱动类型的名称 entry = { next = 0xffff883fd0e38448, prev = 0xffff881fca1cd818 }, parent = 0xffff883fcfded0a8, kset = 0xffff881fff86b6c0, ktype = 0xffffffff81a14e40 <device_ktype>, sd = 0xffff883fccc9e690, kref = { refcount = { counter = 39 } }, state_initialized = 1, state_in_sysfs = 1, state_add_uevent_sent = 1, state_remove_uevent_sent = 0, uevent_suppress = 0 }, init_name = 0x0, type = 0xffffffff81a18a80 <scsi_host_type>, mutex = { count = { counter = 1 }, wait_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, wait_list = { next = 0xffff883fd0e381f8, prev = 0xffff883fd0e381f8 }, owner = 0x0, { osq = 0x0, __UNIQUE_ID_rh_kabi_hide1 = { spin_mlock = 0x0 }, {<No data fields>} } }, bus = 0xffffffff81a19320 <scsi_bus_type>, driver = 0x0, platform_data = 0x0, power = { power_state = { event = 0 }, can_wakeup = 0, async_suspend = 1, is_prepared = false, is_suspended = false, ignore_children = false, early_init = true, lock = { { rlock = { raw_lock = { { head_tail = 9568402, tickets = { head = 146, tail = 146 } } } } } }, entry = { next = 0xffff883fd0e384e0, prev = 0xffff881fca1cd8b0 }, completion = { done = 2147483647, wait = { lock = { { rlock = { raw_lock = { { head_tail = 131074, tickets = { head = 2, tail = 2 } } } } } }, task_list = { next = 0xffff883fd0e38260, prev = 0xffff883fd0e38260 } } }, wakeup = 0x0, wakeup_path = false, syscore = false, suspend_timer = { entry = { next = 0x0, prev = 0x0 }, expires = 0, base = 0xffff883fd0e24000, function = 0xffffffff81402e90 <pm_suspend_timer_fn>, data = 18446612406401728912, slack = -1, start_pid = -1, start_site = 0x0, start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" }, timer_expires = 0, work = { data = { counter = 68719476704 }, entry = { next = 0xffff883fd0e382e0, prev = 0xffff883fd0e382e0 }, func = 0xffffffff81402f10 <pm_runtime_work> }, wait_queue = { lock = { { rlock = { raw_lock = { { head_tail = 262148, tickets = { head = 4, tail = 4 } } } } } }, task_list = { next = 0xffff883fd0e38300, prev = 0xffff883fd0e38300 } }, usage_count = { counter = 0 }, child_count = { counter = 0 }, disable_depth = 0, idle_notification = 0, request_pending = 0, deferred_resume = 0, run_wake = 0, runtime_auto = 1, no_callbacks = 0, irq_safe = 0, use_autosuspend = 0, timer_autosuspends = 0, memalloc_noio = 1, request = RPM_REQ_NONE, runtime_status = RPM_SUSPENDED, runtime_error = 0, autosuspend_delay = 0, last_busy = 0, active_jiffies = 9752, suspended_jiffies = 0, accounting_timestamp = 4294683155, subsys_data = 0x0, qos = 0x0 }, pm_domain = 0x0, numa_node = 1, dma_mask = 0x0, coherent_dma_mask = 0, dma_parms = 0x0, dma_pools = { next = 0xffff883fd0e38388, prev = 0xffff883fd0e38388 }, dma_mem = 0x0, archdata = { dma_ops = 0x0, iommu = 0x0 }, of_node = 0x0, acpi_node = { { companion = 0x0, __UNIQUE_ID_rh_kabi_hide7 = { handle = 0x0 }, {<No data fields>} } }, devt = 0, id = 0, devres_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, devres_head = { next = 0xffff883fd0e383d0, prev = 0xffff883fd0e383d0 }, knode_class = { n_klist = 0x0, n_node = { next = 0x0, prev = 0x0 }, n_ref = { refcount = { counter = 0 } } }, class = 0x0, groups = 0x0, release = 0x0, iommu_group = 0x0, offline_disabled = false, offline = false, device_rh = 0xffff883fccc84738 }, shost_dev = { parent = 0xffff883fd0e38190, p = 0xffff883fd0c8b5c0, kobj = { name = 0xffff883fcd0661c8 "host5",----------设备驱动类型的名称 entry = { next = 0xffff881fce507818, prev = 0xffff883fd0e381a8 }, parent = 0xffff883fcd039180, kset = 0xffff881fff86b6c0, ktype = 0xffffffff81a14e40 <device_ktype>, sd = 0xffff883fccc9eb60, kref = { refcount = { counter = 3 } }, state_initialized = 1, state_in_sysfs = 1, state_add_uevent_sent = 1, state_remove_uevent_sent = 0, uevent_suppress = 0 }, init_name = 0x0, type = 0x0, mutex = { count = { counter = 1 }, wait_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, wait_list = { next = 0xffff883fd0e38498, prev = 0xffff883fd0e38498 }, owner = 0x0, { osq = 0x0, __UNIQUE_ID_rh_kabi_hide1 = { spin_mlock = 0x0 }, {<No data fields>} } }, bus = 0x0, driver = 0x0, platform_data = 0x0, power = { power_state = { event = 0 }, can_wakeup = 0, async_suspend = 1, is_prepared = false, is_suspended = false, ignore_children = false, early_init = true, lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, entry = { next = 0xffff881fce5078b0, prev = 0xffff883fd0e38240 }, completion = { done = 2147483647, wait = { lock = { { rlock = { raw_lock = { { head_tail = 131074, tickets = { head = 2, tail = 2 } } } } } }, task_list = { next = 0xffff883fd0e38500, prev = 0xffff883fd0e38500 } } }, wakeup = 0x0, wakeup_path = false, syscore = false, suspend_timer = { entry = { next = 0x0, prev = 0x0 }, expires = 0, base = 0xffff883fd0e24000, function = 0xffffffff81402e90 <pm_suspend_timer_fn>, data = 18446612406401729584, slack = -1, start_pid = -1, start_site = 0x0, start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" }, timer_expires = 0, work = { data = { counter = 68719476704 }, entry = { next = 0xffff883fd0e38580, prev = 0xffff883fd0e38580 }, func = 0xffffffff81402f10 <pm_runtime_work> }, wait_queue = { lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, task_list = { next = 0xffff883fd0e385a0, prev = 0xffff883fd0e385a0 } }, usage_count = { counter = 0 }, child_count = { counter = 0 }, disable_depth = 1, idle_notification = 0, request_pending = 0, deferred_resume = 0, run_wake = 0, runtime_auto = 1, no_callbacks = 0, irq_safe = 0, use_autosuspend = 0, timer_autosuspends = 0, memalloc_noio = 0, request = RPM_REQ_NONE, runtime_status = RPM_SUSPENDED, runtime_error = 0, autosuspend_delay = 0, last_busy = 0, active_jiffies = 0, suspended_jiffies = 0, accounting_timestamp = 4294671976, subsys_data = 0x0, qos = 0x0 }, pm_domain = 0x0, numa_node = 1, dma_mask = 0x0, coherent_dma_mask = 0, dma_parms = 0x0, dma_pools = { next = 0xffff883fd0e38628, prev = 0xffff883fd0e38628 }, dma_mem = 0x0, archdata = { dma_ops = 0x0, iommu = 0x0 }, of_node = 0x0, acpi_node = { { companion = 0x0, __UNIQUE_ID_rh_kabi_hide7 = { handle = 0x0 }, {<No data fields>} } }, devt = 0, id = 0, devres_lock = { { rlock = { raw_lock = { { head_tail = 0, tickets = { head = 0, tail = 0 } } } } } }, devres_head = { next = 0xffff883fd0e38670, prev = 0xffff883fd0e38670 }, knode_class = { n_klist = 0xffff883fcff102a8, n_node = { next = 0xffff883fd0cb2688, prev = 0xffff881fce2b6688 }, n_ref = { refcount = { counter = 1 } } }, class = 0xffffffff81a18ac0 <shost_class>, groups = 0xffffffff81a19470 <scsi_sysfs_shost_attr_groups>, release = 0x0, iommu_group = 0x0, offline_disabled = false, offline = false, device_rh = 0xffff883fccc84758 }, sht_legacy_list = { next = 0x0, prev = 0x0 }, shost_data = 0xffff883fcd0391e0, dma_dev = 0xffff883fcfded098, rh_reserved1 = 0x0, rh_reserved2 = 0x0, rh_reserved3 = 0x0, rh_reserved4 = 0x0, rh_reserved5 = 0x0, rh_reserved6 = 0x0, scsi_mq_reserved1 = 0, scsi_mq_reserved2 = 0, scsi_mq_reserved3 = 0x0, scsi_mq_reserved4 = 0x0, scsi_mq_reserved5 = { counter = 0 }, scsi_mq_reserved6 = { counter = 0 }, hostdata = 0xffff883fd0e38740---这个一般存放控制器的相关数据,如MPT3SAS_ADAPTER,MPT2SAS_ADAPTER等 }
一般在scsi主机适配器驱动的probe里面,先是scsi_alloc_host,然后scsi_add_host,紧接着就调用scsi_scan_host扫描scsi总线。
scsi总线扫描的目的是通过协议特定或芯片特定的方式探测出挂接在主机适配器后面的目标节点和逻辑单元,为它们在内存中构建相应的数据结构,将它们添加到系统中。
scsi中间层依次以可能的ID和LUN构造INQUIRY命令,之后将这些INQUIRY命令提交到块IO系统,后者最终将调用中间层的策略例程,再次提取到SCSI命令后,调用scsi底层驱动的queuecommand回调函数。其实内核中,只要涉及到注册的,基本都涉及到往上层和往下层的关系的建立。
各个Scsi_Host之间什么关系?
从设备驱动模型的角度说,各个host的shost_dev.parent指向同一个device,其他没有相关性。
crash> device.parent ffff883fd0cb4190 parent = 0xffff883fcfdef098 crash> device.parent 0xffff883fd0e38190 parent = 0xffff883fcfded098 crash> device.parent 0xffff883fd0cb2190 parent = 0xffff883fcfdee098 crash> device.parent 0xffff881fce2b6190 parent = 0xffff883fcfdb0098
SCSI 子系统处理块访问请求
当 SCSI 子系统的请求队列处理函数被通用块层调用后,SCSI 中间层会根据块访问请求的内容,生成、初始并提交 SCSI 命令 (struct scsi_cmd
) 到 SCSI TARGET 端。
scsi这些是按层级去描述对应通信的设备的,分别为host级,bus级,target级,device级。前面提到的scsi_device就是device层的抽象,对应的是lun,可能是磁盘,也可能是光盘之类的,
如果是磁盘,则还会生成一个scsi_disk的对象,光盘的话,则会产生一个 scsi_cd 的对象来和scsi_device 对应。
在scsi总线扫描的时候,每当探测到一个设备,就会调用scsi_alloc_sdev()函数,然后里面会继续调用scsi_alloc_queue(),也就是当内核识别到一个scsi设备之后,需要为该设备设置一个request_queue,这个动作在下面完成,具体怎么识别到scsi_device ,有一堆探测的流程,在此不展开。
struct request_queue *scsi_alloc_queue(struct scsi_device *sdev) { struct request_queue *q; q = __scsi_alloc_queue(sdev->host, scsi_request_fn);----------申请常见的request_queue,并且设置它的成员,scsi_request_fn 用用来执行request调用的 if (!q) return NULL; blk_queue_prep_rq(q, scsi_prep_fn);-------------------scsi_prep_fn准备scsi命令用的函数 blk_queue_unprep_rq(q, scsi_unprep_fn); blk_queue_softirq_done(q, scsi_softirq_done); blk_queue_rq_timed_out(q, scsi_times_out); blk_queue_lld_busy(q, scsi_lld_busy); return q; }
scsi命令的抽象:
内核中使用scsi_cmnd 来管理生成的scsi命令,包括命令的时间,重试次数,上下文指针,承载CDB的命令体等。一个典型的fs下发的request包含的scsi_cmnd 例子如下:
crash> scsi_cmnd 0xffff881f49a2d500 struct scsi_cmnd { device = 0xffff881fcee44800,------这个命令归属的scsi_device对象的指针 list = { next = 0xffff881f49a2cfc8, prev = 0xffff881fcee44838 }, eh_entry = {----嵌入到错误处理链表的成员,当该scsi命令出现错误或者超时的时候用到 next = 0x0, prev = 0x0 }, abort_work = {----命令出现超时的时候用到,这个会嵌入到scsi_host的一个workqueue中去处理 work = { data = { counter = 68719476704 }, entry = { next = 0xffff881f49a2d530, prev = 0xffff881f49a2d530 }, func = 0xffffffff8141eee0 <scmd_eh_abort_handler>----work_struct中的处理函数 }, timer = { entry = { next = 0x0, prev = 0x0 }, expires = 0, base = 0xffff881fd2d8c002, function = 0xffffffff8109c100 <delayed_work_timer_fn>, data = 18446612266693612840, slack = -1, start_pid = -1, start_site = 0x0, start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" }, wq = 0x0, cpu = 0 }, eh_eflags = 0, serial_number = 0,------------------命令编号 jiffies_at_alloc = 4298774713,------这个命令在alloc时的时戳 retries = 0, allowed = 5, prot_op = 0 '\000', prot_type = 0 '\000', cmd_len = 16, sc_data_direction = DMA_FROM_DEVICE, cmnd = 0xffff883e9d3f7e98 "\210", sdb = { table = { sgl = 0xffff880cdf50fe00, nents = 1, orig_nents = 1 }, length = 4096, resid = 0 }, prot_sdb = 0x0, underflow = 4096, transfersize = 512, request = 0xffff883e9d3f7d80,------------------命令对应的blk层的request sense_buffer = 0xffff880168be0f00 "", scsi_done = 0xffffffff81420a90 <scsi_done>,---命令执行后的回调 SCp = { ptr = 0x0, this_residual = 0, buffer = 0x0, buffers_residual = 0, dma_handle = 0, Status = 0, Message = 0, have_data_in = 0, sent_command = 0, phase = 0 }, host_scribble = 0x0, result = 0, tag = 255 '\377', rh_reserved1 = 0x0, rh_reserved2 = 0x0, rh_reserved3 = 0x0, rh_reserved4 = 0x0 }
SCSI 命令初始化和提交
除了通用块层下发的scsi命令之外,可以通过sg来下发scsi命令。
SCSI 子系统的错误处理
由于 硬盘底层驱动是由厂商自己实现的,在此就不予讨论。除此之外,SCSI 子系统的出错处理,主要是由 SCSI 中间层完成。在第一次回调过程中,SCSI 底层驱动将 SCSI 命令的处理结果以及获取的 SCSI 状态信息返回给 SCSI 中间层,SCSI 中间层先对 SCSI 底层驱动返回的 SCSI 命令执行的结果进行判断,若无法得到明确的结论,则对 SCSI 底层驱动返回的 SCSI 状态、感测数据等进行判断。对于判断结论为处理成功的 SCSI 命令,SCSI 中间层会直接进行第二次回调;对于判断结论为需要重试的命令,则会被加入块设备请求对列,重新被处理。这个过程可称为 SCSI 中间层对 SCSI 命令执行结果的基本判断方法。
一切看起来似乎是这么简单,但是实际上并非如此,有些错误是没有明确的判断依据的,如感测数据错误或 TIMEOUT 错误。为了解决这个问题,LINUX 内核中 SCSI 子系统引入了一个专门进行错误处理的线程,对于无法判断错误原因的 SCSI 命令,都会交由该线程进行处理。线程处理过程和两个队列密切相关,一个是错误处理队列(eh_work_q
),一个是错误处理完成队列 (done_q
) 。错误处理队列记录了需要进行错误处理的 SCSI 命令,错误处理完成队列记录了在错误处理过程中被处理完成的 SCSI 命令。下图显示了线程对错误处理队列上记录的命令进行错误处理的过程。
错误处理的过程
static void scsi_unjam_host(struct Scsi_Host *shost) { unsigned long flags; LIST_HEAD(eh_work_q); LIST_HEAD(eh_done_q); spin_lock_irqsave(shost->host_lock, flags); list_splice_init(&shost->eh_cmd_q, &eh_work_q); spin_unlock_irqrestore(shost->host_lock, flags); SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q)); if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q)) if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q)) scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q); spin_lock_irqsave(shost->host_lock, flags); if (shost->eh_deadline != -1) shost->last_reset = 0; spin_unlock_irqrestore(shost->host_lock, flags); scsi_eh_flush_done_q(&eh_done_q); }
整个处理过程可归纳为四个阶段:
- 感测数据查询阶段
通过查询感测数据,为处理 SCSI 命令重新提供判断依据,并按照前述基本判断方法进行判断。如果判断结果为成功或者重试,则可将该命令从错误处理队列移到错误处理完成队列。若判断失败,则命令将会继续保留在 SCSI 错误处理队列中,错误处理进入到 ABORT 阶段。
- ABORT阶段
在这个阶段中,错误处理队列上的 SCSI 命令会被主动 ABORT 掉。被 ABORT 的命令,会被加入到错误处理完成队列。若 ABORT 过程结束,错误处理队列上还存在未能被处理的命令,则需进入 START STOP UNIT 阶段进行处理。
- START STOP UNIT阶段
在这个阶段,START STOP UNIT[6] 命令会被发送到与错误处理队列上的命令相关的 SCSI DEVICE 上,去试图恢复 SCSI DEVICE,如果在 START STOP UNIT 阶段结束后,依旧有命令在错误处理队列上,则需要进入 RESET 阶段进行处理。
- RESET阶段
RESET 阶段的处理过程分四个层次:DEVICE RESET,TARGET RESET, BUS RESET 和 HOST RESET 。首先对与错误队列上的命令相关的 SCSI DEVICE,进行 RESET 操作,如果 DEVICE RESET 后,SCSI 设备能处于正常状态,则和该设备相关的错误处理队列上的错误命令,会被加入到错误处理完成队列中。若通过 DEVICE RESET 不能处理所有的错误命令,则需进入TARGET RESET,再失败则需进入到 BUS RESET 阶段,BUS RESET 会对与错误处理队列上的命令相关的 BUS,进行 RESET 操作。若 BUS RESET 还不能成功处理所有错误处理队列上的 SCSI 命令,则会进入到 HOST RESET 阶段,HOST RESET 会对与错误处理队列上的命令相关的 HOST 进行 RESET 操作。当然,很有可能 HOST RESET 也不能成功处理所有错误命令,则只能认为错误处理队列上错误命令相关的 SCSI 设备不能被使用了。这些不能被使用的设备会被标记为不能使用状态,同时相关的错误命令都会被加入到错误处理完成队列中。对应的函数如下:
那些简写:
void blk_rq_timed_out_timer(unsigned long data) { struct request_queue *q = (struct request_queue *) data; unsigned long flags, next = 0; struct request *rq, *tmp; int next_set = 0; spin_lock_irqsave(q->queue_lock, flags); list_for_each_entry_safe(rq, tmp, &q->timeout_list, timeout_list) blk_rq_check_expired(rq, &next, &next_set);-------遍历下发给驱动的request,查看这些request是否超时了,这些request都串接在timeout_list中 if (next_set) mod_timer(&q->timeout, round_jiffies_up(next)); spin_unlock_irqrestore(q->queue_lock, flags); }
这里有一个需要注意的地方,从网上看,之前是一个request一个定时器,这样定时器就可能设置很多,而且这些定时器很有可能都没有用到,毕竟超时的概率还是比较低的,所以要不停创建和插入加删除定时器,而目前是一个request_queue一个定时器,然后这个定时器负责扫描到期的request,且这个定时器是常驻内存的。
static void blk_rq_check_expired(struct request *rq, unsigned long *next_timeout, unsigned int *next_set) { if (time_after_eq(jiffies, rq->deadline)) {-------这个request超时了 list_del_init(&rq->timeout_list);-------从request_queue的timeout_list中摘取出来 /* * Check if we raced with end io completion */ if (!blk_mark_rq_complete(rq))---防止并发 blk_rq_timed_out(rq);------------处理这个超时的req } else if (!*next_set || time_after(*next_timeout, rq->deadline)) { *next_timeout = rq->deadline; *next_set = 1; } }
static void blk_rq_timed_out(struct request *req) { struct request_queue *q = req->q; enum blk_eh_timer_return ret; ret = q->rq_timed_out_fn(req);---我们调用的是 scsi_times_out switch (ret) { case BLK_EH_HANDLED: /* Can we use req->errors here? */ __blk_complete_request(req); break; case BLK_EH_RESET_TIMER: blk_add_timer(req); blk_clear_rq_complete(req); break; case BLK_EH_NOT_HANDLED: /* * LLD handles this for now but in the future * we can send a request msg to abort the command * and we can move more of the generic scsi eh code to * the blk layer. */ break; default: printk(KERN_ERR "block: bad eh return: %d\n", ret); break; } }
If all scmds either complete or fail, the number of in-flight scmds
becomes equal to the number of failed scmds - i.e. shost->host_busy ==
shost->host_failed. This wakes up SCSI EH thread. So, once woken up,
SCSI EH thread can expect that all in-flight commands have failed and
are linked on shost->eh_cmd_q.
对于LUN的定义位于中间层的scsi_device结构体。而对于node的定义是中间层的scsi_target结构体,channel没有对应的结构体,如果对应的是硬盘,则还有一个scsi_disk的抽象,光盘的话,则有一个类似的 scsi_cd 结构。
系统中也有可能同时存在多个SCSI控制芯片,比如常见的服务器带jbod的方式接入存储,也即多个SCSIhost。对于如何定位每个LUN设备就需要一种编码方式。根据拓扑结构可以很容易的知道定位的编码方式是:host_id: channel_id: node_id:lun_id。这些ID的生成方式不讨论,但是根据每个各设备的编号就可以定位到具体的单个lun设备了。
对于被加入到错误处理完成队列上的请求,若是在设备状态正确,命令重试次数小于允许次数的情况下,这些命令将被重新加入到块访问请求队列中,进行重新处理;否则,直接进行第二次回调处理,完成 SCSI 子系统对块访问请求的处理。这样,SCSI 子系统就完成了 SCSI 命令错误处理的整个过程。
static void scsi_softirq_done(struct request *rq) { struct scsi_cmnd *cmd = rq->special; unsigned long wait_for = (cmd->allowed + 1) * rq->timeout; int disposition; INIT_LIST_HEAD(&cmd->eh_entry); atomic_inc(&cmd->device->iodone_cnt); if (cmd->result) atomic_inc(&cmd->device->ioerr_cnt); disposition = scsi_decide_disposition(cmd); if (disposition != SUCCESS && time_before(cmd->jiffies_at_alloc + wait_for, jiffies)) { sdev_printk(KERN_ERR, cmd->device, "timing out command, waited %lus\n", wait_for/HZ); disposition = SUCCESS; } scsi_log_completion(cmd, disposition); switch (disposition) { case SUCCESS: scsi_finish_command(cmd); break; case NEEDS_RETRY: scsi_queue_insert(cmd, SCSI_MLQUEUE_EH_RETRY); break; case ADD_TO_MLQUEUE: scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY); break; default: if (!scsi_eh_scmd_add(cmd, 0)) scsi_finish_command(cmd); } }
sd表示磁盘,(你可以使用scsi_disk简写的方式来记忆,对应的模块是sd_mod)sr表示光盘,st表示磁带,sg表示通用,文件系统向下调用磁盘中的文件需要用到的是sd,而sg内核驱动的存在使我们可以不使用文件系统,直接在用户空间调用scsi命令,比如有一次crash,看到大多数命令都是REQ_TYPE_FS,但是有一个是dfs通过ioctl直接访问硬盘,命令类型就是 REQ_TYPE_BLOCK_PC。一个lun可能对应一个sd或sr,也可能对应一个级联phy口。Linux中的SCSI层看起来只包含SCSI命令,并不完全实现标准的scsi协议,你可以把linux的scsi理解为符合协议的一个命令构造,命令执行,命令返回的控制层。
sd,sr等,都需要实例化一个 scsi_driver 的对象,
struct scsi_driver { struct module *owner; struct device_driver gendrv; void (*rescan)(struct device *); int (*init_command)(struct scsi_cmnd *); void (*uninit_command)(struct scsi_cmnd *); int (*done)(struct scsi_cmnd *); int (*eh_action)(struct scsi_cmnd *, int); int (*scsi_mq_reserved1)(struct scsi_cmnd *); void (*scsi_mq_reserved2)(struct scsi_cmnd *); void (*rh_reserved)(void); };
比如我们的sd,则实例化如下:
static struct scsi_driver sd_template = { .owner = THIS_MODULE, .gendrv = { .name = "sd", .probe = sd_probe, .remove = sd_remove, .shutdown = sd_shutdown, .pm = &sd_pm_ops, }, .rescan = sd_rescan, .init_command = sd_init_command, .uninit_command = sd_uninit_command, .done = sd_done,-----------------------阮中断回调 .eh_action = sd_eh_action, };
正常返回时:
0xffffffffc0273860 : sd_done+0x0/0x350 [sd_mod] 0xffffffff8147527d : scsi_finish_command+0xcd/0x140 [kernel] 0xffffffff8147f7b2 : scsi_softirq_done+0x142/0x190 [kernel]---这个就是req->q->softirq_done_fn 0xffffffff8130ec66 : blk_done_softirq+0x96/0xc0 [kernel]------处理io返回的软中断 0xffffffff810960ed : __do_softirq+0xfd/0x290 [kernel] 0xffffffff816cf45c : call_softirq+0x1c/0x30 [kernel] 0xffffffff8102d465 : do_softirq+0x65/0xa0 [kernel] 0xffffffff81096535 : irq_exit+0x175/0x180 [kernel] 0xffffffff810522b9 : smp_call_function_single_interrupt+0x39/0x40 [kernel] 0xffffffff816ceb77 : call_function_single_interrupt+0x87/0x90 [kernel]
scsi_finish_command 是一个很关键的函数,比如清除上层request的定时器之类的动作在这个函数中调用完成。
0xffffffff81307d10 : blk_finish_request+0x0/0x100 [kernel] 0xffffffff814800f6 : scsi_end_request+0x116/0x1e0 [kernel] 0xffffffff81480388 : scsi_io_completion+0x168/0x6a0 [kernel] 0xffffffff8147528c : scsi_finish_command+0xdc/0x140 [kernel] 0xffffffff8147f7b2 : scsi_softirq_done+0x142/0x190 [kernel] 0xffffffff8130ec66 : blk_done_softirq+0x96/0xc0 [kernel] 0xffffffff810960ed : __do_softirq+0xfd/0x290 [kernel] 0xffffffff816cf45c : call_softirq+0x1c/0x30 [kernel] 0xffffffff8102d465 : do_softirq+0x65/0xa0 [kernel] 0xffffffff81096535 : irq_exit+0x175/0x180 [kernel] 0xffffffff810522b9 : smp_call_function_single_interrupt+0x39/0x40 [kernel] 0xffffffff816ceb77 : call_function_single_interrupt+0x87/0x90 [kernel]
硬中断的回调:
0xffffffff8147ec70 : scsi_done+0x0/0x60 [kernel] 0xffffffffc0166fd7 : _scsih_io_done+0x117/0x11a0 [mpt3sas] 0xffffffffc0156ad7 : _base_interrupt+0x247/0xc80 [mpt3sas] 0xffffffff81138c74 : __handle_irq_event_percpu+0x44/0x1c0 [kernel] 0xffffffff81138e22 : handle_irq_event_percpu+0x32/0x80 [kernel] 0xffffffff81138eac : handle_irq_event+0x3c/0x60 [kernel] 0xffffffff8113bbaf : handle_edge_irq+0x7f/0x150 [kernel] 0xffffffff8102d321 : handle_irq+0xe1/0x1c0 [kernel] 0xffffffff816d058d : __irqentry_text_start+0x4d/0xf0 [kernel] 0xffffffff816c4287 : ret_from_intr+0x0/0x15 [kernel]
对于用于错误恢复的scsi命令,比如scsi_send_eh_cmnd 函数,设置的 scmd->scsi_done = scsi_eh_done;而正常下发的命令则一般是scsi_done.
上层从通用块层接收到了数据访问的请求,将其转化为SCSI命令,这个命令在上层中定义为scsi_cmnd结构体。然后调用中间层的scsi_host_template结构体中定义的queuecommand接口,将此命令交付中层处理。在命令处理结束,本层的回调函数会被以软中断的形式调用,以处理与命令相关的后续操作和通知通用块层该条命令的执行结果。
root 1007 2 0 Feb26 ? 00:00:00 [scsi_eh_0] root 1019 2 0 Feb26 ? 00:00:00 [scsi_eh_1] root 1030 2 0 Feb26 ? 00:00:00 [scsi_eh_2] root 1036 2 0 Feb26 ? 00:00:00 [scsi_eh_3] root 1046 2 0 Feb26 ? 00:00:00 [scsi_eh_4] root 1054 2 0 Feb26 ? 00:00:00 [scsi_eh_5]
response
对CDB命令的响应命令叫sense。但是这个响应可不是自动产生的,需要scsi设备主动使用sense request命令去查询。所以对于发送request方来说,命令的执行结束分为两个阶段,发送成功和磁盘设备执行成功。函数调用结束的状态只表示是本机发送该命令的结果状态,而不表示实际磁盘设备的执行情况。如果需要获得执行情况,需要去手动获取sense数据。
目前的linux的scsi实现就是这两个阶段的回调,一个是处理本机处理结果,另一个是发送sense request查询设备的执行结果,才会继续向下执行。
SCSI Enclosure Services (SES)
参考资料:
Documentation/scsi/scsi_eh.txt
彪哥的博客《http://blog.chinaunix.net/uid-14528823-id-4924157.html》