PCI 设备详解三

上篇文章已经分析了探测PCI总线的部分代码,碍于篇幅,这里另启一篇。重点分析下pci_scan_root_bus函数

2016-10-24


 

pci_scan_root_bus函数

struct pci_bus *pci_scan_root_bus(struct device *parent, int bus,
        struct pci_ops *ops, void *sysdata, struct list_head *resources)
{
    struct pci_host_bridge_window *window;
    bool found = false;
    struct pci_bus *b;
    int max;
    /*寻找bus的资源*/
    list_for_each_entry(window, resources, list)
        if (window->res->flags & IORESOURCE_BUS) {
            found = true;
            break;
        }
    /*创建bus对应的结构*/
    b = pci_create_root_bus(parent, bus, ops, sysdata, resources);
    if (!b)
        return NULL;

    if (!found) {
        dev_info(&b->dev,
         "No busn resource found for root bus, will use [bus %02x-ff]\n",
            bus);
        pci_bus_insert_busn_res(b, bus, 255);
    }
    /*遍历子总线*/
    max = pci_scan_child_bus(b);

    if (!found)
        pci_bus_update_busn_res_end(b, max);

    pci_bus_add_devices(b);
    return b;
}

 这里首先寻找bus总线号资源,前面在x86_pci_root_bus_resources函数中已经分配了,所以这里理论上是已经分配好了,不过还是验证下!!内核中总是精益求精。接着调用了pci_create_root_bus函数创建了对应的bus结构,然后调用pci_scan_child_bus函数遍历该总线下所有的子总线。最后就调用pci_bus_add_devices添加设备。总体上就是这么几步,但是要弄清楚,还真是不小的工作量。我们一步步来:

1、pci_create_root_bus函数

  1 struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
  2         struct pci_ops *ops, void *sysdata, struct list_head *resources)
  3 {
  4     int error;
  5     struct pci_host_bridge *bridge;
  6     struct pci_bus *b, *b2;
  7     struct pci_host_bridge_window *window, *n;
  8     struct resource *res;
  9     resource_size_t offset;
 10     char bus_addr[64];
 11     char *fmt;
 12     /*创建一个pci_bus结构*/
 13     b = pci_alloc_bus();
 14     if (!b)
 15         return NULL;
 16     /*基本的初始化*/
 17     b->sysdata = sysdata;
 18     b->ops = ops;
 19     /*0号总线的总线号正是该条根总线下的总线号资源的起始号*/
 20     b->number = b->busn_res.start = bus;
 21     /**/
 22     b2 = pci_find_bus(pci_domain_nr(b), bus);
 23     if (b2) {
 24         /* If we already got to this bus through a different bridge, ignore it */
 25         dev_dbg(&b2->dev, "bus already known\n");
 26         goto err_out;
 27     }
 28 
 29     bridge = pci_alloc_host_bridge(b);
 30     if (!bridge)
 31         goto err_out;
 32 
 33     bridge->dev.parent = parent;
 34     bridge->dev.release = pci_release_host_bridge_dev;
 35     dev_set_name(&bridge->dev, "pci%04x:%02x", pci_domain_nr(b), bus);
 36     error = pcibios_root_bridge_prepare(bridge);
 37     if (error) {
 38         kfree(bridge);
 39         goto err_out;
 40     }
 41     /*桥也是作为一个设备存在*/
 42     error = device_register(&bridge->dev);
 43     if (error) {
 44         put_device(&bridge->dev);
 45         goto err_out;
 46     }
 47     /*建立总线到桥的指向*/
 48     b->bridge = get_device(&bridge->dev);
 49     device_enable_async_suspend(b->bridge);
 50     pci_set_bus_of_node(b);
 51 
 52     if (!parent)
 53         set_dev_node(b->bridge, pcibus_to_node(b));
 54 
 55     b->dev.class = &pcibus_class;
 56     b->dev.parent = b->bridge;
 57     dev_set_name(&b->dev, "%04x:%02x", pci_domain_nr(b), bus);
 58     error = device_register(&b->dev);
 59     if (error)
 60         goto class_dev_reg_err;
 61 
 62     pcibios_add_bus(b);
 63 
 64     /* Create legacy_io and legacy_mem files for this bus */
 65     pci_create_legacy_files(b);
 66 
 67     if (parent)
 68         dev_info(parent, "PCI host bridge to bus %s\n", dev_name(&b->dev));
 69     else
 70         printk(KERN_INFO "PCI host bridge to bus %s\n", dev_name(&b->dev));
 71 
 72     /* Add initial resources to the bus */
 73     list_for_each_entry_safe(window, n, resources, list) {
 74         /*从全局的资源链表摘下,加入到特定桥的windows链表中*/
 75         list_move_tail(&window->list, &bridge->windows);
 76         
 77         res = window->res;
 78         offset = window->offset;
 79         /*如果资源是总线号资源*/
 80         if (res->flags & IORESOURCE_BUS)
 81             pci_bus_insert_busn_res(b, bus, res->end);
 82         else
 83             pci_bus_add_resource(b, res, 0);
 84         /*看总线地址到物理地址的偏移*/
 85         if (offset) {
 86             if (resource_type(res) == IORESOURCE_IO)
 87                 fmt = " (bus address [%#06llx-%#06llx])";
 88             else
 89                 fmt = " (bus address [%#010llx-%#010llx])";
 90             snprintf(bus_addr, sizeof(bus_addr), fmt,
 91                  (unsigned long long) (res->start - offset),
 92                  (unsigned long long) (res->end - offset));
 93         } else
 94             bus_addr[0] = '\0';
 95         dev_info(&b->dev, "root bus resource %pR%s\n", res, bus_addr);
 96     }
 97 
 98     down_write(&pci_bus_sem);
 99     /*加入根总线链表*/
100     list_add_tail(&b->node, &pci_root_buses);
101     up_write(&pci_bus_sem);
102 
103     return b;
104 
105 class_dev_reg_err:
106     put_device(&bridge->dev);
107     device_unregister(&bridge->dev);
108 err_out:
109     kfree(b);
110     return NULL;
111 }

 

该函数和之前的相比就略显庞大了。不过也难怪,到了最后的阶段一般都挺复杂。哈哈!这里调用pci_alloc_bus函数分配了一个pci_bus结构,然后做基本的初始化。注意一个就是

1 b->number = b->busn_res.start = bus;

 

总线号资源时预分配好的,且一个总线的总线号就是其对应总线号区间的起始号。

然后调用pci_find_bus检测下本次总线号是否已经存在对应的总线结构,如果存在,则表明有错误,当然一般是不会存在的。

然后调用pci_alloc_host_bridge函数分配了一个pci_host_bridge结构作为主桥。然后在主桥和总线之间建立关系。因为桥也是一种设备,所以需要注册。

所以一直到这里,代码虽然繁琐却不难理解。

到下面需要给总线分配资源了,之前我们是初始化了资源,并没有在总线和资源之间建立关系,需要分清楚。看下面的list_for_each_entry_safe

这里实现的功能就是把windowresources链表中取下,然后加入到刚才创建host-bridge的window链表中,这样就算把资源分配给了主桥,回想下前面提到桥设备的窗口就可以明白了。只是这里的意思貌似只考虑了一个主桥,虽然大部分都是一个主桥。然后把资源一个个资源都和总线相关联。这样总线的资源是有了。

最后调用list_add_tail把总线加入到全局的根总线链表。

下面看第二个函数pci_scan_child_bus,总线的递归遍历就是在这里做的。

 1 unsigned int pci_scan_child_bus(struct pci_bus *bus)
 2 {
 3     unsigned int devfn, pass, max = bus->busn_res.start;
 4     struct pci_dev *dev;
 5 
 6     dev_dbg(&bus->dev, "scanning bus\n");
 7 
 8     /* Go find them, Rover! 遍历一条总线上的所有子总线,一条总线有32个接口,一个接口有8个子功能,所以这里只能以8递增*/
 9     for (devfn = 0; devfn < 0x100; devfn += 8)
10         /*在遍历每一个接口,这里一个接口最多有八个function*/
11     /*在这里,就把总线上的每一个设备都探测过了并加入到了bus对应的设备链表中,后面遍历还要用到*/
12         pci_scan_slot(bus, devfn);
13 
14     /* Reserve buses for SR-IOV capability. 加上预留的总线号的数量*/
15     max += pci_iov_bus_range(bus);
16 
17     /*
18      * After performing arch-dependent fixup of the bus, look behind
19      * all PCI-to-PCI bridges on this bus.
20      */
21      /*查找PCI桥*/
22     if (!bus->is_added) {
23         dev_dbg(&bus->dev, "fixups for bus\n");
24         pcibios_fixup_bus(bus);
25         bus->is_added = 1;
26     }
27     /*据说是需要调用两次pci_scan_bridge,第一次配置,第二次遍历*/
28     for (pass=0; pass < 2; pass++)
29         list_for_each_entry(dev, &bus->devices, bus_list) {
30             if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE ||
31                 dev->hdr_type == PCI_HEADER_TYPE_CARDBUS)
32                 /*遍历PCI桥*/
33                 max = pci_scan_bridge(bus, dev, max, pass);
34         }
35 
36     /*
37      * We've scanned the bus and so we know all about what's on
38      * the other side of any bridges that may be on this bus plus
39      * any devices.
40      *
41      * Return how far we've got finding sub-buses.
42      */
43     dev_dbg(&bus->dev, "bus scan returning with max=%02x\n", max);
44     return max;
45 }

 

这里做的工作也不难理解,先注意有个max变量,初始值是当前总线的总线号,表示已经探测的总线的数量,后续会用到。

一条总线上有32个插槽,而每一个插槽都可以包含八个功能即逻辑设备,所以这里以8递进。在循环中每次调用一下pci_scan_slot函数探测下具体的插槽。

 1 int pci_scan_slot(struct pci_bus *bus, int devfn)
 2 {
 3     unsigned fn, nr = 0;
 4     struct pci_dev *dev;
 5 
 6     if (only_one_child(bus) && (devfn > 0))
 7         return 0; /* Already scanned the entire slot */
 8     /*遍历了第一个功能号,即fn=0*/
 9     dev = pci_scan_single_device(bus, devfn);
10     if (!dev)
11         return 0;
12     if (!dev->is_added)
13         nr++;
14     /*fn=1开始,遍历其他的功能*/
15     for (fn = next_fn(bus, dev, 0); fn > 0; fn = next_fn(bus, dev, fn)) {
16         dev = pci_scan_single_device(bus, devfn + fn);
17         if (dev) {
18             /**/
19             if (!dev->is_added)
20                 nr++;
21             /*如果找到第二个设备就说明这是个多功能的设备*/
22             dev->multifunction = 1;
23         }
24     }
25 
26     /* only one slot has pcie device */
27     if (bus->self && nr)
28         pcie_aspm_init_link_state(bus->self);
29     
30     return nr;
31 }

最先开始仍然是判断,如果这里该插槽只有一个逻辑设备即不是多功能的,且devfn=0,那么就表示在寻找一个不存在的设备,直接return 0,否则就调用pci_scan_single_device函数探测该插槽各个逻辑设备。接着调动了pci_scan_single_device函数,该函数检查下对应设备号的设备是否已经存在于总线的设备链表中,不存在才会往下调用pci_scan_device函数探测。

 

 1 static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
 2 {
 3     struct pci_dev *dev;
 4     u32 l;
 5     /*获取设备厂商*/
 6     if (!pci_bus_read_dev_vendor_id(bus, devfn, &l, 60*1000))
 7         return NULL;
 8     /*分配一个dev结构*/
 9     dev = pci_alloc_dev(bus);
10     if (!dev)
11         return NULL;
12 
13     dev->devfn = devfn;
14     dev->vendor = l & 0xffff;
15     dev->device = (l >> 16) & 0xffff;
16      
17     pci_set_of_node(dev);
18     /*初始化设备*/
19     if (pci_setup_device(dev)) {
20         pci_bus_put(dev->bus);
21         kfree(dev);
22         return NULL;
23     }
24 
25     return dev;
26 }

 这里就要做实质性的工作了,创建了一个设备结构并设置相关的信息如设备号,厂商等,然后调用pci_setup_device函数对设备进行全面的初始化,比较重要是地址空间的映射。这里先不说这些,后面再提。最后会调用pci_device_add函数把设备注册进系统,主要还是在设备和总线之间建立联系。回到pci_scan_child_bus函数中,经过这一步就把当前总线上的各个逻辑设备遍历了一遍,也就是都凡是存在的逻辑设备都有了对应的结构,且都存在于总线的设备链表中。然后开始组个检测这些设备,其目的在于寻找PCI-PCI 桥的存在也即1型设备。这里如果找到一个桥设备就会调用pci_scan_bridge函数遍历桥设备:

  1 int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass)
  2 {
  3     struct pci_bus *child;
  4     int is_cardbus = (dev->hdr_type == PCI_HEADER_TYPE_CARDBUS);
  5     u32 buses, i, j = 0;
  6     u16 bctl;
  7     u8 primary, secondary, subordinate;
  8     int broken = 0;
  9     /*这里是先读设备配置空间的总线号*/
 10     pci_read_config_dword(dev, PCI_PRIMARY_BUS, &buses);
 11     primary = buses & 0xFF;//父总线号
 12     secondary = (buses >> 8) & 0xFF;//子总线号
 13     subordinate = (buses >> 16) & 0xFF;//桥下最大的总线号
 14     
 15     dev_dbg(&dev->dev, "scanning [bus %02x-%02x] behind bridge, pass %d\n",
 16         secondary, subordinate, pass);
 17     /*!primary为真两种情况,1为空 2为0(代表根总线),加上后面的&&才表示为空*/
 18     if (!primary && (primary != bus->number) && secondary && subordinate) {
 19         /*Primary bus硬件实现为0,当是root总线时,正好总线号也是0就不需要修改,而其他子总线就需要重新设置*/
 20         dev_warn(&dev->dev, "Primary bus is hard wired to 0\n");
 21         /*手动设置*/
 22         primary = bus->number;
 23     }
 24 
 25     /* Check if setup is sensible at all 监测配置是否合法*/
 26     if (!pass &&
 27         (primary != bus->number || secondary <= bus->number ||
 28          secondary > subordinate)) {
 29         dev_info(&dev->dev, "bridge configuration invalid ([bus %02x-%02x]), reconfiguring\n",
 30              secondary, subordinate);
 31         broken = 1;
 32     }
 33 
 34     /* Disable MasterAbortMode during probing to avoid reporting
 35        of bus errors (in some architectures) */ 
 36     pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &bctl);
 37     pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
 38                   bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
 39 
 40     if ((secondary || subordinate) && !pcibios_assign_all_busses() &&
 41         !is_cardbus && !broken) {
 42         unsigned int cmax;
 43         /*
 44          * Bus already configured by firmware, process it in the first
 45          * pass and just note the configuration.
 46          */
 47         if (pass)
 48             goto out;
 49 
 50         /*
 51          * If we already got to this bus through a different bridge,
 52          * don't re-add it. This can happen with the i450NX chipset.
 53          *
 54          * However, we continue to descend down the hierarchy and
 55          * scan remaining child buses.
 56          */
 57          /*得到子总线结构*/
 58         child = pci_find_bus(pci_domain_nr(bus), secondary);
 59         if (!child) {
 60             child = pci_add_new_bus(bus, dev, secondary);
 61             if (!child)
 62                 goto out;
 63             /*设置子总线的primary指针*/
 64             child->primary = primary;
 65             /*给子总线也分配总线号资源*/
 66             pci_bus_insert_busn_res(child, secondary, subordinate);
 67             child->bridge_ctl = bctl;
 68         }
 69         /*递归遍历子总线*/
 70         cmax = pci_scan_child_bus(child);
 71         if (cmax > max)
 72             max = cmax;
 73         if (child->busn_res.end > max)
 74             max = child->busn_res.end;
 75     } else {
 76         /*
 77          * We need to assign a number to this bus which we always
 78          * do in the second pass.
 79          */
 80         if (!pass) {
 81             if (pcibios_assign_all_busses() || broken)
 82                 /* Temporarily disable forwarding of the
 83                    configuration cycles on all bridges in
 84                    this bus segment to avoid possible
 85                    conflicts in the second pass between two
 86                    bridges programmed with overlapping
 87                    bus ranges. */
 88                 pci_write_config_dword(dev, PCI_PRIMARY_BUS,
 89                                buses & ~0xffffff);
 90             goto out;
 91         }
 92 
 93         /* Clear errors */
 94         pci_write_config_word(dev, PCI_STATUS, 0xffff);
 95 
 96         /* Prevent assigning a bus number that already exists.
 97          * This can happen when a bridge is hot-plugged, so in
 98          * this case we only re-scan this bus. */
 99         child = pci_find_bus(pci_domain_nr(bus), max+1);
100         if (!child) {
101             child = pci_add_new_bus(bus, dev, ++max);
102             if (!child)
103                 goto out;
104             pci_bus_insert_busn_res(child, max, 0xff);
105         }
106         buses = (buses & 0xff000000)
107               | ((unsigned int)(child->primary)     <<  0)
108               | ((unsigned int)(child->busn_res.start)   <<  8)
109               | ((unsigned int)(child->busn_res.end) << 16);
110 
111         /*
112          * yenta.c forces a secondary latency timer of 176.
113          * Copy that behaviour here.
114          */
115         if (is_cardbus) {
116             buses &= ~0xff000000;
117             buses |= CARDBUS_LATENCY_TIMER << 24;
118         }
119 
120         /*
121          * We need to blast all three values with a single write.
122          */
123         pci_write_config_dword(dev, PCI_PRIMARY_BUS, buses);
124 
125         if (!is_cardbus) {
126             child->bridge_ctl = bctl;
127             /*
128              * Adjust subordinate busnr in parent buses.
129              * We do this before scanning for children because
130              * some devices may not be detected if the bios
131              * was lazy.
132              */
133              /*修正父总线的总线号资源范围*/
134             pci_fixup_parent_subordinate_busnr(child, max);
135             /* Now we can scan all subordinate buses... */
136             max = pci_scan_child_bus(child);
137             /*
138              * now fix it up again since we have found
139              * the real value of max.
140              */
141             pci_fixup_parent_subordinate_busnr(child, max);
142         } else {
143             /*
144              * For CardBus bridges, we leave 4 bus numbers
145              * as cards with a PCI-to-PCI bridge can be
146              * inserted later.
147              */
148             for (i=0; i<CARDBUS_RESERVE_BUSNR; i++) {
149                 struct pci_bus *parent = bus;
150                 if (pci_find_bus(pci_domain_nr(bus),
151                             max+i+1))
152                     break;
153                 while (parent->parent) {
154                     if ((!pcibios_assign_all_busses()) &&
155                         (parent->busn_res.end > max) &&
156                         (parent->busn_res.end <= max+i)) {
157                         j = 1;
158                     }
159                     parent = parent->parent;
160                 }
161                 if (j) {
162                     /*
163                      * Often, there are two cardbus bridges
164                      * -- try to leave one valid bus number
165                      * for each one.
166                      */
167                     i /= 2;
168                     break;
169                 }
170             }
171             max += i;
172             pci_fixup_parent_subordinate_busnr(child, max);
173         }
174         /*
175          * Set the subordinate bus number to its real value.
176          */
177         pci_bus_update_busn_res_end(child, max);
178         pci_write_config_byte(dev, PCI_SUBORDINATE_BUS, max);
179     }
180 
181     sprintf(child->name,
182         (is_cardbus ? "PCI CardBus %04x:%02x" : "PCI Bus %04x:%02x"),
183         pci_domain_nr(bus), child->number);
184 
185     /* Has only triggered on CardBus, fixup is in yenta_socket */
186     while (bus->parent) {
187         if ((child->busn_res.end > bus->busn_res.end) ||
188             (child->number > bus->busn_res.end) ||
189             (child->number < bus->number) ||
190             (child->busn_res.end < bus->number)) {
191             dev_info(&child->dev, "%pR %s "
192                 "hidden behind%s bridge %s %pR\n",
193                 &child->busn_res,
194                 (bus->number > child->busn_res.end &&
195                  bus->busn_res.end < child->number) ?
196                     "wholly" : "partially",
197                 bus->self->transparent ? " transparent" : "",
198                 dev_name(&bus->dev),
199                 &bus->busn_res);
200         }
201         bus = bus->parent;
202     }
203 
204 out:
205     pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
206 
207     return max;
208 }

  该函数通过递归的方式完成了所有总线以及设备的遍历。每一递归都执行两次该函数,第一次探测是否被BIOS处理,第二次才做真正的探测工作。

首先是先读取桥设备的配置空间,获得桥设备的primary bus,secondary bus,subordinate bus号,然后进行判断,如果secondary bussubordinate bus均不为0则说明配置有效,因为初始primary bus号被硬件初始化为0,所以这里如果传递进来的bus number不是0,就需要重新设置。

然后检查这些号码是否合法。合法情况下就在首次执行pci_scan_bridge函数的时候进行子总线的遍历。可以看到这里同样先是调用pci_find_bus函数查找下secondary号总线是否已经存在,不存在才调用pci_add_new_bus函数new一个新的bus结构,同时在该函数中也对总线的部分变量做了初始化。接着设置总线的primary指针。随后需要给总线分配总线号资源了。根据已有的配置,这里secondary是子总线的号,而subordinate就是总线下最大的总线号,所以这正是总线的总线号区间。然后继续调用pci_scan_child_bus函数继续遍历当前子总线。就这么层一层的递归下去。知道最后没有桥了,就从pci_scan_child_bus函数返回探测到的总线的数量即max.而如果配置空间没有被配置,那么就需要重新配置,这里首次执行pci_scan_bridge函数就只是把配置空间总线号区域清零。到了第二次,大题上根前面类似,不过这里因为没有secondary 号,所以只能按照max+1来寻找或者创建子总线结构,同时对于子总线的总线区间设置成0xff即255最大值。然后写入到桥配置空间中。这个时候已经探测了一个新的总线,那么需要对父总线的总线号区间进行更新,然后执行pci_scan_child_bus函数探测当前子总线的其他总线,在递归返回的时候,需要再次执行更新。并且需要把总线的总线号资源设置成正确的区间。因为开始分配的时候设置默认总线号区间最大为255.

整个递归流程完毕,就知道了一共存在多少总线,且总线上的设备都已经正确配置并都已经加入到了设备链表中。

 

总结:

本次分析可谓是困难重重,对于很多大牛来说,这或许根本不是事,但是笔者平时的研究没哟涉及到PCI设备这一层面,仅仅是为了分析qemu中的virtIO才着手分析PCI设备。其中可能不乏错误之处,还望老师们看到多多指正。笔者也正是发现只记录不分享,久而久之就越发懒散,好的东西信手沾来虽然容易,然是后续基本不会再看。而写下来给别人分享就不同了,因为担心写错,好多模糊的地方自己需要再三斟酌,同时也是对自己基础的强化,利人利己!

 

posted @ 2016-10-25 20:51  jack.chen  Views(6068)  Comments(0Edit  收藏  举报

以马内利