Use _CRS instead of _BBN to get assigned PCI bus
Robert Mustacchi reached out to me re an issue seen on Dilos, where the hostbridge topo enumerator module dumped core:
fffffc7fffdfe300 libc.so.1`_lwp_kill+0xa() fffffc7fffdfe330 libc.so.1`raise+0x1e(6) fffffc7fffdfe380 libc.so.1`abort+0x88() fffffc7fffdfe5f0 0xfffffc7fef21c249() fffffc7fffdfe640 hostbridge.so`dev_for_hostbridge+0xef(5d7280, 588a88) fffffc7fffdfe6d0 hostbridge.so`DEVprop_set+0x13b(664dd0, 664e70, 0, fffffc7feb91c9da, fffffc7feb91c813) fffffc7fffdfe750 hostbridge.so`did_props_set+0x84(664dd0, 664e70, fffffc7feb92dc60, 7) fffffc7fffdfe7b0 hostbridge.so`pcihostbridge_declare+0x87(5d7280, 5853e0, 5a8a08, 2) fffffc7fffdfe820 hostbridge.so`hb_process+0x69(5d7280, 5853e0, 2, 5a8a08) fffffc7fffdfe880 hostbridge.so`pci_hostbridges_find+0x6c(5d7280, 5853e0) fffffc7fffdfe8c0 hostbridge.so`platform_hb_enum+0x13(5d7280, 5853e0, 5d27e0, 0, fe) fffffc7fffdfe940 hostbridge.so`hb_enum+0x90(5d7280, 5853e0, 5d27e0, 0, fe, 0) fffffc7fffdfe9d0 libtopo.so.1`topo_mod_enumerate+0xc8(5d7280, 5853e0, 450ff0, 5d27e0, 0, fe) fffffc7fffdfea20 libtopo.so.1`enum_run+0x94(585480, 65df60)
The underlying issue was that the system was enumerating empty pci busses, which was breaking assumptions in the hostbridge topo module.
Below is Paul Winder's explanation for why we're enumerating empty busses along with a description of a possible fix.
This looks like a problem we came across when testing on different platforms. FWIR, when enumerating PCI buses, it tries to match PCI root buses to the ACPI device using the bus from the _BBN method. After it has done that it scans all the _BBN and any which do match an already discovered bus are declared as empty buses. On some BIOS’s the _BBN do *not* match the real PCI bus, and you can end up with ghost/empty PCI buses - which the topology libraries don’t like. We solved this by using the ACPI _CRS (current resource) to get the correct PCI bus number. Eg A system with ghost buses: $ ls /devices/ agpgart pci@31,0 pci@5c,0 pci@91,0 agpgart:agpgart pci@31,0:devctl pci@5c,0:devctl pci@91,0:devctl fw pci@31,0:intr pci@5c,0:intr pci@91,0:intr options pci@31,0:reg pci@5c,0:reg pci@91,0:reg pci@0,0 pci@3a,0 pci@61,0 pci@98,0 pci@0,0:devctl pci@3a,0:devctl pci@61,0:devctl pci@98,0:devctl pci@0,0:intr pci@3a,0:intr pci@61,0:intr pci@98,0:intr pci@0,0:reg pci@3a,0:reg pci@61,0:reg pci@98,0:reg pci@14,0 pci@3c,0 pci@70,0 pseudo pci@14,0:devctl pci@3c,0:devctl pci@70,0:devctl pseudo:devctl pci@14,0:intr pci@3c,0:intr pci@70,0:intr scsi_vhci pci@14,0:reg pci@3c,0:reg pci@70,0:reg scsi_vhci:devctl And the same system when using the _CRS method: # ls /devices/ agpgart pci@14,0:intr pci@3c,0 pci@61,0:intr agpgart:agpgart pci@14,0:reg pci@3c,0:devctl pci@61,0:reg fw pci@31,0 pci@3c,0:intr pci@70,0 options pci@31,0:devctl pci@3c,0:reg pci@70,0:devctl pci@0,0 pci@31,0:intr pci@5c,0 pci@70,0:intr pci@0,0:devctl pci@31,0:reg pci@5c,0:devctl pci@70,0:reg pci@0,0:intr pci@3a,0 pci@5c,0:intr pseudo pci@0,0:reg pci@3a,0:devctl pci@5c,0:reg pseudo:devctl pci@14,0 pci@3a,0:intr pci@61,0 scsi_vhci pci@14,0:devctl pci@3a,0:reg pci@61,0:devctl scsi_vhci:devctl The code has been tested on a few platforms which *don’t* present the problem, and on the platform we found the problem.
This issue is being filed to track integrating Paul's fix for above issue into illumos-gate (patch attached)
The work to hardnen the hostbridge topo module such that it doesn't fall over in this situation is being tracked by illumos#11861.