hostbridge topo module should be hardened to handle empty busses
Robert Mustacchi reached out to me re an issue seen on Dilos, where the hostbridge topo enumerator module dumped core:
The underlying issue was that the system was enumerating empty pci busses, which was breaking assumptions in the hostbridge topo module.
Below is Paul Winder's explanation for why we're enumerating empty busses:
This looks like a problem we came across when testing on different platforms.
FWIR, when enumerating PCI buses, it tries to match PCI root buses to the ACPI device using the bus from the
_BBNmethod. After it has done that it scans all the
_BBNand any which do match an already discovered bus are declared as empty buses. On some BIOS’s the
_BBNdo not match the real PCI bus, and you can end up with ghost/empty PCI buses - which the topology libraries don’t like.
We solved this by using the ACPI
_CRS(current resource) to get the correct PCI bus number.
Eg A system with ghost buses:$ ls /devices/ agpgart pci@31,0 pci@5c,0 pci@91,0 agpgart:agpgart pci@31,0:devctl pci@5c,0:devctl pci@91,0:devctl fw pci@31,0:intr pci@5c,0:intr pci@91,0:intr options pci@31,0:reg pci@5c,0:reg pci@91,0:reg pci@0,0 pci@3a,0 pci@61,0 pci@98,0 pci@0,0:devctl pci@3a,0:devctl pci@61,0:devctl pci@98,0:devctl pci@0,0:intr pci@3a,0:intr pci@61,0:intr pci@98,0:intr pci@0,0:reg pci@3a,0:reg pci@61,0:reg pci@98,0:reg pci@14,0 pci@3c,0 pci@70,0 pseudo pci@14,0:devctl pci@3c,0:devctl pci@70,0:devctl pseudo:devctl pci@14,0:intr pci@3c,0:intr pci@70,0:intr scsi_vhci pci@14,0:reg pci@3c,0:reg pci@70,0:reg scsi_vhci:devctl
And the same system when using the _CRS method:# ls /devices/ agpgart pci@14,0:intr pci@3c,0 pci@61,0:intr agpgart:agpgart pci@14,0:reg pci@3c,0:devctl pci@61,0:reg fw pci@31,0 pci@3c,0:intr pci@70,0 options pci@31,0:devctl pci@3c,0:reg pci@70,0:devctl pci@0,0 pci@31,0:intr pci@5c,0 pci@70,0:intr pci@0,0:devctl pci@31,0:reg pci@5c,0:devctl pci@70,0:reg pci@0,0:intr pci@3a,0 pci@5c,0:intr pseudo pci@0,0:reg pci@3a,0:devctl pci@5c,0:reg pseudo:devctl pci@14,0 pci@3a,0:intr pci@61,0 scsi_vhci pci@14,0:devctl pci@3a,0:reg pci@61,0:devctl scsi_vhci:devctl
The code has been tested on a few platforms which don’t present the problem, and on the platform we found the problem.
Integrating Paul's fix for above is tracked by #11860
Regardless of the underlying issue described above, the hostbridge topo module should be hardened such that it doesn't fall over in this situation. This is what we are tracking with this issue.
Updated by Rob Johnston about 1 year ago
Below are links to a pair of core files and prtconf output from the affected system:
Updated by Robert Mustacchi 8 months ago
- Category set to lib - userland libraries
- Assignee changed from Rob Johnston to Robert Mustacchi
- Priority changed from Low to Normal
- % Done changed from 0 to 90
All of the phantom hostbridges that we encountered had no unit address set up by the kernel. As such, this seemed a reasonable way to try and distinguish between the different cases.
With this in place, the user that was affected by this no longer had crashes with their systems. This was verified by running prtdiag, fmtopo, etc.
Updated by Electric Monk 8 months ago
- Status changed from New to Closed
- % Done changed from 90 to 100
commit 6de7dd38d5e1d12791b18f60b9cad431c7d4b419 Author: Robert Mustacchi <firstname.lastname@example.org> Date: 2020-03-07T09:16:20.000Z 11861 hostbridge topo module should be hardened to handle empty busses Reviewed by: Yuri Pankov <email@example.com> Reviewed by: Toomas Soome <firstname.lastname@example.org> Reviewed by: Paul Winder <email@example.com> Approved by: Joshua M. Clulow <firstname.lastname@example.org>