Project

General

Profile

Bug #11861

hostbridge topo module should be hardened to handle empty busses

Added by Rob Johnston 11 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Category:
lib - userland libraries
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Robert Mustacchi reached out to me re an issue seen on Dilos, where the hostbridge topo enumerator module dumped core:

The underlying issue was that the system was enumerating empty pci busses, which was breaking assumptions in the hostbridge topo module.

Below is Paul Winder's explanation for why we're enumerating empty busses:

This looks like a problem we came across when testing on different platforms.

FWIR, when enumerating PCI buses, it tries to match PCI root buses to the ACPI device using the bus from the _BBN method. After it has done that it scans all the _BBN and any which do match an already discovered bus are declared as empty buses. On some BIOS’s the _BBN do not match the real PCI bus, and you can end up with ghost/empty PCI buses - which the topology libraries don’t like.

We solved this by using the ACPI _CRS (current resource) to get the correct PCI bus number.

Eg A system with ghost buses:

$ ls /devices/
agpgart           pci@31,0          pci@5c,0          pci@91,0
agpgart:agpgart   pci@31,0:devctl   pci@5c,0:devctl   pci@91,0:devctl
fw                pci@31,0:intr     pci@5c,0:intr     pci@91,0:intr
options           pci@31,0:reg      pci@5c,0:reg      pci@91,0:reg
pci@0,0           pci@3a,0          pci@61,0          pci@98,0
pci@0,0:devctl    pci@3a,0:devctl   pci@61,0:devctl   pci@98,0:devctl
pci@0,0:intr      pci@3a,0:intr     pci@61,0:intr     pci@98,0:intr
pci@0,0:reg       pci@3a,0:reg      pci@61,0:reg      pci@98,0:reg
pci@14,0          pci@3c,0          pci@70,0          pseudo
pci@14,0:devctl   pci@3c,0:devctl   pci@70,0:devctl   pseudo:devctl
pci@14,0:intr     pci@3c,0:intr     pci@70,0:intr     scsi_vhci
pci@14,0:reg      pci@3c,0:reg      pci@70,0:reg      scsi_vhci:devctl

And the same system when using the _CRS method:

# ls /devices/
agpgart           pci@14,0:intr     pci@3c,0          pci@61,0:intr
agpgart:agpgart   pci@14,0:reg      pci@3c,0:devctl   pci@61,0:reg
fw                pci@31,0          pci@3c,0:intr     pci@70,0
options           pci@31,0:devctl   pci@3c,0:reg      pci@70,0:devctl
pci@0,0           pci@31,0:intr     pci@5c,0          pci@70,0:intr
pci@0,0:devctl    pci@31,0:reg      pci@5c,0:devctl   pci@70,0:reg
pci@0,0:intr      pci@3a,0          pci@5c,0:intr     pseudo
pci@0,0:reg       pci@3a,0:devctl   pci@5c,0:reg      pseudo:devctl
pci@14,0          pci@3a,0:intr     pci@61,0          scsi_vhci
pci@14,0:devctl   pci@3a,0:reg      pci@61,0:devctl   scsi_vhci:devctl

The code has been tested on a few platforms which don’t present the problem, and on the platform we found the problem.

Integrating Paul's fix for above is tracked by #11860

Regardless of the underlying issue described above, the hostbridge topo module should be hardened such that it doesn't fall over in this situation. This is what we are tracking with this issue.


Related issues

Related to illumos gate - Feature #11612: x86 PCI enumeration should not rely on bios max busClosed

Actions

History

#2

Updated by Robert Mustacchi 11 months ago

  • Related to Feature #11612: x86 PCI enumeration should not rely on bios max bus added
#3

Updated by Robert Mustacchi 7 months ago

  • Category set to lib - userland libraries
  • Assignee changed from Rob Johnston to Robert Mustacchi
  • Priority changed from Low to Normal
  • % Done changed from 0 to 90

All of the phantom hostbridges that we encountered had no unit address set up by the kernel. As such, this seemed a reasonable way to try and distinguish between the different cases.

With this in place, the user that was affected by this no longer had crashes with their systems. This was verified by running prtdiag, fmtopo, etc.

#4

Updated by Joshua M. Clulow 7 months ago

  • Description updated (diff)
#5

Updated by Electric Monk 6 months ago

  • Status changed from New to Closed
  • % Done changed from 90 to 100

git commit 6de7dd38d5e1d12791b18f60b9cad431c7d4b419

commit  6de7dd38d5e1d12791b18f60b9cad431c7d4b419
Author: Robert Mustacchi <rm@fingolfin.org>
Date:   2020-03-07T09:16:20.000Z

    11861 hostbridge topo module should be hardened to handle empty busses
    Reviewed by: Yuri Pankov <ypankov@fastmail.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Paul Winder <paul@winders.demon.co.uk>
    Approved by: Joshua M. Clulow <josh@sysmgr.org>

Also available in: Atom PDF