Project

General

Profile

Actions

Bug #11867

closed

PCIe expansion slots mis-labelled in topo

Added by Rob Johnston over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

I noticed that the labels for PCIe devices in the topo snapshot were incorrect and appeared to be off-by-one on the Joyent-M12G5 in the SF lab.

The system has 7 PCIe expansion slots and has PCIe cards installed in slots 1, 2, 4 and 6. However, in topo these cards are labelled as being in slots 2, 3, 5 and 7. After digging into how topo assigns the labels, I found that it uses logic like the following:

1) Determine the slot id of the device. To do this it first looks at the slot-names devinfo property, which is derived from the IRQ routing table as defined by IEEE 1275 - see pci_slot_names_prop() and tries to scrape the slot id off of the encoded string.

If that fails, (which isn't the case here) it tries to determine the slot-id from the phsyical-slot# devinfo prop.

2) Once it has a slot id, it cross-references it with the type 0x9 SMBIOS records and then sets the label to the locator string from the SMBIOS record with the same slot id.

This whole mechanism relies on SMBIOS and slot-names agreeing on the slot id numbering. But on this system, the slot-names props number the slots from 1 to 7 - which is what we'd expect as slot id 0 should be reserved for on-board devices. But the type 0x9 SMBIOS records use slot id's 0 to 6.

As an example of this mis-match, there is an LSI HBA in the 2nd slot. The string that we scrape the slot id out of in slot-names property for this device is "Slot2":

        name='slot-names' type=int items=3
            value=00000001.746f6c53.00000032

But that ends up matching to the following SMBIOS record:


ID    SIZE TYPE
15    41   SMB_TYPE_SLOT (type 9) (upgradeable system slot)

  Location Tag: CPU1 SLOT 3 PCI-E 3.0 X8

  Reference Designator: CPU1 SLOT 3 PCI-E 3.0 X8
  Slot ID: 0x2
  Type: 0xb5 (PCI Exp. Gen 3 x8)
  Width: 0xb (8x or x8)
  Usage: 0x3 (available)
  Length: 0x3 (short length)
  Slot Characteristics 1: 0xc
        SMB_SLCH1_33V (provides 3.3V)
        SMB_SLCH1_SHARED (opening shared with other slot)
  Slot Characteristics 2: 0x1
        SMB_SLCH2_PME (slot supports PME# signal)
  Segment Group: 0
  Bus Number: 255
  Device/Function Number: 0
Actions #1

Updated by Rob Johnston over 2 years ago

The fix will be to add logic similar to prtdiag(1m) - i.e. on SMBIOS implementations that expose the BDF in the Type 9 (Slot) record (SMBIOS 2.6 and later) attempt to match the SMBIOS slot record using the BDF. Systems with older SMBIOS implementations will fall back to the existing logic.

Additionally, there is a one-line bug in pci_slot_label_lookup() that causes us to source the slotname from the parent did_t. Honestly, I'm not sure how any of this ever worked. And indeed almost every system I've checked recently had broken PCIe labels - not just the Joyent-M12G5.

Actions #2

Updated by Rob Johnston over 2 years ago

Testing

I built a platform image with this change and compared the PCI labels that were generated by libtopo both before and after the change on a variety of platforms. For platforms that were affected by this bug, I verified that the labels were correct after the fix. For the one platform I could find that didn't exhibit this bug, I verified that the fix had no effect.

Below is a list of platforms tested:

PLATFORM                                BEFORE FIX       AFTER FIX
Supermicro SSG-2029P-ACR24L             Affected         Fixed
Supermicro SSG-6049P-E1CR36L            Not Affected     No change
Supermicro SYS-2028U-E1CNRT+            Affected         Fixed
Supermicro SSG-6049P-E1CR60L-JI006      Affected         Fixed
Western Digital H4060-S                 Affected         Fixed
Micro-Star International MS-7b18        Affected         Fixed
Actions #3

Updated by Electric Monk over 2 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 106e8bd44b02f4b8cd3c825790276c1c7081e67a

commit  106e8bd44b02f4b8cd3c825790276c1c7081e67a
Author: Rob Johnston <rob.johnston@joyent.com>
Date:   2019-10-29T23:40:00.000Z

    11867 PCIe expansion slots mis-labelled in topo
    Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
    Approved by: Robert Mustacchi <rm@fingolfin.org>

Actions

Also available in: Atom PDF