Project

General

Profile

Actions

Bug #14193

closed

PCI config space I/O port mechanisms need to check offsets

Added by Robert Mustacchi 7 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

When booting illumos on an AWS nitro instance, illumos will fail in an infinite loop in pci_xcap_locate which is triggered by pcie_determine_serial(). Ryan Zezeski had done a bunch of analysis here and instrumented things and has a bit on what we saw there:

NOTICE: pci_xcap_locate get32 base: 2054
NOTICE: npe_ctlops entry: path: /pci@0,0, rpath: /pci@0,0/pci1d0f,0@4 ctlop: 26
NOTICE: pci_common_peekpoke /pci@0,0/pci1d0f,0@4 R offset: 2054 size: 4
NOTICE: pci_xcap_locate get32 base: 2054 => 1048775
NOTICE: pci_xcap_locate get32 base: 1
NOTICE: npe_ctlops entry: path: /pci@0,0, rpath: /pci@0,0/pci1d0f,0@4 ctlop: 26
NOTICE: pci_xcap_locate get32 base: 1 => 2153848079
NOTICE: pci_xcap_locate get32 base: 2054
NOTICE: npe_ctlops entry: path: /pci@0,0, rpath: /pci@0,0/pci1d0f,0@4 ctlop: 26
NOTICE: pci_common_peekpoke /pci@0,0/pci1d0f,0@4 R offset: 2054 size: 4
NOTICE: pci_xcap_locate get32 base: 2054 => 1048775
NOTICE: pci_xcap_locate get32 base: 1
NOTICE: npe_ctlops entry: path: /pci@0,0, rpath: /pci@0,0/pci1d0f,0@4 ctlop: 26
NOTICE: pci_xcap_locate get32 base: 1 => 2153848079
NOTICE: pci_xcap_locate get32 base: 2054
NOTICE: npe_ctlops entry: path: /pci@0,0, rpath: /pci@0,0/pci1d0f,0@4 ctlop: 26
NOTICE: pci_common_peekpoke /pci@0,0/pci1d0f,0@4 R offset: 2054 size: 4

With this information I was able to start up a VM and dig into what happened. I set up a bunch of break points and in particular got to the pcie_determine_serial() and then started looking at where we actually were going. In particular the question I had was really what pci mechanism we were hitting. On the system I confirmed a few things that were needed:

  • The PCIe mechanism was set to type 1
  • There was no mcfg table or way to access memory mapped configuration space
  • The devices in question had the PCIe compatibility in normal PCI space (indicating that they had should have access to extended configuration space)

In particular, I basically set a bunch of breakpoints on the various type 1 mechanisms and pci_cfgacc_acc. With that after we were getting the first register, I got the following stack trace:

[0]> $C
fffffe0003d97540 pci_mech1_getw+4()
fffffe0003d97560 npe`pci_config_rd16+0x22(fffffe0375dc86c0, 6)
fffffe0003d975c0 npe`pci_common_ctlops_peek+0x29f(fffffe0003d976e0)
fffffe0003d97600 npe`pci_common_peekpoke+0x25(fffffe03755d6688, fffffe03755d5878, 1a, fffffe0003d976e0, 0)
fffffe0003d97690 npe`npe_ctlops+0x143(fffffe03755d6688, fffffe03755d5878, 1a, fffffe0003d976e0, 0)
fffffe0003d976d0 ddi_ctlops+0x37(fffffe03755d5878, fffffe03755d5878, 1a, fffffe0003d976e0, 0)
fffffe0003d97740 i_ddi_caut_getput_ctlops+0x3e(fffffe0375dc86c0, fffffe0003d9776e, 6, 2, 1, 0, 1a)
fffffe0003d97780 i_ddi_caut_get16+0x2b(fffffe0375dc86c0, 6)
fffffe0003d977b0 pci_config_get16+0x32(fffffe0375dc86c0, 6)
fffffe0003d97800 pci_xcap_locate+0x26(fffffe0375dc86c0, 3, fffffe0003d97816)
fffffe0003d97850 pcie`pcie_determine_serial+0x8d(fffffe03755d5878)
fffffe0003d978b0 pcie`pcie_initchild+0x259(fffffe03755d5878)
fffffe0003d97950 npe`npe_initchild+0x151(fffffe03755d5878)
fffffe0003d979e0 npe`npe_ctlops+0x120(fffffe03755d6688, fffffe03755d6688, 1, fffffe03755d5878, 0)
fffffe0003d97a40 init_node+0x97(fffffe03755d5878)
fffffe0003d97a90 i_ndi_config_node+0x150(fffffe03755d5878, 3, 4004048)
fffffe0003d97ad0 i_ndi_init_hw_children+0x5c(fffffe03755d6688, 4004048)
fffffe0003d97b50 config_immediate_children+0x7d(fffffe03755d6688, 4004048, ffffffff)
fffffe0003d97ba0 devi_config_common+0x69(fffffe03755d6688, 4004048, ffffffff)
fffffe0003d97c00 mt_config_thread+0x13a(fffffe0376006060)
fffffe0003d97c10 thread_start+0xb()

So once we're here, I looked a little closer. But the core here is that there's a basic flaw:

1. These devices are supposed to, by specification, supposed to have extended configuration space registers.
2. The hardware platform in question (the AWS Nitro system) does not actually have any way of accessing extended configuration space.

Fundamentally we actually are using the x86 mechanism 1 I/O ports which have no way of accessing bits beyond normal configuration space; however, those files actually don't do the right thing if someone asks for extended configuration space. It basically just truncates the register you get it. So what ended up happening is that we made an initial request for the capabilities at offset 0x100. However, that gets truncated (via an & 0xfc) to 0. This in turn means we start treating the vid/did as the primary capability and then start walking into space. However, because our addresses always get truncated that ends up not working well.

Here, we need to make sure that if someone gets to a configuration space mechanism that doesn't support extended configuration space we properly return an invalid read. With that present, that does actually ensure that the AWS nitro VM boots, which is great and what we'd expect. There'll be a related ticket on things we can do to further improve the PCI capability functions.


Related issues

Related to illumos gate - Bug #14195: pci_cap_locate and pci_xcap_locate need additional boundsNewRobert Mustacchi

Actions
Related to illumos gate - Feature #13689: Want AWS ENA driverClosedRyan Zezeski

Actions
Actions #1

Updated by Electric Monk 7 months ago

  • Gerrit CR set to 1774
Actions #2

Updated by Robert Mustacchi 7 months ago

  • Related to Bug #14195: pci_cap_locate and pci_xcap_locate need additional bounds added
Actions #3

Updated by Robert Mustacchi 6 months ago

This was initially tested by verifying that the AWS nitro instances which did not boot and ended up in a bad place, now successfully came up.

To regresion test this, I took up a few different classes of systems:

  • An AMD system that had MMIO space and extended I/O space mechanisms
  • An AMD system that had the extended I/O space mechanisms, but no MMIO space. This required #14235 to validate testing.
  • An Intel based virtual machine driven by qemu. This has the basic I/O ports and MMIO space.

For each of these, I tested this by doing the following:

1. Using pcieadm save-cfgspace -old systems2a to save all the files on stock bits
2. Rebooted onto a system with the changes from this and make sure that we had the new versions of the functions present by disassembling with mdb -k.
3. Using pcieadm save-cfgspace -a to save all of configuration space from every device again
4. Used pcieadm show-cfgspace -f to read from every file before and after and diff'd them.

In all case, all of configuration space was identical. This was done to try and make sure that the kernel paths which sometimes use I/O space and sometimes MMIO left devices set up the same way, proving that we didn't incorrectly return an EINVAL.

Actions #4

Updated by Electric Monk 6 months ago

  • Status changed from New to Closed
  • % Done changed from 80 to 100

git commit 1636e047cf2d9d8a7f57e52b03ffd4b265f37081

commit  1636e047cf2d9d8a7f57e52b03ffd4b265f37081
Author: Robert Mustacchi <rm@fingolfin.org>
Date:   2021-11-15T22:18:07.000Z

    14193 PCI config space I/O port mechanisms need to check offsets
    Reviewed by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
    Reviewed by: Ryan Zezeski <ryan@oxide.computer>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Rich Lowe <richlowe@richlowe.net>

Actions #5

Updated by Ryan Zezeski 6 months ago

Actions

Also available in: Atom PDF