Project

General

Profile

Actions

Bug #16234

closed

BIOS PCI detection failure results in only bus 0 being enumerated

Added by Joshua M. Clulow 4 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

I created an m7a.2xlarge instance on AWS. It booted up, and I was able to use the serial console and even NMIs worked to get into kmdb, but it did not seem to actually end up with a network interface at all. The metadata agent was obviously then confused and nothing else really happened.

I used the exact same AMI to create an m6a.2xlarge instance, which then worked perfectly. I'm not yet sure what's going on there, but it seems easy enough to reproduce!


Related issues

Related to illumos gate - Feature #11612: x86 PCI enumeration should not rely on bios max busClosedRobert Mustacchi

Actions
Actions #1

Updated by Ryan Zezeski 4 months ago

What's the PCI device ID? It might be that we need LLQ support for those instances.

Actions #2

Updated by Andy Fiddaman 4 months ago

I don't yet know if it's relevant, but these m7a instances also trigger #16213 (the fix was not in the image I tested with).

The PCI device doesn't show up at all, but we do see, during boot:

NOTICE: platform root complex scan returned bus with invalid bus id: 0x1
NOTICE: platform root complex scan returned bus with invalid bus id: 0x24

Actions #3

Updated by Andy Fiddaman 4 months ago

Explicitly setting pci_bios_maxbus to 255 as we generally do due to known problems within BIOS implementations causes an ena device to at least show up:

[0]> pci_bios_maxbus/zff
pci_bios_maxbus:0               =       0xff
[0]> :c
OmniOS r151049 Version omnios-master-71f55ca7377 64-bit
Copyright (c) 2012-2017 OmniTI Computer Consulting, Inc.
Copyright (c) 2017-2024 OmniOS Community Edition (OmniOSce) Association.
DEBUG enabled
NOTICE: pciex dev 0x0, func 0x0
NOTICE: pciex dev 0x1, func 0x0
NOTICE: pciex dev 0x1, func 0x1
NOTICE: pciex dev 0x1, func 0x2
NOTICE: pciex dev 0x1, func 0x3
NOTICE: pciex dev 0x1, func 0x4
NOTICE: pciex dev 0x1, func 0x5
NOTICE: pciex dev 0x1, func 0x6
NOTICE: pciex dev 0x1, func 0x7
NOTICE: pciex dev 0x2, func 0x0
NOTICE: pciex dev 0x3, func 0x0
NOTICE: pciex dev 0x4, func 0x0
PCI-Express (0,4,0) capability found
Found PCI-Ex in the system
WARNING: illegal PCI request: offset = 806, size = 4
NOTICE: pciex dev 0x0, func 0x0
PCI-Express (1,0,0) capability found
Found PCI-Ex in the system
NOTICE: pciex dev 0x0, func 0x0
PCI-Express (24,0,0) capability found
Found PCI-Ex in the system
WARNING: illegal PCI request: offset = c8a, size = 4
WARNING: illegal PCI request: offset = c8e, size = 4
WARNING: illegal PCI request: offset = ec2, size = 4
WARNING: kmem_alloc(): sleeping allocation with size of 0; see kmem_zerosized_log for details
NOTICE: Bus 0x01, bridges 0x43, buffer mem 0x316f400
NOTICE: Bus 0x24, bridges 0x71, buffer mem 0xc6bc00
Hostname: ip-172-31-7-87

ip-172-31-7-87 console login: root
Password:
Last login: Tue Feb  6 13:38:23 2024 on console
OmniOS r151049  omnios-master-71f55ca7377       February 2024
illumos development build: 2024-Feb-05 [illumos]
root@ip-172-31-7-87:~#
root@ip-172-31-7-87:~# /usr/lib/pci/pcieadm show-devs
BDF     TYPE           INSTANCE       DEVICE
0/0/0   PCI            --             440FX - 82441FX PMC [Natoma]
0/1/0   PCI            isa0           82371SB PIIX3 ISA [Natoma/Triton II]
0/1/3   PCI            --             82371AB/EB/MB PIIX4 ACPI
0/3/0   PCI            --             Unknown device: 0x1111
0/4/0   PCI            nvme0          NVMe EBS Controller
1/0/0   PCIe Gen 4x8   pcieb0         Unknown device: 0x200
2/0/0   PCIe Gen 4x8   pcieb1         Unknown device: 0x200
3/0/0   PCI            pcieb2         Unknown device: 0x200
3/0/1   PCI            pcieb3         Unknown device: 0x200
3/0/2   PCI            pcieb4         Unknown device: 0x200
3/0/3   PCI            pcieb5         Unknown device: 0x200
3/0/4   PCI            pcieb6         Unknown device: 0x200
3/0/5   PCI            pcieb7         Unknown device: 0x200
3/0/6   PCI            pcieb8         Unknown device: 0x200
3/0/7   PCI            pcieb9         Unknown device: 0x200
3/1/0   PCI            pcieb10        Unknown device: 0x200
3/1/1   PCI            pcieb11        Unknown device: 0x200
3/1/2   PCI            pcieb12        Unknown device: 0x200
3/1/3   PCI            pcieb13        Unknown device: 0x200
3/1/4   PCI            pcieb14        Unknown device: 0x200
3/1/5   PCI            pcieb15        Unknown device: 0x200
3/1/6   PCI            pcieb16        Unknown device: 0x200
3/1/7   PCI            pcieb17        Unknown device: 0x200
BDF     TYPE           INSTANCE       DEVICE
3/2/0   PCI            pcieb18        Unknown device: 0x200
3/2/1   PCI            pcieb19        Unknown device: 0x200
3/2/2   PCI            pcieb20        Unknown device: 0x200
3/2/3   PCI            pcieb21        Unknown device: 0x200
3/2/4   PCI            pcieb22        Unknown device: 0x200
3/2/5   PCI            pcieb23        Unknown device: 0x200
3/2/6   PCI            pcieb24        Unknown device: 0x200
3/2/7   PCI            pcieb25        Unknown device: 0x200
3/3/0   PCI            pcieb26        Unknown device: 0x200
3/3/1   PCI            pcieb27        Unknown device: 0x200
3/3/2   PCI            pcieb28        Unknown device: 0x200
3/3/3   PCI            pcieb29        Unknown device: 0x200
3/3/4   PCI            pcieb30        Unknown device: 0x200
3/3/5   PCI            pcieb31        Unknown device: 0x200
3/3/6   PCI            pcieb32        Unknown device: 0x200
3/3/7   PCI            pcieb33        Unknown device: 0x200
24/0/0  PCI            --             Unknown device: 0x200
25/0/0  PCI            --             Unknown device: 0x200
26/0/0  PCI            --             Unknown device: 0x200
27/0/0  PCI            --             Elastic Network Adapter (ENA)
26/0/1  PCI            --             Unknown device: 0x200
26/0/2  PCI            --             Unknown device: 0x200
26/0/3  PCI            --             Unknown device: 0x200
BDF     TYPE           INSTANCE       DEVICE
26/0/4  PCI            --             Unknown device: 0x200
root@ip-172-31-7-87:~# /usr/lib/pci/pcieadm show-devs -o vid,did,rev,subvid,subsys 27/0/0
VID   DID   REV   SUBVID SUBSYS
1d0f  ec20  0     1d0f  ec20
Actions #4

Updated by Andy Fiddaman 4 months ago

If I enable UEFI boot, things are much happier. The NIC comes up and works correctly.

OmniOS r151049 Version omnios-master-71f55ca7377 64-bit
Copyright (c) 2012-2017 OmniTI Computer Consulting, Inc.
Copyright (c) 2017-2024 OmniOS Community Edition (OmniOSce) Association.
WARNING: illegal PCI request: offset = 806, size = 4
WARNING: illegal PCI request: offset = c8a, size = 4
WARNING: illegal PCI request: offset = c8e, size = 4
WARNING: illegal PCI request: offset = ec2, size = 4

ip-172-31-6-11 console login: root
Password:
Last login: Tue Feb  6 14:50:40 2024 on console
OmniOS r151049  omnios-master-71f55ca7377       February 2024
root@ip-172-31-6-11:~#
root@ip-172-31-6-11:~# /usr/lib/pci/pcieadm show-devs
BDF     TYPE           INSTANCE       DEVICE
0/0/0   PCI            --             440FX - 82441FX PMC [Natoma]
0/1/0   PCI            isa0           82371SB PIIX3 ISA [Natoma/Triton II]
0/1/3   PCI            --             82371AB/EB/MB PIIX4 ACPI
0/3/0   PCI            vgatext0       Unknown device: 0x1111
0/4/0   PCI            nvme0          NVMe EBS Controller
1/0/0   PCIe Gen 4x8   pcieb0         Unknown device: 0x200
2/0/0   PCIe Gen 4x8   pcieb1         Unknown device: 0x200
3/0/0   PCI            pcieb2         Unknown device: 0x200
3/0/1   PCI            pcieb3         Unknown device: 0x200
3/0/2   PCI            pcieb4         Unknown device: 0x200
3/0/3   PCI            pcieb5         Unknown device: 0x200
3/0/4   PCI            pcieb6         Unknown device: 0x200
3/0/5   PCI            pcieb7         Unknown device: 0x200
3/0/6   PCI            pcieb8         Unknown device: 0x200
3/0/7   PCI            pcieb9         Unknown device: 0x200
3/1/0   PCI            pcieb10        Unknown device: 0x200
3/1/1   PCI            pcieb11        Unknown device: 0x200
3/1/2   PCI            pcieb12        Unknown device: 0x200
3/1/3   PCI            pcieb13        Unknown device: 0x200
3/1/4   PCI            pcieb14        Unknown device: 0x200
3/1/5   PCI            pcieb15        Unknown device: 0x200
3/1/6   PCI            pcieb16        Unknown device: 0x200
3/1/7   PCI            pcieb17        Unknown device: 0x200
BDF     TYPE           INSTANCE       DEVICE
3/2/0   PCI            pcieb18        Unknown device: 0x200
3/2/1   PCI            pcieb19        Unknown device: 0x200
3/2/2   PCI            pcieb20        Unknown device: 0x200
3/2/3   PCI            pcieb21        Unknown device: 0x200
3/2/4   PCI            pcieb22        Unknown device: 0x200
3/2/5   PCI            pcieb23        Unknown device: 0x200
3/2/6   PCI            pcieb24        Unknown device: 0x200
3/2/7   PCI            pcieb25        Unknown device: 0x200
3/3/0   PCI            pcieb26        Unknown device: 0x200
3/3/1   PCI            pcieb27        Unknown device: 0x200
3/3/2   PCI            pcieb28        Unknown device: 0x200
3/3/3   PCI            pcieb29        Unknown device: 0x200
3/3/4   PCI            pcieb30        Unknown device: 0x200
3/3/5   PCI            pcieb31        Unknown device: 0x200
3/3/6   PCI            pcieb32        Unknown device: 0x200
3/3/7   PCI            pcieb33        Unknown device: 0x200
24/0/0  PCIe Gen 4x8   pcieb34        Unknown device: 0x200
25/0/0  PCIe Gen 4x8   pcieb35        Unknown device: 0x200
26/0/0  PCIe Gen 4x8   pcieb36        Unknown device: 0x200
27/0/0  PCI            ena0           Elastic Network Adapter (ENA)
26/0/1  PCI            pcieb37        Unknown device: 0x200
26/0/2  PCI            pcieb38        Unknown device: 0x200
26/0/3  PCI            pcieb39        Unknown device: 0x200
BDF     TYPE           INSTANCE       DEVICE
26/0/4  PCI            pcieb40        Unknown device: 0x200
root@ip-172-31-6-11:~# ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
ena0/dhcp         dhcp     ok           172.31.6.11/20
lo0/v6            static   ok           ::1/128
root@ip-172-31-6-11:~# dladm
LINK        CLASS     MTU    STATE    BRIDGE     OVER
ena0        phys      9216   up       --         --
root@ip-172-31-6-11:~# ping 1.1.1.1
1.1.1.1 is alive

root@ip-172-31-6-11:~# tr '\0' '\n' < /system/boot/environment | grep efi
efi-version=2.70
Actions #5

Updated by Andy Fiddaman 4 months ago

It appears that the PCI_BIOS_PRESENT interrupt call is failing on this instance type, and the current code treats this failure as an indication to try PCI_MECHANISM_1, but only to scan bus 0.
Changing this to scan 0-255, as is generally done, causes the PCI devices to be enumerated and the network adapter to come up.

I believe the reason this does not manifest on m6a instance types is just that the ena adapter is on bus 0 there!

From pci_cfgspace.c:

        bzero(&regs, sizeof (regs));
        regs.eax.word.ax = (PCI_FUNCTION_ID << 8) | PCI_BIOS_PRESENT;

        BOP_DOINT(bootops, 0x1a, &regs);
        carryflag = regs.eflags & PS_C;
        ax = regs.eax.word.ax;
        dx = regs.edx.word.dx;

        /* the carry flag must not be set */
        if (carryflag != 0)
                return (PCI_MECHANISM_NONE);

We are seeing the carry flag set (indicating an error has occurred) on m7a EC2 instances.

Doing BIOS call...br.ax is b101
br.bx is 0
br.dx is 0
On exit, br.ax is b101
br.bx is 0
br.dx is 0
carryflag is 1
done
OmniOS r151049 Version omnios-master-6c0323f9ba4 64-bit
Actions #6

Updated by Andy Fiddaman 4 months ago

  • Subject changed from no network interface in AWS "m7a.2xlarge" instance? to BIOS PCI detection failure results in only bus 0 being enumerated
  • Category changed from driver - device drivers to kernel
  • Status changed from New to In Progress
  • Assignee set to Andy Fiddaman
Actions #7

Updated by Andy Fiddaman 4 months ago

  • Related to Feature #11612: x86 PCI enumeration should not rely on bios max bus added
Actions #8

Updated by Andy Fiddaman 4 months ago

https://illumos.org/opensolaris/bugdb/bug.html#!6426005
introduced a fall back to mechanism 1 when the BIOS call fails, and #11612 effectively forced the use of 255 for the max bus in all cases apart from this one. If I set the maximum to 255 when the BIOS call falls, the device enumerates and the network interface comes up under BIOS boot on an EC2 m7a instance.

Actions #9

Updated by Andy Fiddaman 4 months ago

I tested the associated patch on an m7a instance, booting with kbm_debug (as well as pit_is_broken=1 - which #16222 should obviate).

ok boot -B kbm_debug=1

*** Entered Solaris in _start() cmdline is: /platform/i86pc/kernel/amd64/unix -B console=ttya,kbm_debug=1

Doing BIOS call...
br.ax is b101
br.bx is 0
br.dx is 0
done
rp->eflags is 47
br.ax is b101
br.bx is 0
br.dx is 0
...
ip-172-31-1-229 console login: root
Password:
Last login: Tue Feb  6 18:06:07 2024 on console
OmniOS r151049  omnios-master-6c0323f9ba4       February 2024
root@ip-172-31-1-229:~# ipadm
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
ena0/dhcp         dhcp     ok           172.31.1.229/20
lo0/v6            static   ok           ::1/128

root@ip-172-31-1-229:~# tr '\0' '\n' < /system/boot/environment  | grep efi
root@ip-172-31-1-229:~#

root@ip-172-31-1-229:~# mdb -ke pci_bios_maxbus::print
0xff
Actions #10

Updated by Electric Monk 4 months ago

  • Gerrit CR set to 3259
Actions #11

Updated by Electric Monk 4 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 7a52a8e2ce9d8df39f4fc22d1853e8ddefc97a47

commit  7a52a8e2ce9d8df39f4fc22d1853e8ddefc97a47
Author: Andy Fiddaman <illumos@fiddaman.net>
Date:   2024-02-09T00:12:02.000Z

    16234 BIOS PCI detection failure results in only bus 0 being enumerated
    16237 Add BIOS INT call results to early debug output
    Reviewed by: Bill Sommerfeld <sommerfeld@hamachi.org>
    Reviewed by: Gordon Ross <Gordon.W.Ross@gmail.com>
    Reviewed by: Keith Wesolowski <wesolows@oxide.computer>
    Approved by: Dan McDonald <danmcd@mnx.io>

Actions

Also available in: Atom PDF