Project

General

Profile

Actions

Bug #14557

closed

Attempts to map PCI BARs without MMIO ends in panics

Added by Robert Mustacchi 5 months ago. Updated 30 days ago.

Status:
Closed
Priority:
Normal
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

On a system, we failed to allocate and assign a PCI BAR, we ended up getting a system with the following panic:

panic[cpu0]/thread=fffffcc269698c20: hat_devload: loading a mapping to free page fffffcc088003168

Warning - stack not written to the dump buffer
fffffcc2696983e0 unix:hat_devload+2bc ()
fffffcc269698440 rootnex:rootnex_map_regspec+1f9 ()
fffffcc269698510 rootnex:rootnex_map+22f ()
fffffcc269698580 genunix:ddi_map+59 ()
fffffcc269698670 npe:npe_bus_map+491 ()
fffffcc2696986f0 pcieb:pcieb_bus_map+92 ()
fffffcc269698760 genunix:ddi_map+59 ()
fffffcc269698820 genunix:ddi_regs_map_setup+ff ()
fffffcc269698910 t4nex:t4_devo_attach+359 ()
fffffcc269698990 genunix:devi_attach+ad ()
fffffcc2696989d0 genunix:attach_node+149 ()
fffffcc269698a20 genunix:i_ndi_config_node+16c ()
fffffcc269698a50 genunix:i_ddi_attachchild+60 ()
fffffcc269698a90 genunix:devi_attach_node+8b ()
fffffcc269698b10 genunix:config_immediate_children+e0 ()
fffffcc269698b60 genunix:devi_config_common+69 ()
fffffcc269698c00 genunix:mt_config_thread+216 ()
fffffcc269698c10 unix:thread_start+b ()

In general, we did find a BAR that we didn't program successfully in the message log back here:

WARNING: failed to program mem space [85/0/4] BAR@0x18 length 0x1000000

If we go look at the map request we get the following:

[0]> fffffcc269698488::print ddi_map_req_t
{
    map_op = 1 (DDI_MO_MAP_LOCKED)
    map_type = 0x1 (DDI_MT_REGSPEC)
    map_obj = {
        rnumber = 0x69698470
        rp = 0xfffffcc269698470
    }
    map_flags = 0x80000002
    map_prot = 0x3
    map_handlep = 0xfffffe2619ad09c0
    map_vers = 0x1
}
[0]> fffffcc269698488::print ddi_map_req_t map_obj.rp
map_obj.rp = 0xfffffcc269698470
[0]> fffffcc269698488::print ddi_map_req_t map_obj.rp | ::print struct regspec64
{
    regspec_bustype = 0
    regspec_addr = 0
    regspec_size = 0x1000000
}

So this makes some sense, we basically have an invalid BAR setting of address zero (effectively a sentinel in PCIe) as it's the reset value. Fundamentally here we need to have the various PCI nexus drivers -- npe and pci -- actually go through and check and verify that if there's a request to map address zero we treat this as an invalid address -- an invalid bar and then fail the request.

Actions #1

Updated by Electric Monk about 2 months ago

  • Gerrit CR set to 2188
Actions #2

Updated by Robert Mustacchi about 1 month ago

I tested this on a system that was tripping over this. Previously it would panic as shown in the original message. With the fix in place, the system booted up and attempts to configure the device driver, it showed up and generated console messages ala:

WARNING: nvme1: regs[1] does not have a valid MMIO address
WARNING: nvme2: regs[1] does not have a valid MMIO address

Showing us that we are catching this and no longer panicking, making it easier to do subsequent work and diagnosis.

I've also tested this on a number of other systems and just confirmed that everything comes up as expected.

Actions #3

Updated by Electric Monk 30 days ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit a2d4222865d0ef80687403e52976bd691ec2faee

commit  a2d4222865d0ef80687403e52976bd691ec2faee
Author: Robert Mustacchi <rm@fingolfin.org>
Date:   2022-07-14T18:01:25.000Z

    14557 Attempts to map PCI BARs without MMIO ends in panics
    Reviewed by: C Fraire <cfraire@me.com>
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Actions

Also available in: Atom PDF