Bug #14557
closedAttempts to map PCI BARs without MMIO ends in panics
100%
Description
On a system, we failed to allocate and assign a PCI BAR, we ended up getting a system with the following panic:
panic[cpu0]/thread=fffffcc269698c20: hat_devload: loading a mapping to free page fffffcc088003168 Warning - stack not written to the dump buffer fffffcc2696983e0 unix:hat_devload+2bc () fffffcc269698440 rootnex:rootnex_map_regspec+1f9 () fffffcc269698510 rootnex:rootnex_map+22f () fffffcc269698580 genunix:ddi_map+59 () fffffcc269698670 npe:npe_bus_map+491 () fffffcc2696986f0 pcieb:pcieb_bus_map+92 () fffffcc269698760 genunix:ddi_map+59 () fffffcc269698820 genunix:ddi_regs_map_setup+ff () fffffcc269698910 t4nex:t4_devo_attach+359 () fffffcc269698990 genunix:devi_attach+ad () fffffcc2696989d0 genunix:attach_node+149 () fffffcc269698a20 genunix:i_ndi_config_node+16c () fffffcc269698a50 genunix:i_ddi_attachchild+60 () fffffcc269698a90 genunix:devi_attach_node+8b () fffffcc269698b10 genunix:config_immediate_children+e0 () fffffcc269698b60 genunix:devi_config_common+69 () fffffcc269698c00 genunix:mt_config_thread+216 () fffffcc269698c10 unix:thread_start+b ()
In general, we did find a BAR that we didn't program successfully in the message log back here:
WARNING: failed to program mem space [85/0/4] BAR@0x18 length 0x1000000
If we go look at the map request we get the following:
[0]> fffffcc269698488::print ddi_map_req_t { map_op = 1 (DDI_MO_MAP_LOCKED) map_type = 0x1 (DDI_MT_REGSPEC) map_obj = { rnumber = 0x69698470 rp = 0xfffffcc269698470 } map_flags = 0x80000002 map_prot = 0x3 map_handlep = 0xfffffe2619ad09c0 map_vers = 0x1 } [0]> fffffcc269698488::print ddi_map_req_t map_obj.rp map_obj.rp = 0xfffffcc269698470 [0]> fffffcc269698488::print ddi_map_req_t map_obj.rp | ::print struct regspec64 { regspec_bustype = 0 regspec_addr = 0 regspec_size = 0x1000000 }
So this makes some sense, we basically have an invalid BAR setting of address zero (effectively a sentinel in PCIe) as it's the reset value. Fundamentally here we need to have the various PCI nexus drivers -- npe and pci -- actually go through and check and verify that if there's a request to map address zero we treat this as an invalid address -- an invalid bar and then fail the request.
Updated by Robert Mustacchi about 1 month ago
I tested this on a system that was tripping over this. Previously it would panic as shown in the original message. With the fix in place, the system booted up and attempts to configure the device driver, it showed up and generated console messages ala:
WARNING: nvme1: regs[1] does not have a valid MMIO address WARNING: nvme2: regs[1] does not have a valid MMIO address
Showing us that we are catching this and no longer panicking, making it easier to do subsequent work and diagnosis.
I've also tested this on a number of other systems and just confirmed that everything comes up as expected.
Updated by Electric Monk 30 days ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit a2d4222865d0ef80687403e52976bd691ec2faee
commit a2d4222865d0ef80687403e52976bd691ec2faee Author: Robert Mustacchi <rm@fingolfin.org> Date: 2022-07-14T18:01:25.000Z 14557 Attempts to map PCI BARs without MMIO ends in panics Reviewed by: C Fraire <cfraire@me.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Richard Lowe <richlowe@richlowe.net>