Actions
Bug #13297
openmlxcx kernel panic on debug build due to create an aggregated link
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
Due date:
% Done:
0%
Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
Description
The system has two mellanox cards 25G:
root@Shelf6:~# dladm show-link
LINK CLASS MTU STATE BRIDGE OVER
mlxcx1 phys 1500 up -- --
mlxcx2 phys 1500 up -- --
Creating an aggr link:
root@Shelf6:~# dladm create-aggr -l mlxcx1 -l mlxcx2 -P L2,L3 -L off aggr2
and the system panics:
panic[cpu0]/thread=fffffe003d0f8c20: pageout_deadman: stuck pushing the same page for 90 seconds (freemem is 19984) fffffe003d0f8a10 genunix:process_type+2127bf () fffffe003d0f8a80 genunix:clock+7f3 () fffffe003d0f8b60 genunix:cyclic_softint+1df () fffffe003d0f8b80 unix:cbe_softclock+19 () fffffe003d0f8bd0 unix:av_dispatch_softvect+7d () fffffe003d0f8c00 apix:apix_dispatch_softint+35 () fffffe003d0059d0 unix:switch_sp_and_call+15 () fffffe003d005a30 apix:apix_do_softint+54 () fffffe003d005aa0 apix:apix_do_interrupt+3e0 () fffffe003d005ab0 unix:_interrupt+1f2 () fffffe003d005ba0 unix:i86_mwait+12 () fffffe003d005be0 unix:cpu_idle_mwait+127 () fffffe003d005c00 unix:idle+a2 () fffffe003d005c10 unix:thread_start+b () panic: entering debugger (continue to save dump) Welcome to kmdb kmdb: unable to determine terminal type: assuming `vt100' Loaded modules: [ scsi_vhci crypto mac cpc uppc mr_sas neti sd ptm ufs unix mpt zfs krtld sata apix uhci pmcs hook genunix ip logindmux usba xhci specfs pcplusmp random mm cpu.generic arp mpt_sas emlxs sockfs ] [0]>
prtconf -Dd:
pci15b3,20 (pciex15b3,1017) [Mellanox Technologies MT27800 Family [ConnectX-5]] (driver name: mlxcx) pci15b3,20 (pciex15b3,1017) [Mellanox Technologies MT27800 Family [ConnectX-5]] (driver name: mlxcx)
prtdiag:
System Configuration: Supermicro SSG-6048R-E1CR36L BIOS Configuration: American Megatrends Inc. 2.0a 06/30/2016 BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style) ==== Processor Sockets ==================================== Version Location Tag -------------------------------- -------------------------- Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz SOCKET 0 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz SOCKET 1 ==== Memory Device Sockets ================================ Type Status Set Device Locator Bank Locator ----------- ------ --- ------------------- ---------------- DDR4 in use 0 P1_DIMMA1 NODE 1 DDR4 empty 0 P1_DIMMA2 NODE 1 DDR4 in use 0 P1_DIMMB1 NODE 1 DDR4 empty 0 P1_DIMMB2 NODE 1 DDR4 empty 0 P1_DIMMC1 NODE 1 DDR4 empty 0 P1_DIMMC2 NODE 1 DDR4 empty 0 P1_DIMMD1 NODE 1 DDR4 empty 0 P1_DIMMD2 NODE 1 DDR4 in use 0 P2_DIMME1 NODE 3 DDR4 empty 0 P2_DIMME2 NODE 3 DDR4 in use 0 P2_DIMMF1 NODE 3 DDR4 empty 0 P2_DIMMF2 NODE 3 DDR4 empty 0 P2_DIMMG1 NODE 3 DDR4 empty 0 P2_DIMMG2 NODE 3 DDR4 empty 0 P2_DIMMH1 NODE 3 DDR4 empty 0 P2_DIMMH2 NODE 3 ==== On-Board Devices ===================================== Onboard Aspeed Video Onboard Intel Lan ==== Upgradeable Slots ==================================== ID Status Type Description --- --------- ---------------- ---------------------------- 1 available PCI Exp. Gen 3 x8 CPU1 SLOT1 2 in use PCI Exp. Gen 3 x16 CPU1 SLOT2, LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (mpt_sas) 3 in use PCI Exp. Gen 3 x8 CPU1 SLOT3, Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (ixgbe) 4 available PCI Exp. Gen 3 x16 CPU2 SLOT4 5 available PCI Exp. Gen 3 x8 CPU2 SLOT5 6 in use PCI Exp. Gen 3 x16 CPU2 SLOT6, Mellanox Technologies MT27800 Family [ConnectX-5] (mlxcx) 7 available PCI Exp. Gen 3 x16 CPU2 SLOT7
prtconf -v from a bootable BE on the device:
pci15b3,20 (driver not attached) Hardware properties: name='pci-msix-capid-pointer' type=int items=1 value=0000009c name='pcie-link-supported-speeds' type=int64 items=3 value=000000009502f900.000000012a05f200.00000001dcd65000 name='pcie-link-maximum-speed' type=int64 items=1 value=00000001dcd65000 name='pcie-link-current-speed' type=int64 items=1 value=00000001dcd65000 name='pcie-link-current-width' type=int items=1 value=00000008 name='pcie-link-maximum-width' type=int items=1 value=00000008 name='acpi-namespace' type=string items=1 value='\_SB_.PCI1.QR2A.H001' name='assigned-addresses' type=int items=5 value=c3810110.00000000.f6000000.00000000.02000000 name='reg' type=int items=10 value=00810100.00000000.00000000.00000000.00000000.43810110.00000000.00000000.00000000.02 000000 name='compatible' type=string items=15 value='pciex15b3,1017.15b3.20.0' + 'pciex15b3,1017.15b3.20' + 'pciex15b3,1017.0' + 'pciex 15b3,1017' + 'pciexclass,020000' + 'pciexclass,0200' + 'pci15b3,1017.15b3.20.0' + 'pci15b3,1017.15b3.20' + 'pci15 b3,20,s' + 'pci15b3,20' + 'pci15b3,1017.0' + 'pci15b3,1017,p' + 'pci15b3,1017' + 'pciclass,020000' + 'pciclass,0200' name='model' type=string items=1 value='Ethernet controller' name='power-consumption' type=int items=2 value=00000001.00000001 name='devsel-speed' type=int items=1 value=00000000 name='interrupts' type=int items=1 value=00000002 name='subsystem-vendor-id' type=int items=1 value=000015b3 name='subsystem-id' type=int items=1 value=00000020 name='unit-address' type=string items=1 value='0,1' name='class-code' type=int items=1 value=00020000 name='revision-id' type=int items=1 value=00000000 name='vendor-id' type=int items=1 value=000015b3 name='device-id' type=int items=1 value=00001017 name='vendor-name' type=string items=1 value='Mellanox Technologies' name='device-name' type=string items=1 value='MT27800 Family [ConnectX-5]' name='subsystem-name' type=string items=1 value='unknown subsystem'
Updated by Denis Kozadaev over 1 year ago
The system has only 32 GB RAM.
By Paul Winder's hint I tuned /kernel/drv/mlxcx.conf:
sq_size_shift = 12;
rq_size_shift = 11;
tx_nrings_per_group = 32;
rx_ngroups_small = 16;
and it helped, the system boots and the aggr created:
root@Shelf6:~# dladm show-aggr -x aggr2 LINK PORT SPEED DUPLEX STATE ADDRESS PORTSTATE aggr2 -- 10000Mb full up 50:6b:4b:c1:ea:74 -- mlxcx1 10000Mb full up 50:6b:4b:c1:ea:74 attached mlxcx2 10000Mb full up 50:6b:4b:c1:ea:75 attached
Updated by Denis Kozadaev over 1 year ago
update:
the system has the only mellanox dual port 25G card.
also a DEBUG build was tested.
Actions