Project

General

Profile

Bug #12890

mlxcx uses excessive stack space causing stack overflow panic

Added by Andy Fiddaman 4 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
driver - device drivers
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

I've had a couple of boot time panics on a system with a pair of mlxcx cards. In both cases it was a double fault with a very long stack trace involving the mlxcx driver.
Here's one of them:

panic message: BAD TRAP: type=8 (#df Double fault) rp=fffffdc5cca88f10 addr=0
dump content: kernel pages and pages from PID -1
> $C
fffffb0450d6c030 tr_pftrap+0x11()
fffffb0450d6c0a0 vmem_alloc+0x1c3(fffffffffbf84b60, 20000, 0)
fffffb0450d6c1b0 vmem_xalloc+0x629(fffffdc4e1c1f000, 20000, 1000, 0, 0, 0, 0, 0)
fffffb0450d6c220 vmem_alloc+0x190(fffffdc4e1c1f000, 20000, 0)
fffffb0450d6c2b0 kmem_slab_create+0x7c(fffffdc4e1c2f008, 0)
fffffb0450d6c310 kmem_slab_alloc+0x10b(fffffdc4e1c2f008, 0)
fffffb0450d6c370 kmem_cache_alloc+0x15b(fffffdc4e1c2f008, 0)
fffffb0450d6c3e0 vmem_alloc+0x1ae(fffffdc4e1c1f000, 2000, 0)
fffffb0450d6c480 segkmem_xalloc+0xfe(fffffdc4e1c1f000, 0, 2000, 0, 0, fffffffffb8a7a50, fffffffffbfa96a0)
fffffb0450d6c4f0 segkmem_alloc_vn+0x3b(fffffdc4e1c1f000, 2000, 0, fffffffffbfa96a0)
fffffb0450d6c520 segkmem_alloc+0x17(fffffdc4e1c1f000, 2000, 0)
fffffb0450d6c630 vmem_xalloc+0x629(fffffdc4e1c20000, 2000, 1000, 0, 0, 0, 0, ffffffff00000000)
fffffb0450d6c6a0 vmem_alloc+0x190(fffffdc4e1c20000, 2000, 0)
fffffb0450d6c730 kmem_slab_create+0x7c(fffffdc551804008, 0)
fffffb0450d6c790 kmem_slab_alloc+0x10b(fffffdc551804008, 0)
fffffb0450d6c7f0 kmem_cache_alloc+0x15b(fffffdc551804008, 0)
fffffb0450d6c870 rootnex_coredma_allochdl+0x3f(fffffdc551081d50, fffffdc5510fd7f8, fffffb0450d6ca48, 1, 0, fffffdc5cfc1eb38)
fffffb0450d6c8f0 rootnex_dma_allochdl+0xd5(fffffdc551081d50, fffffdc5510fd7f8, fffffb0450d6ca48, 1, 0, fffffdc5cfc1eb38)
fffffb0450d6c960 ddi_dma_allochdl+0x62(fffffdc551081008, fffffdc5510fd7f8, fffffb0450d6ca48, 1, 0, fffffdc5cfc1eb38)
fffffb0450d6c9c0 pcieb_dma_allochdl+0x3e(fffffdc551081008, fffffdc5510fd7f8, fffffb0450d6ca48, 1, 0, fffffdc5cfc1eb38)
fffffb0450d6ca30 ddi_dma_allochdl+0x62(fffffdc5510fd7f8, fffffdc5510fd7f8, fffffb0450d6ca48, 1, 0, fffffdc5cfc1eb38)
fffffb0450d6cae0 ddi_dma_alloc_handle+0x7c(fffffdc5510fd7f8, fffffb0450d6cbc8, 1, 0, fffffdc5cfc1eb38)
fffffb0450d6cb90 mlxcx_dma_alloc+0x72(fffffdc5d691e040, fffffdc5cfc1eb18, fffffb0450d6cbc8, fffffb0450d6cbc2, 1, 240, 1)
fffffb0450d6cc70 mlxcx_cmd_mbox_alloc+0xb7(fffffdc5d691e040, fffffb0450d6cd70, 9)
fffffb0450d6cd00 mlxcx_cmd_send+0x79(fffffdc5d691e040, fffffb0450d6cd38, fffffb0450d6cdc8, 1010, fffffb0450d6ddd8, 10)
fffffb0450d6ee40 mlxcx_cmd_give_pages+0x154(fffffdc5d691e040, 1, 200, fffffb0450d6eef8)
fffffb0450d6ff40 mlxcx_give_pages+0x14f(fffffdc5d691e040, 11a9)
fffffb0450d6ff80 mlxcx_init_pages+0x42(fffffdc5d691e040, 2)
fffffb0450d6ffc0 mlxcx_attach+0x17f(fffffdc5510fd7f8, 0)
fffffb0450d70040 devi_attach+0xa1(fffffdc5510fd7f8, 0)
fffffb0450d70080 attach_node+0x8b(fffffdc5510fd7f8)
fffffb0450d700d0 i_ndi_config_node+0x12c(fffffdc5510fd7f8, 6, 0)
fffffb0450d70100 i_ddi_attachchild+0x3a(fffffdc5510fd7f8)
fffffb0450d70140 devi_attach_node+0x5d(fffffdc5510fd7f8, 4080)
fffffb0450d70210 devi_config_one+0x3ae(fffffdc551081008, fffffdc5cdb5fc00, fffffb0450d702e0, 4080, 0)
fffffb0450d70290 ndi_devi_config_one+0xad(fffffdc551081008, fffffdc5cdb5fc00, fffffb0450d702e0, 4080)
fffffb0450d70370 resolve_pathname+0x14f(fffffb0450d707c8, fffffb0450d70380, 0, 0)
fffffb0450d703a0 e_ddi_hold_devi_by_path+0x24(fffffb0450d707c8, 0)
fffffb0450d70c10 ufm_do_getcaps+0x95(8046724, 100001)
fffffb0450d70c70 ufm_ioctl+0xa3(ec00000000, 75666d01, 8046724, 100001, fffffdc5d6a70d20, fffffb0450d70dc8)
fffffb0450d70cb0 cdev_ioctl+0x2b(ec00000000, 75666d01, 8046724, 100001, fffffdc5d6a70d20, fffffb0450d70dc8)
fffffb0450d70d00 spec_ioctl+0x45(fffffdc5cf9b7140, 75666d01, 8046724, 100001, fffffdc5d6a70d20, fffffb0450d70dc8, 0)
fffffb0450d70d90 fop_ioctl+0x5b(fffffdc5cf9b7140, 75666d01, 8046724, 100001, fffffdc5d6a70d20, fffffb0450d70dc8, 0)
fffffb0450d70eb0 ioctl+0x153(b, 75666d01, 8046724)
fffffb0450d70f10 sys_syscall32+0x138()

This appears to be due to stack space exhaustion and looking that the addresses above, three pages are being used in mlxcx_give_pages() and mlxcx_cmd_give_pages()


Related issues

Related to illumos gate - Bug #12797: mlxcx max flow table limit can be exceededClosed

Actions
Related to illumos gate - Bug #12798: mlxcx command interface should allow concurrent commands and be interrupt drivenClosed

Actions
Related to illumos gate - Bug #12799: mlxcx #if defined for MAC_VLAN_UNTAGGED is redundantClosed

Actions

History

#1

Updated by Andy Fiddaman 4 months ago

  • Category set to driver - device drivers
#2

Updated by Paul Winder 4 months ago

  • Status changed from New to In Progress
  • Assignee set to Paul Winder
  • Gerrit CR set to 705

The changes in change 705 remove the arrays from the structs which chewed up this stack

#3

Updated by Dan McDonald 4 months ago

  • Related to Bug #12797: mlxcx max flow table limit can be exceeded added
#4

Updated by Dan McDonald 4 months ago

  • Related to Bug #12798: mlxcx command interface should allow concurrent commands and be interrupt driven added
#5

Updated by Dan McDonald 4 months ago

  • Related to Bug #12799: mlxcx #if defined for MAC_VLAN_UNTAGGED is redundant added
#6

Updated by Paul Winder 4 months ago

See #12798 for testing info

#7

Updated by Electric Monk 4 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 5f0e3176f407dfb9d989b5dcc94a6d5384d0b142

commit  5f0e3176f407dfb9d989b5dcc94a6d5384d0b142
Author: Paul Winder <pwinder@racktopsystems.com>
Date:   2020-07-10T15:41:30.000Z

    12797 mlxcx max flow table limit can be exceeded
    12798 mlxcx command interface should allow concurrent commands and be interrupt driven
    12799 mlxcx #if defined for MAC_VLAN_UNTAGGED is redundant
    12890 mlxcx uses excessive stack space causing stack overflow panic
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Reviewed by: Andy Stormont <astormont@racktopsystems.com>
    Reviewed by: Robert Mustacchi <rm@fingolfin.org>
    Reviewed by: Alex Wilson <alex@cooperi.net>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF