Project

General

Profile

Actions

Bug #13218

closed

"Stack smashing detected" panic when creating vnic over aggr with 4 mlxcx links

Added by Paul Winder almost 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

Created an aggr, similar to:

dladm create-aggr -t -l mlxcx0 -l mlxcx1 -l mlxcx2 -l mlxcx3 aggr0

and then a vnic
dladm create-vnic -t -l aggr0 vnic0

and the system panic'ed with the stack:
> $C
ffffd002e3d25210 vpanic()
ffffd002e3d25220 0xfffffffffb85c822()
ffffd002e3d25490 0xfffffffffb9cb0b6()
ffffd002e3d25540 mac_fanout_setup+0x65(ffffd095a3cef390, ffffd095841650c8, ffffd095841650d4, fffffffffb9d9c00, ffffd095a3cef390, 0, 0)
ffffd002e3d255c0 mac_srs_group_setup+0xf7(ffffd095a3cef390, ffffd095841650c8, 1)
ffffd002e3d256a0 mac_datapath_setup+0x70d(ffffd095a3cef390, ffffd095841650c8, 1)
ffffd002e3d25750 mac_client_datapath_setup+0x298(ffffd095a3cef390, 0, ffffd002e3d25b74, ffffd0953ee26000, 0, ffffd09484a4f8a8)
ffffd002e3d25800 i_mac_unicast_add+0x58e(ffffd095a3cef390, ffffd002e3d25b74, 0, ffffd09a1f574d80, 0, ffffd002e3d258b0)
ffffd002e3d25880 mac_unicast_add+0x6e(ffffd095a3cef390, ffffd002e3d25b74, 0, ffffd09a1f574d80, 0, ffffd002e3d258b0)
ffffd002e3d25920 vnic_unicast_add+0x1ff(ffffd09a1f574c60, 3, ffffd002e3d25b68, 3, ffffd002e3d25b64, ffffd002e3d25b74, 4300000000, ffffd002e3d25b70, ffffd00200000000, 0)
ffffd002e3d25b00 vnic_dev_create+0x3d9(b, a, ffffd002e3d25b6c, ffffd002e3d25b64, ffffd002e3d25b74, ffffd002e3d25b68, ffffffff00000003, 0, ffffd09a00000000, 0, ffffd09595d76044, ffffd09300000000, ffffd002e3d25b70, ffffd0952a46c1d0)
ffffd002e3d25be0 vnic_ioc_create+0xda(ffffd09595d76000, 8041550, 100003, ffffd0952a46c1d0, ffffd002e3d25dd8)
ffffd002e3d25c80 drv_ioctl+0x1ef(1200000000, 1710001, 8041550, 100003, ffffd0952a46c1d0, ffffd002e3d25dd8)
ffffd002e3d25cc0 cdev_ioctl+0x2b(1200000000, 1710001, 8041550, 100003, ffffd0952a46c1d0, ffffd002e3d25dd8)
ffffd002e3d25d10 spec_ioctl+0x45(ffffd0947d365880, 1710001, 8041550, 100003, ffffd0952a46c1d0, ffffd002e3d25dd8, 0)
ffffd002e3d25da0 fop_ioctl+0x5b(ffffd0947d365880, 1710001, 8041550, 100003, ffffd0952a46c1d0, ffffd002e3d25dd8, 0)
ffffd002e3d25ec0 ioctl+0x153(3, 1710001, 8041550)
ffffd002e3d25f10 _sys_sysenter_post_swapgs+0x14f()

Recreated with dtrace -m mac running, and from the dtrace in the dump:
> ffffd09519a960c0::dtrace -c f
CPU     ID                    FUNCTION:NAME
 15  42345            mac_start_ring:return 
 15  42147          mac_hwring_start:return 
 15  42345            mac_start_ring:return 
.
.
 15  41142          mac_flow_cpu_init:entry 
 15  41136 mac_compute_soft_ring_count:entry 
 15  41888        mac_client_stat_get:entry 
 15  41070         mac_client_ifspeed:entry 
 15  42322               mac_stat_get:entry 
 15  42323              mac_stat_get:return 
 15  41071        mac_client_ifspeed:return 
 15  41889       mac_client_stat_get:return 
 15  41137 mac_compute_soft_ring_count:return 
 15  41130          mac_next_bind_cpu:entry 
 15  41131         mac_next_bind_cpu:return 
.
.
 15  41130          mac_next_bind_cpu:entry 
 15  41131         mac_next_bind_cpu:return 
 15  41138            mac_tx_cpu_init:entry 
 15  41139           mac_tx_cpu_init:return 

Looks like there was no return from mac_flow_cpu_init(). A code snippet
static void
mac_flow_cpu_init(flow_entry_t *flent, cpupart_t *cpupart)
{
        mac_soft_ring_set_t *rx_srs;
        processorid_t cpuid;
        int i, j, k, srs_cnt, nscpus, maxcpus, soft_ring_cnt = 0;
        mac_cpus_t *srs_cpu;
        mac_resource_props_t *emrp = &flent->fe_effective_props;
        uint32_t cpus[MRP_NCPUS];
.
.
.
        nscpus = 0;
        for (srs_cnt = 0; srs_cnt < flent->fe_rx_srs_cnt; srs_cnt++) {
                rx_srs = flent->fe_rx_srs[srs_cnt];
                srs_cpu = &rx_srs->srs_cpu;
                for (j = 0; j < srs_cpu->mc_ncpus; j++) {
                        cpus[nscpus++] = srs_cpu->mc_cpus[j];
                }
        }

An array on the stack with no guard in the code to avoid the overflow. And MR_NCPUS is defined as 128.
A look at the flow_entry_t
> ffffd095841650c8::print -at flow_entry_t fe_rx_srs fe_rx_srs_cnt
ffffd0958416b698 void *[128] fe_rx_srs = [ 0xffffd0979de95cc0, 0xffffd0979de95000, 0xffffd09a1f247340, 0xffffd09a1f246680, 0xffffd09a1f2459c0, 0xffffd09a1f244d00, 0xffffd09a1f244040, 0xffffd09a1f243300, 0xffffd09a1f242640, 0xffffd09a1f241980, 0xffffd09a1f240cc0, 0xffffd09a1f240000, 0xffffd09a1f23f340, 0xffffd09a1f23e680, 0xffffd09a1f23d9c0, 0xffffd09a1f23cd00, 0xffffd09a1f23c040, 0xffffd09a1f23b300, 0xffffd09a1f23a640, 0xffffd09a1f239980, 0xffffd09a1f238cc0, 0xffffd09a1f238000, 0xffffd09a1f237340, 0xffffd09a1f236680, 0xffffd09a1f2359c0, 0xffffd09a1f234d00, 0xffffd09a1f234040, 0xffffd09a1f233300, 0xffffd09a1f232640, 0xffffd09a1f231980, 0xffffd09a1f230cc0, 0xffffd09a1f230000, ... ]
ffffd0958416ba98 int fe_rx_srs_cnt = 0x41

fe_rx_srs_cnt is 0x41 (65) and if look at mc_ncpus for each mc_cpus[] array we see:
> ffffd0958416b698::array "void *" 0x41|::print -t 'void *'|::print -t mac_soft_ring_set_t srs_cpu.mc_ncpus
uint32_t srs_cpu.mc_ncpus = 0x2
uint32_t srs_cpu.mc_ncpus = 0x2
uint32_t srs_cpu.mc_ncpus = 0x2
uint32_t srs_cpu.mc_ncpus = 0x2
.
.
.
uint32_t srs_cpu.mc_ncpus = 0x2
uint32_t srs_cpu.mc_ncpus = 0x2

mc_ncpus is 0x2 for each if the 0x41 array entries, this means nscpus from the code snippet will end up as 0x82 = 130 which is greater than MRP_NCPUS, hence the array overflow and stack corruption.


Related issues

Related to illumos gate - Bug #13222: Increase maximum number of fanout CPUs from 128 to 256ClosedPaul Winder

Actions
Actions #1

Updated by Electric Monk almost 3 years ago

  • Gerrit CR set to 972
Actions #2

Updated by Paul Winder almost 3 years ago

  • Gerrit CR changed from 972 to 973
Actions #3

Updated by Paul Winder almost 3 years ago

  • Related to Bug #13222: Increase maximum number of fanout CPUs from 128 to 256 added
Actions #4

Updated by Electric Monk almost 3 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 3714f7be8e09c39a0ea7ce7ef44cb495ce250913

commit  3714f7be8e09c39a0ea7ce7ef44cb495ce250913
Author: Paul Winder <paul@winder.uk.net>
Date:   2020-12-16T14:23:43.000Z

    13218 "Stack smashing detected" panic when creating vnic over aggr with 4 mlxcx links
    13222 Increase maximum number of fanout CPUs from 128 to 256
    Reviewed by: Robert Mustacchi <rm@fingolfin.org>
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Approved by: Dan McDonald <danmcd@joyent.com>

Actions

Also available in: Atom PDF