Project

General

Profile

Bug #12797

mlxcx max flow table limit can be exceeded

Added by Paul Winder about 2 months ago. Updated 4 days ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
driver - device drivers
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

During testing with an Atto ConnectX-4 based card the system panic'ed after a failure to allocate a flow table.
The panic happened in the recovery path. We need to do two things: prevent the panic, and ensure we don't try and allocate more flow tables than the hardware supports.


Related issues

Related to illumos gate - Bug #12798: mlxcx command interface should allow concurrent commands and be interrupt drivenClosed

Actions
Related to illumos gate - Bug #12799: mlxcx #if defined for MAC_VLAN_UNTAGGED is redundantClosed

Actions
Related to illumos gate - Bug #12890: mlxcx uses excessive stack space causing stack overflow panicClosed

Actions

History

#1

Updated by Paul Winder about 2 months ago

  • Gerrit CR set to 705
#2

Updated by Paul Winder 14 days ago

There are two issues, trying to create too many flow tables and panic during recovery from an allocation failure.

1. I no longer have the original panic, but during flow table/group set up the data structure for the flow table was allocated, the command to allocate the flow table in the hardware would fail. The recovery code notices the allocated data structure, but also assumes the subordinate flow group has been allocated - but it hadn't so the system panic'ed attempting to free from a NULL pointer.

2. The setup code assumed there would be enough flow table entries to cover the number of groups being created. On this particular device there were not. In mlxcx_calc_rx_ngroups() the number of groups is already limited based on device capabilities. This has been extended so we don't create so many groups such that the max number of flow tables is exceeded. We require one per port and 2 per group.

#3

Updated by Dan McDonald 5 days ago

  • Related to Bug #12798: mlxcx command interface should allow concurrent commands and be interrupt driven added
#4

Updated by Dan McDonald 5 days ago

  • Related to Bug #12799: mlxcx #if defined for MAC_VLAN_UNTAGGED is redundant added
#5

Updated by Dan McDonald 5 days ago

  • Related to Bug #12890: mlxcx uses excessive stack space causing stack overflow panic added
#6

Updated by Paul Winder 4 days ago

Testing:
  • The changes are from Racktop's production version of the driver, and as such have been through Racktop QA.
  • Specific for this ticket, the fix was confirmed against the specific Atto card which induced the fault.
#7

Updated by Electric Monk 4 days ago

  • Status changed from In Progress to Closed
  • % Done changed from 80 to 100

git commit 5f0e3176f407dfb9d989b5dcc94a6d5384d0b142

commit  5f0e3176f407dfb9d989b5dcc94a6d5384d0b142
Author: Paul Winder <pwinder@racktopsystems.com>
Date:   2020-07-10T15:41:30.000Z

    12797 mlxcx max flow table limit can be exceeded
    12798 mlxcx command interface should allow concurrent commands and be interrupt driven
    12799 mlxcx #if defined for MAC_VLAN_UNTAGGED is redundant
    12890 mlxcx uses excessive stack space causing stack overflow panic
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Reviewed by: Andy Stormont <astormont@racktopsystems.com>
    Reviewed by: Robert Mustacchi <rm@fingolfin.org>
    Reviewed by: Alex Wilson <alex@cooperi.net>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF