Project

General

Profile

Bug #13207

Creating an aggr with more than 128 Tx or Rx rings panics

Added by Paul Winder 22 days ago. Updated 22 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
networking
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

The easiest way to trigger this is to create an aggr with more than 2 links, and at least 2 of those links are mlxcx. Eg

dladm create-aggr -l mlxcx0 -l mlxcx1 -l mlxcx2 aggr0

When you do this the system panic with an assertion failure:
assertion failed: aggr_add_pseudo_tx_group(port, &grp->lg_tx_group) == 0 (0x5 == 0x0), file: ../../common/io/aggr/aggr_grp.c, line: 1550

And then if the link was permanent (ie not created with the "-t" option) the system goes into panic<->reboot loop.


Related issues

Related to illumos gate - Bug #13208: Create aggr fails when underlying links have more than 128 Tx ringsNewPaul Winder

Actions
#1

Updated by Paul Winder 22 days ago

The error is caused by EIO returned by aggr_add_pseudo_tx_ring(). EIO is returned when the MAX_RINGS_PER_GROUP limit (128) is reached. Two issues:
  1. Why is the limit reached?
  2. The system does not need to panic.

The limit is reached because mlxcx has a default of 64 rings per group, so after 2 links in the aggr MAX_RINGS_PER_GROUP is reached. That default is too high and will be addressed in another bug report.

The code in aggr_grp_create() needs to be fixed to remove the VERIFY's.
This snippet:

        /*
         * Attach each port if necessary.
         */
        for (port = grp->lg_ports; port != NULL; port = port->lp_next) {
                /*
                 * Create the pseudo ring for each HW ring of the
                 * underlying port. Note that this is done after the
                 * aggr registers its MAC.
                 */
                VERIFY3S(aggr_add_pseudo_tx_group(port, &grp->lg_tx_group),
                    ==, 0);

                for (i = 0; i < grp->lg_rx_group_count; i++) {
                        VERIFY3S(aggr_add_pseudo_rx_group(port,
                            &grp->lg_rx_groups[i]), ==, 0);
                }

                if (aggr_port_notify_link(grp, port))
                        link_state_changed = B_TRUE;

                /*
                 * Initialize the callback functions for this port.
                 */
                aggr_port_init_callbacks(port);
        }

#2

Updated by Paul Winder 22 days ago

  • Related to Bug #13208: Create aggr fails when underlying links have more than 128 Tx rings added
#3

Updated by Paul Winder 22 days ago

I tested by creating an aggr with too many mlxcx links and (before fixing #13208), dladm now fails:

# dladm create-aggr -t  -l mlxcx0 -l mlxcx1 -l mlxcx2 aggr0
dladm: create operation failed: I/O error

I also hard code an error to simulate the failure at what was the 2nd VERIFY to confirm that failure path worked.

Creating aggr with 2 mlxcx was succesful, and aggr with more than 2 of other NIC varieties was successful

#4

Updated by Electric Monk 22 days ago

  • Gerrit CR set to 962

Also available in: Atom PDF