Project

General

Profile

Actions

Bug #13207

closed

Creating an aggr with more than 128 Tx or Rx rings panics

Added by Paul Winder almost 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
networking
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

The easiest way to trigger this is to create an aggr with more than 2 links, and at least 2 of those links are mlxcx. Eg

dladm create-aggr -l mlxcx0 -l mlxcx1 -l mlxcx2 aggr0

When you do this the system panic with an assertion failure:
assertion failed: aggr_add_pseudo_tx_group(port, &grp->lg_tx_group) == 0 (0x5 == 0x0), file: ../../common/io/aggr/aggr_grp.c, line: 1550

And then if the link was permanent (ie not created with the "-t" option) the system goes into panic<->reboot loop.


Related issues

Related to illumos gate - Bug #13208: Create aggr fails when underlying links have more than 128 Tx ringsClosedPaul Winder

Actions
Actions #1

Updated by Paul Winder almost 3 years ago

The error is caused by EIO returned by aggr_add_pseudo_tx_ring(). EIO is returned when the MAX_RINGS_PER_GROUP limit (128) is reached. Two issues:
  1. Why is the limit reached?
  2. The system does not need to panic.

The limit is reached because mlxcx has a default of 64 rings per group, so after 2 links in the aggr MAX_RINGS_PER_GROUP is reached. That default is too high and will be addressed in another bug report.

The code in aggr_grp_create() needs to be fixed to remove the VERIFY's.
This snippet:

        /*
         * Attach each port if necessary.
         */
        for (port = grp->lg_ports; port != NULL; port = port->lp_next) {
                /*
                 * Create the pseudo ring for each HW ring of the
                 * underlying port. Note that this is done after the
                 * aggr registers its MAC.
                 */
                VERIFY3S(aggr_add_pseudo_tx_group(port, &grp->lg_tx_group),
                    ==, 0);

                for (i = 0; i < grp->lg_rx_group_count; i++) {
                        VERIFY3S(aggr_add_pseudo_rx_group(port,
                            &grp->lg_rx_groups[i]), ==, 0);
                }

                if (aggr_port_notify_link(grp, port))
                        link_state_changed = B_TRUE;

                /*
                 * Initialize the callback functions for this port.
                 */
                aggr_port_init_callbacks(port);
        }

Actions #2

Updated by Paul Winder almost 3 years ago

  • Related to Bug #13208: Create aggr fails when underlying links have more than 128 Tx rings added
Actions #3

Updated by Paul Winder almost 3 years ago

I tested by creating an aggr with too many mlxcx links and (before fixing #13208), dladm now fails:

# dladm create-aggr -t  -l mlxcx0 -l mlxcx1 -l mlxcx2 aggr0
dladm: create operation failed: I/O error

I also hard code an error to simulate the failure at what was the 2nd VERIFY to confirm that failure path worked.

Creating aggr with 2 mlxcx was succesful, and aggr with more than 2 of other NIC varieties was successful

Actions #4

Updated by Electric Monk almost 3 years ago

  • Gerrit CR set to 962
Actions #5

Updated by Electric Monk almost 3 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit adc528899caad05412c9b8d397e277900adf786b

commit  adc528899caad05412c9b8d397e277900adf786b
Author: Paul Winder <paul@winder.uk.net>
Date:   2020-12-08T18:36:43.000Z

    13207 Creating an aggr with more than 128 Tx or Rx rings panics
    Reviewed by: Robert Mustacchi <rm@fingolfin.org>
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Approved by: Dan McDonald <danmcd@joyent.com>

Actions

Also available in: Atom PDF