Project

General

Profile

Actions

Bug #14766

closed

panic when destroying a vnic

Added by Garrett D'Amore 5 months ago. Updated 5 months ago.

Status:
Closed
Priority:
High
Category:
networking
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Bite-size
Tags:
Gerrit CR:
External Bug:

Description

Our QA team discovered a panic condition when tearing down a VNIC, if the operation is aborted.

Essentially, they did control-C of the dladm command to destroy the VNIC. This was a lot easier on an i40e instance with a lot of rings, because the tear-down takes a while.

The problem is a duplicate free of one of the mac address handles. This leads to a kernel panic.


Related issues

Related to illumos gate - Feature #5030: VNICs should support a variable MTUClosedRobert Mustacchi2014-07-19

Actions
Actions #1

Updated by Garrett D'Amore 5 months ago

The relevant fix I made for this is here:

+++ b/usr/src/uts/common/io/vnic/vnic_dev.c
@@ -23,6 +23,7 @@
  * Copyright 2018 Joyent, Inc.
  * Copyright 2016 OmniTI Computer Consulting, Inc. All rights reserved.
  * Copyright 2020 OmniOS Community Edition (OmniOSce) Association.
+ * Copyright 2022 RackTop Systems, Inc.
  */

 #include <sys/types.h>
@@ -571,8 +572,6 @@ vnic_dev_create(datalink_id_t vnic_id, datalink_id_t linkid,

        err = dls_devnet_create(vnic->vn_mh, vnic->vn_id, crgetzoneid(credp));
        if (err != 0) {
-               VERIFY(is_anchor || mac_margin_remove(vnic->vn_lower_mh,
-                   vnic->vn_margin) == 0);
                if (!is_anchor) {
                        VERIFY(mac_mtu_remove(vnic->vn_lower_mh,
                            vnic->vn_mtu) == 0);
Actions #2

Updated by Garrett D'Amore 5 months ago

Some of our internal commentary:

All
Comments
Work Log
History
Activity
Ascending order - Click to sort in descending order
Permalink Edit Delete
gdamore
Garrett D'Amore added a comment - 05/Mar/22 11:19 AM
This looks like a duplicate free of resources on an error path:

VERIFY(is_anchor || mac_margin_remove(vnic->vn_lower_mh,
vnic->vn_margin) 0);
if (!is_anchor) {
VERIFY(mac_mtu_remove(vnic->vn_lower_mh,
vnic->vn_mtu) 0);
VERIFY(mac_margin_remove(vnic->vn_lower_mh,
vnic->vn_margin) == 0);
}
Permalink Edit Delete
gdamore
Garrett D'Amore added a comment - 05/Mar/22 11:20 AM
So the problem is that the first VERIFY means that the code inside the anchor check is duplicate, so we wind up calling mac_margin_remove() twice.
Permalink Edit Delete
gdamore
Garrett D'Amore added a comment - 05/Mar/22 11:21 AM
I don't think this is i40e specific, but it might be easier to trigger there – we don't know why the error actually occurred.
Actions #3

Updated by Electric Monk 5 months ago

  • Gerrit CR set to 2205
Actions #4

Updated by Garrett D'Amore 5 months ago

  • % Done changed from 0 to 90
Actions #5

Updated by Dan McDonald 5 months ago

  • Related to Feature #5030: VNICs should support a variable MTU added
Actions #6

Updated by Garrett D'Amore 5 months ago

  • Status changed from In Progress to Pending RTI
Actions #7

Updated by Joshua M. Clulow 5 months ago

Testing Notes (from RTI mail)

This has been part of our shipping product for a bit, and was tested and verified by formal QA here at RackTop. (Easily reproduced on i40e by control-C during a vnic teardown.)

Actions #8

Updated by Electric Monk 5 months ago

  • Status changed from Pending RTI to Closed
  • % Done changed from 90 to 100

git commit d5d1522f61a203e5d41826580ab94c4ffe4c8a37

commit  d5d1522f61a203e5d41826580ab94c4ffe4c8a37
Author: Garrett D'Amore <garrett@damore.org>
Date:   2022-06-29T23:18:49.000Z

    14766 panic when destroying a vnic
    Reviewed by: Dan McDonald <danmcd@mnx.io>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Jerry Jelinek <gjelinek@gmail.com>
    Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
    Reviewed by: Jim Johnson <jjohnson@racktopsystems.com>
    Approved by: Gordon Ross <gordon.w.ross@gmail.com>

Actions

Also available in: Atom PDF