Project

General

Profile

Actions

Bug #13644

closed

ixgbe_ring_tx() has an invalid VERIFY under memory pressure

Added by Dan McDonald 9 months ago. Updated 9 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
driver - device drivers
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Bite-size
Tags:
Gerrit CR:

Description

Consider: https://github.com/illumos/illumos-gate/blob/b2761fb273089c452ca34297d7ab4a1d1c1f1012/usr/src/uts/common/io/ixgbe/ixgbe_tx.c#L416-L420

The VERIFY will dereference a NULL pointer (tcb) IF AND ONLY IF the very first call to ixgbe_tx_copy() fails because of low-memory allocation failures. If it fails the very first time, "tcb" will never get assigned a non-NULL value and therefore the VERIFY will dereference a NULL pointer.

Introduced in #12849 and the fix is to either update the VERIFY or remove it.


Related issues

Related to illumos gate - Bug #12849: ixgbe can do unnecessary allocations during txClosedJason King

Actions
Actions #1

Updated by Jason King 9 months ago

I think we can safely remove the VERIFY altogether. It was added early-on when trying to refactor the TX path in #12849 due to all the inscrutable pointer manipulations that were occurring. In the revised code, there is exactly one place where tcp->mp is assigned (just a few lines above it), so it's not really as big a concern now.

Actions #2

Updated by Dan McDonald 9 months ago

USE WITH CAUTION AND ONLY ON AN ixgbe you know to include 12849's fix...
A hotpatch using mdb -kw that circumvents the VERIFY altogether:

> ixgbe_ring_tx+0x1b7::write -l 2 2977
ixgbe_ring_tx+0x1b7:            0x8b48  =       0x2977

As a single command:

echo "ixgbe_ring_tx+0x1b7::write -l 2 2977" | mdb -kw
Actions #3

Updated by Dan McDonald 9 months ago

  • Related to Bug #12849: ixgbe can do unnecessary allocations during tx added
Actions #4

Updated by Electric Monk 9 months ago

  • Gerrit CR set to 1349
Actions #5

Updated by Joshua M. Clulow 9 months ago

Or, even:

mdb -kwe 'ixgbe_ring_tx+0x1b7::write -l 2 2977'
Actions #6

Updated by Jason King 9 months ago

A customer that was experiencing this problem was given the workaround which disables the VERIFY check being removed by this change. With that in place, they are no longer experiencing any panics.

We did try to see if we could determine a specific type of workload that could consistently trigger this particular pathology, but unfortunately there was nothing that stood out other than just dumb luck.

Actions #7

Updated by Electric Monk 9 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 1dd392eb86f73b4fcf06627115882e9b0bf27e8c

commit  1dd392eb86f73b4fcf06627115882e9b0bf27e8c
Author: Jason King <jason.king@joyent.com>
Date:   2021-03-19T16:51:02.000Z

    13644 ixgbe_ring_tx() has an invalid VERIFY under memory pressure
    Reviewed by: Dan McDonald <danmcd@joyent.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Robert Mustacchi <rm@fingolfin.org>

Actions

Also available in: Atom PDF