Project

General

Profile

Actions

Bug #12980

closed

attempting to change MTU on mlxcx based aggregation can induce FMA event

Added by Paul Winder 10 months ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Aggregations will let you (attempt) to change the MTU whilst they are in use, to do that they stop and start appropriate rings. If you attempt to do this on a mlxcx aggr which is under heavy load you can induce an FMA event and corresponding syslog message.
The NIC remains online and functional, and is not flagged for retirement, but after looking at the cause I suspect there is also a risk of a panic through bad pointer references.
The FMA event is similar to:

Jul 24 2020 11:50:28.714272905 ereport.io.device.inval_state
nvlist version: 0
        class = ereport.io.device.inval_state
        ena = 0x2d29cd1f7c203c01
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = dev
                device-path = /pci@0,0/pci8086,6f08@3/pci117c,b4@0
        (end detector)

        __ttl = 0x1
        __tod = 0x5f1acb04 0x2a92f089

and messages:
2020-07-24T09:45:17.627768+00:00 localhost mlxcx: [ID 887215 kern.warning] WARNING: mlxcx0: got completion on CQ 17 but no buffer matching wqe found: 3c3d (first buffer counter = ffffffff)
2020-07-24T09:45:48.933365+00:00 localhost mlxcx: [ID 887215 kern.warning] WARNING: mlxcx0: got completion on CQ 16 but no buffer matching wqe found: 1598 (first buffer counter = ffffffff)

To induce this you need to create a set of links similar to:
# dladm show-link
mlxcx0      phys      9000   up       --         --
mlxcx1      phys      9000   up       --         --
aggr0       aggr      9000   up       --         mlxcx0,mlxcx1
data0       vnic      8000   up       --         aggr0

Notice the smaller MTU on the data0 vnic, it is that which allows us to attempt the MTU update and cause the rings to be stop/started.
I ran iperf as a server on the system which I was attempting to change the MTU, and client on another system.
Repeated running of:
dladm set-linkprop -p mtu=8000 aggr0

Produced the message and FMA record in about 1 in 3 cases


Related issues

Related to illumos gate - Bug #12987: devo_power misconfigured in mlxcxClosedPaul Winder

Actions
Related to illumos gate - Bug #12988: potential hang in mlxcx when async and ring vectors end up on same CPUClosedPaul Winder

Actions
Actions #1

Updated by Paul Winder 10 months ago

To test, I used two servers.
  1. on server 1, I had the aggregation and vnic configured as in the main descriptive text.
  2. on server 1, I started iperf3 -s
  3. on server 2, I started iperf3 -c <server1>
  4. on server 1, whilst iperf3 is running I executed a similar (brain dead) script to:
    # for i in 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 4; do dladm set-linkprop -p mtu=8000 aggr0; sleep 2; done
    

and it produces output, as expected:

dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy
dladm: warning: cannot set link property 'mtu' on 'aggr0': link busy

and no FMA events nor syslog messages.

This set of tests were carried out with kmem_flags set to 0xf and zero.

Actions #2

Updated by Paul Winder 9 months ago

  • Subject changed from attempting to change MTU on mlxcx based aggregation can induce and FMA event to attempting to change MTU on mlxcx based aggregation can induce FMA event
Actions #3

Updated by Paul Winder 9 months ago

  • Related to Bug #12987: devo_power misconfigured in mlxcx added
Actions #4

Updated by Paul Winder 9 months ago

  • Related to Bug #12988: potential hang in mlxcx when async and ring vectors end up on same CPU added
Actions #5

Updated by Electric Monk 9 months ago

  • Gerrit CR set to 816
Actions #6

Updated by Paul Winder 9 months ago

A walk through on the circumstances which cause the message to appear when a ring is stopped.

An important factor is that an interrupt vector can (and almost certainly will) have multiple CQs assigned to it. Then when the vector gets an interrupt, as the interrupt handler processes the events it matches events to CQs and then process the pertinent CQ.

Also, CQs which act as receive queues/rings do have DMA buffers pre-allocated and assigned to entries in the CQ.

With this in mind, consider:
  1. An interrupt vector is scheduled and processing events.
  2. A command is issued which will stop the rings. Each ring is stopped in mlxcx_mac_ring_stop(). mlxcx_mac_ring_stop() will issue a command to the HCA which will stop the send and receive queues. After this no more events will be posted to the CQ tied to the receive queue, but there still maybe unprocessed completions yet to be processed in 1.
  3. mlxcx_mac_ring_stop() frees the DMA buffers tied to the CQ. This is safe because the CQ has been stopped, and the h/w will not post anymore completions.
  4. The running interrupt vector sees there are completion events for the CQ - they would have been post before the ring was stopped. It runs through the completions in the CQ and attempts to match completions to buffers. But the buffer lists have been emptied and the buffers freed, so it doesn't find a matching buffer for the completion and issues a warning message.

The warning messages are disconcerting, but in this case harmless. We need to avoid issuing messages for known situations which are not errors.

But, even though I never encountered this, I could see in the mlxcx_mac_ring_stop() code where the buffer lists, which are traversed in the processing of the CQs, are emptied without the CQ lock. Possibly leading to the interrupt processing the CQs to traverse a list whilst it is being deconstructed.

Actions #7

Updated by Electric Monk 8 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit e1447ca93391f31609bda487cb922dbff9dcdef5

commit  e1447ca93391f31609bda487cb922dbff9dcdef5
Author: Paul Winder <pwinder@racktopsystems.com>
Date:   2020-08-26T14:21:08.000Z

    12980 attempting to change MTU on mlxcx based aggregation can induce FMA event
    12987 devo_power misconfigured in mlxcx
    12988 potential hang in mlxcx when async and ring vectors end up on same CPU
    Reviewed by: Robert Mustacchi <rm@fingolfin.org>
    Reviewed by: Dan McDonald <danmcd@joyent.com>
    Reviewed by: Igor Kozhukhov <igor@dilos.org>
    Approved by: Garrett D'Amore <garrett@damore.org>

Actions

Also available in: Atom PDF