Project

General

Profile

Feature #12674

want mac rx barrier function

Added by Patrick Mooney 9 months ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
bhyve
Gerrit CR:

Description

Upstreaming OS-7727 for bhyve:

As part of ongoing work on the viona subsystem of bhyve, it has become apparent that the ability to impose a barrier on the mac RX path would be valuable. This barrier functionality would ensure that after a call to the function (mac_rx_barrier), any threads executing RX callbacks for the client, both via mac_rx_set and mac_promisc_add, will have completed those calls. It does not prevent new entry to the RX callbacks, but does promise that such entries will have been totally completed or begin fresh, after the barrier returns.

Test notes:

The pending wad for OS-7447, which inspired the need for mac_rx_barrier, was the primary test-bed for this change. A CN was booted up on a platform containing a combination of both proposed changes. A bhyve instance was started and loaded with test network traffic (via iperf3). Using bhyvectl --wrlock-cycle, the viona instances serving the guest were repeatedly forced to relinquish their driver lease in bhyve, requiring mac_rx_barrier calls to ensure all MAC callbacks would either finish or observe the changes to viona state. Visibility of these events was tracked with dtrace, verifying that callback info state like mcbi_barrier_cnt was as expected.

The second focus was to ensure that the new limits on MAC callback walker entry (threads entering the callback list must wait until pending deletes or barrier requests have finished) are functioning as expected. Catching such an occurrence is difficult given the tight timing of promisc callbacks and the relatively slow pace of events which manipulate those lists. I used the following dtrace to force such circumstances to be visible:

dls_rx_promisc:entry
{
    chill(50000)
}
mac_callback_walker_enter:entry
{
    self->t = timestamp;
    self->w = args[0];
}
mac_callback_walker_enter:return
{
    @ = quantize(timestamp - self->t);
    self->t = 0
}
cv_wait:entry /self->t/
{
    stack();
    print(*self->w)
}

With unfiltered promisc enabled in a zone, I repeatedly started and stopped snoop(1):

while true; do snoop -r -d net0 -c 5; done

This caused promisc callbacks to be created and deleted, each time the snoop command ran and exited, resulting in eventual firing of the cv_wait probe, showing the non-zero mcbi_del_cnt:

  9  25074                    cv_wait:entry
              mac`mac_callback_walker_enter+0x33
              mac`mac_promisc_dispatch+0x47
              mac`mac_rx_common+0x45
              mac`mac_rx+0xb6
              mac`mac_rx_ring+0x2b
              igb`igb_intr_rx_work+0x5c
              igb`igb_intr_rx+0x15
              apix`apix_dispatch_by_vector+0x8c
              apix`apix_dispatch_lowlevel+0x25
              unix`switch_sp_and_call+0x13
mac_cb_info_t {
    kmutex_t *mcbi_lockp = 0xfffffe59ff4f7c40
    kcondvar_t mcbi_cv = {
        ushort_t _opaque = 0x1
    }
    uint_t mcbi_del_cnt = 0x1
    uint_t mcbi_walker_cnt = 0x1
    uint_t mcbi_barrier_cnt = 0
}
^C
  9  25074                    cv_wait:entry
              mac`mac_callback_walker_enter+0x33
              mac`mac_promisc_dispatch+0x47
              mac`mac_rx_common+0x45
              mac`mac_rx+0xb6
              mac`mac_rx_ring+0x2b
              igb`igb_intr_rx_work+0x5c
              igb`igb_intr_rx+0x15
              apix`apix_dispatch_by_vector+0x8c
              apix`apix_dispatch_lowlevel+0x25
              unix`switch_sp_and_call+0x13
mac_cb_info_t {
    kmutex_t *mcbi_lockp = 0xfffffe59ff4f7c40
    kcondvar_t mcbi_cv = {
        ushort_t _opaque = 0x1
    }
    uint_t mcbi_del_cnt = 0x1
    uint_t mcbi_walker_cnt = 0x1
    uint_t mcbi_barrier_cnt = 0
}

           value  ------------- Distribution ------------- count
            1024 |                                         0
            2048 |@@@@@@@@@                                579
            4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@              1718
            8192 |@@@@                                     243
           16384 |                                         8
           32768 |                                         4
           65536 |                                         0

Using a DEBUG kernel, I hammered several of the test conditions and then forced a dump, checking ::findleaks once it had completed. The result was clean.

#1

Updated by Patrick Mooney 9 months ago

A small change needed, since this is being integrated before the vnic-LSO/loopback wad:

--- a/usr/src/uts/common/io/mac/mac_client.c
+++ b/usr/src/uts/common/io/mac/mac_client.c
@@ -1641,7 +1641,7 @@ mac_rx_barrier(mac_client_handle_t mch)
        i_mac_perim_enter(mip);

        /* If a RX callback is set, quiesce and restart that datapath */
-       if (mcip->mci_rx_fn != mac_rx_def) {
+       if (mcip->mci_rx_fn != mac_pkt_drop) {
                mac_rx_client_quiesce(mch);

#2

Updated by Patrick Mooney 8 months ago

This was tested first on SmartOS bhyve. After integration there, it was side-pulled by OmniOSce.

#3

Updated by Electric Monk 8 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit b237158d576c3f39f35d97c4dd214c07273ddde4

commit  b237158d576c3f39f35d97c4dd214c07273ddde4
Author: Patrick Mooney <pmooney@pfmooney.com>
Date:   2020-05-11T20:55:36.000Z

    12674 want mac rx barrier function
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Ryan Zezeski <rpz@joyent.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF