Project

General

Profile

Actions

Bug #14032

closed

viona miscounts rx bytes copied

Added by Patrick Mooney 4 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Category:
bhyve
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Smithx10 reported via IRC that they had several bhyve VMs misbehavior in their environment after a (SmartOS) platform update. The instances in question were still responsive via the serial console, but would not respond to network traffic. No misbehaving firewall or network configuration was found. There was no apparent NIC errors inside the instance, but the ARP table was empty of valid entries (reporting empty records for IPs such as the default gateway). The next step was to inspect the in-kernel viona state to see if anything there was of interest. The TX ring was normal, with the vr_xfer_outstanding count at zero, ruling out "stuck" packets preventing other transmit activity from occurring. The RX ring was a different story:

> fffffd6b3b8f9110::print viona_vring_t
{
    vr_link = 0xfffffd6b3b8f9100
    vr_lock = {
        _opaque = [ 0 ]
    }
    vr_cv = {
        _opaque = 0x1
    }
    vr_state = 0x3
    vr_state_flags = 0
    vr_xfer_outstanding = 0
    vr_worker_thread = 0xfffffd69c5d314c0
    vr_lease = 0xfffffd6db829e8c0
    vr_txdesb = 0
    vr_txiov = 0
    vr_intr_enabled = 0
    vr_msi_addr = 0xfee00000
    vr_msi_msg = 0x4082
    vr_a_mutex = {
        _opaque = [ 0 ]
    }
    vr_u_mutex = {
        _opaque = [ 0 ]
    }
    vr_pa = 0x1366e0000
    vr_size = 0x400
    vr_mask = 0x3ff
    vr_cur_aidx = 0x579c
    vr_cur_uidx = 0x579c
    vr_map_pages = 0xfffffd6d5c81ba40
    vr_stats = {
        rs_ndesc_too_high = 0
        rs_bad_idx = 0
        rs_indir_bad_len = 0
        rs_indir_bad_nest = 0
        rs_indir_bad_next = 0
        rs_no_space = 0x14f86
        rs_too_many_desc = 0
        rs_desc_bad_len = 0
        rs_bad_ring_addr = 0
        rs_fail_hcksum = 0
        rs_fail_hcksum6 = 0
        rs_fail_hcksum_proto = 0
        rs_bad_rx_frame = 0xef8b7
        rs_rx_merge_overrun = 0
        rs_rx_merge_underrun = 0
        rs_rx_pad_short = 0x102
        rs_rx_mcast_check = 0
        rs_too_short = 0xef8b7
        rs_tx_absent = 0
        rs_rx_hookdrop = 0
        rs_tx_hookdrop = 0
    }
}

The high counts for no_space, bad_rx_frame and too_short drawing particular attention. Those stats continued to climb as we were inspecting the instance, providing a reason for why the host was not responding: it was unable to receive ARP replies (hence the empty table) and was effectively prevented from establishing IP traffic. After further debugging, that instance was restarted. After the (guest) reboot, it was able to communicate on the network once again. The rs_too_short and rs_bad_rx_frame counts continued to tick upward, though. We got some strange results when dtracing the too_short probe:

        if (err == 0 && copied != msz) {
                VIONA_PROBE5(too_short, viona_vring_t *, ring,
                    uint16_t, cookie, mblk_t *, mp, size_t, copied,
                    size_t, msz);
                VIONA_RING_STAT_INCR(ring, too_short);
                err = EINVAL;
        }

The "copied" sizes reported from the probe were in fact larger than msz (the calculated mblk data length). With values like copied = 2014 msz = 1514. This lead me to inspect the tallying for those values more closely. Some speculative dtrace shed further light on the matter:

viona_recv_merged:entry {
    self->spec = speculation();
}
viona_copy_mblk:entry
/self->spec/
{
    speculate(self->spec);
    self->seek = args[1];
    self->len = args[3];
    self->end = args[4];
}
viona_copy_mblk:return
/self->end && self->spec/
{
    speculate(self->spec);
    printf("copy_mblk(%lu, %lu) -> %lu %u\n", self->seek, self->len, (uint64_t)arg1, (uint_t)*self->end);
}

viona-too_short
/self->spec/
{
    commit(self->spec);
    self->spec = 0;
}
viona-too_short
{
    printf("too_short %lu %lu\n", (uint64_t)arg3, (uint64_t)arg4);
}

viona_recv_merged:return
/self->spec/
{
    discard(self->spec);
    self->spec = 0;
}

Many speculations failed, due to probe firings for other instances on the machine, but one that was captured further confirms the suspect tallying:

 35   6088           viona_copy_mblk:return copy_mblk(0, 500) -> 500 0
 35   6088           viona_copy_mblk:return copy_mblk(500, 1024) -> 1014 1
 35   5959 viona_recv_merged:viona-too_short too_short 2014 1514

The return values from viona_copy_mblk indicate that 1514 bytes were copied total, but the too_short probe shows 2014 - an overshoot of 500 (the first transfer size). Looking at the source, the reason is clear:

        do {
                while (i < n && !end) {
                        buf = iov[i].iov_base;
                        len = iov[i].iov_len;

                        chunk += viona_copy_mblk(mp, copied, buf, len, &end);
                        copied += chunk;
                        i++;
                }

                uelem[buf_idx].id = cookie;
                uelem[buf_idx].len = chunk;

For a merged RX operation, viona copies the packet data into the buffers associated with a virtqueue descriptor. If/when that descriptor becomes full, we will attempt to pull another from the queue (as prescribed by the spec for merged rx). This requires us to keep track of how much we copied for each descriptor (the chunk) as well as how much we copied total (copied). We are instead adding the chunk amount for every copied buffer. We should be recording the viona_copy_mblk in an intermediate and adding it (by itself) to both chunk and copied. This logic has been broken in viona for years, so it's a little surprising that it is acting up only how. One possible explanation: Until #13783, guest-physically-contiguous pages would be treated as virtually contiguous memory when consumed by viona, due to how bhyve was implementing guest memory (giant kmem_alloc() regions). After #13783, those previously contiguous regions would be treated as separate pages regardless, since bhyve would make no guarantees about them being virtually contiguous in the host KAS. It is likely that we only ever made one trip through the loop calling viona_copy_mblk (per descriptor), making the erroneous addition harmless. Regardless, this needs to be addressed.

I do not yet have solid evidence that this miscounting is what caused the RX ring to fill in the guest, but it would have implied that we were communicating bad "used" buffer lengths to the guest. It's not clear how that may have effected its interpretation of the results and subsequent behavior.

Actions #1

Updated by Patrick Mooney 4 months ago

Smithx10 has a test platform with the proposed fix to this (and #14026) for testing. He's looking to see if the workload which was seemingly adept at triggering the stuck ring can reproduce the same behavior when the copied byte counts are correct.

Actions #2

Updated by Patrick Mooney 3 months ago

After a weekend of running the test bits, Smithx10 reports that instances running on that CN continue to report zero too_short or bad_rx_frame counts, with no other error conditions.

Actions #3

Updated by Patrick Mooney 3 months ago

Like #14035, this misbehavior was triggered by large (>1500 byte) ethernet frames which would span multiple RX descriptors when being delivered to the guest. That explains why it was not caught until other problems popped up in an environment using jumbo frames.

Actions #4

Updated by Patrick Mooney 3 months ago

In addition to the in-situ testing, where the previously high rs_too_short counts were seen to be zero when running the proposed fix, I also tried the normal battery of guests in a local test setup. None had issue booting up and utilizing their viona NICs as before. I did not see any errors on the involved rings.

Actions #5

Updated by Electric Monk 3 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 49e0679264dbb4a3cb279b5f200ba140798c0388

commit  49e0679264dbb4a3cb279b5f200ba140798c0388
Author: Patrick Mooney <pmooney@pfmooney.com>
Date:   2021-08-31T19:18:31.000Z

    14032 viona miscounts rx bytes copied
    Reviewed by: Robert Mustacchi <rm@fingolfin.org>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Dan Cross <cross@oxidecomputer.com>
    Approved by: Gordon Ross <gordon.w.ross@gmail.com>

Actions

Also available in: Atom PDF