Project

General

Profile

Bug #10902

vioif should do TX reclaim without NOTIFY_ON_EMPTY

Added by Joshua M. Clulow 7 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Category:
driver - device drivers
Start date:
2019-05-02
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

The vioif driver uses the VIRTIO_F_NOTIFY_ON_EMPTY feature to trigger TX descriptor reclamation. This feature is not widely used in Virtio drivers on other platforms; indeed, perhaps only in the NetBSD driver from which vioif was originally derived. The original Virtio specification draft can be interpreted in different ways and it appears to be possible to end up in a situation where all of the TX descriptors have been used, but no interrupt has arrived (or the interrupt that arrived was poorly timed).

This change seeks to rework our reclamation handling in three ways:

  1. Attempt to reclaim any available descriptors whenever beginning a TX operation.
  2. Schedule a periodic reclaim (when there are outstanding descriptors) so that even in the absence of new transmissions, old descriptors will have an upper bound on their lifetime.
  3. Enable the (normally disabled) TX interrupt when no available descriptors exist in order to uncork the ring as soon as possible.

Note: This bug mirrors Joyent-internal bug OS-7513

History

#1

Updated by Joshua M. Clulow 7 months ago

Performance Notes (from Patrick Mooney)

Here are the results of some performance tests done on bhyve. The "before" platform is joyent_20190103T011318Z.

As a client (with a local zone acting as iperf3 server):

[  4]   0.00-10.00  sec  1.07 GBytes   923 Mbits/sec                  sender
[  4]   0.00-10.00  sec  1.07 GBytes   923 Mbits/sec                  receiver
[  4]   0.00-10.00  sec  1.07 GBytes   922 Mbits/sec                  sender
[  4]   0.00-10.00  sec  1.07 GBytes   922 Mbits/sec                  receiver
[  4]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  sender
[  4]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver

As a server:

[  4]   0.00-10.00  sec  4.19 GBytes  3.60 Gbits/sec                  sender
[  4]   0.00-10.00  sec  4.19 GBytes  3.60 Gbits/sec                  receiver
[  4]   0.00-10.00  sec  4.57 GBytes  3.93 Gbits/sec                  sender
[  4]   0.00-10.00  sec  4.57 GBytes  3.93 Gbits/sec                  receiver
[  4]   0.00-10.00  sec  3.98 GBytes  3.42 Gbits/sec                  sender
[  4]   0.00-10.00  sec  3.98 GBytes  3.42 Gbits/sec                  receiver

After booting to a PI with the change...

As a client:

[  4]   0.00-10.00  sec  2.37 GBytes  2.03 Gbits/sec                  sender
[  4]   0.00-10.00  sec  2.37 GBytes  2.03 Gbits/sec                  receiver
[  4]   0.00-10.00  sec  2.85 GBytes  2.45 Gbits/sec                  sender
[  4]   0.00-10.00  sec  2.85 GBytes  2.45 Gbits/sec                  receiver
[  4]   0.00-10.00  sec  2.78 GBytes  2.39 Gbits/sec                  sender
[  4]   0.00-10.00  sec  2.78 GBytes  2.39 Gbits/sec                  receiver

As a server:

[  4]   0.00-10.00  sec  3.84 GBytes  3.30 Gbits/sec                  sender
[  4]   0.00-10.00  sec  3.84 GBytes  3.30 Gbits/sec                  receiver
[  4]   0.00-10.00  sec  4.01 GBytes  3.45 Gbits/sec                  sender
[  4]   0.00-10.00  sec  4.01 GBytes  3.45 Gbits/sec                  receiver
[  4]   0.00-10.00  sec  4.14 GBytes  3.55 Gbits/sec                  sender
[  4]   0.00-10.00  sec  4.14 GBytes  3.55 Gbits/sec                  receiver
#2

Updated by Joshua M. Clulow 7 months ago

Testing Notes (from Patrick Mooney)

Functional Testing

The most critical area I wanted to verify was the updated vioif_tx_drain() logic which now waits for TX descriptors which are still outstanding (not yet posted to the used ring). This is in contrast to the old vioif_stop() logic which did not perform such a wait. In order to simulate those outstanding TX descriptors, I used some longstanding undesired behavior in bhyve: where packets looped back to a native zone can be stalled in a TCP buffer if the underlying socket is not being read.
My test program to be run in the native zone is as follows:

#include <stdio.h>
#include <sys/socket.h>
#include <netinet/in.h>

int main()
{

        int lfd, fd;
        struct sockaddr_in sin;
        char b;

        lfd = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
        inet_aton("<ZONE IP ADDR>", &sin.sin_addr);
        sin.sin_port = htons(8080);

        bind(lfd, (const struct sockaddr *)&sin, sizeof (sin));
        listen(lfd, 1);

        fd = accept(lfd, NULL, NULL);

        printf("accepted conn %d\n", fd);

        read(0, &b, 1);

        printf("closing\n");
        close(fd);
        close(lfd);

        return (0);
}

From the smartos guest, I initiated a connection with nc(1), input some data via the keyboard, and then closed the connect with ^D, leaving the process "hanging" with the lingering connection. I then issued a ifconfig vioif0 unplumb to shutdown that vioif instance. The command hung, as expected. Using mdb -k I observed the vioif_tx_drain() function spinning in the delay/reclaim cycle, waiting for the descriptors to be posted to the used ring. I stopped the test program in the loopback zone and observed the smartos guest immediately finish its reclaim and indicate success of the unplumb.

Periodic Reclaim

Using DTrace, I observed the periodic reclaim being schedule when packets were outstanding due to new transmissions, resulting in its eventual shutdown when they were reclaimed and no frames were outstanding.

Simple periodic reclaim startup/shutdown

With the system in a steady state (with no active network traffic), I issued one ping to a local address while tracing the relevant functions in vioif. It shows the transmit and scheduling of the periodic reclaim, the first instance of the periodic check reclaiming the used descriptor, and the subsequent check where it de-schedules itself due to being idle:

  1  61311                   vioif_tx:entry
  1  61305      vioif_reclaim_restart:entry
  0  61303     vioif_reclaim_periodic:entry
  0  61302     vioif_reclaim_used_tx:return                 1
  0  61305      vioif_reclaim_restart:entry
  1  61303     vioif_reclaim_periodic:entry
  1  61302     vioif_reclaim_used_tx:return                 0
#3

Updated by Joshua M. Clulow 7 months ago

Final Testing Notes

I built this change against stock illumos-gate and installed it in my OpenIndiana virtual machine. I was able to use iperf3 in both directions (in and out of the guest) to perform a basic smoke test to confirm that the new driver version still works.

#4

Updated by Electric Monk 6 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 9e0bf232630ca5ae16a56613041b276f8a1c9740

commit  9e0bf232630ca5ae16a56613041b276f8a1c9740
Author: Patrick Mooney <pmooney@pfmooney.com>
Date:   2019-05-12T14:31:24.000Z

    10902 vioif should do TX reclaim without NOTIFY_ON_EMPTY
    Reviewed by: Jason King <jbk@joyent.com>
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Gordon Ross <gwr@nexenta.com>
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Approved by: Richard Lowe <richlowe@richlowe.net>
    Approved by: Garrett D'Amore <garrett@damore.org>

Also available in: Atom PDF