Project

General

Profile

Actions

Feature #12678

closed

mac_tx() is too eager to emulate hardware offloads

Added by Patrick Mooney about 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
bhyve
Gerrit CR:

Description

Upstreaming OS-7905 for bhyve:

This was discovered at the same time as OS-7904. As mentioned in that ticket, run the following IP forwarding test and dtrace program.

# NET_TESTS=/opt/net-tests /opt/net-tests/tests/forwarding/ip_forwarding -nfluv cea523d9-5e7e-4043-bbe8-fb920c3f8997 e9a97d65-b1e5-e77c-d573-f85ee5e0aa7c e9d88af9-352e-6419-ba42-88a25f2e967f

# dtrace -x dynvarsize=4m -qn 'mac_tx:entry { self->t = 1; this->mcip = (mac_client_impl_t *)arg0; self->mip = this->mcip->mci_mip; self->mci_name = stringof(this->mcip->mci_name); } mac_hw_emul:entry /self->t/ { this->mp = *args[0]; printf("emul [%s %s 0x%x (mp: 0x%x)] 0x%x %a\n", self->mci_name, self->mip->mi_name, self->mip->mi_tx_cksum_flags, this->mp->b_datap->db_struioun.cksum.flags, args[3], caller); stack(); } mac_tx_send:entry,mac_provider_tx:entry,mac_tx:return /self->t/ { self->t = 0; self->mci_name = 0; self->mip = 0; }' -o /var/tmp/emul.out

In the output you'll see entries like this.

emul [ipft_client_r0 simnet1060 0x0 (mp: 0x108)] 0x0 mac`mac_tx+0x120

              mac`mac_tx+0x120
              dld`str_mdata_fastpath_put+0x53
              ip`ip_xmit+0x82d
              ip`ip_forward_xmit_v4+0x116
              ip`ire_recv_forward_v4+0x5bd
              ip`ill_input_short_v4+0x4ee
              ip`ip_input_common_v4+0x3a7
              ip`ip_input+0x2b
              mac`mac_rx_soft_ring_process+0x19a
              mac`mac_rx_srs_proto_fanout+0x29a
              mac`mac_rx_srs_drain+0x363
              mac`mac_srs_worker+0x326
              unix`thread_start+0x8

OS-7904 already covers the problem with the mi_tx_cksum_flags value, but there's another one. Notice that last 0x0 value, the one before the caller symbol. That's the emul argument passed to mac_hw_emul(). Calling that function with a value of zero makes no sense because it means we won't emulate anything. The reason this is happening is because of the flags in the mblk: 0x108. These flags are HW_LOCAL_MAC and HCK_FULLCKSUM_OK. They are not related to checksum offloads provided by the NIC. Now look at the logic from mac_tx().

            const uint16_t needed =
                (DB_CKSUMFLAGS(mp) ^ mip->mi_tx_cksum_flags) &
                DB_CKSUMFLAGS(mp);

            mp->b_next = NULL;

            if (needed != 0) {
                mac_emul_t emul = 0;

                if (needed & HCK_IPV4_HDRCKSUM)
                    emul |= MAC_IPCKSUM_EMUL;
                if (needed & (HCK_PARTIALCKSUM | HCK_FULLCKSUM))
                    emul |= MAC_HWCKSUM_EMUL;
                if (needed & HW_LSO)
                    emul = MAC_LSO_EMUL;

                mac_hw_emul(&mp, &tail, NULL, emul);

The needed value will be non-zero because it's checking ALL mblk flags against mi_tx_cksum_flags. The later will only include checksum/LSO offload flags, but the former will include all flags. Thus we get a non-zero needed value even though there is nothing to emulate.

To test I ran the network test from the original report. While this test was running I ran the following dtrace script to check for erroneous mac_hw_emul() calls.

Here's the output before the patch. Notice the calls where EMUL FLAGS is 0x0.

<GZ> root@gaia [~]
# dtrace -x dynvarsize=4m -qn 'mac_tx:entry { self->t = 1; this->mcip = (mac_client_impl_t *)arg0; self->mip = this->mcip->mci_mip; self->mci_name = stringof(this->mcip->mci_name); } mac_hw_emul:entry /self->t/ { this->mp = *args[0]; @[self->mci_name, self->mip->mi_name, self->mip->mi_tx_cksum_flags, this->mp->b_datap->db_struioun.cksum.flags, arg3, caller] = count(); } mac_tx_send:entry,mac_provider_tx:entry,mac_tx:return /self->t/ { self->t = 0; self->mci_name = 0; self->mip = 0; } END { printf("%-16s %-16s %-12s %-12s %-12s %-32s %-8s\n", "CLIENT", "MAC", "MIP FLAGS", "MP FLAGS", "EMUL FLAGS", "CALLER", "COUNT"); printa("%-16s %-16s 0x%-10x 0x%-10x 0x%-10x %-32a %-@8u\n", @); }' 
^C
CLIENT           MAC              MIP FLAGS    MP FLAGS     EMUL FLAGS   CALLER                           COUNT   
ipft_server0     simnet1098       0x0          0x15         0x4          mac`mac_tx+0x120                 362     
ipft_client0     simnet1097       0x0          0x15         0x4          mac`mac_tx+0x120                 409     
ipft_client_r0   simnet1098       0x0          0x108        0x0          mac`mac_tx+0x120                 496     
ipft_server0     simnet1098       0x0          0x5          0x3          mac`mac_tx+0x120                 496     
ipft_client_r0   simnet1098       0x0          0x105        0x3          mac`mac_tx+0x120                 980     
ipft_client_r0   simnet1098       0x0          0x100        0x0          mac`mac_tx+0x120                 1502    

Here's the output after the patch. The fact that there are zero rows indicates that things are actually working as they should now (whereas before we were calling mac_hw_emul() in cases when we shouldn't.

# dtrace -x dynvarsize=4m -qn 'mac_tx:entry { self->t = 1; this->mcip = (mac_client_impl_t *)arg0; self->mip = this->mcip->mci_mip; self->mci_name = stringof(this->mcip->mci_name); } mac_hw_emul:entry /self->t/ { this->mp = *args[0]; @[self->mci_name, self->mip->mi_name, self->mip->mi_tx_cksum_flags, this->mp->b_datap->db_struioun.cksum.flags, arg3, caller] = count(); } mac_tx_send:entry,mac_provider_tx:entry,mac_tx:return /self->t/ { self->t = 0; self->mci_name = 0; self->mip = 0; } END { printf("%-16s %-16s %-12s %-12s %-12s %-32s %-8s\n", "CLIENT", "MAC", "MIP FLAGS", "MP FLAGS", "EMUL FLAGS", "CALLER", "COUNT"); printa("%-16s %-16s 0x%-10x 0x%-10x 0x%-10x %-32a %-@8u\n", @); }'
^C
CLIENT           MAC              MIP FLAGS    MP FLAGS     EMUL FLAGS   CALLER                           COUNT   

Finally, I ran the full network test suite to verify no regressions.

# /opt/net-tests/bin/nettest 
Test: /opt/net-tests/tests/forwarding/ip_fwd_no_cksum (run as root) [00:31] [PASS]
Test: /opt/net-tests/tests/forwarding/ip_fwd_partial_cksum (run as root) [00:31] [PASS]
Test: /opt/net-tests/tests/forwarding/ip_fwd_full_cksum (run as root) [00:32] [PASS]
Test: /opt/net-tests/tests/forwarding/ip_fwd_partial_cksum_lso (run as root) [00:32] [PASS]
Test: /opt/net-tests/tests/forwarding/ip_fwd_full_cksum_lso (run as root) [00:32] [PASS]

Results Summary
PASS       5

Running Time:   00:02:40
Percent passed: 100.0%

Related issues

Related to illumos gate - Feature #12677: simnet has bogus mi_tx_cksum_flagsClosedRyan Zezeski

Actions
Actions #1

Updated by Patrick Mooney about 2 years ago

Actions #2

Updated by Electric Monk about 2 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit c61a1653a4d73dbc950dac7d96350fd6cb517486

commit  c61a1653a4d73dbc950dac7d96350fd6cb517486
Author: Ryan Zezeski <rpz@joyent.com>
Date:   2020-05-18T18:37:51.000Z

    12676 want better offloads for vnics
    12677 simnet has bogus mi_tx_cksum_flags
    12678 mac_tx() is too eager to emulate hardware offloads
    Portions contributed by: Patrick Mooney <patrick.mooney@joyent.com>
    Portions contributed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Patrick Mooney <pmooney@oxide.computer>
    Reviewed by: Andy Fiddaman <andy@omniosce.org>
    Approved by: Dan McDonald <danmcd@joyent.com>

Actions

Also available in: Atom PDF