aggr needs support for multiple pseudo rx groups
We need multiple pseudo Rx groups for aggr in order to better utilize the underlying hardware.
MAC groups are an abstraction used to group and program hardware rings. Groups are where we place unicast and VLAN filters. The more groups a MAC provider supports, the more MAC clients (e.g. VNICs) it can hardware accelerate through the use of hardware filtering. This hardware filtering relieves the MAC framework from performing software classification and allows the SRS to poll the MAC's hardware rings. In effect, network performance is better when a client has a reserved MAC group with hardware filtering.
An aggr is both a MAC client and a MAC provider. It's a client to the underlying MAC/NIC (aggr_port_t) and a provider of the aggregation (aggr_grp_t). As a provider it must support the groups abstraction to allow clients of the aggr to make use of hardware classification. It does this by creating a "pseudo group": an abstraction that combines 1 hardware group from each port. E.g., if we have two ixgbe ports with groups of 4 rings, then 1 pseudo group will contain a mapping to one group on each ixgbe port, and contain 8 pseudo rings.
The problem is that aggr currently only creates one pseudo group – no matter what the underlying hardware may provide. So, once again, if we aggregate two ixgbe NICs, each with 32 groups, the aggr will only make use of one group and the other 31 will go to waste. The upshot of this is that an aggr can provide hardware classification for only one client. The moment there are two or more clients all traffic going over the aggr must be software classified and performance improvements like polling mode are lost.
The purpose of this ticket is to track the work of adding multiple Rx pseudo group support to aggr.
On my local workstation running a DEBUG kernel I ran the following tests. Each of these tests was run in 3 scenarios: 1) two ixgbe aggrs with active LACP and default (1500) MTU, 2) two ixgbe aggrs with active LACP and 9000 MTU, 3) mixed aggr of ixgbe/igb with LACP off.
- Plumb IP on the primary aggr client and make sure it can still send/receive traffic. This traffic should come over the HW lanes but I introduced a regression there where the primary MAC client gets placed on the default group and gets SW classified. This doesn't affect the correctness of the program, just potentially its performance. I think its okay to fix this in a follow up ticket because in triton we are never going to pass traffic on the primary MAC. It will always be VLAN VNICs on the aggr/overlay.
- Same as above but on a VLAN on the primary MAC client (aka dladm create-vlan).
- Create a VNIC on the aggr, plumb IP on it, and verify traffic is received via HW lanes.
- Create a VLAN VNIC on the aggr, plumb IP on it, and verify traffic is received via HW lanes.
- Create two VNICs on the aggr, plumb IP on both, and verify traffic is received via HW lanes for both (this asserts that the pseudo groups are working and that each client gets its own group).
- Same as above but for VLAN VNICs.
- Perform variations of the previous tests, but with more VNICs, that exercise the VLAN ref counting mechanisms in MAC/ixgbe. It's important that if multiple clients exist on a VLAN that we keep the HW filters in place until the very last client using that VLAN is deleted.
- Run various non-aggr tests to make sure that the non-aggr Rx path still works.
I also ran a test where I would continously ping for a bogus IP to generate L2 multicast traffic. I would do this while booting a server with aggrs and make sure that the broadcast traffic didn't interfere with the creation of the aggr like we saw in OS-6697.
Finally, I ran several tests on a JPC CN. I booted a JPC CN on these aggr bits and then created two VMs: one KVM (KVM_A), one SmartOS container (SOS_A). I put both VMs on the external and fabric (overlay) networks. I then created a SmartOS container (SOS_B) on another CN. Then I ran the following tests:
- iperf3 from SOS_B to KVM_A over external
- iperf3 from SOS_B to KVM_A over fabric
- iperf3 from SOS_B to SOS_A over external
- iperf3 from SOS_B to SOS_A over fabric
- iperf3 from SOS_A to KVM_A over fabric (this exercises the MAC-loopback Tx path)
- iperf3 from SOS_A to KVM_A over external (this exercises the MAC-loopback Tx path)