Project

General

Profile

Bug #2800

outbound traffic matching flows restricted to single aggr port

Added by Sebastien Roy over 7 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
kernel
Start date:
2012-05-29
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Consider the following aggregation with two ports:

  1. dladm show-aggr
    LINK POLICY ADDRPOLICY LACPACTIVITY LACPTIMER FLAGS
    aggr0 L4 auto off short -----
  2. dladm show-aggr x
    LINK PORT SPEED DUPLEX STATE ADDRESS
    PORTSTATE
    aggr0 -
    10000Mb full up 0:c:29:7c:80:45 --
    vmxnet3s0 10000Mb full up 0:c:29:7c:80:45 attached
    vmxnet3s1 10000Mb full up 0:c:29:7c:80:4f attached

We have a layer-4 outbound load-spreading policy One would assume that outgoing packets associated with different connections would generally be load-spread across these two ports. If there are no flows configured on the system, that is more-or-less true (subject to the quality of the port hashing algorithm). If I have a reasonable number of ssh connections to the system from a few locations, I can see ports being reasonably evenly used:

  1. dlstat show-aggr
    LINK PORT IPKTS RBYTES OPKTS OBYTES
    aggr0 -- 28.47K 1.98M 18.05K 2.80M
    aggr0 vmxnet3s0 14.54K 1.02M 8.33K 942.11K
    aggr0 vmxnet3s1 13.93K 961.04K 9.72K 1.86M

Now if we associate flows with said ssh sessions, we see a different picture (note that I've rebooted the system in between the two tests to reset kstats so that we can see the impact more clearly):

  1. flowadm add-flow -t -l aggr0 -a remote_ip=172.16.100.213 illumos
  2. flowadm add-flow -t -l aggr0 -a remote_ip=172.16.100.37 macbook
    <ssh from these systems, generate some output, and wait a little bit>
  3. dlstat show-aggr
    LINK PORT IPKTS RBYTES OPKTS OBYTES
    aggr0 -- 15.56K 1.32M 2.35K 223.31K
    aggr0 vmxnet3s0 8.34K 708.26K 2.35K 223.23K
    aggr0 vmxnet3s1 7.22K 615.02K 1 77

Note that only one of the ports is being used for output.

It turns out that this bug is not unique to flows over aggregations. It also exists for flows over any device that exports multiple transmit rings. It just happens to be the case that the aggr driver exports each underlying port as separate pseudo rings.

The problem is that flows' tx soft ring set's sole tx ring is always set to the default tx ring for the MAC, and is never updated (even when the default tx ring is changed). Thus packets associated with flows are always transmitted on the default tx ring. The reason why this is happening is that mac_datapath_setup() never associates a ring group with a flow entry (it issn't setting fe_tx_ring_group nor fe_rx_ring_group). When mac_tx_srs_setup() is then called to setup the soft ring set for this flow, it notices that there is no ring group for this flow and simply defaults to assigning the default transmit ring for the link to the flow:

if ((grp = flent->fe_tx_ring_group) == NULL) {
ring = (mac_ring_t *)mip->mi_default_tx_ring;
tx_ring_count = (ring == NULL) ? 0 : 1;
} else {
tx_ring_count = grp->mrg_cur_count;
ring = grp->mrg_rings;
ASSERT(grp->mrg_state MAC_GROUP_STATE_SHARED ||
grp->mrg_state MAC_GROUP_STATE_RESERVED);
}

Also available in: Atom PDF