Project

General

Profile

Bug #13157

maxbw can kill link with hires tick

Added by Ryan Zezeski 8 days ago. Updated 8 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
networking
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

The new default hires tick (#11499) can cause certain maxbw values to freeze a link's Rx SRS.

The maxbw setting is presented to the user as a per-second enforcement, e.g. 5M means 5Mbps. However, the actual enforcement in the kernel is done in terms of bytes-per-tick. This means a conversion must happen from Mbps to Bpt. Set maxbw low enough and your hz rate could bring the kernel's bw limit below the MTU of the link, leading to a frozen link. Because of this there is a check to make sure the user sets maxbw high enough to allow one MTU frame per tick. But this check is hard-coded for both 100hz and 1500 MTU. The simple solution is to fix the check to use the actual hz value and actual MTU value. However, because of this discrepancy between user-maxbw and kernel-maxbw, the higher the hz and MTU, the larger the minimum maxbw must be. I have a patch to modify the kernel enforcement to actually enforce the limit the way the user expects, but its not ready yet and we need a fix to keep people from freezing their links in the interim. The way I see it we have two choices.

1. The technically correct but potentially breaking to users way: fix the user-land check and don't allow any maxbw that would lead to a per-tick enforcement of less than MTU. Given there are potentially users with persistent link props of maxbw below 12M (higher for larger MTUs), then this will break them.

2. Non-breaking way that potentially allows more traffic than maxbw prescribes: implicitly raise kernel-side limit to MTU-frame-per-tick. User's persistent props don't break and we fix the freeze.

I lean towards number 2 because I have a series of changes that fix maxbw as a whole and would allow users to set arbitrarily low limits (like frame-per-second). Otherwise, we have to potentially break current users only to once again allow those lower limits in the future. Since this only affects users of "small" maxbw (anything below hz * MTU * 8), I think this is an acceptable trade off for now.

History

#1

Updated by Ryan Zezeski 8 days ago

  • Assignee changed from Adam Leventhal to Ryan Zezeski
#2

Updated by Electric Monk 8 days ago

  • Gerrit CR set to 929
#3

Updated by Ryan Zezeski 8 days ago

It's also worth noting the technical details of why exactly the Rx SRS stalls.

First, in mac_rx_srs_process(), we were falling thru to the following code because mac_bw_drop_threshold was not big enough to handle the incoming MTU frame.

                if (head != NULL) {
                    /* Drop any packet over the threshold */
                    srs_rx->sr_stat.mrs_sdrops += count;
                    mutex_enter(&mac_bw->mac_bw_lock);
                    mac_bw->mac_bw_drop_bytes += sz;
                    mutex_exit(&mac_bw->mac_bw_lock);
                    freemsgchain(head);
                }

Here's the dtrace proving it.

rpz@thunderhead:~$ pfexec dtrace -qn 'mac_rx_srs_process:entry { this->srs = (mac_soft_ring_set_t *)arg1; } mac_rx_srs_process:entry /this->srs->srs_type & 0x00000200/ { self->t = 1; } mac_rx_srs_process:return /self->t/ { self->t = 0; } freemsgchain:entry /self->t/ { printf("freemsgchain %u\n", msgsize(args[0])); }'
freemsgchain 1514
freemsgchain 1514
freemsgchain 1514
freemsgchain 1514
freemsgchain 1514
freemsgchain 1514
freemsgchain 1514

After fixing that you'll hit another problem, in SRS Rx drain. The mac_srs_pick_chain call will return NULL but also set SRS_BW_ENFORCED, leading us to spin in mac_rx_srs_drain_bw until someone readjusts the maxbw setting. You can see this clearly in this dtrace. Note that I should have instead printed this state on return from mac_srs_pick_chain, but it still gets the point across. It shows that we have all the available bandwidth for this tick, and there is a frame of 1514 to be processed, but SRS_BW_ENFORCED is set so we spin (as indicated by the 2359 times this probe fired in a span of a couple of seconds on a link that is receiving no traffic).

rpz@thunderhead:~$ pfexec dtrace -qn 'mac_srs_pick_chain:entry { @[args[0]->srs_bw->mac_bw_used, args[0]->srs_size, args[0]->srs_bw->mac_bw_state] = count(); }'
^C

                0             1514        0             2359

It was at this point I realized that the easiest thing to do was just raise the mac_bw_limit to allow MTU-per-tick. Otherwise the change was going to start touching multiple places in the data path and effectively would end up at the same result but with more complicated code.

Also available in: Atom PDF