mac_promisc_dispatch CV deadlock when re-entering against mac_promisc_remove
mac_promisc_dispatch, we call into
mac_callback_walker_enter, which amongst other things sets the
mcbi_walker_cnt to act as a barrier against somebody else destroying the cb while we're using it. The idea is that if someone who wants to use or destroy it (e.g.
mac_promisc_remove) enters while
mcbi_walker_cnt is non-zero, they will increment
mcbi_del_cnt and then sleep on the
mcbi_cv until all the walkers are done. This is basically a slightly hacky reader-writer lock (though some extra features were added with the
mcbi_barrier_cnt in pmooney's bhyve changes -- #12674). To prevent a nasty race with another new walker, in
mac_callback_walker_enter, if a new walker tries to use the cb while
mcbi_del_cnt is >0, then it too also sleeps on the CV.
Unfortunately, it's possible to re-enter
mac_promisc_dispatch from under
mac_promisc_dispatch_one, for example like in this thread stack:
fffffbe36151ec20 SLEEP CV 1 swtch+0x86 cv_wait+0x68 mac_callback_walker_enter+0x2b mac_promisc_dispatch+0x37 mac_provider_tx+0x57 mac_hwring_send_priv+0x1a aggr_ring_tx+0x1a mac_hwring_tx+0x19 mac_ring_tx+0x1e mac_provider_tx+0x85 mac_tx_send+0x288 mac_tx_soft_ring_process+0x89 mac_tx_aggr_mode+0x8c mac_tx+0x1a9 str_mdata_fastpath_put+0x8e ip_xmit+0x841 ire_send_wire_v4+0x345 conn_ip_output+0x1d4 tcp_send_data+0x58 tcp_input_data+0x1f3a squeue_enter+0x3f9 ip_fanout_v4+0xbca ip_input_local_v4+0xc6 ire_recv_local_v4+0x131 ill_input_short_v4+0x3ff ip_input_common_v4+0x23f ip_input+0x1f mac_rx_soft_ring_process+0x1be mac_rx_srs_fanout+0x395 mac_rx_srs_drain+0x22b mac_rx_srs_process+0x123 mac_rx_classify+0x88 mac_rx_flow+0x58 mac_rx_common+0x23e mac_rx+0xc6 aggr_mac_rx+0x25 aggr_recv_path_cb+0x11f aggr_recv_promisc_cb+0x13 mac_promisc_dispatch_one+0x9c mac_promisc_dispatch+0x82 mac_rx_common+0x47 mac_rx+0xc6 mac_rx_ring+0x1f ixgbe_intr_rx_work+0x5c ixgbe_intr_msix+0x3f apix_dispatch_by_vector+0x8c apix_dispatch_lowlevel+0x1c
If we manage to enter the callback walker section, and then while we're looping back around to call ourselves via a stack like this one, another thread enters
mac_promisc_remove, then this second looped back call will go to sleep on the
mcbi_cv waiting for
mac_promisc_remove, on the only thread which could signal that same CV (at the end of the original
mac_promisc_dispatch call, when it calls into
This deadlocks MAC and any thread which attempts to transmit network traffic after this point. Eventually this usually results in a full-system hang as other things block up behind things which are blocked on network TX.
Updated by Alex Wilson about 1 year ago
I have a dump of a system thus frozen, if anybody needs it to get further details. My original debugging sessions on it: