Bug #12481
attempting to change MTU size of aggregate over mlxcx can cause dladm to hang
Start date:
Due date:
% Done:
100%
Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
Description
If you create an aggr over mlxcx devices and then issue the command:
dladm set-linkprop -p mtu=1500 aggr1
and the link is in use, you will get one of two different hangs.
Either
> ffffd0328837dba0::findstack -v stack pointer for thread ffffd0328837dba0 (dladm/1): ffffd000f63ff690 [ ffffd000f63ff690 _resume_from_idle+0x12b() ] ffffd000f63ff6c0 swtch+0x133() ffffd000f63ff700 cv_wait+0x68(ffffd0323fb00568, ffffd0323fb00520) ffffd000f63ff760 mlxcx_mac_ring_stop+0x11c(ffffd0328d401000) ffffd000f63ff790 mac_stop_ring+0x29(ffffd03277726720) ffffd000f63ff7c0 mac_stop_group_and_rings+0x38(ffffd03234f3c000) ffffd000f63ff7f0 mac_stop+0x33(ffffd031f41fc5c0) ffffd000f63ff850 mac_client_datapath_teardown+0x1c9(ffffd031f365c718, ffffd0328f7a1798, 0) ffffd000f63ff8a0 mac_unicast_remove+0x29b(ffffd031f365c718, ffffd0328f7a1798) ffffd000f63ff920 aggr_set_port_sdu+0x47(ffffd03cf66a5128, ffffd03345ccb308, 5dc, ffffd0328f2dbb08) ffffd000f63ff980 aggr_sdu_update+0x91(ffffd03cf66a5128, 5dc) ffffd000f63ff9e0 aggr_m_setprop+0x6c(ffffd03cf66a5128, fffffffffba08cdb, 6, 4, ffffd000f63ff9fc) ffffd000f63ffa50 mac_set_mtu+0xc4(ffffd03c89987138, 5dc, 0) ffffd000f63ffad0 mac_set_prop+0x136(ffffd03c89987138, 6, ffffd03c265e6010, ffffd03c265e6114, 4) ffffd000f63ffba0 drv_ioc_prop_common+0x4dd(ffffd03ea565aa80, 807e3c0, 1, ffffd0334ce09860, 100003) ffffd000f63ffbe0 drv_ioc_setprop+0x1b(ffffd03ea565aa80, 807e3c0, 100003, ffffd0334ce09860, ffffd000f63ffdd8) ffffd000f63ffc80 drv_ioctl+0x1ef(1200000000, d1d001b, 807e3c0, 100003, ffffd0334ce09860, ffffd000f63ffdd8) ffffd000f63ffcc0 cdev_ioctl+0x2b(1200000000, d1d001b, 807e3c0, 100003, ffffd0334ce09860, ffffd000f63ffdd8) ffffd000f63ffd10 spec_ioctl+0x45(ffffd031f0351500, d1d001b, 807e3c0, 100003, ffffd0334ce09860, ffffd000f63ffdd8, 0) ffffd000f63ffda0 fop_ioctl+0x5b(ffffd031f0351500, d1d001b, 807e3c0, 100003, ffffd0334ce09860, ffffd000f63ffdd8, 0) ffffd000f63ffec0 ioctl+0x153(3, d1d001b, 807e3c0) ffffd000f63fff10 _sys_sysenter_post_swapgs+0x14f()
Some analysis of this particular case:
The function mlxcx_mac_ring_stop() is waiting here:
mutex_enter(&s->mlbs_mtx); while (!list_is_empty(&s->mlbs_busy)) cv_wait(&s->mlbs_free_nonempty, &s->mlbs_mtx); while ((buf = list_head(&s->mlbs_free)) != NULL) { mlxcx_buf_destroy(mlxp, buf); } mutex_exit(&s->mlbs_mtx);
If we take a look at the mlbs_busy list, we see:
> ffffd0323fb00528::walk list|::print -t mlxcx_buffer_t mlb_state mlb_mp mlxcx_buffer_state_t mlb_state = 3 (MLXCX_BUFFER_ON_LOAN) mblk_t *mlb_mp = 0xffffd03cc8d0aec0 mlxcx_buffer_state_t mlb_state = 3 (MLXCX_BUFFER_ON_LOAN) mblk_t *mlb_mp = 0xffffd03c81a2e8c0 mlxcx_buffer_state_t mlb_state = 3 (MLXCX_BUFFER_ON_LOAN) mblk_t *mlb_mp = 0xffffd03e82f7b240
There are 3 buffers "on loan". These are receive buffers which have been passed up the stack, and in this case they are being held on to.
With some judicious kgrep'ing, I found this thread:
> ffffd000fb4a9c20::findstack -v stack pointer for thread ffffd000fb4a9c20 (aggr_lacp_rx_thread()): ffffd000fb4a9a50 [ ffffd000fb4a9a50 _resume_from_idle+0x12b() ] ffffd000fb4a9a80 swtch+0x133() ffffd000fb4a9ac0 cv_wait+0x68(ffffd03c89987224, ffffd03c89987210) ffffd000fb4a9b00 i_mac_perim_enter+0x5b(ffffd03c89987138) ffffd000fb4a9b30 mac_perim_enter_by_mh+0x23(ffffd03c89987138, ffffd000fb4a9b40) ffffd000fb4a9b70 aggr_lacp_rx+0x3d(ffffd03cc8d0aec0) ffffd000fb4a9c00 aggr_lacp_rx_thread+0xd2(ffffd03cf66a5128) ffffd000fb4a9c10 thread_start+0xb()
The argument to aggr_lacp_rx() is a mblk, and it is one of the mlxcx mblk. This thread is blocked in i_mac_perim_enter() because it wants to be the owner.
A closer look at the stack in mlxcx reveals a call to mac_set_mtu() with the same address as the argument to i_mac_perim_enter(). Which is a mac_impl_t:
> ffffd03c89987138::print -t mac_impl_t mi_perim_owner kthread_t *mi_perim_owner = 0xffffd0328837dba0
which is indeed owned by the thread blocked in mlxcx.
So, this is the deadlock: mlxcx thread is the perimeter owner, and is blocked because it has a mblk which is in use. That in use mblk is owned by another aggr thread which wants to be the perimeter owner.
The solution is not to block the mlxcx_mac_ring_stop() because of loaned mblk, but we do need to handle them be returned to use after the ring has stopped.
And another cause:
fffffe63205b2c20 SLEEP MUTEX 1 swtch+0x86 turnstile_block+0x25b mutex_vector_enter+0x358 mlxcx_mac_ring_tx+0x170 mac_hwring_tx+0x1c mac_hwring_send_priv+0x90 aggr_ring_tx+0x1a mac_hwring_tx+0x1c mac_tx_send+0x63c mac_tx_soft_ring_process+0x89 mac_tx_aggr_mode+0x8c mac_tx+0xdb str_mdata_fastpath_put+0x8e ip_xmit+0x841 ire_send_wire_v4+0x345 conn_ip_output+0x1d4 tcp_send_data+0x58 tcp_send+0x8c2 tcp_wput_data+0x68a tcp_input_data+0xab7 squeue_enter+0x3f9 ip_fanout_v4+0xbca ip_input_local_v4+0xc6 ire_recv_local_v4+0x131 ill_input_short_v4+0x3ff ip_input_common_v4+0x23f ip_input+0x1f mac_rx_soft_ring_process+0x1be mac_rx_srs_proto_fanout+0x29a mac_rx_srs_drain+0x2e5 mac_rx_srs_process+0x123 mac_rx_classify+0x7c mac_rx_flow+0x58 mac_rx_common+0x1d6 mac_rx+0xc6 mac_rx_ring+0x1f aggr_mac_rx+0x1c aggr_recv_path_cb+0x12f aggr_recv_cb+0xd mac_rx_common+0x20b mac_rx+0xc6 mac_rx_ring+0x1f mlxcx_intr_n+0x191 apix_dispatch_by_vector+0x8c apix_dispatch_lowlevel+0x1c
In this case the mac_set_mtu() is holding the WQ mutex (amongst others), and waiting for the bufs to drain. But the stack, is waiting for the same mutex, but has just taken a buffer.... deadlock again.
Updated by Paul Winder 10 months ago
The problem is the mlxcx mac stop callback function mlxcx_mac_ring_stop() grabs the completion and work queue mutex and then waits for the buffer queues to drain whilst holding them. Holding the completion mutex blocks the interrupt routine, so we need to shutdown the queues, release the locks and then wait for the buffer lists to drain.
Updated by Electric Monk 9 months ago
- Status changed from In Progress to Closed
- % Done changed from 70 to 100
git commit 19325e87abbf92e73010df2342fc48019027aee1
commit 19325e87abbf92e73010df2342fc48019027aee1 Author: Paul Winder <pwinder@racktopsystems.com> Date: 2020-04-30T07:09:30.000Z 12481 attempting to change MTU size of aggregate over mlxcx can cause dladm to hang Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Andy Stormont <astormont@racktopsystems.com> Reviewed by: Dan McDonald <danmcd@joyent.com> Reviewed by: Robert Mustacchi <rm@fingolfin.org> Approved by: Garrett D'Amore <garrett@damore.org>