Project

General

Profile

Actions

Bug #4302

closed

stack overflow from squeue_drain()

Added by Robert Mustacchi over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Category:
networking
Start date:
2013-11-07
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

A customer doing load testing induced a panic with the following stack trace:

ffffff00b88ad040 ip_xmit+0x23(ffffff1a0ba6d620, ffffff1ad8fede88, 180036060, 34, e3408ec4, 0, 0, ffffff1a04734c68)
ffffff00b88ad270 ire_send_wire_v4+0x345(ffffff19e85fc190, ffffff1a0ba6d620, ffffff19fe980910, ffffff1a04734b00, ffffff19e771b618)
ffffff00b88ad2f0 conn_ip_output+0x250(ffffff1a0ba6d620, ffffff1a04734b00)
ffffff00b88ad320 tcp_send_data+0x84(ffffff1a04762300, ffffff1a0ba6d620)
ffffff00b88ad350 tcp_ack_timer+0x125(ffffff1a04762000)
ffffff00b88ad390 tcp_timer_handler+0x4a(ffffff1a04762000, ffffff1a046ee590, ffffff1992146d00, 0)
ffffff00b88ad5b0 squeue_drain+0x1f8(ffffff1992146d00, 4, ae52030a8b)
ffffff00b88ad640 squeue_enter+0x4fe(ffffff1992146d00, ffffff19f133dcc0, ffffff19f133dcc0, 1, ffffff00b88ad780, 4, 4)
ffffff00b88ad710 ip_fanout_v4+0x48d(ffffff19f133dcc0, ffffff19f115f310, ffffff00b88ad780)
ffffff00b88ad9b0 ire_send_local_v4+0x4ce(ffffff19c463e498, ffffff19f133dcc0, ffffff19f115f310, ffffff1ae5155ac0, ffffff19d5492908)
ffffff00b88ada30 conn_ip_output+0x250(ffffff19f133dcc0, ffffff1ae5155ac0)
ffffff00b88ada60 tcp_send_data+0x84(ffffff1ae5154a80, ffffff19f133dcc0)
ffffff00b88adab0 tcp_timer+0xac3(ffffff1ae5154780)
ffffff00b88adaf0 tcp_timer_handler+0x4a(ffffff1ae5154780, ffffff1a05f898d8, ffffff1992146400, 0)
ffffff00b88add10 squeue_drain+0x1f8(ffffff1992146400, 4, ae5202f0df)
ffffff00b88adda0 squeue_enter+0x4fe(ffffff1992146400, ffffff19eda81540, ffffff19eda81540, 1, ffffff00b88adee0, 4, 4)
ffffff00b88ade70 ip_fanout_v4+0x48d(ffffff19eda81540, ffffff1addecc590, ffffff00b88adee0)
ffffff00b88ae110 ire_send_local_v4+0x4ce(ffffff19d1dc9b18, ffffff19eda81540, ffffff1addecc590, ffffff19f44a7640, ffffff19d54926a8)
ffffff00b88ae190 conn_ip_output+0x250(ffffff19eda81540, ffffff19f44a7640)
ffffff00b88ae1c0 tcp_send_data+0x84(ffffff19f44a3300, ffffff19eda81540)
ffffff00b88ae210 tcp_timer+0xac3(ffffff19f44a3000)
ffffff00b88ae250 tcp_timer_handler+0x4a(ffffff19f44a3000, ffffff19f4487268, ffffff19921467c0, 0)
ffffff00b88ae470 squeue_drain+0x1f8(ffffff19921467c0, 4, ae52027e0d)
ffffff00b88ae500 squeue_enter+0x4fe(ffffff19921467c0, ffffff1adb26a760, ffffff1adb26a760, 1, ffffff00b88ae640, 4, 4)
ffffff00b88ae5d0 ip_fanout_v4+0x48d(ffffff1adb26a760, ffffff1adb28ea10, ffffff00b88ae640)
ffffff00b88ae870 ire_send_local_v4+0x4ce(ffffff19d1dc9b18, ffffff1adb26a760, ffffff1adb28ea10, ffffff19f44d7300, ffffff19d54926a8)
ffffff00b88ae8f0 conn_ip_output+0x250(ffffff1adb26a760, ffffff19f44d7300)
ffffff00b88ae920 tcp_send_data+0x84(ffffff19f450fbc0, ffffff1adb26a760)
ffffff00b88ae970 tcp_timer+0xac3(ffffff19f450f8c0)
ffffff00b88ae9b0 tcp_timer_handler+0x4a(ffffff19f450f8c0, ffffff19f44ed538, ffffff1992146b80, 0)
ffffff00b88aebd0 squeue_drain+0x1f8(ffffff1992146b80, 4, ae520241a1)
ffffff00b88aec60 squeue_enter+0x4fe(ffffff1992146b80, ffffff1a0a7e8540, ffffff1a0a7e8540, 1, ffffff00b88aeda0, 4, 4)
ffffff00b88aed30 ip_fanout_v4+0x48d(ffffff1a0a7e8540, ffffff19edb5bb90, ffffff00b88aeda0)
ffffff00b88aefd0 ire_send_local_v4+0x4ce(ffffff19d0a084b0, ffffff1a0a7e8540, ffffff19edb5bb90, ffffff19e7e29480, ffffff19921320b0)
ffffff00b88af050 conn_ip_output+0x250(ffffff1a0a7e8540, ffffff19e7e29480)
ffffff00b88af080 tcp_send_data+0x84(ffffff19e7e86a80, ffffff1a0a7e8540)
ffffff00b88af0d0 tcp_timer+0xac3(ffffff19e7e86780)
ffffff00b88af110 tcp_timer_handler+0x4a(ffffff19e7e86780, ffffff19e9d69458, ffffff1992146880, 0)
ffffff00b88af330 squeue_drain+0x1f8(ffffff1992146880, 4, ae5201bb5c)
ffffff00b88af3c0 squeue_enter+0x4fe(ffffff1992146880, ffffff19ebb3b720, ffffff19ebb3b720, 1, ffffff00b88af500, 4, 4)
ffffff00b88af490 ip_fanout_v4+0x48d(ffffff19ebb3b720, ffffff19ebb2a890, ffffff00b88af500)
ffffff00b88af730 ire_send_local_v4+0x4ce(ffffff19d1dc9b18, ffffff19ebb3b720, ffffff19ebb2a890, ffffff19f44bf800, ffffff19d54926a8)
ffffff00b88af7b0 conn_ip_output+0x250(ffffff19ebb3b720, ffffff19f44bf800)
ffffff00b88af7e0 tcp_send_data+0x84(ffffff19f44be400, ffffff19ebb3b720)
ffffff00b88af830 tcp_timer+0xac3(ffffff19f44be100)
ffffff00b88af870 tcp_timer_handler+0x4a(ffffff19f44be100, ffffff19f4499480, ffffff1998e31e00, 0)
ffffff00b88afa90 squeue_drain+0x1f8(ffffff1998e31e00, 4, ae5201a634)
ffffff00b88afb20 squeue_enter+0x4fe(ffffff1998e31e00, ffffff19efa45780, ffffff19efa45780, 1, ffffff00b88afc60, 4, 4)
ffffff00b88afbf0 ip_fanout_v4+0x48d(ffffff19efa45780, ffffff19efa485d0, ffffff00b88afc60)
ffffff00b88afe90 ire_send_local_v4+0x4ce(ffffff19d1dc9b18, ffffff19efa45780, ffffff19efa485d0, ffffff19f44bfc80, ffffff19d54926a8)
ffffff00b88aff10 conn_ip_output+0x250(ffffff19efa45780, ffffff19f44bfc80)
ffffff00b88aff40 tcp_send_data+0x84(ffffff19f44beb40, ffffff19efa45780)
ffffff00b88aff90 tcp_timer+0xac3(ffffff19f44be840)
ffffff00b88affd0 tcp_timer_handler+0x4a(ffffff19f44be840, ffffff19f4499588, ffffff19921461c0, 0)
ffffff00b88b01f0 squeue_drain+0x1f8(ffffff19921461c0, 4, ae52018884)
ffffff00b88b0280 squeue_enter+0x4fe(ffffff19921461c0, ffffff1ad9f6e0c0, ffffff1ad9f6e0c0, 1, ffffff00b88b04e0, 4, 4)
ffffff00b88b0350 ip_fanout_v4+0x48d(ffffff1ad9f6e0c0, ffffff1aded57d90, ffffff00b88b04e0)
ffffff00b88b03d0 ire_recv_local_v4+0x366(ffffff19c5f50080, ffffff1ad9f6e0c0, ffffff1aded57d90, ffffff00b88b04e0)
ffffff00b88b04b0 ill_input_short_v4+0x6ce(ffffff1ad9f6e0c0, ffffff1aded57d90, ffffff1aded57da0, ffffff00b88b04e0, ffffff00b88b0670)
ffffff00b88b06e0 ip_input+0x23b(ffffff19b4f617a8, 0, ffffff1ad9f6e0c0, ffffff00b88b0740)
ffffff00b88b07d0 i_dls_link_rx+0x2e7(ffffff19c0125570, 0, ffffff1ad9f6e0c0, 0)
ffffff00b88b0810 mac_rx_deliver+0x5d(ffffff19b38bc010, 0, ffffff1ad9f6e0c0, 0)
ffffff00b88b08a0 mac_rx_soft_ring_process+0x17a(ffffff19b38bc010, ffffff19c044c000, ffffff1ad9f6e0c0, ffffff1ad9f6e0c0, 1, 0)
ffffff00b88b1160 mac_rx_srs_fanout+0x823(ffffff19c04089c0, ffffff1ad9f6e0c0)
ffffff00b88b11e0 mac_rx_srs_drain+0x261(ffffff19c04089c0, 800)
ffffff00b88b1270 mac_rx_srs_process+0x1a9(ffffff1995775008, ffffff19c04089c0, ffffff1ad9f6e0c0, 0)
ffffff00b88b1360 mac_tx_send+0x334(ffffff19b38b74a8, ffffff19ac1e0788, ffffff1ad9f6e0c0, ffffff00b88b1370)
ffffff00b88b13f0 mac_tx_single_ring_mode+0x91(ffffff19c0be3300, ffffff1ad9f6e0c0, e33e89dc, 1, 0)
ffffff00b88b1480 mac_tx+0x35d(ffffff19b38b74a8, ffffff1ad9f6e0c0, e33e89dc, 1, 0)
ffffff00b88b14e0 str_mdata_fastpath_put+0xa4(ffffff19d18bbd38, ffffff1ad9f6e0c0, e33e89dc, 1)
ffffff00b88b1630 ip_xmit+0x7eb(ffffff1ad9f6e0c0, ffffff1ad8fedd80, 180036060, 3c, e33e89dc, 0, 0, ffffff19f44b3168)
ffffff00b88b1860 ire_send_wire_v4+0x345(ffffff19e85fc450, ffffff1ad9f6e0c0, ffffff1aded57d90, ffffff19f44b3000, ffffff19e771b7e0)
ffffff00b88b18e0 conn_ip_output+0x250(ffffff1ad9f6e0c0, ffffff19f44b3000)
ffffff00b88b1910 tcp_send_data+0x84(ffffff19f44ee480, ffffff1ad9f6e0c0)
ffffff00b88b1960 tcp_timer+0xac3(ffffff19f44ee180)
ffffff00b88b19a0 tcp_timer_handler+0x4a(ffffff19f44ee180, ffffff19f44eded8, ffffff1992146ac0, 0)
ffffff00b88b1bc0 squeue_drain+0x1f8(ffffff1992146ac0, 2, ae52011fa8)
ffffff00b88b1c20 squeue_worker+0x132(ffffff1992146ac0)
ffffff00b88b1c30 thread_start+8()

The problem is that this thread is the squeue worker for one squeue – but via squeue_enter() is trying to drain 7 other squeues! This is clearly out of hand; we need a stack check in squeue_drain() to assure that in the event of a non-worker doing an squeue_drain() without much stack remaining, we defer to the worker instead of trying to do it in place.

Actions #1

Updated by Robert Mustacchi over 8 years ago

  • Status changed from New to Resolved

Resolved in 1ddb55e64530b7536b48b3ce8a2a9ad78793aa40.

Actions

Also available in: Atom PDF