Bug #1925

stack overflow from mac code

Added by Robert Mustacchi over 2 years ago. Updated almost 2 years ago.

Status:Resolved Start date:2011-12-22
Priority:Urgent Due date:
Assignee:Robert Mustacchi % Done:

100%

Category:kernel Spent time: -
Target version:-
Difficulty:Medium Tags:

Description

Joyent observed a panic with the following stack trace:

ffffff00f7b2a160 _interrupt+0x4b()
ffffff00f7b2a230 ip_fanout_v4+0x48d(ffffff298c32d960, ffffff246c2c0d90, ffffff00f7b2a3c0)
ffffff00f7b2a2b0 ire_recv_local_v4+0x366(ffffff376ce06008, ffffff298c32d960, ffffff246c2c0d90, ffffff00f7b2a3c0)
ffffff00f7b2a390 ill_input_short_v4+0x6ce(ffffff298c32d960, ffffff246c2c0d90, ffffff246c2c0da0, ffffff00f7b2a3c0, ffffff00f7b2a550)
ffffff00f7b2a5c0 ip_input+0x23b(ffffff3de8c23528, 0, ffffff298c32d960, ffffff00f7b2a620)
ffffff00f7b2a6b0 i_dls_link_rx+0x2e7(ffffff29f93e96f0, 0, ffffff298c32d960, 0)
ffffff00f7b2a6f0 mac_rx_deliver+0x5d(ffffff229fc538d0, 0, ffffff298c32d960, 0)
ffffff00f7b2a780 mac_rx_soft_ring_process+0x17a(ffffff229fc538d0, ffffff23fef56680, ffffff298c32d960, ffffff298c32d960, 1, 0)
ffffff00f7b2a8c0 mac_rx_srs_proto_fanout+0x4e5(ffffff2267e24340, ffffff298c32d960)
ffffff00f7b2a940 mac_rx_srs_drain+0x26e(ffffff2267e24340, 800)
ffffff00f7b2a9d0 mac_rx_srs_process+0x180(ffffff21f94ae018, ffffff2267e24340, ffffff298c32d960, 0)
ffffff00f7b2aac0 mac_tx_send+0x334(ffffff229fc54100, ffffff21f943f788, ffffff298c32d960, ffffff00f7b2aad0)
ffffff00f7b2ab50 mac_tx_single_ring_mode+0x91(ffffff2204f46000, ffffff298c32d960, e49e066f, 1, 0)
ffffff00f7b2abe0 mac_tx+0x35d(ffffff229fc54100, ffffff298c32d960, e49e066f, 1, 0)
ffffff00f7b2ac40 str_mdata_fastpath_put+0xa4(ffffff220d9f8150, ffffff298c32d960, e49e066f, 1)
ffffff00f7b2ad90 ip_xmit+0x7eb(ffffff298c32d960, ffffff3de9087c90, 180036060, 34, e49e066f, 0, 0, ffffff24ef68e628)
ffffff00f7b2afc0 ire_send_wire_v4+0x345(ffffff376cb47010, ffffff298c32d960, ffffff246c2c0d90, ffffff24ef68e4c0, ffffff2801959920)
ffffff00f7b2b040 conn_ip_output+0x250(ffffff298c32d960, ffffff24ef68e4c0)
ffffff00f7b2b0c0 tcp_xmit_end+0x12a(ffffff24f0337b40)
ffffff00f7b2b110 tcp_close_output+0x142(ffffff24f0337840, ffffff24f0337e40, ffffff21d8c0d580, 0)
ffffff00f7b2b330 squeue_drain+0x1f8(ffffff21d8c0d580, 4, 2f69a5de6cca8)
ffffff00f7b2b3c0 squeue_enter+0x4fe(ffffff21d8c0d580, ffffff240508a580, ffffff240508a580, 1, ffffff00f7b2b620, 4, 4)
ffffff00f7b2b490 ip_fanout_v4+0x48d(ffffff240508a580, ffffff298b6f9b10, ffffff00f7b2b620)
ffffff00f7b2b510 ire_recv_local_v4+0x366(ffffff3de86c29d0, ffffff240508a580, ffffff298b6f9b10, ffffff00f7b2b620)
ffffff00f7b2b5f0 ill_input_short_v4+0x6ce(ffffff240508a580, ffffff298b6f9b10, ffffff298b6f9b20, ffffff00f7b2b620, ffffff00f7b2b7b0)
ffffff00f7b2b820 ip_input+0x23b(ffffff3895e0cb28, 0, ffffff240508a580, ffffff00f7b2b880)
ffffff00f7b2b910 i_dls_link_rx+0x2e7(ffffff29f93e9598, 0, ffffff240508a580, 0)
ffffff00f7b2b950 mac_rx_deliver+0x5d(ffffff229fc54100, 0, ffffff240508a580, 0)
ffffff00f7b2b9e0 mac_rx_soft_ring_process+0x17a(ffffff229fc54100, ffffff2403373780, ffffff240508a580, ffffff240508a580, 1, 0)
ffffff00f7b2bb20 mac_rx_srs_proto_fanout+0x4e5(ffffff2267e23680, ffffff240508a580)
ffffff00f7b2bba0 mac_rx_srs_drain+0x26e(ffffff2267e23680, 800)
ffffff00f7b2bc30 mac_rx_srs_process+0x180(ffffff21f94ae018, ffffff2267e23680, ffffff240508a580, 0)
ffffff00f7b2bd20 mac_tx_send+0x334(ffffff229fc538d0, ffffff21f943f788, ffffff240508a580, ffffff00f7b2bd30)
ffffff00f7b2bdb0 mac_tx_single_ring_mode+0x91(ffffff23ffc509c0, ffffff240508a580, e49e21da, 1, 0)
ffffff00f7b2be40 mac_tx+0x35d(ffffff229fc538d0, ffffff240508a580, e49e21da, 1, 0)
ffffff00f7b2bea0 str_mdata_fastpath_put+0xa4(ffffff23af353888, ffffff240508a580, e49e21da, 1)
ffffff00f7b2bff0 ip_xmit+0x7eb(ffffff240508a580, ffffff24286ad290, 180036061, 34, e49e21da, 0, 0, ffffff24f10f6168)
ffffff00f7b2c220 ire_send_wire_v4+0x345(ffffff376ce062c8, ffffff240508a580, ffffff298b6f9b10, ffffff24f10f6000, ffffff28019594f8)
ffffff00f7b2c2a0 conn_ip_output+0x250(ffffff240508a580, ffffff24f10f6000)
ffffff00f7b2c320 tcp_xmit_end+0x12a(ffffff24f10ed400)
ffffff00f7b2c370 tcp_close_output+0x142(ffffff24f10ed100, ffffff24f10ed700, ffffff21d8c0d4c0, 0)
ffffff00f7b2c590 squeue_drain+0x1f8(ffffff21d8c0d4c0, 4, 2f69a5de3e68e)
ffffff00f7b2c620 squeue_enter+0x4fe(ffffff21d8c0d4c0, ffffff24ffe70ac0, ffffff24ffe70ac0, 1, ffffff00f7b2c880, 4, 4)
ffffff00f7b2c6f0 ip_fanout_v4+0x48d(ffffff24ffe70ac0, ffffff24b475c090, ffffff00f7b2c880)
ffffff00f7b2c770 ire_recv_local_v4+0x366(ffffff2987f662d8, ffffff24ffe70ac0, ffffff24b475c090, ffffff00f7b2c880)
ffffff00f7b2c850 ill_input_short_v4+0x6ce(ffffff24ffe70ac0, ffffff24b475c090, ffffff24b475c0a0, ffffff00f7b2c880, ffffff00f7b2ca10)
ffffff00f7b2ca80 ip_input+0x23b(ffffff3894c24728, 0, ffffff24ffe70ac0, ffffff00f7b2cae0)
ffffff00f7b2cb70 i_dls_link_rx+0x2e7(ffffff29f93e99a0, 0, ffffff24ffe70ac0, 0)
ffffff00f7b2cbb0 mac_rx_deliver+0x5d(ffffff23d956f8c8, 0, ffffff24ffe70ac0, 0)

From the stack above, it is possible for us to enter this path again and again without popping off stack frames. Because of this, we ended causing the double fault. The solution that we have implemented is similar to what is done in putnext. We have added a check to use the asynchronous worker depending on the amount of stack space remaining. The value is a tunable, but our experience shows it wants to be around 10k.

History

Updated by Robert Mustacchi over 2 years ago

  • Category set to kernel

Updated by Rich Lowe over 2 years ago

  • Assignee set to Robert Mustacchi
  • % Done changed from 0 to 100
  • Tags deleted (needs-triage)

Resolved in r13552:375fcb299d27

Updated by Rich Lowe almost 2 years ago

  • Status changed from New to Resolved

Also available in: Atom PDF