Bug #10408
opentemporarily frozen vnic
0%
Description
Symptom:
Network traffic suddenly stops being passed through a VNIC and then resumes with all the pending packets being sent out in a rush. The behavior seems to depend on the amount of traffic passed through the interface. On an interface with traffic in the MByte/s range stoppages occurs every 2-3 minutes on my production system.
System
2 * Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
512 GB RAM
2 * 10-Gigabit X540-AT2
1 * Symbios Logic SAS3008
OmniOS r28
Reproduce
keep pinging an IP address of the server. you will notice that it will stop responding to packets at some point ... after an interrupt of several seconds, all pending ICMP response packets will get sent out in quick succession.
ping test-machine | perl -n -e '/time=(\S+)/ and $1 > 100 and print localtime(time)." ".$_'
Analysis
Running
dtrace -n 'icmp_send_reply_v4:entry {stack(); printf("%Y %d", walltimestamp, walltimestamp);}'
reveals the following under normal operation
2019 Feb 15 09:45:01 1550220301866123949 2 49885 icmp_send_reply_v4:entry ip`icmp_inbound_v4+0x5fb ip`ip_fanout_v4+0xf49 ip`ip_input_local_v4+0x16e ip`ire_recv_local_v4+0x132 ip`ill_input_short_v4+0x4d6 ip`ip_input_common_v4+0x3a7 ip`ip_input+0x2b dls`i_dls_link_rx+0x1cd mac`mac_rx_deliver+0x37 mac`mac_rx_soft_ring_process+0x19a mac`mac_rx_srs_proto_fanout+0x29a mac`mac_rx_srs_drain+0x363 mac`mac_rx_srs_process+0x3ce mac`mac_rx_classify+0x129 mac`mac_rx_flow+0x63 mac`mac_rx_common+0x196 mac`mac_rx+0xb6 mac`mac_rx_ring+0x2b ixgbe`ixgbe_intr_rx_work+0x5c ixgbe`ixgbe_intr_msix+0x58
but when icmp packages are sent out after a blockage the stack looks like this
2019 Feb 15 09:45:02 1550220302636098390 5 49885 icmp_send_reply_v4:entry ip`icmp_inbound_v4+0x5fb ip`ip_fanout_v4+0xf49 ip`ip_input_local_v4+0x16e ip`ire_recv_local_v4+0x132 ip`ill_input_short_v4+0x4d6 ip`ip_input_common_v4+0x3a7 ip`ip_input+0x2b dls`i_dls_link_rx+0x1cd mac`mac_rx_deliver+0x37 mac`mac_rx_soft_ring_drain+0x155 mac`mac_soft_ring_worker+0x1b9 unix`thread_start+0x8
Updated by Tobias Oetiker over 3 years ago
Additional info
there are several vnics setup on the physical interface ... the blocking behaviour is isolated to individual VNICs ... all are affected, but the blocking does not happen at the same time
Updated by Tobias Oetiker over 3 years ago
additional info in https://illumos.topicbox.com/groups/developer/T4ecf9c6adb6e4ad7