Bug #7572
closedvioif panic: qe->qe_indirect_next < qe->qe_queue->vq_indirect_num
100%
Description
During testing for a fix to enable MSI-x interrupts on KVM virtio devices, I ran across the following panic:
panic[cpu0]/thread=ffffff0007acbc40:
assertion failed: qe->qe_indirect_next < qe->qe_queue->vq_indirect_num, file: ../../common/io/virtio/virtio.c, line: 657
The stack trace is:
::stack
vpanic()
0xfffffffffba8bd94()
virtio_ve_add_indirect_buf+0xc5(ffffff026625a680, 109deb7a0, 40, 1)
virtio_ve_add_cookie+0x54()
vioif_send+0x288(ffffff02552fe900, ffffff025381a1a0)
vioif_tx+0x4d(ffffff02552fe900, ffffff025381a1a0)
mac_tx+0x594(ffffff024e9645c8, ffffff025381a1a0, e04b50e9, 1, 0)
str_mdata_fastpath_put+0x53(ffffff0250b4abe8, ffffff025381a1a0, e04b50e9, 1)
ip_xmit+0x9ed(ffffff025381a1a0, ffffff02508fccc0, b80036061, 2888, e04b50e9, 0)
ire_send_wire_v4+0x401(ffffff024f4862d8, ffffff025381a1a0, ffffff025381b510, ffffff0256d6b0c0, ffffff024ece6dc0)
conn_ip_output+0x2bc(ffffff025381a1a0, ffffff0256d6b0c0)
tcp_send_data+0x80(ffffff025a8755c0, ffffff025381a1a0)
tcp_send+0x6d8(ffffff025a8755c0, 51c, 34, 20, 0, ffffff0007acaed4)
tcp_wput_data+0x686(ffffff025a8755c0, 0, 0)
tcp_input_data+0xa54(ffffff025a874dc0, ffffff026c89d360, ffffff024ec5ea40, ffffff0007acb500)
squeue_enter+0x963(ffffff024ec5ea40, ffffff026c89d360, ffffff026c89d360, 1, ffffff0007acb500, 4)
ip_fanout_v4+0xd8f(ffffff026c89d360, ffffff026c89b17e, ffffff0007acb500)
ip_input_local_v4+0x16e(ffffff024f4866f8, ffffff026c89d360, ffffff026c89b17e, ffffff0007acb500)
ire_recv_local_v4+0x172(ffffff024f4866f8, ffffff026c89d360, ffffff026c89b17e, ffffff0007acb500)
ill_input_short_v4+0x568(ffffff026c89d360, ffffff026c89b17e, ffffff026c89b18e, ffffff0007acb500, ffffff0007acb690)
ip_input_common_v4+0x3ba(ffffff024eab91a8, ffffff024f140040, ffffff026c89d360, 0, 0, 0)
ip_input+0x2b(ffffff024eab91a8, ffffff024f140040, ffffff026c89d360, 0)
mac_rx_soft_ring_process+0x19a(ffffff024e9645c8, ffffff025367f040, ffffff026c89d360, ffffff026c89d360, 1, 0)
mac_rx_srs_proto_fanout+0x29a(ffffff0267842300, ffffff026c89d360)
mac_rx_srs_drain+0x34c(ffffff0267842300, 800)
mac_rx_srs_process+0x3ce(ffffff0250b6abb0, ffffff0267842300, ffffff026c89d360, 0)
mac_rx_common+0x143(ffffff0250b6abb0, 0, ffffff026c89d360)
mac_rx+0xb6(ffffff0250b6abb0, 0, ffffff026c89d360)
vioif_process_rx+0x111(ffffff02552fe900)
vioif_rx_handler+0x20(ffffff02552fe908, 0)
av_dispatch_autovect+0x91(1b)
dispatch_hardint+0x36(1b, 0)
ffffff0007a05a70 [stack frame pointer is invalid]
This seems to be very reproducible through the following test case:
ssh delphix@<ip addr of vioif device>
yes
Something strange is going on in this code:
644 unsigned int
645 virtio_ve_indirect_available(struct vq_entry *qe)
646 {
647 return (qe->qe_queue->vq_indirect_num - (qe->qe_indirect_next - 1));
648 }
In this code, when the two values are equal, the return value is 1, presumably indicating that there is still a space in the array. Yet, this is a false assertion, because when accessing an index of the array that is equal to its length, the element accessed is out of bounds.
The one place this function is used is in vioif_tx_external, which is inlined into vioif_send in the stack trace in the bug description. Right after this function is called is the call to virtio_ve_add_cookie which eventually causes the panic.
I used mdb to fix the off-by-1 error in the function and could not reproduce the panic. We need to change virtio_ve_indirect_available to return one less than it did before, thus accurately reflecting the status of the two values it uses as the length of the array and the index being used.