vmxnet3s rx buffer allocations can result in system unresponsiveness
We've had a number of customer problems related to vmxnet3s devices becoming unreachable over the network. The commonalities in these cases have been that a vmxnet3 device has jumbo frames enabled, and as receive throughput goes up for some sustained period, vmxnet3s memory allocations start taking a very long time in interrupt context and ultimately failing.
The issue is that vmxnet3s has a fixed-size receive buffer pool of pre-allocated buffers into which incoming packets are copied. When all of those buffers have been used (i.e., all buffers are used for in-flight data being processed by the networking stack), vmxnet3s starts dynamically allocating memory as needed. It requires contiguous pages for dma of the size of the MTU. For 1500 byte MTU, that's only a single page allocation. For 9000 byte MTU (jumbo frames), that's 3 contiguous pages. When memory gets fragmented enough, these allocations can take a very long time and ultimately fail.
vmxnet3s is doing contiguous multi-page allocations in interrupt context, and these allocations can take a long time (many seconds). We should avoid doing these allocations altogether by boosting the size of the buffer pool and dropping packets if we exhaust the pool. Three things we can do to help with this are:
1) Pre-allocating the maximum number of buffers in the buffer pool so that we never have to allocate memory in interrupt context.
2) Increase the size of the buffer pool to increase the likelihood that a pre-allocated buffer will be available even under very heavy load.
3) In the event that a receive interrupt comes in and there are no available buffers in the pool, instead of dynamically allocating a new buffer, we drop the packet.
Updated by Electric Monk almost 6 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
commit 6849994e8263545ed3c0f6f5676e47b38e14f63e Author: Sebastien Roy <firstname.lastname@example.org> Date: 2016-11-18T00:22:11.000Z 7582 vmxnet3s MAC callbacks erroneously return DDI_ return values 7583 vmxnet3s rx buffer allocations can result in system unresponsiveness Reviewed by: Pavel Zakharov <email@example.com> Reviewed by: Matthew Ahrens <firstname.lastname@example.org> Reviewed by: Steve Gonczi <email@example.com> Reviewed by: Robert Mustacchi <firstname.lastname@example.org> Reviewed by: Dan McDonald <email@example.com> Approved by: Richard Lowe <firstname.lastname@example.org>