Feature #7865
closedAllow i40e to use multiple rings
100%
Description
Currently, i40e has a single ring (and interrupt vector) for processing transmit and receive data and this seems to limit the potential throughput. The best I could achieve was around 30Gb/s with large MTU sizes.
The plan is to make the implementation as simple as possible, i.e. to keep a single group with multiple rings assigned to a single PF. Multiple VF's (and groups) could be a separate project.
Related issues
Updated by Paul Winder over 5 years ago
One of the motivations for this work was a large variability in throughput on the i40e when placed under load, such that using iperf with multiple connections would invariably lead to at least one of them dropping almost down to zero. Initially I still had this problem but I managed to track it down. I believe it was the interaction between when the ring polling code disables interrupts whilst still active in the interrupt routine. Serialising the enabling and disabling of the interrupts in i40e_rx_ring_intr_[en|dis]able with i40e_ring_rx() stopped the throughput dropping down removed and lot of variability.
In tests using iperf with an MTU of 9000, with a single connection, I got throughput figures of
iperf -c 172.16.101.2 -t 30 -w 8M -P 1 ------------------------------------------------------------ Client connecting to 172.16.101.2, TCP port 5001 TCP window size: 8.00 MByte ------------------------------------------------------------ [ 3] local 172.16.101.1 port 43639 connected with 172.16.101.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-30.0 sec 77.2 GBytes 22.1 Gbits/sec
With two connections:
iperf -c 172.16.101.2 -t 30 -w 8M -P 2 ------------------------------------------------------------ Client connecting to 172.16.101.2, TCP port 5001 TCP window size: 8.00 MByte ------------------------------------------------------------ [ 4] local 172.16.101.1 port 60196 connected with 172.16.101.2 port 5001 [ 3] local 172.16.101.1 port 52806 connected with 172.16.101.2 port 5001 [ ID] Interval Transfer Bandwidth [ 4] 0.0-30.0 sec 69.4 GBytes 19.9 Gbits/sec [ 3] 0.0-30.0 sec 68.5 GBytes 19.6 Gbits/sec [SUM] 0.0-30.0 sec 138 GBytes 39.5 Gbits/sec
Using an MTU of 1500 the maximum throughput I could get was 24.5Gb/s which translates to > 2,000,000 packets/sec
Updated by Robert Mustacchi over 5 years ago
Hmm, if MAC was allowing the ring processing thread at the same time as the interrupt thread, that would explain a good deal about what we're seeing with the cyclical performance, as we'd like end up seeing duplicate packets delivered to TCP which would end up killing the window at times, or worse.
Updated by Paul Winder over 5 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 90
Updated by Electric Monk over 5 years ago
- Status changed from In Progress to Closed
git commit 396505af9432aab52f4853cfde77ca834a9cce76
commit 396505af9432aab52f4853cfde77ca834a9cce76 Author: Paul Winder <paul.winder@tegile.com> Date: 2017-02-23T14:46:40.000Z 7865 Allow i40e to use multiple rings Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Dale Ghent <daleg@omniti.com> Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Updated by Yuri Pankov about 5 years ago
- Related to Bug #8318: i40e polling panics on debug after 7865 added