Bug #12957
closed
Some ipadm and dladm commands are slow on i40e
Added by Paul Winder almost 2 years ago.
Updated almost 2 years ago.
Description
Some of the network management commands when used against i40e interfaces are slow. Eg
create-aggr |
37s |
set-linkprop -p mtu aggr |
10m |
delete-aggr |
20s |
create-vnic |
16s |
create-if i40e |
20s |
delete-if i40e |
10s |
The majority of the time is caused by a misprogrammed delay used in ring teardown. The command is similar to:
/* delay 50ms */
delay(50 * drv_usectohz(1000))
This is supposed to delay by 50ms .... but if the tick is 100Hz (so not set to highres tick), this ends up being 500ms. And when there is this delay for every ring and there are 256 rings, that's 128s.
- Related to Bug #12958: i40e allocates large amounts of DMA added
Note: Since I originally added this, I realised I was running with 16 msix vector limit (the default is 8). Since the majority of the time is in ring setup/teardown, I'd expect the times to be a bit over ½ those stated. The benefits in terms of percentage improvement will be similar.
Along with #12958 we have the following timings (before and after):
Aggregates¶
|
Before |
After |
create-aggr (2 i/fs) |
37s |
1.2s |
create-if |
0.01s |
3.4s |
create-addr |
5 |
1.6s |
set linkprop mtu |
10min |
5.2s |
delete-aggr |
20s |
0.6s |
VNICs¶
|
Before |
After |
create-vnic |
16s |
1s |
create-if |
0.1s |
0.1s |
create-addr |
0.5s |
0.5s |
i40e¶
|
Before |
After |
create-if |
20s |
2.3s |
create-addr |
0.5s |
1.0s |
delete-if |
10.3s |
0.6s |
- Description updated (diff)
Testing included:
- At the very least all the commands to produce the previous tables.
- Running with different MTUs.
- Using iperf to place load across NIC.
- The MTU on an aggr can be changed whilst it is in use. When the MTU of an aggr is changed, it will stop the rings, change the MTU and restart them, all this whilst traffic is passing over the aggr. This is a good test of the changes which stop the ring and quiesce the interrupt vectors. This knowledge was used to create a test which used iperf to generate traffic and repeatedly change the MTU on the aggr.
- Related to Bug #12972: Remove reference to deprecated ddi_power from i40e added
iperf was used place a load on the NIC. Infrastructure restrictions prevented me from putting a full load, with the source of the i a 2 NIC aggr of 10Gb ixgbe I had throughput of 19.7Gb (to a 25Gb mlxcx)
The change introduced a mutex which is acquired in the interrupt thread, but it will v. rarely contended. It is acquired only when admin programs are used, and then briefly simply to set and test flags. There is a secondary benefit. Before the change when multiple rings (as there would always be) were mapped to a vector, the effort to determine the “not started” rings had no work involved checking ring rx and tx rings before deciding to do nothing. Whereas now it is a v. simple check of a boolean (as there will be no ring).
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
git commit aa2a44afcbfb9d08096ea5af01f0bb30d4b7f9a6
commit aa2a44afcbfb9d08096ea5af01f0bb30d4b7f9a6
Author: Paul Winder <pwinder@racktopsystems.com>
Date: 2020-07-23T06:41:38.000Z
12957 Some ipadm and dladm commands are slow on i40e
12958 i40e allocates large amounts of DMA
12972 Remove reference to deprecated ddi_power from i40e
Reviewed by: Garrett D'Amore <garrett@damore.org>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Robert Mustacchi <rm@fingolfin.org>
Reviewed by: Randy Fishel <randyf@sibernet.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Also available in: Atom
PDF