i40e allocates large amounts of DMA
The i40e driver pre-allocates all its DMA for all its rings during the call to mc_start(9E).
Typically one instance will have 32 groups, with a total of 512 rings, each ring will ask for 0x600 DMA buffers for tx, and 0x800 DMA buffers for rx. With jumbo frame (9K) if you do the maths, this ends up being ~16GB of DMA.
I have being doing work alongside #12957 to reduce this and improve command line responsiveness.
Updated by Paul Winder 6 months ago
Note: Since I originally added this, I realised I was running with 16 msix vector limit (the default is 8). The effect is the memory allocations would be ½ of the values below and since the majority of the time is in ring setup/teardown, I'd expect the times to be a bit over ½ those stated. The benefits in terms of percentage improvement will be similar.
To address this, I've changed the point at which the DMA is allocated and freed. I have moved it to allocate all rings for all groups in i40e_start(), to individual allocation for each ring as i40e_ring_start()/i40e_ring_stop() are called.
Based on this I have produced this table to illustrate how many calls are made into the DMA allocation and free routines and who long the command line utility takes to run the command.
In each case the tests were started with an un-configured NIC.
|create-aggr (2 i/fs)||3672064||37s||114752||1.2s|
|set linkprop mtu||3672064||3672074||10min||114752||229504||5.2s|
In the "Before" case, the large allocation is a one-off, subsequent VNICs created on the same link would not incur the same large allocation. But it does not detract that it would allocate ~16GB (with jumbo frames) of DMA for a single VNIC.
So, if you have a single aggr, you will use 1GB vs 32GB, a single VNIC 500MB vs 16GB. Obviously, the more aggr/vnic/addr groups created the usage will increase per these tables, but in the most common scenarios there are significant memory savings.
Also, note that the memory usage estimates are for a single interface, the memory usage can be multiplied up for dual or quad port PCI adapters.
There are some notable timing improvements with this change alongside #12957
Updated by Electric Monk 6 months ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
commit aa2a44afcbfb9d08096ea5af01f0bb30d4b7f9a6 Author: Paul Winder <email@example.com> Date: 2020-07-23T06:41:38.000Z 12957 Some ipadm and dladm commands are slow on i40e 12958 i40e allocates large amounts of DMA 12972 Remove reference to deprecated ddi_power from i40e Reviewed by: Garrett D'Amore <firstname.lastname@example.org> Reviewed by: Igor Kozhukhov <email@example.com> Reviewed by: Robert Mustacchi <firstname.lastname@example.org> Reviewed by: Randy Fishel <email@example.com> Approved by: Dan McDonald <firstname.lastname@example.org>