Bug #13208
Create aggr fails when underlying links have more than 128 Tx rings
100%
Description
Using:
dladm create-aggr -l mlxcx0 -l mlxcx1 -l mlxcx2 aggr0
Currently panics the system (see #13207). Needless, we should be able to create an aggr with more than 2 mlxcx links. It is cause because the default Tx rings per group in mlxcx is 64, so 2 mlxcx hits the global per group limit of 128 -
MAX_RINGS_PER_GROUP
.
The default is set by this constant:
#define MLXCX_TX_NRINGS_PER_GROUP_DFLT 64
This seems far too high, I propose the default be set to 16 - the same as number of rings per Rx group.
It can be overridden is mlxcx.conf using tx_nrings_per_group
driver property.
Related issues
Updated by Paul Winder 4 months ago
- Related to Bug #13207: Creating an aggr with more than 128 Tx or Rx rings panics added
Updated by Alex Wilson 4 months ago
Setting this default lower seems fine to me. The default of 64 was mainly picked while thinking of large machines (64 or 128 logical CPUs) with single or dual NICs (like our fileserver head nodes) to make sure they could spread out across the CPUs available.
I created an aggr of 4 mlxcxs during testing originally, but it was on a machine with way fewer logical CPUs and it reduced the count of rings per group automatically there. I didn't realise the global limit was so low.
Updated by Paul Winder 3 months ago
- Subject changed from Can't create an aggr with more than 2 mlxcx links to Create aggr fails when underlying links have more than 128 Tx rings
Updated by Paul Winder 3 months ago
I have revised how I intend to address this. Rather than reducing the default on the individual driver (mlxcx) where the fault was found, I tried to take a more generic approach.
When an aggr is created, it will now limit the number of rings per port/link. It will restrict it to MAX_RINGS_PER_GROUP / <number of ports>
. If the aggr is created with a single link, we will make sure that link does not use more than ½ MAX_RINGS_PER_GROUP
.
As an aggr is being manipulated (eg by adding and removing ports/links) we will maintain the highest number of ports/links the aggr ever had. Then when adding links to an existing aggr we will include the new links in the high link count and ensure the new links never use more than MAX_RINGS_PER_GROUP / <high port count>
rings. This can lead to the situation where newly added links may have less Tx rings, this will only become apparent when the underlying link driver (currently only mlxcx) has significant number of Tx rings. To circumvent this, where possible, an aggr should be created with all its underlying links in the initial dladm create-aggr
command, once the aggr has been created if a link is removed from an aggr then re-added, the link will get the same number of rings it previously had.
If new links are added to an aggr and there are no Tx rings left, the operation will fail - as it does now.
Updated by Paul Winder 3 months ago
- Creating an aggr with 1 link
- Add another to it
- Adding a 3rd link which has too many Tx rings
- Deleting what was created, and creating a 4 link aggr. Each mlxcx has 64 Tx rings and the create succeeded by limiting the number used for each link.
- Removing and re-adding back a link.
# dladm create-aggr -t -l mlxcx0 aggr0 # dladm show-aggr -L aggr0 LINK PORT AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED aggr0 mlxcx0 yes no no no no no # dladm add-aggr -t -l mlxcx1 aggr0 # dladm show-aggr -L aggr0 LINK PORT AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED aggr0 mlxcx0 yes no no no no no -- mlxcx1 yes no no no no no # dladm add-aggr -t -l mlxcx2 aggr0 dladm: add operation failed: I/O error # dladm show-aggr -L aggr0 LINK PORT AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED aggr0 mlxcx0 yes no no no no no -- mlxcx1 yes no no no no no # dladm delete-aggr aggr0 # dladm create-aggr -t -l mlxcx0 -l mlxcx1 -l mlxcx2 -l mlxcx3 aggr0 # dladm show-aggr -L aggr0 LINK PORT AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED aggr0 mlxcx0 yes no no no no no -- mlxcx1 yes no no no no no -- mlxcx2 yes no no no no no -- mlxcx3 yes no no no no no # dladm remove-aggr -t -l mlxcx0 aggr0 # dladm show-aggr -L aggr0 LINK PORT AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED aggr0 mlxcx1 yes no no no no no -- mlxcx2 yes no no no no no -- mlxcx3 yes no no no no no # dladm add-aggr -t -l mlxcx0 aggr0 # dladm show-aggr -L aggr0 LINK PORT AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED aggr0 mlxcx1 yes no no no no no -- mlxcx2 yes no no no no no -- mlxcx3 yes no no no no no -- mlxcx0 yes no no no no no # dladm delete-aggr aggr0
Updated by Paul Winder 3 months ago
- Increased the default number of ports to 4. It is common for aggrs to have 4 ports and we have had no reported issues with other NICs used in 4 port aggrs panic'ing because of depleted Tx tings. So it seems reasonable this will have no impact on existing aggrs.
- When adding further aggrs, it will ensure no individual port can have more Tx rings than any other. So, if 5th link is added to the aggr, then none of the links will uses more than 1/5th of the maximum supported rings -
MAX_RINGS_PER_GROUP
. It does this by trimming down existing rings.
To test the code, I used a version which started with a default of 2 ports and created an aggr with 2 mlxcx links. Each mlxcx link will have 64 Tx rings, so together they would use up to the MAX_RINGS_PER_GROUP
limit. To this I added 2 more links which would force the trimming of existing links. This was done whilst send iperf traffic.
Sample iperf output after adding the 2 links, there was a drop off on some of the TCP connections which later recovered.
[ 5] 30.00-40.00 sec 6.64 GBytes 5.70 Gbits/sec [ 8] 30.00-40.00 sec 5.97 GBytes 5.12 Gbits/sec [ 10] 30.00-40.00 sec 6.63 GBytes 5.70 Gbits/sec [ 12] 30.00-40.00 sec 6.40 GBytes 5.50 Gbits/sec [SUM] 30.00-40.00 sec 25.6 GBytes 22.0 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 40.00-50.00 sec 11.6 GBytes 9.93 Gbits/sec [ 8] 40.00-50.00 sec 2.28 GBytes 1.96 Gbits/sec [ 10] 40.00-50.00 sec 2.32 GBytes 1.99 Gbits/sec [ 12] 40.00-50.00 sec 11.9 GBytes 10.2 Gbits/sec [SUM] 40.00-50.00 sec 28.0 GBytes 24.1 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 50.00-60.00 sec 13.7 GBytes 11.7 Gbits/sec [ 8] 50.00-60.00 sec 0.00 Bytes 0.00 bits/sec [ 10] 50.00-60.00 sec 0.00 Bytes 0.00 bits/sec [ 12] 50.00-60.00 sec 13.7 GBytes 11.8 Gbits/sec [SUM] 50.00-60.00 sec 27.4 GBytes 23.5 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 60.00-70.00 sec 13.5 GBytes 11.6 Gbits/sec [ 8] 60.00-70.00 sec 0.00 Bytes 0.00 bits/sec [ 10] 60.00-70.00 sec 0.00 Bytes 0.00 bits/sec [ 12] 60.00-70.00 sec 13.7 GBytes 11.7 Gbits/sec [SUM] 60.00-70.00 sec 27.1 GBytes 23.3 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 70.00-80.00 sec 13.6 GBytes 11.7 Gbits/sec [ 8] 70.00-80.00 sec 0.00 Bytes 0.00 bits/sec [ 10] 70.00-80.00 sec 0.00 Bytes 0.00 bits/sec [ 12] 70.00-80.00 sec 13.9 GBytes 12.0 Gbits/sec [SUM] 70.00-80.00 sec 27.5 GBytes 23.6 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 80.00-90.00 sec 13.5 GBytes 11.6 Gbits/sec [ 8] 80.00-90.00 sec 0.00 Bytes 0.00 bits/sec [ 10] 80.00-90.00 sec 0.00 Bytes 0.00 bits/sec [ 12] 80.00-90.00 sec 13.6 GBytes 11.7 Gbits/sec [SUM] 80.00-90.00 sec 27.1 GBytes 23.3 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 90.00-100.00 sec 9.27 GBytes 7.96 Gbits/sec [ 8] 90.00-100.00 sec 4.12 GBytes 3.53 Gbits/sec [ 10] 90.00-100.00 sec 4.03 GBytes 3.46 Gbits/sec [ 12] 90.00-100.00 sec 10.0 GBytes 8.62 Gbits/sec [SUM] 90.00-100.00 sec 27.5 GBytes 23.6 Gbits/sec
I assumed the drop off was caused by stopping of some of the rings, so I repeated the test when I knew no rings would be trimmed and there was a similar effect. I'm not sure what causes this, but it doesn't seem to be a regression.
[ 5] 80.00-90.00 sec 4.55 GBytes 3.91 Gbits/sec [ 8] 80.00-90.00 sec 4.51 GBytes 3.87 Gbits/sec [ 10] 80.00-90.00 sec 11.0 GBytes 9.45 Gbits/sec [ 12] 80.00-90.00 sec 4.57 GBytes 3.93 Gbits/sec [SUM] 80.00-90.00 sec 24.6 GBytes 21.2 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 90.00-100.00 sec 0.00 Bytes 0.00 bits/sec [ 8] 90.00-100.00 sec 0.00 Bytes 0.00 bits/sec [ 10] 90.00-100.00 sec 22.4 GBytes 19.3 Gbits/sec [ 12] 90.00-100.00 sec 0.00 Bytes 0.00 bits/sec [SUM] 90.00-100.00 sec 22.4 GBytes 19.3 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 100.00-110.00 sec 0.00 Bytes 0.00 bits/sec [ 8] 100.00-110.00 sec 0.00 Bytes 0.00 bits/sec [ 10] 100.00-110.00 sec 22.9 GBytes 19.6 Gbits/sec [ 12] 100.00-110.00 sec 0.00 Bytes 0.00 bits/sec [SUM] 100.00-110.00 sec 22.9 GBytes 19.6 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 110.00-120.00 sec 0.00 Bytes 0.00 bits/sec [ 8] 110.00-120.00 sec 0.00 Bytes 0.00 bits/sec [ 10] 110.00-120.00 sec 22.2 GBytes 19.1 Gbits/sec [ 12] 110.00-120.00 sec 0.00 Bytes 0.00 bits/sec [SUM] 110.00-120.00 sec 22.2 GBytes 19.1 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 120.00-130.00 sec 0.00 Bytes 0.00 bits/sec [ 8] 120.00-130.00 sec 0.00 Bytes 0.00 bits/sec [ 10] 120.00-130.00 sec 22.8 GBytes 19.6 Gbits/sec [ 12] 120.00-130.00 sec 0.00 Bytes 0.00 bits/sec [SUM] 120.00-130.00 sec 22.8 GBytes 19.6 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 130.00-140.00 sec 1.08 GBytes 928 Mbits/sec [ 8] 130.00-140.00 sec 1.09 GBytes 938 Mbits/sec [ 10] 130.00-140.00 sec 19.6 GBytes 16.9 Gbits/sec [ 12] 130.00-140.00 sec 1.08 GBytes 927 Mbits/sec [SUM] 130.00-140.00 sec 22.9 GBytes 19.6 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 140.00-150.00 sec 6.74 GBytes 5.79 Gbits/sec [ 8] 140.00-150.00 sec 6.83 GBytes 5.87 Gbits/sec [ 10] 140.00-150.00 sec 6.84 GBytes 5.87 Gbits/sec [ 12] 140.00-150.00 sec 6.80 GBytes 5.84 Gbits/sec [SUM] 140.00-150.00 sec 27.2 GBytes 23.4 Gbits/sec
Updated by Electric Monk about 1 month ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit 13810335a5a8384eed97a8661536eb5352f0c933
commit 13810335a5a8384eed97a8661536eb5352f0c933 Author: Paul Winder <paul@winder.uk.net> Date: 2020-12-11T07:51:49.000Z 13208 Create aggr fails when underlying links have more than 128 Tx rings Reviewed by: Robert Mustacchi <rm@fingolfin.org> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Dan McDonald <danmcd@joyent.com>