Bug #13041
i40e_get_available_resources() broken again for X722 part
100%
Description
See bug 9601 for the original fix.
After that, we got some new code via bug 12933:
+ /* OCP cards case: if a mezz is removed the ethernet port is at + * disabled state in PRTGEN_CNF register. Additional NVM read is + * needed in order to check if we are dealing with OCP card. + * Those cards have 4 PFs at minimum, so using PRTGEN_CNF for counting + * physical ports results in wrong partition id calculation and thus + * not supporting WoL. + */ + if (hw->mac.type == I40E_MAC_X722) { + if (i40e_acquire_nvm(hw, I40E_RESOURCE_READ) == I40E_SUCCESS) { + status = i40e_aq_read_nvm(hw, I40E_SR_EMP_MODULE_PTR, + 2 * I40E_SR_OCP_CFG_WORD0, + sizeof(ocp_cfg_word0), + &ocp_cfg_word0, TRUE, NULL); + if (status == I40E_SUCCESS && + (ocp_cfg_word0 & I40E_SR_OCP_ENABLED)) + hw->num_ports = 4; + i40e_release_nvm(hw); + } + } +
The assertion here that PF is at least 4 is clearly not the case as shown
in the above bug, and on my device 0x8086:0x37d2
num_functions is 2, so we end up with:
3951 hw->num_partitions = num_functions / hw->num_ports;
num_partitions as zero, and we go boom.
What's perplexing here is that the upstream code has the same code: it
forces OCP X722 to have 4 ports.
The above comment is talking about the PRTGEN_CNF register, but we
no longer use that, preferring the status one, as described in bug 9601.
I don't have a spec with any X722 or OCP details, but it seems like we might
have a chance at fixing this just by disabling that new code again.
Related issues
Updated by Marcel Telka 7 months ago
- Related to Bug #9601: Divide by zero in i40e_get_available_resources() added
Updated by Marcel Telka 7 months ago
- Related to Bug #12933: Update i40e core code to support NVM v1.7 added
Updated by John Levon 7 months ago
The system I tested this on is based on this:
https://www.supermicro.com/en/products/motherboard/X11DPH-T
which definitely seems to give lie to the idea that the OCP config
has to have four active ports. (Is it a X557-T2 in the system?)
The C620 chipset spec also has a section "38.11.1.1.4 OCP Support"
that seems to explicitly say both quad and dual port config is allowed.
Updated by Randy Fishel 7 months ago
The problem as stated in previous comments is that the driver can find more ports than functions, and that is not a proper configuration per Intel. There is also a secondary problem that it is possible that both of num_ports and num_partitions can be zero, and later in the Illumos driver we expect both to be non-zero, but is possibly something that should be fixed as part of another bug.
The linux (and possibly BSD) driver does not find this to be a problem because because it doesn't use num_partitions, but a failure to have num_ports to be correct is a problem for the Linux driver as partition_id will be incorrect (Illumos does not use the features where an incorrect partition_id becomes an issue, but does expect num_ports and num_partitions to be non-zero).
While removing the OCP will allow the Illumos driver to function on 2 port X722 cards, it does become a deviation from the Intel common code, and it might be worthwhile to either specify that (in the code), or find a fix that works better in the X722 case (which could be a different bug).
Updated by Robert Mustacchi 6 months ago
- Category set to driver - device drivers
- Assignee set to John Levon
Updated by Electric Monk 6 months ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit 341c5f490806c8b3e6e31512923db4c0e1b464b1
commit 341c5f490806c8b3e6e31512923db4c0e1b464b1 Author: John Levon <john.levon@joyent.com> Date: 2020-08-19T07:30:51.000Z 13041 i40e_get_available_resources() broken again for X722 part Reviewed by: Randy Fishel <randyf@sibernet.com> Approved by: Robert Mustacchi <rm@fingolfin.org>