Feature #13018
openA virtual function of a ConnectX-4 VDI attached to an IllumOS based VM (OI) is not handled correctly by mlxcx
0%
Description
I created a ESXi VM running current OI Hipster with a CX-4 VDI VF. The mlxcx driver tries to handle the device but fails, resulting in messages displayed in dmesg:
mlxcx: [ID 989156 kern.warning] WARNING: mlxcx1: command MLXCX_OP_CREATE_EQ 0x301 failed with status code MLXCX_CMD_R_BAD_PARAM (0x3)
mlxcx: [ID 989156 kern.warning] WARNING: mlxcx0: command MLXCX_OP_CREATE_EQ 0x301 failed with status code MLXCX_CMD_R_BAD_PARAM (0x3)
Files
Updated by Paul Winder almost 2 years ago
MLXCX_OP_CREATE_EQ is used to create event queues for asynchronous events and completion events. I suspect this is happening creating the event queue for async events. Doing so it calls:
ret = mlxcx_setup_eq(mlxp, 0, (1ULL << MLXCX_EVENT_CMD_COMPLETION) | (1ULL << MLXCX_EVENT_PAGE_REQUEST) | (1ULL << MLXCX_EVENT_PORT_STATE) | (1ULL << MLXCX_EVENT_INTERNAL_ERROR) | (1ULL << MLXCX_EVENT_PORT_MODULE) | (1ULL << MLXCX_EVENT_SENDQ_DRAIN) | (1ULL << MLXCX_EVENT_LAST_WQE) | (1ULL << MLXCX_EVENT_CQ_ERROR) | (1ULL << MLXCX_EVENT_WQ_CATASTROPHE) | (1ULL << MLXCX_EVENT_PAGE_FAULT) | (1ULL << MLXCX_EVENT_WQ_INVALID_REQ) | (1ULL << MLXCX_EVENT_WQ_ACCESS_VIOL) | (1ULL << MLXCX_EVENT_NIC_VPORT) | (1ULL << MLXCX_EVENT_DOORBELL_CONGEST));
The event of particular concern is MLXCX_EVENT_NIC_VPORT, this event is to notify the e-switch manager (ie in the hypervisor driver) of changes to vport context. This could be the cause of the BAD_PARAM error as it is not-likely to be applicable for a VF.
In fact there are h/w capabilities which can be checked to confirm whether they are supported, both for this and some of the other events. Unfortunately I don't have h/w to test this hypothesis....
Updated by Florian Manschwetus almost 2 years ago
Is there a way to query/test this, having an illumos based system running in such an environment?
Or is there some documentation about this?
Updated by Paul Winder almost 2 years ago
- Unload the mlxcx driver. Use modinfo to get the list of drivers and get the 'id' of mlxcx. Then run 'modunload -i <id>'.
- Run this dtrace command:
dtrace -Zn 'fbt:mlxcx:mlxcx_setup_async_eqs:entry {print(args[0]->mlx_caps->mlc_hca_cur); exit(0);}'
- Get the driver to attempt to load by doing something like 'dladm show-phys'
In the meantime, I will put together a change which I think will fix it ....
Updated by Paul Winder almost 2 years ago
A possible fix is here, you can cherry-pick that and build it yourself.
Otherwise I can send you a pre-built mlxcx driver, compiled from head of master
Updated by Florian Manschwetus almost 2 years ago
admin@oi-testref:~$ sudo dtrace -Zn 'fbt:mlxcx:mlxcx_setup_async_eqs:entry {print(args[0]->mlx_caps->mlc_hca_cur); exit(0);}'
dtrace: invalid probe specifier fbt:mlxcx:mlxcx_setup_async_eqs:entry {print(args[0]->mlx_caps->mlc_hca_cur); exit(0);}: in action list: args[ ] may not be referenced because probe description fbt:mlxcx:mlxcx_setup_async_eqs:entry matches an unstable set of probes
Updated by Paul Winder almost 2 years ago
Ok, try
dtrace -Zn 'fbt:mlxcx:mlxcx_setup_async_eqs:entry {print(((mlxcx_t *)arg0)->mlx_caps->mlc_hca_cur); exit(0);}'