Bug #14419
closediprb transmit watchdog somewhat overzealous
100%
Description
The iprb
driver attempts to detect a transmit stall by keeping the transmit time of the last outstanding frame submitted to the controller. Each time we submit a command to the controller, we update tx_wdog
with the current time. We also update tx_wdog
each time we reclaim a command that the controller has completed. In iprb_periodic()
, we check to make sure this value is not more than 15 seconds in the past, and if it is we reset the controller.
Unfortunately, we only seem to trigger the reclaim routine when we're trying to transmit another packet. If the link is not particularly busy, and no data is transmitted for 15 seconds, it seems like we will spuriously reset the controller which causes a variety of other issues on a system that would have otherwise been working correctly.
A report from the field suggests that by pinging the computer with iprb
from another computer, the problem goes away; presumably because we are sending an echo reply each second and keeping the watchdog at bay.
One way to fix this is to call iprb_cmd_reclaim()
in the periodic routine, right before we check the watchdog value, to ensure that things are moving along. Indeed, an iprb
with this patch applied improves the behaviour.
Another question is: in iprb_cmd_submit()
, we set the CB_CMD_I
flag on the submitted command iff we believe it is the last available command; i.e., if we have dispatched 128 commands to the controller and heard nothing back. I can't see how this would trigger a reclaim, though, unless there was already another packet about to be sent out -- which in some cases on idle systems there may not be.
Related issues
Updated by Joshua M. Clulow 7 months ago
- Related to Bug #14078: null pointer dereference crashes from iprb nics added
Updated by Joshua M. Clulow 7 months ago
Testing Notes¶
I built the iprb
module with this change and gave it to a user on IRC who confirms that it makes networking work as they expect on their machine.
I have also done the regular RTI build with the same diff, though I have not installed the bits on a machine with the hardware as it is a rare and old part, and I don't have one available myself.
Updated by Electric Monk 7 months ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit ab3f6e90e6b1d8edee27c66beb8e53bc6033fb2a
commit ab3f6e90e6b1d8edee27c66beb8e53bc6033fb2a Author: Joshua M. Clulow <josh@sysmgr.org> Date: 2022-01-25T20:49:19.000Z 14419 iprb transmit watchdog somewhat overzealous Reviewed by: Andy Fiddaman <andy@omnios.org> Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Robert Mustacchi <rm@fingolfin.org> Approved by: Dan McDonald <danmcd@joyent.com>