Bug #8629
closednvme: rework command abortion
100%
Description
The way timed-out commands are aborted in nvme is a bit odd. It drops the mutex of the command to be aborted very early, which means that there may be a race between abortion and the command. We handle this safely right now but it has a couple of odd side effects that other parts of nvme need deal with. Another issue is that it replaces the command callback with its own version to deal with cleanup, and a timeout needs to be detected by the submitter and be handled specially because of that.
It turns out that this could be improved. When a command is aborted it should go through its regular completion callback, which will deal with the abortion in the error code path and eventually clean up the command. At the same time we can deal better with abort commands that time out themselves. Improving the internal error propagation of nvme will also help.
Updated by Hans Rosenfeld over 6 years ago
Webrev: https://grumpf.hope-2000.org/illumos-8629/
As command timeouts and aborts don't happen in normal operation, I have tested this with two different test setups:
- On a fully functional Intel P3700 device, which takes about 9 seconds to complete a format operation. I've reduced the format command timeout to 6 seconds, and as a result the driver attempted to abort the format command. The abort returned an error (abort failed) and the format command completed normally within the time the driver waited for the abort command. Reducing the format command timeout to 3s led to a double timeout and subsequent faulting of the NVMe device as expected.
- I've put two known to be broken Intel P3700 in my test system. These devices have the nasty habit of ignoring commands during attach, but at different points in attach. Both caused a double timeout at the expected places in attach and were faulted by FMA.
This change has been in illumos-joyent since July.
Updated by Electric Monk about 6 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit e984c70bc7d741cd0663924c95c15fed9f645565
commit e984c70bc7d741cd0663924c95c15fed9f645565 Author: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org> Date: 2017-09-25T09:11:15.000Z 8629 nvme: rework command abortion Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Reviewed by: Jason King <jason.king@joyent.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Richard Lowe <richlowe@richlowe.net>