Bug #14159


ahci can deadlock with sata framework

Added by Jason King 8 months ago. Updated 6 months ago.

driver - device drivers
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


We had a system that would have the zfs deadman seemingly randomly fire on the systems boot zpool. Attempting to do a crash dump failed since I/O to the mirrored boot drives seemed to be hung. We were able to get kmdb loaded and capture the state of the system after this occurred. Unfortunately, the serial console was not configured, so there are only screenshots of the kmdb output (though I'm attempting to reproduce w/ the serial console configured so I can copy/paste the kmdb analysis and will update if possible).

Looking at threads in ahci (::stacks -m ahci), I saw three threads. One thread was servicing an ahci interrupt and had a call stack containing (I'm abbreviating this w/ the key bits since I'm having to copy it by hand):


The mutex in question is from this (in sata_scsi_init_pkt):


Examining who held the mutex, it was a thread with the following call stack (again abbreviated):


The above mentioned mutex is acquired in sata_txlt_inquiry and is held for effectively the entire function. Note that the mutex associated with the blocked CV is not the cport_mutex. We are blocked due to the following code:

    while (spkt->satapkt_reason == SATA_PKT_BUSY)

So the command is waiting for completion with the same sata framework cport_mutex being held that the ISR is blocked on. Eventually, the ahci watchdog thread attempts to time out the request, it resets the port


In this case the watchdog handler has reset the port, and is blocked on the sata framework on the same cport_mutex trying to notify the sata framework that the port has been reset, so it also cannot make progress and clean things up (i.e. complete the synchronous request as failed/timed out).

I believe the error is sata_txlt_inquiry holding cport_mutex while calling into the HBA (the sata_tran_start method i.e. ahci_tran_start). No other callers in the SATA framework do this -- either they just don't have the mutex held, or they explicitly drop it around the call into the HBA). We (unfortunately) don't have any sort of comments or docs to point at to definitely say it's wrong, but the evidence is highly suggestive that is the correct way to resolve the deadlock.


sata.d (541 Bytes) sata.d D script to widen the deadlock window Jason King, 2021-10-25 11:27 PM
write-file.c (925 Bytes) write-file.c Simple program to continuously write 1MB of data, flush in a loop Jason King, 2021-10-25 11:27 PM
inquiry-ata.c (1.35 KB) inquiry-ata.c Program to issue ATA INFORMATION requests to device Jason King, 2021-10-25 11:27 PM
Actions #1

Updated by Marcel Telka 8 months ago

I think the problem might be missing mutex_exit/mutex_enter calls around this code in ahci_restart_port_wait_till_ready():

8763      sata_hba_event_notify(ahci_ctlp->ahcictl_dip, &sdevice,
8764          SATA_EVNT_DEVICE_RESET);

I tried to add those calls more than 8 years ago when I worked on #3839, but such a change was refused:

Actions #2

Updated by Jason King 8 months ago

  • Assignee set to Jason King

I'm in the process of testing/verifying a fix, but I think the immediate issue is that if something issues an INQUIRY for the ATA INFORMATION page to the device, the sata framework's cport_mutex is held across the call into the HBA. All the other callers of sata_fetch_device_identify_data (or more specifically anyone invoking the HBA's start (sata_hba_start) method invoke the start function without that lock being held -- either they explicitly drop the lock themselves, or their callers do so.

Actions #3

Updated by Jason King 8 months ago

That said, the existing sata event handling code leaves much to be desired and could use improvement.

Actions #4

Updated by Rich Lowe 8 months ago

I strongly recommend fixing this first, and then improving the code.

Actions #5

Updated by Jason King 7 months ago

  • Description updated (diff)
Actions #6

Updated by Jason King 7 months ago

The plan is to fix the deadlock as a part of this change. Any other bits will have their own ticket.

Actions #7

Updated by Garrett D'Amore 7 months ago

So Marcel, I'm the one who blocked the change, and let me be the first to say I was wrong to do so, although I understand why I did.

I've since reviewed the code in rather gory detail, and I'm not super happy about this SATA framework code at all, but this one change (fixing the locks) is actually needed.

I have also written a change (which we may or may not upstream) which moves the notify stuff out of the synchronous path and into a taskq. I think that is more or less what I was suggesting 8 years ago, but I seem to have arrived at the same solution "independently" (of myself?) 8 years later on.

Anyway, the change is in our tree, and so far testing seems to hold up on it.

Actions #8

Updated by Electric Monk 7 months ago

  • Gerrit CR set to 1772
Actions #9

Updated by Jason King 7 months ago

Testing this has been a bit challenging due to just how tight the window is. However using the attached files, I've had some amount of success.

Using sata.d, I attempted to increase the window where a deadlock occurs.
At the same time, I used write-file.c to just write a 1MB block of random data to a file, flush, and start all over (overwriting the 1MB -- the intention is to generate write I/O and not have to worry about exhausting space). While all of that is running, ATA information requests are sent to the device.

With all of that running, I was able to produce the lockup a couple of times over the course of the day on an unpatched system. While it would be nice to be able to replicate it somewhat quicker and easier, it appears that NCQ I/Os (basically reads and writes) have the effect of starving out non-NCQ I/Os (such as the ATA INFORMATION request). As a result, it meant requiring having the AHCI ISR service an NCQ I/O completion (with no other NCQ I/Os outstanding), then in the small window where the ISR marks the slot available and the I/O callbacks are invoked, have the ATA Information go through without getting 'preempted' (for lack of a better term) by another NCQ I/O. Even with various attempts using chill() with dtrace, it proved to be challenging.

In comparison though, with this fix in place, the same setup that produced several hangs within a day ran for 3+ days non-stop without causing a system hang -- system responsiveness wasn't great, but considering it was slowing down I/O servicing on the root pool of the system, while inducing a fair amount of constant write I/O, that's not too surprising.

This is in addition to examining the other callers to sata_fetch_device_identify_data which all call it without cport_mutex being held -- which itself is a strong indicator that the ATA Information request should also follow suit -- obviously the stacks seen in the deadlock provide even stronger proof.

Actions #10

Updated by Electric Monk 6 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit df07a22216251845ccadf35a5acc27bfa0b1a4d1

commit  df07a22216251845ccadf35a5acc27bfa0b1a4d1
Author: Jason King <>
Date:   2021-12-01T01:54:11.000Z

    14159 ahci can deadlock with sata framework
    Reviewed by: Garrett D'Amore <>
    Approved by: Dan McDonald <>


Also available in: Atom PDF