ahci watchdog triggers deadlock with sata layer
Having hit a number of system hangs which eventually trigger the ZFS watchdog timer on the systems boot pool (which reside on an AHCI controller), we were able to examine the system. Since a serial console wasn't configured, the commentary accompanies the screen shots to explain.
When we see that, we would see AHCI watchdog timeouts. So, using ::stacks -m ahci, we saw three stacks, one of which is in the ahci watchdog handler. It is calling into sata_hba_event_notify which is blocked waiting for a mutex.
Looking at the mutex, we see it is currently held by another thread. Looking at the thread that is holding the mutex, we see it is the sd code attempting to retry a SCSI Inquiry command, which is getting translated by the sata layer and then submitted to the ahci driver. The ahci driver is then blocked on a CV. Note that the accompanying mutex with the CV is not the same mutex the watchdog thread is waiting on.
Wading through some disassembly and looking at the source, the mutex being held is sata_cport_info_t->cport_mutex, cport meaning controller port. This is acquired in sata_txlt_inquiry and held as it calls into the ahci driver.
Since the ahci watchdog thread cannot make process because it is blocked called into the SATA layer, it cannot complete the cleanup of the timed out requests. At the same time, the thread attempting the inquiry which holds the lock needed by the watchdog thread cannot make progress because it is waiting for the SATA port to become available, and a deadlock occurs.
Looking at the other thread, it's an ISR that is also blocked on the same mutex.
To get some insight into what command timed out, if we look at the second argument to ahci_timeout_pkts(), this is the acpi_port_t. Any packets for timeout will be in ahci_port_t->ahciport_slot_pkts and will have a timeout of 0 in ahci_port_t->ahciport_slot_timeout.
Looking at that packet (sata_pkt_t), we see it is a synchronous command (satapkt_op_mode == 0x4), the direction is SATA_DIR_READ.
The SATA command (satacmd_cmd_reg is 0x3c – SATAC_ID_DEVICE.
While we don't have the full context, this is probably a timed out INQUIRY.
This was likely created by a bad disk, but a bad drive should not deadlock the kernel.