Project

General

Profile

Actions

Bug #14777

closed

ahci watchdog triggers deadlock with sata layer

Added by Garrett D'Amore 5 months ago. Updated 5 months ago.

Status:
Rejected
Priority:
High
Category:
driver - device drivers
Start date:
Due date:
% Done:

80%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

Having hit a number of system hangs which eventually trigger the ZFS watchdog timer on the systems boot pool (which reside on an AHCI controller), we were able to examine the system. Since a serial console wasn't configured, the commentary accompanies the screen shots to explain.

When we see that, we would see AHCI watchdog timeouts. So, using ::stacks -m ahci, we saw three stacks, one of which is in the ahci watchdog handler. It is calling into sata_hba_event_notify which is blocked waiting for a mutex.
Looking at the mutex, we see it is currently held by another thread. Looking at the thread that is holding the mutex, we see it is the sd code attempting to retry a SCSI Inquiry command, which is getting translated by the sata layer and then submitted to the ahci driver. The ahci driver is then blocked on a CV. Note that the accompanying mutex with the CV is not the same mutex the watchdog thread is waiting on.

Wading through some disassembly and looking at the source, the mutex being held is sata_cport_info_t->cport_mutex, cport meaning controller port. This is acquired in sata_txlt_inquiry and held as it calls into the ahci driver.
Since the ahci watchdog thread cannot make process because it is blocked called into the SATA layer, it cannot complete the cleanup of the timed out requests. At the same time, the thread attempting the inquiry which holds the lock needed by the watchdog thread cannot make progress because it is waiting for the SATA port to become available, and a deadlock occurs.

Looking at the other thread, it's an ISR that is also blocked on the same mutex.

To get some insight into what command timed out, if we look at the second argument to ahci_timeout_pkts(), this is the acpi_port_t. Any packets for timeout will be in ahci_port_t->ahciport_slot_pkts and will have a timeout of 0 in ahci_port_t->ahciport_slot_timeout.
Looking at that packet (sata_pkt_t), we see it is a synchronous command (satapkt_op_mode == 0x4), the direction is SATA_DIR_READ.
The SATA command (satacmd_cmd_reg is 0x3c – SATAC_ID_DEVICE.
While we don't have the full context, this is probably a timed out INQUIRY.

This was likely created by a bad disk, but a bad drive should not deadlock the kernel.

Actions #1

Updated by Garrett D'Amore 5 months ago

This bug was found and fixed last year at RackTop. I'm working on upstreaming the fix.

Actions #2

Updated by Garrett D'Amore 5 months ago

The solution I devised at RackTop was to move the code called by sata_hba_event_notify() into a separate task, on a taskq.

Actions #3

Updated by Garrett D'Amore 5 months ago

Actually, Jason King's solution in the form of 14159 also addresses this.

His work was developed after mine, and while we run with both approaches at RackTop, 14159 should be sufficient.

I'm going to close this ticket for now.

Actions #4

Updated by Garrett D'Amore 5 months ago

  • Status changed from New to Rejected
Actions

Also available in: Atom PDF