Project

General

Profile

Actions

Bug #11952

closed

large USB hard disks experience I/O failures

Added by Joshua M. Clulow about 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Hard
Tags:
Gerrit CR:

Description

Using a large USB hard disk (most likely larger than 2TB) can result in DMA errors and ultimately a stuck device; e.g.,

Nov  9 09:21:36 newcastle rootnex: [ID 561485 kern.warning] WARNING: xhci: coding error detected, the driver is using ddi_dma_attr(9S) incorrectly. There is a small risk of data corruption in particular with large I/Os. The driver should be replaced with a corrected version for proper system operation. To disable this warning, add 'set rootnex:rootnex_bind_warn=0' to /etc/system(4).
Nov  9 09:21:36 newcastle xhci: [ID 197104 kern.info] NOTICE: xhci0: failed to bind DMA memory: -3
Nov  9 09:21:36 newcastle xhci: [ID 902155 kern.info] NOTICE: xhci0: xhci stop endpoint command (2)/slot (3) in wrong state: 19
Nov  9 09:21:36 newcastle xhci: [ID 617155 kern.info] NOTICE: xhci0: endpoint is in state 3
Nov  9 09:21:36 newcastle xhci: [ID 902155 kern.info] NOTICE: xhci0: xhci stop endpoint command (3)/slot (3) in wrong state: 19
Nov  9 09:21:36 newcastle xhci: [ID 617155 kern.info] NOTICE: xhci0: endpoint is in state 3
Nov  9 09:21:36 newcastle scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,2064@14/storage@11/disk@0,0 (sd1):
Nov  9 09:21:36 newcastle       Command failed to complete...Device is gone

This turns out to be somewhat complicated. The device shown above is presenting 512 byte sectors (whether they're emulated or not, under the covers) which means the logical block address (LBA) range spans from 0x0 up to 0x1d1a94a20; critically, this value is larger than the 32-bit LBA field in a SCSI READ (12) or WRITE (12) (aka SCMD_READ_G5 or SCMD_WRITE_G5) command. Block addresses beyond the 32 bit boundary must be encoded in a command that supports a wider LBA, so sd moves up to READ (16) and WRITE (16) (aka SCMD_READ_G4 and SCMD_WRITE_G4).

USB host controllers have a relatively small transfer size limit for an individual request. As such, we have logic in scsa2usb_rw_transport() to break a larger SCSI read or write command into multiple smaller commands in sequence to satisfy the upstack I/O request. Today this logic only covers the Group 1 and Group 5 SCSI read and write commands, leaving out the Group 4 SCSI commands which only come into effect for larger logical addresses. This has only become an issue since the advent of 512 byte sector drives of a capacity larger than about 2TB.

The intermittent nature of the I/O failures (e.g., a pool will import, but heavy I/O will subsequently fail and take the pool offline) seems to be on account of at least two things:

  • ZFS issuing I/O requests that fit within the transfer limit at least during import -- but not later
  • I/O requests for logical addresses prior to the 2TB boundary appear to be sent as Group 5 commands; only writes to the latter half of a large disk will be Group 4
Actions #1

Updated by Joshua M. Clulow about 2 years ago

  • Description updated (diff)
Actions #2

Updated by Joshua M. Clulow about 2 years ago

  • Description updated (diff)
Actions #4

Updated by Joshua M. Clulow about 2 years ago

Testing Notes

I have a Seagate 4TB expansion drive:

scsa2usb         1   ffffff0cff6eb550 /pci@0,0/pci8086,2064@14/storage@11
    usba_device: ffffff0d08997540

    idVendor: 0x0bc2 idProduct: 0x231a usb_addr: 0x03

    Manufacturer String:        Seagate
    Product String:             Expansion
$ pfexec diskinfo | grep -i seagate
SCSI    c7t0d0                  Seagate  Expansion        3726.02 GiB   no  no 

Prior to this change, I experienced the failures described in the ticket any time I would try and rsync any substantial quantity of data into the pool. After applying the fix, I was able to rsync several hundred gigabytes of data into the pool, and have initiated a scrub which has worked so far without error:

  scan: scrub in progress since Sat Nov  9 18:03:33 2019
        604G scanned at 72.3M/s, 594G issued at 71.1M/s, 604G total
        0 repaired, 98.24% done, 0 days 00:02:33 to go

There have been no errors other than the usual illegal requests one expects from a device which isn't a real, full SCSI device:

$ pfexec iostat -En c7t0d0
c7t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: Seagate  Product: Expansion        Revision: 0710 Serial No: NAADVDVB 
Size: 4000.79GB <4000787029504 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 61 Predictive Failure Analysis: 0 

The scrub has now completed:

scan: scrub repaired 0 in 0 days 02:25:24 with 0 errors on Sat Nov  9 20:28:57 2019
Actions #5

Updated by Joshua M. Clulow about 2 years ago

Testing Notes (Supplemental)

I did a full RTI build, booted it, and performed another scrub to confirm that the final version of the change is OK:

November 11, 2019 at 04:22:24 PM PST
  scan: scrub in progress since Mon Nov 11 11:47:13 2019
        1.19T scanned at 75.6M/s, 1.19T issued at 75.5M/s, 1.19T total
        0 repaired, 99.89% done, 0 days 00:00:17 to go

November 11, 2019 at 04:22:54 PM PST
  scan: scrub repaired 0 in 0 days 04:35:35 with 0 errors on Mon Nov 11 16:22:48 2019
newcastle # uname -v
rti-xhci-0-g64d8df1e16
Actions #6

Updated by Electric Monk about 2 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 10b633f40f61a97f70236c451b22a1ec8368edb2

commit  10b633f40f61a97f70236c451b22a1ec8368edb2
Author: Joshua M. Clulow <josh@sysmgr.org>
Date:   2019-11-12T20:36:11.000Z

    11952 large USB hard disks experience I/O failures
    Reviewed by: Paul Winder <paul@winders.demon.co.uk>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: C Fraire <cfraire@me.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Actions

Also available in: Atom PDF