Project

General

Profile

Bug #13109

usb disk dies on high IO and loader hangs on reset

Added by Olaf Bohlen 3 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

I was copying from an external USB disk to an NFS mount and under constant 30MB/s IO the disk dies reproducable:

beast console login: Sep  6 14:24:30 beast scsi: WARNING: /pci@0,0/pci108e,4848@1d,7/storage@4/disk@0,0 (sd7):
Sep  6 14:24:30 beast   Command failed to complete...Device is gone
Sep  6 14:24:33 beast scsi: WARNING: /pci@0,0/pci108e,4848@1d,7/storage@4/disk@0,0 (sd7):
Sep  6 14:24:33 beast   drive offline
Sep  6 14:24:33 beast scsi: WARNING: /pci@0,0/pci108e,4848@1d,7/storage@4/disk@0,0 (sd7):
Sep  6 14:24:33 beast   drive offline
Sep  6 14:24:33 beast scsi: WARNING: /pci@0,0/pci108e,4848@1d,7/storage@4/disk@0,0 (sd7):
Sep  6 14:24:33 beast   drive offline
Sep  6 14:24:34 beast usba: WARNING: /pci@0,0/pci108e,4848@1d,7/storage@4 (scsa2usb1): Reinserted device is accessible again.

beast console login: Sep  6 14:31:00 beast reboot: initiated by localadm on /dev/pts/3

beast console login: 
beast console login: 
beast console login: 
beast console login: 

[ hard reset via ILOM ]

BIOS drive C: is disk0                                                          
BIOS drive D: is disk1                                                          
BIOS drive E: is disk2                                                          
BIOS drive F: is disk3                                                          
BIOS drive G: is disk4                                                          
BIOS drive H: is disk5                                                          
BIOS drive I: is disk6                                                          
ZFS: i/o error - all block copies unavailable                                   
ZFS: can't read MOS of pool backuppool                                          
\                                                                               


Also, after resetting the system the loader hangs here when seeing the disk that was gone.
#1

Updated by Olaf Bohlen 3 months ago

The system is a Sun X4470 running OpenIndiana hipster:

SunOS beast 5.11 illumos-bd05d33960 i86pc i386 i86pc

The disk is attached via USB2:

root@beast:~# rmformat
Looking for devices...
     1. Logical Node: /dev/rdsk/c8t0d0p0
        Physical Node: /pci@0,0/pci108e,4848@1d,7/storage@4/disk@0,0
        Connected Device: ST2000LM 007-1R8174       0117
        Device Type: Removable
        Bus: USB
        Size: 1907.7 GB
        Label: <Unknown>
        Access permissions: Medium is not write protected.
root@beast:~# grep "storage@4" /etc/path_to_inst
"/pci@0,0/pci108e,4848@1d,7/storage@4" 1 "scsa2usb" 
"/pci@0,0/pci108e,4848@1d,7/storage@4/disk@0,0" 7 "sd" 
root@beast:~# grep pci108e,4848@1d,7 /etc/path_to_inst
"/pci@0,0/pci108e,4848@1d,7" 1 "ehci" 
#2

Updated by Olaf Bohlen 3 months ago

root@beast:~# ls -lh /var/crash/beast
total 40818433
-rw-r--r--   1 root     root           2 Sep  6 18:52 bounds
-rw-r--r--   1 root     root       3.05M Sep  6 18:52 unix.0
-rw-r--r--   1 root     root       15.4G Sep  6 18:54 vmcore.0

I have a crash dump...

#3

Updated by Olaf Bohlen 3 months ago

> ffffff0ab8d94000::spa -v
ADDR                 STATE NAME                                                
ffffff0ab8d94000    ACTIVE backuppool

    ADDR             STATE     AUX          DESCRIPTION                        
    ffffff0a8f6db000 CANT_OPEN NO_REPLICAS  root
    ffffff0aa1ad8000 REMOVED   -              /dev/dsk/c8t0d0s0

#4

Updated by Olaf Bohlen 3 months ago

root@beast:~# rmformat
Looking for devices...
     1. Logical Node: /dev/rdsk/c8t0d0p0
        Physical Node: /pci@0,0/pci108e,4848@1d,7/storage@4/disk@0,0
        Connected Device: ST2000LM 007-1R8174       0117
        Device Type: Removable
        Bus: USB
        Size: 1907.7 GB
        Label: <Unknown>
        Access permissions: Medium is not write protected.
root@beast:~# time dd if=/dev/dsk/c8t0d0p0 of=/localstripe/failingdd bs=128k cou
nt=409600
409600+0 records in
409600+0 records out
53687091200 bytes transferred in 2148.347276 secs (23.8MB/sec)

real    35m56.662s
user    0m3.352s
sys     4m20.122s
root@beast:~# ls -lh /localstripe/failingdd
-rw-r--r--   1 root     root         50G Sep  8 21:16 /localstripe/failingdd

so, when I use the zpool, the disk gets "lost", but on dd everything works fine. This looks strange.

Also available in: Atom PDF