Bug #13109
usb disk dies on high IO and loader hangs on reset
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
Due date:
% Done:
0%
Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
Description
I was copying from an external USB disk to an NFS mount and under constant 30MB/s IO the disk dies reproducable:
beast console login: Sep 6 14:24:30 beast scsi: WARNING: /pci@0,0/pci108e,4848@1d,7/storage@4/disk@0,0 (sd7): Sep 6 14:24:30 beast Command failed to complete...Device is gone Sep 6 14:24:33 beast scsi: WARNING: /pci@0,0/pci108e,4848@1d,7/storage@4/disk@0,0 (sd7): Sep 6 14:24:33 beast drive offline Sep 6 14:24:33 beast scsi: WARNING: /pci@0,0/pci108e,4848@1d,7/storage@4/disk@0,0 (sd7): Sep 6 14:24:33 beast drive offline Sep 6 14:24:33 beast scsi: WARNING: /pci@0,0/pci108e,4848@1d,7/storage@4/disk@0,0 (sd7): Sep 6 14:24:33 beast drive offline Sep 6 14:24:34 beast usba: WARNING: /pci@0,0/pci108e,4848@1d,7/storage@4 (scsa2usb1): Reinserted device is accessible again. beast console login: Sep 6 14:31:00 beast reboot: initiated by localadm on /dev/pts/3 beast console login: beast console login: beast console login: beast console login: [ hard reset via ILOM ] BIOS drive C: is disk0 BIOS drive D: is disk1 BIOS drive E: is disk2 BIOS drive F: is disk3 BIOS drive G: is disk4 BIOS drive H: is disk5 BIOS drive I: is disk6 ZFS: i/o error - all block copies unavailable ZFS: can't read MOS of pool backuppool \
Also, after resetting the system the loader hangs here when seeing the disk that was gone.
Updated by Olaf Bohlen 4 months ago
The system is a Sun X4470 running OpenIndiana hipster:
SunOS beast 5.11 illumos-bd05d33960 i86pc i386 i86pc
The disk is attached via USB2:
root@beast:~# rmformat Looking for devices... 1. Logical Node: /dev/rdsk/c8t0d0p0 Physical Node: /pci@0,0/pci108e,4848@1d,7/storage@4/disk@0,0 Connected Device: ST2000LM 007-1R8174 0117 Device Type: Removable Bus: USB Size: 1907.7 GB Label: <Unknown> Access permissions: Medium is not write protected. root@beast:~# grep "storage@4" /etc/path_to_inst "/pci@0,0/pci108e,4848@1d,7/storage@4" 1 "scsa2usb" "/pci@0,0/pci108e,4848@1d,7/storage@4/disk@0,0" 7 "sd" root@beast:~# grep pci108e,4848@1d,7 /etc/path_to_inst "/pci@0,0/pci108e,4848@1d,7" 1 "ehci"
Updated by Olaf Bohlen 4 months ago
root@beast:~# ls -lh /var/crash/beast total 40818433 -rw-r--r-- 1 root root 2 Sep 6 18:52 bounds -rw-r--r-- 1 root root 3.05M Sep 6 18:52 unix.0 -rw-r--r-- 1 root root 15.4G Sep 6 18:54 vmcore.0
I have a crash dump...
Updated by Olaf Bohlen 4 months ago
> ffffff0ab8d94000::spa -v ADDR STATE NAME ffffff0ab8d94000 ACTIVE backuppool ADDR STATE AUX DESCRIPTION ffffff0a8f6db000 CANT_OPEN NO_REPLICAS root ffffff0aa1ad8000 REMOVED - /dev/dsk/c8t0d0s0
Updated by Olaf Bohlen 4 months ago
root@beast:~# rmformat Looking for devices... 1. Logical Node: /dev/rdsk/c8t0d0p0 Physical Node: /pci@0,0/pci108e,4848@1d,7/storage@4/disk@0,0 Connected Device: ST2000LM 007-1R8174 0117 Device Type: Removable Bus: USB Size: 1907.7 GB Label: <Unknown> Access permissions: Medium is not write protected. root@beast:~# time dd if=/dev/dsk/c8t0d0p0 of=/localstripe/failingdd bs=128k cou nt=409600 409600+0 records in 409600+0 records out 53687091200 bytes transferred in 2148.347276 secs (23.8MB/sec) real 35m56.662s user 0m3.352s sys 4m20.122s root@beast:~# ls -lh /localstripe/failingdd -rw-r--r-- 1 root root 50G Sep 8 21:16 /localstripe/failingdd
so, when I use the zpool, the disk gets "lost", but on dd everything works fine. This looks strange.