Project

General

Profile

Bug #13357

Hot-Plugging NVMe drive to USB-C triggers system panic

Added by Stephan Althaus 4 months ago. Updated 4 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Hello!

When i plug in a NVMe drive to the Thunderbolt / USB-C Port of my laptop, the system panics.

The fmadm says my device "JMICRON-JMS583" has issued an error state,
but the system should not panic, no?

(https://www.jmicron.com/products/list/13)

What does the panicstack say?
Why is there a panic as the system believes a reset to the XHCI bus is sufficient?

Greetings, Stephan


Files

prtconf.txt (362 KB) prtconf.txt Stephan Althaus, 2020-12-08 11:12 AM
fmdump.txt (7.08 KB) fmdump.txt Stephan Althaus, 2020-12-08 11:12 AM
faulty.txt (7.04 KB) faulty.txt Stephan Althaus, 2020-12-08 11:12 AM
fmadm-faulty.txt (5.64 KB) fmadm-faulty.txt Stephan Althaus, 2020-12-09 09:35 PM
prtconf-with-device.txt (347 KB) prtconf-with-device.txt Stephan Althaus, 2020-12-09 09:56 PM
debian-dmesg.txt (93.5 KB) debian-dmesg.txt Debian Linux seems to be able to reset the bus, or a part of it Stephan Althaus, 2020-12-11 08:27 AM
#1

Updated by Stephan Althaus 4 months ago

#2

Updated by Dan McDonald 4 months ago

What would be very helpful is the kernel coredump (vmdump.N), or at least for starters the /var/adm/messages* portion that shows the panic stack in a bit more detail. The one reported by faulty.txt is too-short (and the offset into the xhci routine WAY too large at first glance).

#3

Updated by Stephan Althaus 4 months ago

First, i had to resize my dump volume to enable savecore..

To trigger the panic i have to boot with the drive connected,
unplug it, and connect it back.

1) UNPLUG

from /var/adm/messages :

Dec 9 22:03:56 dell6510 usba: [ID 691482 kern.warning] WARNING: /pci@0,0/pci8086,a114@1c,4/pci8086,15da@0/pci8086,15da@2/pci1028,7b1@0 (xhci3): Connecting device on port 1 failed
Dec 9 22:03:56 dell6510 usba: [ID 723738 kern.info] /pci@0,0/pci8086,a114@1c,4/pci8086,15da@0/pci8086,15da@2/pci1028,7b1@0 (xhci3): Port1 in over current condition, please check the attached device to clear the condition. The system will try to recover
the port, but if not successful, you need to re-connect the hub or reboot the system to bring the port back to work
Dec 9 22:03:57 dell6510 usba: [ID 691482 kern.warning] WARNING: /pci@0,0/pci8086,a114@1c,4/pci8086,15da@0/pci8086,15da@2/pci1028,7b1@0 (xhci3): Connecting device on port 2 failed
Dec 9 22:03:57 dell6510 usba: [ID 723738 kern.info] /pci@0,0/pci8086,a114@1c,4/pci8086,15da@0/pci8086,15da@2/pci1028,7b1@0 (xhci3): Port2 in over current condition, please check the attached device to clear the condition. The system will try to recover
the port, but if not successful, you need to re-connect the hub or reboot the system to bring the port back to work
Dec 9 22:03:58 dell6510 usba: [ID 691482 kern.warning] WARNING: /pci@0,0/pci8086,a114@1c,4/pci8086,15da@0/pci8086,15da@2/pci1028,7b1@0 (xhci3): Connecting device on port 3 failed
Dec 9 22:03:58 dell6510 usba: [ID 723738 kern.info] /pci@0,0/pci8086,a114@1c,4/pci8086,15da@0/pci8086,15da@2/pci1028,7b1@0 (xhci3): Port3 in over current condition, please check the attached device to clear the condition. The system will try to recover
the port, but if not successful, you need to re-connect the hub or reboot the system to bring the port back to work
Dec 9 22:03:59 dell6510 usba: [ID 691482 kern.warning] WARNING: /pci@0,0/pci8086,a114@1c,4/pci8086,15da@0/pci8086,15da@2/pci1028,7b1@0 (xhci3): Connecting device on port 4 failed
Dec 9 22:03:59 dell6510 usba: [ID 723738 kern.info] /pci@0,0/pci8086,a114@1c,4/pci8086,15da@0/pci8086,15da@2/pci1028,7b1@0 (xhci3): Port4 in over current condition, please check the attached device to clear the condition. The system will try to recover
the port, but if not successful, you need to re-connect the hub or reboot the system to bring the port back to work

2) RE-CONNECT

from /var/adm/messages

Dec 9 22:04:06 dell6510 unix: [ID 836849 kern.notice]
Dec 9 22:04:06 dell6510 ^Mpanic[cpu0]/thread=fffffe003d134c20:
Dec 9 22:04:06 dell6510 genunix: [ID 287227 kern.notice] XHCI runtime reset required
Dec 9 22:04:06 dell6510 unix: [ID 100000 kern.notice]
Dec 9 22:04:06 dell6510 genunix: [ID 655072 kern.notice] fffffe003d134b50 xhci:xhci_taskq+393cdf07 ()
Dec 9 22:04:06 dell6510 genunix: [ID 655072 kern.notice] fffffe003d134c00 genunix:taskq_thread+2cd ()
Dec 9 22:04:06 dell6510 genunix: [ID 655072 kern.notice] fffffe003d134c10 unix:thread_start+b ()
Dec 9 22:04:06 dell6510 unix: [ID 100000 kern.notice]
Dec 9 22:04:06 dell6510 genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Dec 9 22:04:06 dell6510 ahci: [ID 405573 kern.info] NOTICE: ahci4: ahci_tran_reset_dport port 3 reset port
Dec 9 22:04:19 dell6510 genunix: [ID 100000 kern.notice]
Dec 9 22:04:19 dell6510 genunix: [ID 665016 kern.notice] ^M100% done: 539762 pages dumped,
Dec 9 22:04:19 dell6510 genunix: [ID 851671 kern.notice] dump succeeded

dump is located for download here:
https://duedinghausen.eu/vmdump.1.gz

#4

Updated by Stephan Althaus 4 months ago

After "fmadm repair" the devices, and "fmadm acquit" the events,
and finally a reboot the device is back in the tree,
see "prtconf-with-device.txt"

However, i can't use the device

$ sudo cfgadm -al
<snip>
usb14/2 unknown empty unconfigured ok
usb14/3 usb-storage connected configured ok
usb14/4 unknown empty unconfigured ok
steven@dell6510:~$ sudo diskinfo
TYPE DISK VID PID SIZE RMV SSD
SATA c27t1d0 Samsung SSD 850 EVO 250GB 232.89 GiB no yes
NVME c8t002538B471B9EFD6d0 PM961 NVMe SAMSUNG 1024GB 953.87 GiB no yes
NVME c9t002538B471B9EFD3d0 PM961 NVMe SAMSUNG 1024GB 953.87 GiB no yes

$ sudo rmformat
Looking for devices...
1. Logical Node: /dev/rdsk/c41t0d0p0
Physical Node: /pci@0,0/pci8086,a114@1c,4/pci8086,15da@0/pci8086,15da@2/pci1028,7b1@0/storage@3/disk@0,0
Connected Device: JMICRON JMS583 0204
Device Type: <Unknown>
Bus: USB
Size: <Unknown>
Label: <Unknown>
Access permissions: <Unknown>

$ sudo rmformat /dev/rdsk/c41t0d0p0
Not a removable media device

$ sudo format /dev/rdsk/c41t0d0p0

Error: can't open disk '/dev/rdsk/c41t0d0p0'.

$ sudo ls -l /dev/rdsk/c41t0d0p0
lrwxrwxrwx 1 root root 108 Apr 21 2020 /dev/rdsk/c41t0d0p0 -> ../../devices/pci@0,0/pci8086,a114@1c,4/pci8086,15da@0/pci8086,15da@2/pci1028,7b1@0/storage@3/disk@0,0:q,raw

$ sudo ls l /devices/pci@0,0/pci8086,a114@1c,4/pci8086,15da@0/pci8086,15da@2/pci1028,7b1@0/storage@3/disk@0,0:q,raw
cr-------
1 steven lp 220, 3216 Apr 21 2020 /devices/pci@0,0/pci8086,a114@1c,4/pci8086,15da@0/pci8086,15da@2/pci1028,7b1@0/storage@3/disk@0,0:q,raw

#5

Updated by Jason King 4 months ago

Unfortunately, IIUC there is currently no support to reset the entire USB 3 bus on that port, which is why the system panics. The reset requires involvement with the various drivers and other bits of the system to work (and not leave things in a strange state) outside of just telling the chip to reset itself, which hasn't been done yet as far as I know.

It might be useful to know if other platforms are reporting similar over current errors in case those are incorrect if you have the opportunity to try it.

#6

Updated by Stephan Althaus 4 months ago

Yes, Debian Linux seems to be able to reset the bus, or a part of it (?).

See attached "debian-dmesg.txt" , search around JMICRON in the file, i plugged and unplugged the device 2-3 times

Also available in: Atom PDF