Bug #8684

illumos disables a working USB controller

Added by Gary Mills 29 days ago. Updated 5 days ago.

Status:NewStart date:2017-09-23
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:kernel
Target version:-
Difficulty:Medium Tags:needs-triage

Description

When I'm booting the OI-hipster-gui-20170502 USB image, I get these messages on the console:

SunOS Release ...
Copyright (c) ...
WARNING: /pci@0,0/pci1043,81c0@b,1 (ehci0): No SOF interrupts have been received, this USB EHCI controller is unusable
WARNING: /pci@0,0/pci1043,81c0@b (ohci0): No SOF interrupts have been received, this USB OHCI controller is unusable
WARNING: /pci@0,0/pci1043,81c0@b (ohci0): No SOF interrupts have been received, this USB OHCI controller is unusable
WARNING: /pci@0,0/pci1043,81c0@b,1 (ehci0): No SOF interrupts have been received, this USB EHCI controller is unusable

When this happened, the keyboard, mouse, and USB stick went dead. I had to hold down the power button to recover the system. The messages come from these kernel files:

usr/src/uts/common/io/usb/hcd/ehci/ehci_util.c
usr/src/uts/common/io/usb/hcd/openhci/ohci.c

It's disabled a USB controller that is working, and used for the keyboard and mouse, and also for the front panel ports. The USB controllers show up this way with scanpci:

pci bus 0x0000 cardnum 0x0b function 0x00: vendor 0x10de device 0x026d
 NVIDIA Corporation MCP51 USB Controller
pci bus 0x0000 cardnum 0x0b function 0x01: vendor 0x10de device 0x026e
 NVIDIA Corporation MCP51 USB Controller

The machine on which this behavior occurs has an ASUS M2NPV-VM motherboard with an AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ CPU. The BIOS version is Revision 1401 08/07/2008. It works correctly with an OS installed from the OI-hipster-gui-20160421 DVD. Sometime between 2016 and 2017, a kernel change occured that broke this machine.

History

#1 Updated by Jean-Pierre André 29 days ago

I can confirm the issue on a different motherboard (Dell) with a similar CPU (AMD Athlon(tm) 64 X2 Dual Core Processor 4200+), and the same USB controller. So I cannot upgrade any more.

#2 Updated by Gary Mills 28 days ago

I found the same USB errors when booting the OI-hipster-gui-20161030 DVD. This means that the change that introduced the fatal error must have been incorporated into OI sometime between April 2016 and October 2016. Was there an exception list for this USB controller before? Did the time delay waiting for the SOF interrupts change? Why is illumos initializing USB controllers that are in use anyway?

#3 Updated by Toomas Soome 27 days ago

Gary Mills wrote:

I found the same USB errors when booting the OI-hipster-gui-20161030 DVD. This means that the change that introduced the fatal error must have been incorporated into OI sometime between April 2016 and October 2016. Was there an exception list for this USB controller before? Did the time delay waiting for the SOF interrupts change? Why is illumos initializing USB controllers that are in use anyway?

It actually is old issue plaguing Solaris down to at least Solaris 10, also the issue has been filed as #1857. Just that there has been none to investigate EHCI driver and fix it...

#4 Updated by Gary Mills 27 days ago

I'm aware of #1857. I omitted it because it dealt only with virtualized systems. Something changed in 2016 so that the bug began to affect real hardware. I'm still searching for that change. The two files cited above have not changed in many years. They can't be it. As well, those files don't contain any code to try several more times. They simply print the message and disable the controller.

The curious thing is that even though my motherboard contains two of those USB controllers, the kernel only disables the one that is in use.

#5 Updated by Gary Mills 5 days ago

I found a workaround that enabled me to boot and run the hipster BE that I upgraded last month. The BIOS of this system contained an item called `HPET Support'. It was enabled. I disabled it. After that change, the SOF error messages did not appear. The first HPET message also disappeared from the messages log, leaving only the second one:

Oct 17 20:32:12 amd pcplusmp: [ID 801116 kern.info] NOTICE: ACPI HPET table query failed

Also, the USB keyboard and USB mouse worked normally.

The implications of this seemingly unrelated change are curious. The kernel apparently initialized the HPET device incorrectly, resulting in no interrupts from the USB controller. That's why the kernel disabled the USB controller. The incorrect initialization may be the result of an incorrect HPET table in the BIOS, or it may be just bad kernel code. The problem may affect only Nvidia HPET devices or only Nvidia MCP51 HPET devices. Without the HPET support, the interrupts from the USB controller were normal.

Also available in: Atom