Bug #13059
openDell R630 - X2APIC enabled cause boot hang
0%
Description
After purchasing two R630 -- I've discovered that this server is non-bootable with illumos.
The error appears to be when X2APIC is enabled in BIOS; the error: No SOF interrupts have been received this usb ehci hostcontroller is unusable and then just hangs.
If I turn off X2APIC, the system boots but I then loose HyperThreading capabilities which is bearable but not optimal.
I am seeing the same symptoms when trying to boot into FreeBSD and variants.
Linux is the only successful OS and not a big fan of the OS, If there is anything I can do to assist and that I am happy to lend the hardware if required, I'm happy to do so.
All BIOS/Firmware are up to date.
ISO: omniosce-r151034l.iso
Files
Updated by David Fox almost 3 years ago
Happens on Bleeding edge and Joysent SmartOS
Updated by Robert Mustacchi over 2 years ago
Apologies for not seeing this earlier. Something that might be useful here is to boot with kmdb enabled and entering it on an NMI. There are a couple of odd things that we should take apart.
In general, hyper threading and the x2apic historically haven't been related. At least on other Dell RX30 systems we haven't seen that combination happen. In this case, we should probably focus on the x2apic hang first. What would be useful is to figure out if we can drop into the kernel debugger when this happens. There are two approaches to try. The first is to use the ipmi serial console and try and inject a break (usually ~b) or inject an nmi via something like ipmitool chassis power diag
. This generally require using loader to change the kmdb options.
Updated by David Fox over 2 years ago
Hi Robert,
I discovered I can launch a NMI via the iDRAC interface on the server. I will try this weekend and see if I can get this to work.
Updated by Robert Mustacchi over 2 years ago
OK, that's good to hear. I'd recommend that you try and do this over the Dell Serial over lan console if possible, as I've found that smoother than the dell vga console. Though it shouldn't matter which way inject the NMI.
Updated by David Fox over 2 years ago
Hi Robert,
I've attached the output via iDrac, not much it's a start at-least. But this is all I could pry out gather in terms of information.
If you could advise on what next I should proceed with I'm happy to do so.
Attached: nmi.jpg -- output of NMI
Updated by Jürgen Bereuter over 2 years ago
The error appears to be when X2APIC is enabled in BIOS; the error: No SOF interrupts have been received this usb ehci hostcontroller is unusable and then just hangs.
I am experiencing nearly the same behaviour on an hpe DL20 gen10 server. It does not matter, if X2APIC or HPET are disabled or not. The internal raid controller (E208i-a SR Gen10) shows no disks during the installation of omnios (bloody). I tried to use another hba in the pcie slot - there are the 2 disks, but after installation it wont boot - there are the SOF - warnings/errors in the log. The internal usb port is not recognised, when there is a usb-stick in it.
The server never freezes, only the SOF messages appear on the display (and no boot, or disk)
Updated by Gary Mills over 2 years ago
I reported a similar bug in 2017. It's still open. The bug report is here:
https://www.illumos.org/issues/8684
Here's my workaround:
I found a workaround that enabled me to boot and run the hipster BE that I upgraded last month. The BIOS of this system contained an item called `HPET Support'. It was enabled. I disabled it. After that change, the SOF error messages did not appear. The the USB keyboard and USB mouse worked normally.
Updated by Jürgen Bereuter over 2 years ago
Unfortunately, switching HPET off in BIOS makes no difference on the DL20 - no usb memory stick (internal usb port). Keyboard and mouse are working (with HPET on) , they are attached to the front usb.
Updated by Peter Kelm over 2 years ago
Not sure whether this helps but same here on our new Microservers (Gen 10+).
Updated by Malte Graebner over 1 year ago
Same thing happens here like Peter Kelm
HP MicroServer G10 Plus
As soon as I got some time i can try to take a look at this and post some kmdb dumps here.
Let me know if you want to see some specific infos from the kmdb.