Project

General

Profile

Actions

Bug #13059

open

Dell R630 - X2APIC enabled cause boot hang

Added by David Fox almost 3 years ago. Updated over 1 year ago.

Status:
New
Priority:
High
Assignee:
-
Category:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

After purchasing two R630 -- I've discovered that this server is non-bootable with illumos.

The error appears to be when X2APIC is enabled in BIOS; the error: No SOF interrupts have been received this usb ehci hostcontroller is unusable and then just hangs.
If I turn off X2APIC, the system boots but I then loose HyperThreading capabilities which is bearable but not optimal.

I am seeing the same symptoms when trying to boot into FreeBSD and variants.
Linux is the only successful OS and not a big fan of the OS, If there is anything I can do to assist and that I am happy to lend the hardware if required, I'm happy to do so.

All BIOS/Firmware are up to date.
ISO: omniosce-r151034l.iso


Files

nmi.jpg (61.6 KB) nmi.jpg David Fox, 2020-09-26 07:08 PM
Actions #1

Updated by David Fox almost 3 years ago

Happens on Bleeding edge and Joysent SmartOS

Actions #2

Updated by Robert Mustacchi over 2 years ago

Apologies for not seeing this earlier. Something that might be useful here is to boot with kmdb enabled and entering it on an NMI. There are a couple of odd things that we should take apart.

In general, hyper threading and the x2apic historically haven't been related. At least on other Dell RX30 systems we haven't seen that combination happen. In this case, we should probably focus on the x2apic hang first. What would be useful is to figure out if we can drop into the kernel debugger when this happens. There are two approaches to try. The first is to use the ipmi serial console and try and inject a break (usually ~b) or inject an nmi via something like ipmitool chassis power diag. This generally require using loader to change the kmdb options.

Actions #3

Updated by David Fox over 2 years ago

Hi Robert,

I discovered I can launch a NMI via the iDRAC interface on the server. I will try this weekend and see if I can get this to work.

Actions #4

Updated by Robert Mustacchi over 2 years ago

OK, that's good to hear. I'd recommend that you try and do this over the Dell Serial over lan console if possible, as I've found that smoother than the dell vga console. Though it shouldn't matter which way inject the NMI.

Actions #5

Updated by David Fox over 2 years ago

Hi Robert,

I've attached the output via iDrac, not much it's a start at-least. But this is all I could pry out gather in terms of information.
If you could advise on what next I should proceed with I'm happy to do so.

Attached: nmi.jpg -- output of NMI

Actions #6

Updated by Jürgen Bereuter over 2 years ago

The error appears to be when X2APIC is enabled in BIOS; the error: No SOF interrupts have been received this usb ehci hostcontroller is unusable and then just hangs.

I am experiencing nearly the same behaviour on an hpe DL20 gen10 server. It does not matter, if X2APIC or HPET are disabled or not. The internal raid controller (E208i-a SR Gen10) shows no disks during the installation of omnios (bloody). I tried to use another hba in the pcie slot - there are the 2 disks, but after installation it wont boot - there are the SOF - warnings/errors in the log. The internal usb port is not recognised, when there is a usb-stick in it.
The server never freezes, only the SOF messages appear on the display (and no boot, or disk)

Actions #7

Updated by Gary Mills over 2 years ago

I reported a similar bug in 2017. It's still open. The bug report is here:

    https://www.illumos.org/issues/8684

Here's my workaround:

I found a workaround that enabled me to boot and run the hipster BE
that I upgraded last month.  The BIOS of this system contained an item
called `HPET Support'.  It was enabled.  I disabled it.  After that
change, the SOF error messages did not appear.  The the USB keyboard
and USB mouse worked normally.
Actions #8

Updated by Jürgen Bereuter over 2 years ago

Unfortunately, switching HPET off in BIOS makes no difference on the DL20 - no usb memory stick (internal usb port). Keyboard and mouse are working (with HPET on) , they are attached to the front usb.

Actions #9

Updated by Peter Kelm over 2 years ago

Not sure whether this helps but same here on our new Microservers (Gen 10+).

Actions #10

Updated by Malte Graebner over 1 year ago

Same thing happens here like Peter Kelm
HP MicroServer G10 Plus
As soon as I got some time i can try to take a look at this and post some kmdb dumps here.
Let me know if you want to see some specific infos from the kmdb.

Actions

Also available in: Atom PDF