Project

General

Profile

Actions

Bug #1439

closed

Panic when booting on IBM SystemX

Added by Jens Rosenboom over 12 years ago. Updated almost 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
kernel
Start date:
2011-09-01
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

Right after loading the kernel, there is a panic "vmem_hash_delete: bad free" called from fm_smb_fmacompat, see attached screenshot. The code in uts/intel/os/fmsmb.c:562 seems to do more allocs than frees anyway, can someone please clean this up?


Files

Bildschirmfoto_2011-08-31_um_15.11.05.png (26.7 KB) Bildschirmfoto_2011-08-31_um_15.11.05.png Screenshot Jens Rosenboom, 2011-09-01 09:04 AM
sm.patch (1.15 KB) sm.patch Jens Rosenboom, 2011-09-05 07:05 PM
sm.patch2 (1.15 KB) sm.patch2 Jens Rosenboom, 2011-09-05 07:33 PM
issue_00_similar_panic_messages_and_stack.jpg (212 KB) issue_00_similar_panic_messages_and_stack.jpg same error on console Jon Strabala, 2012-01-04 01:47 AM
issue_UEFI__1_13_also_fails.jpg (104 KB) issue_UEFI__1_13_also_fails.jpg it happens on UEFI 1.13 Jon Strabala, 2012-01-04 01:47 AM
issue_work_around.jpg (106 KB) issue_work_around.jpg up and running on an oi_151a image using a kmdb trick Jon Strabala, 2012-01-04 02:21 AM
Actions #1

Updated by Jens Rosenboom over 12 years ago

Some more findings:

The problem only occurs with recent UEFI versions, 1.11 and 1.12 for my x3650M3. After a downgrade to 1.10, the system runs without a problem.

After inserting a "goto bad" right at the start of fm_smb_fmacompat, the machine also works fine with the 1.12 version.

Looking at the smbios data, I can even guess where the crash comes from: The final type 11 record doesn't have a string attached, leaving cnt=0 when the "for i" loop finishes and thus giving bad parameters for the kmem_free call.

Actions #2

Updated by Jens Rosenboom over 12 years ago

I made a patch for this, can someone please check this? Works for me, but no guarantees otherwise. Might be good if it would be tested on a board that has this SUNW-PRMS-1 string, too

Actions #3

Updated by Jens Rosenboom over 12 years ago

New version, thanks to richlowe for spotting the mistake

Actions #4

Updated by Jon Strabala almost 12 years ago

Hit the same issue on a brand new IBM x3550 M3 but on a newer BIOS, e.g. UEFI 1.13 (build date 9/23/2011) when doing a fresh install.

Since I can not install oi_151a, I can not apply the patch. Looks like I need to make my own ISO (sort of a pain) until there is a new ISO release (hopefully with the patch in it).

Actions #5

Updated by Jon Strabala almost 12 years ago

Looking at the file ./usr/src/uts/intel/os/fmsmb.c it occurred to me that x86gentopo_legacy can be set in /etc/system and this lead me to ask furhter questions in the #illumos IRC, I determined with help from (Daemar) that I could indeed boot from grub by adding -kd to drop into a kmdb prompt early on and then type the following:

x86gentopo_legacy /W 1

to disable the path that causes the 'panic' followed immediately by

:c 

to leave kmdb and continue execution. This got me to an installer! I have attached an image of the system up and running.

Then I can use the kmdb trick until I can get an /etc/system setup with a x86gentopo_legacy fix and then eventually a make and boot into new BE based with the patch described above (where the kmdb trick or a modified /etc/system is not needed).

Actions #6

Updated by Jon Strabala almost 12 years ago

As discussed the oi_151a live CD installs and runs on my IBM x3550 M3 with UEFI (BIOS) rev 1.13 in the default oi_151a BE (openindiana) as long as I have the following setting in my /etc/system:

set x86gentopo_legacy=1

Once I built and booted into a new BE (nightly-2012-01-04-uefi) with the above patch applied, e.g. https://www.illumos.org/issues/1439#note-3 , I could then remove the above setting from my /etc/system file and the system still worked just fine.

root@systemx:~# beadm list
BE                      Active Mountpoint Space Policy Created
nightly-2012-01-04-uefi NR     /          9.00G static 2012-01-04 17:03
openindiana             -      -          58.8M static 2012-01-03 19:06

Thus for what its worth patch works fine on another system (albeit another IBM System X).

Actions #7

Updated by Jon Strabala almost 12 years ago

I can apply patch https://www.illumos.org/issues/1439#note-3 to both an IBM SystemX (above) and a non-IBM system (e.g. Supermicro X9SCA-F nightly BE nightly-2012-01-17-uefi) in both cases the resulting BE seems to work just fine.

I do not have other machines available to test. Is there anything else I can do to help get this patch integrated?

Actions #8

Updated by Rich Lowe almost 12 years ago

  • Category set to kernel
  • Status changed from New to Resolved
  • % Done changed from 0 to 100
  • Tags deleted (needs-triage)

Resolved in r13613 commit:8abd7b12d92f

Actions

Also available in: Atom PDF