Bug #1439

Panic when booting on IBM SystemX

Added by Jens Rosenboom over 3 years ago. Updated over 3 years ago.

Status:ResolvedStart date:2011-09-01
Priority:NormalDue date:
Assignee:-% Done:

100%

Category:kernel
Target version:-
Difficulty:Medium Tags:

Description

Right after loading the kernel, there is a panic "vmem_hash_delete: bad free" called from fm_smb_fmacompat, see attached screenshot. The code in uts/intel/os/fmsmb.c:562 seems to do more allocs than frees anyway, can someone please clean this up?

Bildschirmfoto_2011-08-31_um_15.11.05.png - Screenshot (26.7 KB) Jens Rosenboom, 2011-09-01 09:04 AM

sm.patch Magnifier (1.15 KB) Jens Rosenboom, 2011-09-05 07:05 PM

sm.patch2 (1.15 KB) Jens Rosenboom, 2011-09-05 07:33 PM

issue_00_similar_panic_messages_and_stack.jpg - same error on console (212 KB) Jon Strabala, 2012-01-04 01:47 AM

issue_UEFI__1_13_also_fails.jpg - it happens on UEFI 1.13 (104 KB) Jon Strabala, 2012-01-04 01:47 AM

issue_work_around.jpg - up and running on an oi_151a image using a kmdb trick (106 KB) Jon Strabala, 2012-01-04 02:21 AM

History

#1 Updated by Jens Rosenboom over 3 years ago

Some more findings:

The problem only occurs with recent UEFI versions, 1.11 and 1.12 for my x3650M3. After a downgrade to 1.10, the system runs without a problem.

After inserting a "goto bad" right at the start of fm_smb_fmacompat, the machine also works fine with the 1.12 version.

Looking at the smbios data, I can even guess where the crash comes from: The final type 11 record doesn't have a string attached, leaving cnt=0 when the "for i" loop finishes and thus giving bad parameters for the kmem_free call.

#2 Updated by Jens Rosenboom over 3 years ago

I made a patch for this, can someone please check this? Works for me, but no guarantees otherwise. Might be good if it would be tested on a board that has this SUNW-PRMS-1 string, too

#3 Updated by Jens Rosenboom over 3 years ago

New version, thanks to richlowe for spotting the mistake

#4 Updated by Jon Strabala over 3 years ago

Hit the same issue on a brand new IBM x3550 M3 but on a newer BIOS, e.g. UEFI 1.13 (build date 9/23/2011) when doing a fresh install.

Since I can not install oi_151a, I can not apply the patch. Looks like I need to make my own ISO (sort of a pain) until there is a new ISO release (hopefully with the patch in it).

#5 Updated by Jon Strabala over 3 years ago

Looking at the file ./usr/src/uts/intel/os/fmsmb.c it occurred to me that x86gentopo_legacy can be set in /etc/system and this lead me to ask furhter questions in the #illumos IRC, I determined with help from (Daemar) that I could indeed boot from grub by adding -kd to drop into a kmdb prompt early on and then type the following:

x86gentopo_legacy /W 1

to disable the path that causes the 'panic' followed immediately by

:c 

to leave kmdb and continue execution. This got me to an installer! I have attached an image of the system up and running.

Then I can use the kmdb trick until I can get an /etc/system setup with a x86gentopo_legacy fix and then eventually a make and boot into new BE based with the patch described above (where the kmdb trick or a modified /etc/system is not needed).

#6 Updated by Jon Strabala over 3 years ago

As discussed the oi_151a live CD installs and runs on my IBM x3550 M3 with UEFI (BIOS) rev 1.13 in the default oi_151a BE (openindiana) as long as I have the following setting in my /etc/system:

set x86gentopo_legacy=1

Once I built and booted into a new BE (nightly-2012-01-04-uefi) with the above patch applied, e.g. https://www.illumos.org/issues/1439#note-3 , I could then remove the above setting from my /etc/system file and the system still worked just fine.

root@systemx:~# beadm list
BE                      Active Mountpoint Space Policy Created
nightly-2012-01-04-uefi NR     /          9.00G static 2012-01-04 17:03
openindiana             -      -          58.8M static 2012-01-03 19:06

Thus for what its worth patch works fine on another system (albeit another IBM System X).

#7 Updated by Jon Strabala over 3 years ago

I can apply patch https://www.illumos.org/issues/1439#note-3 to both an IBM SystemX (above) and a non-IBM system (e.g. Supermicro X9SCA-F nightly BE nightly-2012-01-17-uefi) in both cases the resulting BE seems to work just fine.

I do not have other machines available to test. Is there anything else I can do to help get this patch integrated?

#8 Updated by Rich Lowe over 3 years ago

  • Category set to kernel
  • Status changed from New to Resolved
  • % Done changed from 0 to 100
  • Tags deleted (needs-triage)

Resolved in r13613 commit:8abd7b12d92f

Also available in: Atom