Project

General

Profile

Bug #9977

Unable to boot from new BE since end of October 2018

Added by r a 5 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
2018-11-12
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

I am currently running OpenIndiana Hipster illumos-229852ddf2 created on the 11th August 2018, I have successfully updated in September and October to newer versions of OpenIndiana but as they can be used to successfully run VirtualBox I have deleted them. When I completed a successful upgrade in November I found that upon attempting to boot OpenIndiana the computer rebooted.
I have mounted the BE and attempted to find any log file which has been updated but none appear to have any indication that the BE did actually attempt to start, no core files are created. I have attempted to update, the update process completes normally but the BE fails to boot, when I do perform an update I always activate the last known good BE and then select the new BE via the Grub Menu. If the BE is okay, then I will activate the new version.


Files

201812261131_screenshot.jpg (564 KB) 201812261131_screenshot.jpg Screenshot of frozen screen r a, 2018-12-26 11:41 AM
crash.0 (581 KB) crash.0 crash r a, 2019-01-04 10:04 PM

History

#1

Updated by Toomas Soome 5 months ago

r a wrote:

I am currently running OpenIndiana Hipster illumos-229852ddf2 created on the 11th August 2018, I have successfully updated in September and October to newer versions of OpenIndiana but as they can be used to successfully run VirtualBox I have deleted them. When I completed a successful upgrade in November I found that upon attempting to boot OpenIndiana the computer rebooted.
I have mounted the BE and attempted to find any log file which has been updated but none appear to have any indication that the BE did actually attempt to start, no core files are created. I have attempted to update, the update process completes normally but the BE fails to boot, when I do perform an update I always activate the last known good BE and then select the new BE via the Grub Menu. If the BE is okay, then I will activate the new version.

boot with -k option and hopefully you can get the panic message.

#2

Updated by r a 5 months ago

Attempted boot with -k, system locked up during boot and required a power cycle in order to regain control.
Mounted BE and attempted to look for error messages but the logs were not updated.

#3

Updated by Toomas Soome 5 months ago

r a wrote:

Attempted boot with -k, system locked up during boot and required a power cycle in order to regain control.
Mounted BE and attempted to look for error messages but the logs were not updated.

Boot with flags:

-k -B prom_debug=true,kbm_flags=true

and post the screenshot where it is hung.

#4

Updated by r a 4 months ago

Modified the boot flags to include -k -B prom_debug=true,kbm_flags=true and during the boot no error was displayed and the system rebooted automatically, tried adding -v still no error message displayed. Thinking that I had written down the kbm_flags incorrectly I added the flags -k -B prom_debug=true,kdm_flags=true,kbm_flags=true this resulted in the system hanging at the point where it automatically restarted and required the system to be powered off.

I mounted the BE in /tmp/broken and looked for core or other files which could have been modified but I could not find any file with a timestamp during the time of my repeated attempts to boot the new environment.

#5

Updated by Toomas Soome 4 months ago

r a wrote:

Modified the boot flags to include -k -B prom_debug=true,kbm_flags=true and during the boot no error was displayed and the system rebooted automatically, tried adding -v still no error message displayed. Thinking that I had written down the kbm_flags incorrectly I added the flags -k -B prom_debug=true,kdm_flags=true,kbm_flags=true this resulted in the system hanging at the point where it automatically restarted and required the system to be powered off.

I mounted the BE in /tmp/broken and looked for core or other files which could have been modified but I could not find any file with a timestamp during the time of my repeated attempts to boot the new environment.

whoops, sorry, it was kbm_debug=true

anyhow, it would be good to get screenshot from the point of crash.

#6

Updated by r a 4 months ago

Modified the boot command, instead of automatically rebooting the screen froze. see attached picture

#7

Updated by r a 4 months ago

r a wrote:

Modified the boot command, instead of automatically rebooting the screen froze. see attached picture

#8

Updated by r a 4 months ago

In order to recover, the computer had to be powered off and then powered back on.

#9

Updated by r a 3 months ago

I used dumpadm to be configured as follows

  1. dumpadm
    Dump content: kernel pages
    Dump device: /dev/zvol/dsk/rpool/dump (dedicated)
    Savecore directory: /var/crash/openindiana
    Savecore enabled: yes
    Save compressed: on

Then I deleted the latest non booting boot environment and created a new one and attempted to boot with the new inherited dumpadm settings but made sure the good boot was activated. Rebooted and selected the newest boot environment which failed to boot sucessfully. After the computer rebooted in the working BE I checked the /var/crash/openindiana directory and found two files.
rw-r--r- 1 root root 2 Jan 4 21:40 bounds
rw-r--r- 1 root root 578945024 Jan 4 21:40 vmdump.0

#10

Updated by r a 3 months ago

After reading how to process the vmdump.0 file, I created the unix.0 and vmcore.0 files then processed them with

echo '::panicinfo\n::cpuinfo -v\n::threadlist -v 10\n::msgbuf\n*panic_thread::findstack -v\n::stacks' | mdb 0 > ~/crash.0

crash.0 attached

#11

Updated by Michal Nowak 3 months ago

The trace looks to me similar to #9761, which was solved by porting VirtualBox to OI.

Remove VirtualBox provided by Oracle, including it's kernel modules, update to the latest OpenIndiana, install system/virtualbox.

#12

Updated by r a 3 months ago

I thought the same thing on Friday afternoon, so created a clone on my current BE, then booted from the cloned environment, removed VirtualBox 5.2.22, then performed a package refresh and package update creating a new BE. Rebooted and selected new BE without VirtualBox and the latest code.

Unfortunately the new BE failed to boot and rebooted in the same way it had with VirtualBox present, no /var/crash/openindiana/vmdump file was created.

#13

Updated by Michal Nowak 3 months ago

If all VirtualBox kernel modules were removed, it ought to be a different crash.

Maybe some remnants of Oracle-shipped VirtualBox were still present? Try to search the filesystem again, e.g. /opt, is my only suggestion... Or get crash info about the (new) crash.

#14

Updated by r a 3 months ago

I made sure that VirtualBox was removed because I thought it was the cause of the problem.
The settings to generate a crash dump were in place when I cloned the current BE, the clone inherited the same settings which I checked before performing an pkg update to generate a new BE.
The new BE does not generate a vmdump.0 file in /var/crash/openindiana in its own BE, in the BE with VirtualBox or the original BE. Not sure why the vmdump.0 is not being created.

#15

Updated by Michal Nowak 3 months ago

Maybe "Gathering Information from a Running System" (https://wiki.illumos.org/display/illumos/How+To+Report+Problems) is the way to go?

#16

Updated by r a 3 months ago

Following the guidance in "How to Report Problems", I deleted the BE of the lastest update and booted into the Clone of my working BE from which I had uninstalled VirtualBox. I verfied that all VirtualBox components where removed and then I performed a pkg update -v --no-be-activate --be-name openindiana201901-01 to create the new BE.

Upon rebooting and selected the openindiana201901-01 enivronment the system crashed and rebooted, I tried to generate a savecore as one was not automatically generated and it was identical to the previous one I had created 15 days ago.

So I proceed to follow the additional instructions to edit the Grub boot environment, by deleting the splashimage line and then editting the kernel line by removing console=graphics and adding -kvd -m and booting. Once at the kmdb> prompt I followed the instructions to moddebug/W 80000000 press enter, snooping/W 1 and pressing enter and then :c and enter.

My system booted goto the login prompt and then proceeded to start lightdm and I could login normally. I cloned the openindiana201901-01 environment to create openindiana201901-02. Then attempted to boot the openindiana201901-02 initially by removing the splashimage line in the Grub boot settings and the system crashed and restarted on boot. I then modified the Grub boot to remove from the kernel line "console=graphics" and the system booted into login screen.

#17

Updated by r a 3 months ago

As a hack to get the BE to boot without editing each time I modified the /rpool/boot/grub/menu.lst and removed the ",console=graphics" entry at the end of the line.

#18

Updated by Michal Nowak 3 months ago

Great, so this is actually about console=graphics option, unrelated to VirtualBox.

If you want to gather information on the crash, you may want to temporarily boot to kmdb with console=graphics option present (if kmdb actually appears, of which I am not sure).

#19

Updated by r a 3 months ago

Tried to boot kmdb with console=graphics, the system booted but all I had was a flashing _ in the upper left hand corner. Eventually I had to power cycle my computer.

Not sure how to proceed to collect meaningful data to understand why the console=graphics is causing a problem.
Could the problem be related to changes made to support EFI booting?

#20

Updated by r a 3 months ago

Performed a "pkg refresh" , then a "pkg update -v --be-name 201902-01" which worked perfectly and it correctly inherited the settings from the current BE.

BE Active Mountpoint Space Policy Created
oihipster201902-01 NR / 178G static 2019-02-01 11:11

With root privilege I modified the /rpool/boot/grub/menu.lst file to add ",console=graphics" to see if it now works

title oihipster201902-01
findroot (pool_rpool,0,a)
bootfs rpool/ROOT/oihipster201902-01
splashimage /boot/splashimage.xpm
foreground d25f00
background 115d93
kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS,console=graphics
module$ /platform/i86pc/$ISADIR/boot_archive #============ End of LIBBE entry =============

Unfortunately the splash screen image appears, then the orange block starts to complete the first sweep, before it gets to the end, there is a beep and computer restarts.

It is clear that changes introduced in the package updates end of September 2018 through to October 2018 has broken the console=graphics functionality so as long as it is not present my computer does boot.

Also available in: Atom PDF