Project

General

Profile

Actions

Bug #5638

open

SYSTEM hang

Added by Frank B. over 8 years ago. Updated over 8 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2015-02-19
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:
External Bug:

Description

System:
SUN MICROSYSTEMS SUN FIRE X4275 SERVER
BIOS Configuration: American Megatrends Inc. 07060309 07/10/2013
BMC Configuration: IPMI 1.5 (KCS: Keyboard Controller Style)
Processor:
Intel(R) Xeon(R) CPU E5540 2.53GHz CPU 1
Intel(R) Xeon(R) CPU E5540
2.53GHz CPU 2

Memory size: 49144 Megabytes

Version:
OmniOS v11 r151012

We have the above machine in use.

Unfortunately, there are problems with the use of a USB stick or CF card.

The machine crashes. Regardless of whether is only read from or written to the USB bus.

test, the installation was done on a USB stick.

Simple tile installation should be on a CF card

the problem does not occur under linux kernel 3.10er or Solaris 11.1

TyP of CF Card is: SanDisk Extreme CommpactFlash Card 32GB

Filesystem Size Used Avail Use% Mounted on
/usr/lib/libc/libc_hwcap1.so.1 31G 1.7G 29G 6% /lib/libc.so.1
swap 47G 292K 47G 1% /etc/svc/volatile
swap 47G 0 47G 0% /tmp
swap 47G 32K 47G 1% /var/run
rpool/export 29G 19K 29G 1% /export
rpool/export/home 29G 19K 29G 1% /export/home
rpool 29G 24K 29G 1% /rpool


Files

nmi.png (83.4 KB) nmi.png Frank B., 2015-02-19 03:23 PM
storage_prtconf-pv.txt (51 KB) storage_prtconf-pv.txt Frank B., 2015-02-19 03:23 PM
DSC_0532.png (803 KB) DSC_0532.png Frank B., 2015-02-19 04:16 PM
cpuinfo1.png (87.4 KB) cpuinfo1.png Frank B., 2015-02-20 08:19 AM
cpuinfo2.png (86 KB) cpuinfo2.png Frank B., 2015-02-20 08:19 AM
cpuinfo3.png (81.5 KB) cpuinfo3.png Frank B., 2015-02-20 08:19 AM
cpuinfo4.png (81.5 KB) cpuinfo4.png Frank B., 2015-02-20 08:19 AM
cpuinfo5.png (86.3 KB) cpuinfo5.png Frank B., 2015-02-20 08:19 AM
cpuinfo6.png (70.9 KB) cpuinfo6.png Frank B., 2015-02-20 08:19 AM
stack.png (167 KB) stack.png Frank B., 2015-02-20 08:19 AM
status.png (63.8 KB) status.png Frank B., 2015-02-20 08:19 AM
msgbuf1.png (119 KB) msgbuf1.png Frank B., 2015-02-20 08:19 AM
msgbuf2.png (116 KB) msgbuf2.png Frank B., 2015-02-20 08:19 AM
msgbuf3.png (117 KB) msgbuf3.png Frank B., 2015-02-20 08:21 AM
msgbuf4.png (118 KB) msgbuf4.png Frank B., 2015-02-20 08:21 AM
msgbuf5.png (116 KB) msgbuf5.png Frank B., 2015-02-20 08:21 AM
msgbuf6.png (116 KB) msgbuf6.png Frank B., 2015-02-20 08:21 AM
msgbuf7.png (115 KB) msgbuf7.png Frank B., 2015-02-20 08:21 AM

Related issues

Related to illumos gate - Bug #5991: System CrashNew2015-06-08

Actions
Actions #1

Updated by Frank B. over 8 years ago

Actions #2

Updated by Rich Lowe over 8 years ago

The debugger not working is worrying.

You say in the synopsis that the system hangs, but in the description that it crashes. If it crashes, do you get a dump, or the ability to get any info from the debugger?

If it hangs, have you tried 'set snooping=1' in /etc/system to enable the deadman timer, to see whether that fires (and perhaps has more luck than the NMI...)

Actions #4

Updated by Frank B. over 8 years ago

snooping is enabled.

here the issue of

:: msgbuf, :: status, and stack :: :: cpuinfo

Actions #5

Updated by Frank B. over 8 years ago

Hi,

Would it be right to say something about this?

please let me know if you need more info.

regards
Frank

Actions #6

Updated by Dan McDonald over 8 years ago

Can you upload the system dump?

Actions #7

Updated by Frank B. over 8 years ago

Hi,

unfortunately I cannot create a system dump.
Because the card or stick report a timeout and are no longer accessible.

"If you cannot save a crash dump, and are on a serial console, enter ::msgbuf, ::panicinfo, ::cpuinfo -v and::threadlist -v 10 and record the output. (Or take photos if you're unlucky enough to be at a VGA console)."
This is unfortunately limited. I had already appended what I was able to.

regards
Frank

Actions #8

Updated by Frank B. over 8 years ago

Hi,

What I found out is that uch consists of error already in version 5.11 OmniOS omnios-6de5e81.

regards
Frank

Actions #9

Updated by Frank B. over 8 years ago

Hi,

there are already any news in this regard.

or else I can do what?

is really penetrate!

Actions #10

Updated by Frank B. over 8 years ago

Hi,

I have my server updated to OmniOS r151014.

the file system now runs already for 1 week not as before within 3 days. (USB stick) on a CF card, it has been running for 3 days.

Is there a changelog for the kernel?
And changes were made to the USB drivers?

In the system changelog I can not find instructions on it.

regards
frank

Actions #11

Updated by Dan McDonald over 8 years ago

So wait. Are you now running with r151014, where '012 was not running?

As for the changelog, the best thing to do is to pull illumos-omnios, checkout the r151014 branch and do "git whatchanged origin/r151012.." with the r151014 branch checked out. There's a lot there.

Actions #12

Updated by Frank B. over 8 years ago

Hi,
with the version r151012 there were problems with both USB and with the installation on CF cards.
With version r151014 it runs now. apparently.

THX

Actions #13

Updated by Frank B. over 8 years ago

Dan McDonald wrote:

So wait. Are you now running with r151014, where '012 was not running?

As for the changelog, the best thing to do is to pull illumos-omnios, checkout the r151014 branch and do "git whatchanged origin/r151012.." with the r151014 branch checked out. There's a lot there.

Hi,

but that would be the change has been made to the system.

I would like to have an Changelog by the kernel.

To determine if there were any changes to the USB bus.
all predecessor versions I tested it gave error on the USB bus.
These seem to have been resolved.

regards
frank

Actions #14

Updated by Frank B. over 8 years ago

Hi,

Unfortunately, I realized that the BUG still exists!

The system now runs a little longer.
as I wrote already exists because neither problem under Linux kernel 3.10er still Solaris 11.1.

Freshly installed Omnios omnios-170cea2.

Actions #15

Updated by Dan McDonald over 8 years ago

Two silly questions -- do you have USB3 enabled on this guy's BIOS? And do you have C-states enabled? Both need to be turned off.

Actions #16

Updated by Frank B. over 8 years ago

The server does not have USB3.
the C-states can not only adjust SpeedStep.
which I have now disabled

Actions #17

Updated by Frank B. over 8 years ago

the error is still there.

even after disabled SpeedStep.

Nor any idea ???

cat /etc/release
OmniOS v11 r151014
Copyright 2015 OmniTI Computer Consulting, Inc. All rights reserved.
Use is subject to license terms.
uname -a
SunOS sts01 5.11 omnios-170cea2 i86pc i386 i86pc
Actions #18

Updated by Frank B. over 8 years ago

Summary:

We have some issues with the current version of OmniOS concerning the installation on the internal CF-card in a Sun Fire X4275.

The card disconnects (after a while of normal use) with the following error:

"SCSI: WARNING: /pci@0,0/pci108e,4845@1d,7/storage@1/disk@0,0 (sd0):
SCSI transport failed: resson 'timeout': retrying command
SCSI: WARNING: /pci@0,0/pci108e,4845@1d,7/storage@1/disk@0,0 (sd0):
SCIS transport failed: reason 'timeout': giving up"

This doesn't seem to be a hardware issue as we tried it on several servers of the same type and the error occured on all of them.

Unfortunately we cannot get a system dump as the CF-card is not accessible anymore when the error occurs.

The problem does not occur with an installation of Debian or Solaris 11.1 on the CF-card, neither with an installation of OmniOS on harddisks.

Deactivating USB3 and the C-state didn't solve the problem, as the server does not have USB3 or C-state, but only speedstep.

Actions #19

Updated by Frank B. over 8 years ago

Actions

Also available in: Atom PDF