Bug #5638
openSYSTEM hang
0%
Description
System:
SUN MICROSYSTEMS SUN FIRE X4275 SERVER
BIOS Configuration: American Megatrends Inc. 07060309 07/10/2013
BMC Configuration: IPMI 1.5 (KCS: Keyboard Controller Style)
Processor:
Intel(R) Xeon(R) CPU E5540 2.53GHz CPU 1
2.53GHz CPU 2
Intel(R) Xeon(R) CPU E5540
Memory size: 49144 Megabytes
Version:
OmniOS v11 r151012
We have the above machine in use.
Unfortunately, there are problems with the use of a USB stick or CF card.
The machine crashes. Regardless of whether is only read from or written to the USB bus.
test, the installation was done on a USB stick.
Simple tile installation should be on a CF card
the problem does not occur under linux kernel 3.10er or Solaris 11.1
TyP of CF Card is: SanDisk Extreme CommpactFlash Card 32GB
Filesystem Size Used Avail Use% Mounted on
/usr/lib/libc/libc_hwcap1.so.1 31G 1.7G 29G 6% /lib/libc.so.1
swap 47G 292K 47G 1% /etc/svc/volatile
swap 47G 0 47G 0% /tmp
swap 47G 32K 47G 1% /var/run
rpool/export 29G 19K 29G 1% /export
rpool/export/home 29G 19K 29G 1% /export/home
rpool 29G 24K 29G 1% /rpool
Files
Related issues
Updated by Rich Lowe over 8 years ago
The debugger not working is worrying.
You say in the synopsis that the system hangs, but in the description that it crashes. If it crashes, do you get a dump, or the ability to get any info from the debugger?
If it hangs, have you tried 'set snooping=1' in /etc/system to enable the deadman timer, to see whether that fires (and perhaps has more luck than the NMI...)
Updated by Frank B. over 8 years ago
- File cpuinfo1.png cpuinfo1.png added
- File cpuinfo2.png cpuinfo2.png added
- File cpuinfo3.png cpuinfo3.png added
- File cpuinfo4.png cpuinfo4.png added
- File cpuinfo5.png cpuinfo5.png added
- File cpuinfo6.png cpuinfo6.png added
- File stack.png stack.png added
- File status.png status.png added
- File msgbuf1.png msgbuf1.png added
- File msgbuf2.png msgbuf2.png added
Updated by Frank B. over 8 years ago
- File msgbuf3.png msgbuf3.png added
- File msgbuf4.png msgbuf4.png added
- File msgbuf5.png msgbuf5.png added
- File msgbuf6.png msgbuf6.png added
- File msgbuf7.png msgbuf7.png added
snooping is enabled.
here the issue of
:: msgbuf, :: status, and stack :: :: cpuinfo
Updated by Frank B. over 8 years ago
Hi,
Would it be right to say something about this?
please let me know if you need more info.
regards
Frank
Updated by Frank B. over 8 years ago
Hi,
unfortunately I cannot create a system dump.
Because the card or stick report a timeout and are no longer accessible.
"If you cannot save a crash dump, and are on a serial console, enter ::msgbuf, ::panicinfo, ::cpuinfo -v and::threadlist -v 10 and record the output. (Or take photos if you're unlucky enough to be at a VGA console)."
This is unfortunately limited. I had already appended what I was able to.
regards
Frank
Updated by Frank B. over 8 years ago
Hi,
What I found out is that uch consists of error already in version 5.11 OmniOS omnios-6de5e81.
regards
Frank
Updated by Frank B. over 8 years ago
Hi,
there are already any news in this regard.
or else I can do what?
is really penetrate!
Updated by Frank B. over 8 years ago
Hi,
I have my server updated to OmniOS r151014.
the file system now runs already for 1 week not as before within 3 days. (USB stick) on a CF card, it has been running for 3 days.
Is there a changelog for the kernel?
And changes were made to the USB drivers?
In the system changelog I can not find instructions on it.
regards
frank
Updated by Dan McDonald over 8 years ago
So wait. Are you now running with r151014, where '012 was not running?
As for the changelog, the best thing to do is to pull illumos-omnios, checkout the r151014 branch and do "git whatchanged origin/r151012.." with the r151014 branch checked out. There's a lot there.
Updated by Frank B. over 8 years ago
Hi,
with the version r151012 there were problems with both USB and with the installation on CF cards.
With version r151014 it runs now. apparently.
THX
Updated by Frank B. over 8 years ago
Dan McDonald wrote:
So wait. Are you now running with r151014, where '012 was not running?
As for the changelog, the best thing to do is to pull illumos-omnios, checkout the r151014 branch and do "git whatchanged origin/r151012.." with the r151014 branch checked out. There's a lot there.
Hi,
but that would be the change has been made to the system.
I would like to have an Changelog by the kernel.
To determine if there were any changes to the USB bus.
all predecessor versions I tested it gave error on the USB bus.
These seem to have been resolved.
regards
frank
Updated by Frank B. over 8 years ago
Hi,
Unfortunately, I realized that the BUG still exists!
The system now runs a little longer.
as I wrote already exists because neither problem under Linux kernel 3.10er still Solaris 11.1.
Freshly installed Omnios omnios-170cea2.
Updated by Dan McDonald over 8 years ago
Two silly questions -- do you have USB3 enabled on this guy's BIOS? And do you have C-states enabled? Both need to be turned off.
Updated by Frank B. over 8 years ago
The server does not have USB3.
the C-states can not only adjust SpeedStep.
which I have now disabled
Updated by Frank B. over 8 years ago
the error is still there.
even after disabled SpeedStep.
Nor any idea ???
cat /etc/release
OmniOS v11 r151014
Copyright 2015 OmniTI Computer Consulting, Inc. All rights reserved.
Use is subject to license terms.
uname -a
SunOS sts01 5.11 omnios-170cea2 i86pc i386 i86pc
Updated by Frank B. over 8 years ago
Summary:
We have some issues with the current version of OmniOS concerning the installation on the internal CF-card in a Sun Fire X4275.
The card disconnects (after a while of normal use) with the following error:
"SCSI: WARNING: /pci@0,0/pci108e,4845@1d,7/storage@1/disk@0,0 (sd0):
SCSI transport failed: resson 'timeout': retrying command
SCSI: WARNING: /pci@0,0/pci108e,4845@1d,7/storage@1/disk@0,0 (sd0):
SCIS transport failed: reason 'timeout': giving up"
This doesn't seem to be a hardware issue as we tried it on several servers of the same type and the error occured on all of them.
Unfortunately we cannot get a system dump as the CF-card is not accessible anymore when the error occurs.
The problem does not occur with an installation of Debian or Solaris 11.1 on the CF-card, neither with an installation of OmniOS on harddisks.
Deactivating USB3 and the C-state didn't solve the problem, as the server does not have USB3 or C-state, but only speedstep.