Project

General

Profile

Bug #3281

frequent freezes on oi_151a.1.7

Added by Richard PALO over 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
2012-10-17
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Not sure where to place this, so I start here.
Since upgraded to a7 on a AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
running in 64-bit, I've been noticing frequent freezes that last a moment or two
then go away.
Thinking perhaps it was related to the nvidia driver, I upgraded to Nvidia 304.51,
but it doesn't really change anything( and I still get the following sort of messages )

Oct 17 07:31:28 x3200 nvidia: [ID 702911 kern.notice] NOTICE: NVRM: nv_segmap: ddi_devmap_segmap() failed (6)
Oct 17 07:31:28 x3200 last message repeated 3 times
Oct 17 07:33:27 x3200 nvidia: [ID 702911 kern.notice] NVRM: Xid (0000:02:00): 8, Channel 00000000

My gut feeling believes its a graphics system problem, but not being sure, what can I use to get more useful debugging statistics, I loaded a nvidia bug report log.

Anybody else experiencing this, and/or any tips on how to get more useful information to further isolate the problem?


Files

nvidia-x3200-bug-report.log.gz (37.6 KB) nvidia-x3200-bug-report.log.gz nvidia bug report Richard PALO, 2012-10-17 05:55 AM

History

#1

Updated by Richard PALO over 7 years ago

Thought I would try powertop to catch a couple freezes, not sure if its real useful.

                                                                   OpenIndiana PowerTOP version 1.2                                                                   

C-states (idle power)    Avg    Residency                                                  P-states (frequencies)
C0 (cpu    running)        (6.2%)                                                     2009    Mhz       100.0%
C1            1.8ms    (93.8%) 

Wakeups-from-idle per second: 1290.0   389.2rval: 5.0s                                                                                                                
no ACPI    power usage estimate available

Top causes for wakeups:                                                                                                                                               
17.3% (223.4)               <kernel> :  genunix`callout_realtime                                                                                                      
15.9% (205.2)                  sched :  <xcalls> unix`dtrace_xcall_func                                                                                               
15.7% (202.8)               <kernel> :  genunix`cv_wakeup                                                                                                             
 7.8% (100.0)               <kernel> :  unix`cbe_hres_tick                                                                                                            
 7.8% (100.0)               <kernel> :  genunix`clock                                                                                                                 
 7.8% (100.0)               <kernel> :  genunix`cyclic_timer                                                                                                          
 6.5% ( 83.4)               <kernel> :  genunix`callout_normal                                                                                                        
 4.7% ( 61.0)            <interrupt> :  nvidia#0                                                                                                                      
 3.9% ( 50.0)               <kernel> :  SDC`sysdc_update                                                                                                              
 2.9% ( 37.6)               <kernel> :  genunix`lwp_timer_timeout                                                                                                     
 2.9% ( 37.4)            <interrupt> :  ohci#0                                                                                                                        
 2.6% ( 33.2)               <kernel> :  ohci`ohci_handle_root_hub_status_change                                                                                       
 1.2% ( 16.0)            <interrupt> :  nv_sata#0                                                                                                                     
 0.8% ( 10.0)               <kernel> :  ata`ghd_timeout                                                                                                               
 0.6% (  8.0)                  sched :  <xcalls> unix`hati_demap_func                                                                                                 
 0.3% (  4.0)               <kernel> :  ehci`ehci_handle_root_hub_status_change                                                                                       
 0.3% (  4.0)               <kernel> :  genunix`schedpaging                                                                                                           
 0.2% (  2.4)               <kernel> :  genunix`lbolt_cyclic                                                                                                          
 0.2% (  2.0)               <kernel> :  genunix`deadman                                                                                                               
 0.1% (  1.4)               <kernel> :  sd`sd_start_retry_command                                                                                                     
 0.1% (  1.2)               <kernel> :  sd`sd_pm_idletimeout_handler                                                                                                  
 0.1% (  1.2)            <interrupt> :  nge#0                                                                                                                         
 0.1% (  1.0)               <kernel> :  TS`ts_update                                                                                                                  
 0.1% (  1.0)               <kernel> :  nvidia`nvidia_rc_timer                                                                                                        
 0.1% (  1.0)               <kernel> :  genunix`clock_realtime_fire                                                                                                   
 0.1% (  1.0)            <interrupt> :  ehci#0                                                                                                                        
 0.0% (  0.4)               <kernel> :  cpu.generic`gcpu_ntv_mca_poll_cyclic                                                                                          
 0.0% (  0.4)            <interrupt> :  ata#0                                                                                                                         
 0.0% (  0.2)               <kernel> :  kcf`rnd_handler                                                                                                               
 0.0% (  0.2)               <kernel> :  swrand`rnd_handler                                                                                                            
 0.0% (  0.2)               <kernel> :  ip`igmp_slowtimo                                                                                                              
 0.0% (  0.2)               <kernel> :  nv_sata`nv_timeout                                                                                                            
 0.0% (  0.2)               <kernel> :  ip`mld_slowtimo                                                                                                               

Q - Quit R - Refresh

#2

Updated by Richard PALO over 7 years ago

I become suspicious when I noticed some font problems in a terminal window, so I took out the Geforce 7300GS card installed since years and reverted back to the mainboard integrated vga.

I haven't noticed the "freezes" for a while so I'm optimistic that it was a hardware problem.

Too bad, neither nvidia nor Xorg seem to use the fault management diagnostic subsystem,
at least nothing shows in the fm dump or statistics along these lines.

can probably close this report as a false alarm due to hw problems.

#3

Updated by Richard PALO over 7 years ago

  • Status changed from New to Feedback
#4

Updated by Marion Hakanson about 7 years ago

I've had similar problems, though I'm not sure it is an nVidia or Xorg problem directly. I think consensus is heading toward some problem with devices which share the same interrupt (IRQ), or perhaps sharing of interrupts with the nVidia device.

See the Closed bug #1625 for my report. On one of my systems with the problem, I was able to reduce (and almost eliminate) the number of occurrences by disabling some of the USB ports which happened to share the same IRQ as the nVidia card. I do have a forced crash dump taken while a machine was in the "console unresponsive" state, if someone would find it useful.

You could look at interrupt assignments using:
echo "::interrupts -d" | mdb -k

#5

Updated by Richard PALO about 7 years ago

Marion Hakanson wrote:

I've had similar problems, though I'm not sure it is an nVidia or Xorg problem directly. I think consensus is heading toward some problem with devices which share the same interrupt (IRQ), or perhaps sharing of interrupts with the nVidia device.

See the Closed bug #1625 for my report. On one of my systems with the problem, I was able to reduce (and almost eliminate) the number of occurrences by disabling some of the USB ports which happened to share the same IRQ as the nVidia card. I do have a forced crash dump taken while a machine was in the "console unresponsive" state, if someone would find it useful.

You could look at interrupt assignments using:
echo "::interrupts -d" | mdb -k

Yes, this interrupt problem I've experienced since opensolaris days... since I took out the nvidia video card and went back to the on-board vga, I'd say I haven't noticed any real freezes whereas before it was quite frustrating.
I seem to recall that a Sun engineer mentioned to me that nvidia chipsets and MSI interrupt handling where quite a challenge and that they weren't really go spend any more time on dealing with it, apparently a moving target. oh well.

#6

Updated by Luca De Pandis about 7 years ago

Richard PALO wrote:

Not sure where to place this, so I start here.
Since upgraded to a7 on a AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
running in 64-bit, I've been noticing frequent freezes that last a moment or two
then go away.
Thinking perhaps it was related to the nvidia driver, I upgraded to Nvidia 304.51,
but it doesn't really change anything( and I still get the following sort of messages )
[...]

My gut feeling believes its a graphics system problem, but not being sure, what can I use to get more useful debugging statistics, I loaded a nvidia bug report log.

Anybody else experiencing this, and/or any tips on how to get more useful information to further isolate the problem?

Me too, Richard.
nvidia: [ID 702911 kern.notice] NOTICE: NVRM: nv_devmap: devmap_devmem_setup() failed (22)
nvidia: [ID 702911 kern.notice] NOTICE: NVRM: nv_segmap: ddi_devmap_segmap() failed (6)

SunOS xxxxxx 5.11 oi_151a7 i86pc i386 i86pc Solaris

This message is appeared since i upgraded my graphics driver from 173.14.32 to 173.14.35.
Also the latest version (173.14.36) shows the same message.

When i use the 173.14.32, this message disappears.

#7

Updated by Ken Mays almost 7 years ago

  • Status changed from Feedback to Closed

This was mentioned by Sun engineering awhile ago to downgrade the Nvidia graphics driver as mentioned by another user. There was a specifc crash issue with the newer Nvidia driver releases which is something Oracle may have fixed but was not done to OpenSolaris before the kernel gate closure. Workaround, just use an older Nvidia driver release (like Nvidia <=173.14.32) as needed for now.
See: ftp://download.nvidia.com/solaris/173.14.32/NVIDIA-Solaris-x86-173.14.32.run

Also available in: Atom PDF