Bug #7716

drm hang with ring dumps in messages

Added by Marcel Telka 8 months ago. Updated 6 months ago.

Status:ClosedStart date:2017-01-02
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Difficulty:Medium Tags:needs-triage

Description

I face the drm hang with the flood of ring dumps in the /var/adm/messages (attached). After the hang the screen shows the distorted image and Xorg core dumps. The only way I found how to get out of this is reboot.

Here is the beginning of the messages log:

Jan  2 11:41:32 telcontar drm: [ID 300979 kern.warning] WARNING: [drm:ring_dump:881] Dump render ring ring
Jan  2 11:41:32 telcontar drm: [ID 834546 kern.warning] WARNING: [drm:ring_dump:882] HEAD 0x14d04 TAIL 0x14e28
Jan  2 11:41:32 telcontar drm: [ID 628728 kern.warning] WARNING: [drm:ring_dump:883] seq 3742159
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a34]: 0x12044
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a38]: 0x3919c9
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a3c]: 0x0
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a40]: 0x11000001
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a44]: 0x22040
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a48]: 0x3919c9
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a4c]: 0x0
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a50]: 0x10800001
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a54]: 0x80
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a58]: 0x3919c9
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a5c]: 0x1000000
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a60]: 0xb160001
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a64]: 0x3919c9
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a68]: 0x0
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a6c]: 0x0
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a70]: 0x7a000003
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a74]: 0x100002
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a78]: 0x261084
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a7c]: 0x0
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a80]: 0x0
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a84]: 0x0
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a88]: 0x7a000003
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a8c]: 0x4000
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a90]: 0x261084
Jan  2 11:41:32 telcontar drm: [ID 994420 kern.warning] WARNING: [drm:ring_dump:890] render ring[0x14a94]: 0x0

...

This hang happens usually in less than 24 hours after the boot, but sometimes the machine survives for several days. The machine is Lenovo T520 with this graphics controller (scanpci):

pci bus 0x0000 cardnum 0x02 function 0x00: vendor 0x8086 device 0x0126
 Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller

IIRC, this started to happen once I updated from older drm binaries (made by Martin Bochnig) to the gfx-drm work available in recent OI Hipster.

messages (695 KB) Marcel Telka, 2017-01-02 10:58 AM

History

#1 Updated by Marcel Telka 8 months ago

Happened again; in about 20.5 hours after the boot.

#3 Updated by Gordon Ross 8 months ago

BTW, the reason this didn't happen with the older DRM port is most likely that you weren't actually using any DRM driver stuff, just plain Xorg. If you have an old BE around, you can check whether Xorg has any /dev/dri/* device nodes open.

As for getting this actually working: I think we need some later driver code from Linux (or one of the derivatives). That's a work in progress...

#4 Updated by Marcel Telka 7 months ago

Gordon Ross wrote:

BTW, the reason this didn't happen with the older DRM port is most likely that you weren't actually using any DRM driver stuff, just plain Xorg. If you have an old BE around, you can check whether Xorg has any /dev/dri/* device nodes open.

I booted a BE from Oct 22nd, 2016 and I found that the Xorg process have the /devices/pci@0,0/display@2:drm0 open which is a symlink from /dev/dri/card1.

BTW, the gfx-drm integration fix in the oi-userland gate (mentioned above) didn't helped.

#5 Updated by Gordon Ross 6 months ago

Can anyone with one of the systems affected by this hang try setting
i915_try_reset = true
(you can use mdb -kw to set it)
and report back with the result? Thanks!

#6 Updated by Marcel Telka 6 months ago

Almost 5 days of running with i915_try_reset = 1 and no hung. Also, the "vblank_mode=0 glxgears" test proved that the workaround makes things far better.

#7 Updated by Gordon Ross 6 months ago

OK, so it appears both gen6 and gen7 need the "try reset" work-around, at least for now.
https://github.com/illumos/gfx-drm/pull/5

#8 Updated by Gordon Ross 6 months ago

  • Status changed from New to Closed
commit 7164db27a8fbf944d9aef7d0a7d321ec51999ef8
Author: Gordon Ross <gordon.w.ross@gmail.com>
Date:   Tue Feb 14 21:32:43 2017 -0500

    7716 drm hang with ring dumps in messages
    Reviewed by: Aurlien Larcher <aurelien.larcher@gmail.com>
    Reviewed by: Richard Lowe <richlowe@richlowe.net>

1       1       usr/src/uts/intel/io/i915/i915_dma.c

Also available in: Atom