Project

General

Profile

Actions

Bug #12017

closed

Assertion failure in kstat_waitq_to_runq from blkdev

Added by Paul Winder almost 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Assertion failure when booting a DEBUG kernel in a bhyve VM (using vioblk):

panic[cpu2]/thread=fffffe001f21cc20: assertion failed: (int)wcnt > 0, file: ../../common/os/kstat_fr.c, line: 1398

fffffe001f21c930 genunix:process_type+188a0d ()
fffffe001f21c970 unix:kstat_waitq_to_runq+a0 ()
fffffe001f21c9d0 blkdev:bd_sched+7e ()
fffffe001f21ca30 blkdev:bd_runq_exit+b0 ()
fffffe001f21ca90 blkdev:bd_xfer_done+14f ()
fffffe001f21cad0 vioblk:vioblk_complete+6c ()
fffffe001f21cb30 vioblk:vioblk_poll+eb ()
fffffe001f21cb60 vioblk:vioblk_int_handler+2f ()
fffffe001f21cbc0 unix:av_dispatch_autovect+83 ()
fffffe001f21cc00 unix:dispatch_hardint+36 ()
fffffe001f1c7a50 unix:switch_sp_and_call+15 ()
fffffe001f1c7ab0 unix:do_interrupt+f5 ()
fffffe001f1c7ac0 unix:_interrupt+1f2 ()
fffffe001f1c7bb0 unix:mach_cpu_idle+b ()
fffffe001f1c7be0 unix:cpu_idle+10f ()
fffffe001f1c7c00 unix:idle+ae ()
fffffe001f1c7c10 unix:thread_start+b ()

kstat_waitq_to_runq+0xa0(fffffe170a042220)
blkdev`bd_sched+0x7e(fffffe17059d1b80, fffffe170b6c2918)
blkdev`bd_runq_exit+0xb0(fffffe171ab42c58, 0)
blkdev`bd_xfer_done+0x14f(fffffe171ab42c58, 0)
vioblk`vioblk_complete+0x6c(fffffe1709ea9380, fffffe170ea6e038)
vioblk`vioblk_poll+0xeb(fffffe1709ea9380)
vioblk`vioblk_int_handler+0x2f(fffffe1709ea9380, fffffe170b8fba28)

[2]> fffffe170a042220::print kstat_io_t wcnt
wcnt = 0xffffffff


Related issues

Related to illumos gate - Feature #11827: Increase concurrency through blkdevClosedPaul Winder

Actions
Actions #1

Updated by Paul Winder almost 2 years ago

  • Related to Feature #11827: Increase concurrency through blkdev added
Actions #2

Updated by Paul Winder almost 2 years ago

#11827 changed some locking and re-order kstat update with respect to moving I/Os from wait->run queues. It looks like this has introduced a race condition when some kstat counters can have a transient negative value, which triggered this ASSERT.

Actions #3

Updated by Paul Winder almost 2 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 50
Actions #4

Updated by Paul Winder almost 2 years ago

Review is: https://code.illumos.org/c/illumos-gate/+/200

Before #11827 the moving of requests to and from wait and run queue was atomic with kstat adjustment. This change re-instates that atomicity

tested the failure described by this ticket. I ran regression tests by running concurrent "dd" commands to the drive with a DEBUG kernel. No panics and iostat output seemed reasonable.

Actions #5

Updated by Electric Monk almost 2 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 50 to 100

git commit 19687f06a4a23ca82910eccd0f898c90ecd3cf62

commit  19687f06a4a23ca82910eccd0f898c90ecd3cf62
Author: Paul Winder <paul@winders.demon.co.uk>
Date:   2019-11-27T14:42:09.000Z

    12017 Assertion failure in kstat_waitq_to_runq from blkdev
    Reviewed by: Igor Kozhukhov <igor@dilos.org>
    Reviewed by: Matthias Scheler <mscheler@tintri.com>
    Reviewed by: Andy Fiddaman <andy@omniosce.org>
    Approved by: Dan McDonald <danmcd@joyent.com>

Actions

Also available in: Atom PDF