Bug #12017
closedAssertion failure in kstat_waitq_to_runq from blkdev
100%
Description
Assertion failure when booting a DEBUG kernel in a bhyve VM (using vioblk):
panic[cpu2]/thread=fffffe001f21cc20: assertion failed: (int)wcnt > 0, file: ../../common/os/kstat_fr.c, line: 1398 fffffe001f21c930 genunix:process_type+188a0d () fffffe001f21c970 unix:kstat_waitq_to_runq+a0 () fffffe001f21c9d0 blkdev:bd_sched+7e () fffffe001f21ca30 blkdev:bd_runq_exit+b0 () fffffe001f21ca90 blkdev:bd_xfer_done+14f () fffffe001f21cad0 vioblk:vioblk_complete+6c () fffffe001f21cb30 vioblk:vioblk_poll+eb () fffffe001f21cb60 vioblk:vioblk_int_handler+2f () fffffe001f21cbc0 unix:av_dispatch_autovect+83 () fffffe001f21cc00 unix:dispatch_hardint+36 () fffffe001f1c7a50 unix:switch_sp_and_call+15 () fffffe001f1c7ab0 unix:do_interrupt+f5 () fffffe001f1c7ac0 unix:_interrupt+1f2 () fffffe001f1c7bb0 unix:mach_cpu_idle+b () fffffe001f1c7be0 unix:cpu_idle+10f () fffffe001f1c7c00 unix:idle+ae () fffffe001f1c7c10 unix:thread_start+b () kstat_waitq_to_runq+0xa0(fffffe170a042220) blkdev`bd_sched+0x7e(fffffe17059d1b80, fffffe170b6c2918) blkdev`bd_runq_exit+0xb0(fffffe171ab42c58, 0) blkdev`bd_xfer_done+0x14f(fffffe171ab42c58, 0) vioblk`vioblk_complete+0x6c(fffffe1709ea9380, fffffe170ea6e038) vioblk`vioblk_poll+0xeb(fffffe1709ea9380) vioblk`vioblk_int_handler+0x2f(fffffe1709ea9380, fffffe170b8fba28) [2]> fffffe170a042220::print kstat_io_t wcnt wcnt = 0xffffffff
Related issues
Updated by Paul Winder over 3 years ago
- Related to Feature #11827: Increase concurrency through blkdev added
Updated by Paul Winder over 3 years ago
#11827 changed some locking and re-order kstat update with respect to moving I/Os from wait->run queues. It looks like this has introduced a race condition when some kstat counters can have a transient negative value, which triggered this ASSERT.
Updated by Paul Winder over 3 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 50
Updated by Paul Winder over 3 years ago
Review is: https://code.illumos.org/c/illumos-gate/+/200
Before #11827 the moving of requests to and from wait and run queue was atomic with kstat adjustment. This change re-instates that atomicity
andy@omniosce.org tested the failure described by this ticket. I ran regression tests by running concurrent "dd" commands to the drive with a DEBUG kernel. No panics and iostat output seemed reasonable.
Updated by Electric Monk over 3 years ago
- Status changed from In Progress to Closed
- % Done changed from 50 to 100
git commit 19687f06a4a23ca82910eccd0f898c90ecd3cf62
commit 19687f06a4a23ca82910eccd0f898c90ecd3cf62 Author: Paul Winder <paul@winders.demon.co.uk> Date: 2019-11-27T14:42:09.000Z 12017 Assertion failure in kstat_waitq_to_runq from blkdev Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed by: Matthias Scheler <mscheler@tintri.com> Reviewed by: Andy Fiddaman <andy@omniosce.org> Approved by: Dan McDonald <danmcd@joyent.com>