Project

General

Profile

Bug #9403

assertion failed in arc_buf_destroy() when concurrently reading block with checksum error

Added by Brad Lewis over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
2018-03-28
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

This assertion (VERIFY) failure was reported when reading a block. Turns out the problem is that if we get an i/o error (ECKSUM in this case), and there are multiple concurrent ARC reads of the same block (from different clones), then the ARC will put multiple buf's on the same ANON hdr, which isn't supposed to happen, and then causes a panic when we try to arc_buf_destroy() the buf.

Jan 12 18:39:29 TFGELSVMLXDEL01 ^Mpanic[cpu12]/thread=fffffe4229738c40:
Jan 12 18:39:29 TFGELSVMLXDEL01 genunix: [ID 603766 kern.notice] assertion failed: remove_reference(hdr, 0L, tag) 0 (0x2 0x0), file: ../../common/fs/zfs/arc.c, line: 3125
Jan 12 18:39:29 TFGELSVMLXDEL01 unix: [ID 100000 kern.notice]
Jan 12 18:39:29 TFGELSVMLXDEL01 genunix: [ID 655072 kern.notice] fffffe42297388b0 genunix:strlog+0 ()
Jan 12 18:39:29 TFGELSVMLXDEL01 genunix: [ID 655072 kern.notice] fffffe42297388f0 zfs:arc_buf_destroy+d7 ()
Jan 12 18:39:29 TFGELSVMLXDEL01 genunix: [ID 655072 kern.notice] fffffe4229738940 zfs:dbuf_read_done+93 ()
Jan 12 18:39:29 TFGELSVMLXDEL01 genunix: [ID 655072 kern.notice] fffffe42297389a0 zfs:arc_read_done+198 ()
Jan 12 18:39:29 TFGELSVMLXDEL01 genunix: [ID 655072 kern.notice] fffffe4229738a40 zfs:zio_done+3f2 ()
Jan 12 18:39:29 TFGELSVMLXDEL01 genunix: [ID 655072 kern.notice] fffffe4229738a70 zfs:zio_execute+7f ()
Jan 12 18:39:29 TFGELSVMLXDEL01 genunix: [ID 655072 kern.notice] fffffe4229738b30 genunix:taskq_thread+2d0 ()
Jan 12 18:39:29 TFGELSVMLXDEL01 genunix: [ID 655072 kern.notice] fffffe4229738b40 unix:thread_start+8 ()

fffffe422a5b2c40::findstack -v

stack pointer for thread fffffe422a5b2c40: fffffe422a5b24f0
[ fffffe422a5b24f0 _resume_from_idle+0xf4() ]
fffffe422a5b2520 swtch+0x141()
fffffe422a5b2560 cv_wait+0x70(fffffea8a7bc13e0, fffffea8a7bc13d8)
fffffe422a5b25a0 zio_wait+0x6b(fffffea8a7bc1068)
fffffe422a5b2650 dmu_buf_hold_array_by_dnode+0x163(fffffeaac781a780, 853d5dc0000, 2000, 1, fffffffff7ac7650, fffffe422a5b269c, fffffe422a5b2690, 0)
fffffe422a5b26f0 dmu_read_uio_dnode+0x5a(fffffeaac781a780, fffffe422a5b2940, 2000)
fffffe422a5b2750 dmu_read_uio+0x5b(fffffea4cd1f0ac0, 1, fffffe422a5b2940, 2000)
fffffe422a5b27c0 zvol_read+0x10a(1130000016e, fffffe422a5b2940, fffffea323cacdb0)
fffffe422a5b27f0 cdev_read+0x2d(1130000016e, fffffe422a5b2940, fffffea323cacdb0)
fffffe422a5b2890 spec_read+0x2b9(fffffea4d53a6580, fffffe422a5b2940, 0, fffffea323cacdb0, 0)
fffffe422a5b2910 fop_read+0x5b(fffffea4d53a6580, fffffe422a5b2940, 0, fffffea323cacdb0, 0)
fffffe422a5b29f0 vn_rdwr+0x10a(0, fffffea4d53a6580, fffffea4cccb6000, 2000, 853d5dc0000, 1, 0, fffffffffffffffd, fffffea323cacdb0, fffffe422a5b2a48)
fffffe422a5b2ab0 sbd_data_read+0x153(fffffeab1ea9d998, fffffea3ca9fc800, 853d5dc0000, 2000, fffffea4cccb6000)
fffffe422a5b2b20 sbd_handle_read+0x1ce(fffffea3ca9fc800, 0)
fffffe422a5b2b80 sbd_new_task+0x60b(fffffea3ca9fc800, 0)
fffffe422a5b2c20 stmf_worker_task+0x2fd(fffffea43f5d80f0)
fffffe422a5b2c30 thread_start+8()

fffffe422ce63c40::findstack -v

stack pointer for thread fffffe422ce63c40: fffffe422ce634f0
[ fffffe422ce634f0 _resume_from_idle+0xf4() ]
fffffe422ce63520 swtch+0x141()
fffffe422ce63560 cv_wait+0x70(fffffea8ad269f00, fffffea8ad269ef8)
fffffe422ce635a0 zio_wait+0x6b(fffffea8ad269b88)
fffffe422ce63650 dmu_buf_hold_array_by_dnode+0x163(fffffeaac666ac50, 853d5dc0000, 2000, 1, fffffffff7ac7650, fffffe422ce6369c, fffffe422ce63690, 0)
fffffe422ce636f0 dmu_read_uio_dnode+0x5a(fffffeaac666ac50, fffffe422ce63940, 2000)
fffffe422ce63750 dmu_read_uio+0x5b(fffffea4cd07ae00, 1, fffffe422ce63940, 2000)
fffffe422ce637c0 zvol_read+0x10a(11300000162, fffffe422ce63940, fffffea323cacdb0)
fffffe422ce637f0 cdev_read+0x2d(11300000162, fffffe422ce63940, fffffea323cacdb0)
fffffe422ce63890 spec_read+0x2b9(fffffea3616f9e80, fffffe422ce63940, 0, fffffea323cacdb0, 0)
fffffe422ce63910 fop_read+0x5b(fffffea3616f9e80, fffffe422ce63940, 0, fffffea323cacdb0, 0)
fffffe422ce639f0 vn_rdwr+0x10a(0, fffffea3616f9e80, fffffea4c9c9d000, 2000, 853d5dc0000, 1, 0, fffffffffffffffd, fffffea323cacdb0, fffffe422ce63a48)
fffffe422ce63ab0 sbd_data_read+0x153(fffffea4c1af5998, fffffea52401f000, 853d5dc0000, 2000, fffffea4c9c9d000)
fffffe422ce63b20 sbd_handle_read+0x1ce(fffffea52401f000, 0)
fffffe422ce63b80 sbd_new_task+0x60b(fffffea52401f000, 0)
fffffe422ce63c20 stmf_worker_task+0x2fd(fffffea43f5d8230)
fffffe422ce63c30 thread_start+8()
Note that both threads are reading the same offset (0x853d5dc0000) of different dnodes (both object #1 of different ZVOLs).

The typical code path is that the 2nd arc_read() of this BP will go through:

if (hdr != NULL && HDR_HAS_L1HDR(hdr) && hdr->b_l1hdr.b_pabd != NULL) {
...
if (HDR_IO_IN_PROGRESS(hdr)) {
...
if (done) {
... add new acb to existing hdr
hdr->b_l1hdr.b_acb = acb;
and then when the i/o completes, arc_read_done() will do:
for (acb = callback_list; acb != NULL; acb = acb->acb_next) {
...
int error = arc_buf_alloc_impl(hdr, acb->acb_private,
acb->acb_compressed, no_zio_error, &acb->acb_buf);
...
if (no_zio_error) {
...
} else {
...
arc_change_state(arc_anon, hdr, hash_lock);
...
while ((acb = callback_list) != NULL) {
if (acb->acb_done)
acb->acb_done(zio, acb->acb_buf, acb->acb_private);
The acb_done() is dbuf_read_done() which calls arc_buf_destroy() if there was an i/o error.

There is also a similar code path in arc_read() that should also be able to hit this, but is a narrower race. In arc_read_done(), we do

arc_hdr_clear_flags(hdr, ARC_FLAG_IO_IN_PROGRESS);
...
cv_broadcast(&hdr->b_l1hdr.b_cv);
if (hash_lock != NULL) {
mutex_exit(hash_lock);
So a concurrent arc_read() can see the hdr as not IO_IN_PROGRESS, but the hdr is still ANON. arc_read() can do:
if (hdr != NULL && HDR_HAS_L1HDR(hdr) && hdr->b_l1hdr.b_pabd != NULL) {
...
if (HDR_IO_IN_PROGRESS(hdr)) { - FALSE
...
if (done) {
...
VERIFY0(arc_buf_alloc_impl(hdr, private,
compressed_read, B_TRUE, &buf));
And then arc_read_done() calls acb_done() which is dbuf_read_done() which calls arc_buf_destroy() if there was an i/o error - but now there are 2 bufs and it would panic.

Possible ways to address this:

1. pass another flag to `arc_buf_alloc_impl` to indicate if there’s an error or not

2. have arc_read / arc_read_done allocate a whole new anon hdr if there was an error

3. change the interface to not pass a buf to the callback (to the DMU) when there’s an i/o error

History

#1

Updated by Marcel Telka over 1 year ago

  • Subject changed from assertion failed in arc_buf_destroy() when concurrently reading block with checksum error Edit© CommentAssign MoreI Hit This!VerifyBack to Evaluated to assertion failed in arc_buf_destroy() when concurrently reading block with checksum error
  • Category set to zfs - Zettabyte File System
#2

Updated by Electric Monk over 1 year ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit fa98e487a9619b7902f218663be219e787a57dad

commit  fa98e487a9619b7902f218663be219e787a57dad
Author: Matthew Ahrens <mahrens@delphix.com>
Date:   2018-05-07T16:26:45.000Z

    9403 assertion failed in arc_buf_destroy() when concurrently reading block with checksum error
    Reviewed by: George Wilson <george.wilson@delphix.com>
    Reviewed by: Paul Dagnelie <pcd@delphix.com>
    Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
    Approved by: Matt Ahrens <mahrens@delphix.com>

Also available in: Atom PDF