Project

General

Profile

Actions

Bug #13736

open

ZFS Panic - Assertion failed during scrub

Added by Nick Hall about 2 months ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Hard
Tags:
Gerrit CR:

Description

I have a server running SmartOS which is panicing during a ZFS scrub with:

panic[cpu3]/thread=fffffe003df7dc20: assertion failed: size <= (1ULL << 24) (0x19b8a00 <= 0x1000000), file: ../../common/fs/zfs/abd.c, line: 312

fffffe003df7d840 fffffffffba43521 ()
fffffe003df7d890 zfs:abd_alloc+170 ()
fffffe003df7d8f0 zfs:arc_get_data_abd+6d ()
fffffe003df7d920 zfs:arc_hdr_alloc_pabd+35 ()
fffffe003df7d9a0 zfs:arc_hdr_alloc+186 ()
fffffe003df7dac0 zfs:arc_read+31f ()
fffffe003df7db50 zfs:dsl_scan_prefetch_thread+1d6 ()
fffffe003df7dc00 genunix:taskq_thread+2cd ()
fffffe003df7dc10 unix_thread_start+b ()

It panics 100% of the time during a certain point in the scrub, when it is around 85% complete.

I imported the pool onto a Linux distribution running ZFS 2.0 and it scrubs without a problem, repeatedly. I do not see any errors under Linux and ZFS 2.0, but if I come back into Illumos then ZFS always gives this assertion during the scrub.

I put the hard drives into another server to eliminate the chance of certain hardware issues, and the same thing happened.

I eventually did a ZFS send | recv of the pool onto another system, as I thought that perhaps there was somehow some strange ZFS metadata corruption that scrub couldn't fix, and I figured the new pool would then work fine.

So the new pool was on a different server with different disk controller, different hard disks, etc. There is no hardware in common between the two, and the pool has been recreated from scratch with zfs send | recv. But this pool has the same issue! The zfs panic assertion now happens during its scrub after it has been filled with the data from zfs recv.

So since I've eliminated any chance of a hardware problem, what remains is that there must be a bug in the ZFS code in Illumos that is triggering on some data in my pool, and it must be data that can be sent over in the zfs send stream. And this bug must not exist in the ZFS 2.0 implementation on Linux, as I've had no problems there.

I needed to get this server up and running so I migrated what I needed to Linux so that I could get it running, however, I still have the original pool and can do some tests with it in Illumos/SmartOS to narrow down the bug if someone can assist with what I can run to debug this further. Thanks.


Files

dsl_scan.txt.gz (1.12 MB) dsl_scan.txt.gz 2nd pipeline output, gzipped. Dan McDonald, 2021-05-01 04:24 AM

Related issues

Related to illumos gate - Bug #13033: assertion failed: size <= (1ULL << 24) (0x1189000 <= 0x1000000), file: ../../common/fs/zfs/abd.c, line: 309New

Actions
Actions #1

Updated by Nick Hall about 2 months ago

  • Related to Bug #13033: assertion failed: size <= (1ULL << 24) (0x1189000 <= 0x1000000), file: ../../common/fs/zfs/abd.c, line: 309 added
Actions #2

Updated by Dan McDonald about 2 months ago

1.) Providing a core dump would be helpful (one was also asked for in #13033)

2.) Please include the SmartOS PI version on the report.

Actions #3

Updated by Nick Hall about 2 months ago

This was SmartOS 20210325T002528Z

Actions #4

Updated by Dan McDonald about 2 months ago

Thank you for confirming it's recent. Happen to have the vmdump.N file handy?

Actions #5

Updated by Nick Hall about 2 months ago

Dan McDonald wrote in #note-4:

Thank you for confirming it's recent. Happen to have the vmdump.N file handy?

Yes I'll send you an email about this.

Actions #6

Updated by Dan McDonald about 2 months ago

I received a vmdump with a DIFFERENT stack, but what I'm going to guess might have the same blktptr in the aforementioned one.

$C

fffffe00218e9740 vpanic()
fffffe00218e97d0 0xfffffffffba43521()
fffffe00218e9800 zio_buf_alloc+0x47(0) /* XXX KEBE SAYS LENGTH OF 0!!! /
fffffe00218e9840 abd_alloc_linear+0x77(0, 1) /
XXX KEBE SAYS LENGTH OF 0!!! /
fffffe00218e9890 abd_alloc+0x12a(0, 1) /
XXX KEBE SAYS LENGTH OF 0!!! /
fffffe00218e98f0 arc_get_data_abd+0x6d(fffffe178b34bc30, 0, =
fffffe178b34bc30) /
XXX KEBE SAYS LENGTH OF 0!!! /
fffffe00218e9920 arc_hdr_alloc_pabd+0x35(fffffe178b34bc30, 0) /
XXX KEBE SAYS LOOK HERE /
fffffe00218e99a0 arc_hdr_alloc+0x106(9f7c4122c82b9011, 0, 0, 0, 31, 2, fffffe1700000000) /
XXX KEBE SAYS OH NO, SIZES OF 0 TOO! */
fffffe00218e9ac0 arc_read+0x31f(fffffe177dbd5ad0, fffffe16e3bbc000, fffffe17061d59e0, fffffffff7d0e100, fffffe1735029e30, 4, =
fffffe16000000a0, fffffe00218e9b04, fffffe17061d5a60)
fffffe00218e9b50 dsl_scan_prefetch_thread+0x1d6(fffffe16e37ce900)
fffffe00218e9c00 taskq_thread+0x2cd(fffffe16e6459958)
fffffe00218e9c10 thread_start+0xb()

...the arc is getting sizes of 0 from the block pointer in arc_read():

fffffe17061d59e0::print -at blkptr_t

fffffe17061d59e0 blkptr_t {
fffffe17061d59e0 dva_t [3] blk_dva [
fffffe17061d59e0 dva_t {
fffffe17061d59e0 uint64_t [2] dva_word [ 0x99cd2afa800d52b3, 0x4b1d0699bd57d987 ]
},
fffffe17061d59f0 dva_t {
fffffe17061d59f0 uint64_t [2] dva_word [ 0x38cb48404a39bceb, 0x536e9dc7725610a5 ]
},
fffffe17061d5a00 dva_t {
fffffe17061d5a00 uint64_t [2] dva_word [ 0xeec9883556a04ed3, 0x7ffb0b1c1675c3ce ]
},
]
fffffe17061d5a10 uint64_t blk_prop 0x6e5251b161084baf
fffffe17061d5a18 uint64_t [2] blk_pad [ 0x951227bc15bd6ab0, 0x95e62f4d1569c0e ]
fffffe17061d5a28 uint64_t blk_phys_birth 0xfd71a304dd811644
fffffe17061d5a30 uint64_t blk_birth 0x5f5e37f25489e6e3
fffffe17061d5a38 uint64_t blk_fill 0xa4510e31f14f832e
fffffe17061d5a40 zio_cksum_t blk_cksum {
fffffe17061d5a40 uint64_t [4] zc_word =3D [ 0xd1e37409aeef358c, 0x335ad1a77d5fe3ea, 0x69652c8fe057fabe, 0xfd3fde8977c08336 ]
}
}

ZFS experts might want to decode the blk_prop to see what's wrong.

Actions #7

Updated by Jason King about 2 months ago

Does ::blkptr show anything more interesting?

Actions #8

Updated by Dan McDonald about 2 months ago

> fffffe17061d59e0::blkptr
EMBEDDED [L14 ? (82)] et=81 ? (49) size=1084bb0L/31P birth=6871991595353892579L
> 
Actions #9

Updated by Jason King about 2 months ago

That seems... wrong....

What's interesting is that OpenZFS does have this bit of code a bit ahead of the code that would call arc_hdr_alloc

                /*
                 * Gracefully handle a damaged logical block size as a
                 * checksum error.
                 */
                if (lsize > spa_maxblocksize(spa)) {
                        rc = SET_ERROR(ECKSUM);
                        if (hash_lock != NULL)
                                mutex_exit(hash_lock);
                        goto out;
                }

Though what's confusing me though is that if this was getting triggered on OpenZFS, I would expect it to show up in a scrub, which it's not. Unfortunately each of those lines in OpenZFS seems to have a separate commit, so trying to tie it back to a specific change is challenging -- part of it was part of the compressed arc stuff George Wilson did, so I'm not sure.

Actions #10

Updated by Igor Kozhukhov about 2 months ago

could you try to test this iso:
http://apt2.dilos.org/dilos/isos/dilos-core-21-05-01_01-08-2.0.2.85.iso

no need install it, just load, import pool and try to run scrub and let mw know result.
i have ported many oepnzfs updates to DilOS

Actions #11

Updated by Jason King about 2 months ago

Something that might be interesting:

fffffe17061d59c0::print scan_prefetch_issue_ctx_t

(Addr based on ::offsetof scan_prefetch_issue_ctx_t spic_bp since the bp address passed to arc_read() in dsl_scan_prefetch_thread appears to come from that.

Something else that might be interesting as well:

fffffe16e37ce900::print dsl_scan_t scn_prefetch_queue | ::walk avl | ::print scan_prefetch_issue_ctx_t spic_bp | ::blkptr

and see how sane those blockptrs look.. that should be any other block pointers queued for prefetch, but not read yet.

Actions #12

Updated by Dan McDonald about 2 months ago

First one:

> fffffe17061d59c0::print scan_prefetch_issue_ctx_t
{
    spic_avl_node = {
        avl_child = [ 0, 0 ]
        avl_pcb = 0xfffffe17061d7c41
    }
    spic_spc = 0xfffffe1735029e30
    spic_bp = {
        blk_dva = [
            {
                dva_word = [ 0x99cd2afa800d52b3, 0x4b1d0699bd57d987 ]
            },
            {
                dva_word = [ 0x38cb48404a39bceb, 0x536e9dc7725610a5 ]
            },
            {
                dva_word = [ 0xeec9883556a04ed3, 0x7ffb0b1c1675c3ce ]
            },
        ]
        blk_prop = 0x6e5251b161084baf
        blk_pad = [ 0x951227bc15bd6ab0, 0x95e62f4d1569c0e ]
        blk_phys_birth = 0xfd71a304dd811644
        blk_birth = 0x5f5e37f25489e6e3
        blk_fill = 0xa4510e31f14f832e
        blk_cksum = {                 
            zc_word = [ 0xd1e37409aeef358c, 0x335ad1a77d5fe3ea, 0x69652c8fe057fabe, 0xfd3fde8977c08336 ]
        }
    }
    spic_zb = {
        zb_objset = 0xe2b2
        zb_object = 0x1b29c
        zb_level = 0
        zb_blkid = 0xfffffffffffffffe
    }
}

Second one is attached as a .gz file because it's 4.7MB uncompressed.

Actions #13

Updated by Dan McDonald about 1 month ago

I was provided another dump, with a completely different panic-stack, and a completely different iffy-block-pointer:

> $C
fffffe004061e7a0 vpanic()
fffffe004061e7f0 vcmn_err+0x42(3, fffffffff7e8f9d0, fffffe004061e800)
fffffe004061e860 zfs_panic_recover+0x65(fffffffff7e8f9d0)
fffffe004061e8c0 zfs_blkptr_verify+0x122(fffffe2cff9c2000, fffffe2db7781da0)
fffffe004061e9a0 zio_read+0x5a(fffffe2d0f5ade88, fffffe2cff9c2000, fffffe2db7781da0, fffffe2de2a97000, 696000, fffffffff7d466b0, fffffe2e44d2bd28, ffffffff00000004, fffffe00008000a0, fffffe2db7781e20)
fffffe004061eac0 arc_read+0x820(fffffe2d0f5ade88, fffffe2cff9c2000, fffffe2db7781da0, fffffffff7d9a100, fffffe2dba8f5338, 4, fffffe2d008000a0, fffffe004061eb04, fffffe2db7781e20)
fffffe004061eb50 dsl_scan_prefetch_thread+0x1d6(fffffe2cffa52c40)
fffffe004061ec00 taskq_thread+0x2cd(fffffe2d08a40298)
fffffe004061ec10 thread_start+0xb()
> ::status
debugging crash dump vmcore.10 (64-bit) from nsvs.selltothem.com
operating system: 5.11 joyent_20210325T002528Z (i86pc)
git branch: release-20210325
git rev: 0eb9f0cca48b7b34380838b2e9020d2d44e7a27c
image uuid: (not set)
panic message: blkptr at fffffe2db7781da0 has invalid TYPE 206
dump content: kernel pages only
> fffffe2db7781da0::blkptr
DVA[0]=<11892525:4a44f886fce44a00:f60d1200>
DVA[1]=<5764843:62b0a2640b5e2800:14e020200>
DVA[2]=<3500807:34b3be6c05da9e00:18b16b600>
[L8 ? (206)] ? (19) ? (24) unencrypted BE gang dedup triple
size=1eebe00L/696000P birth=8719749725896471672L/3501575714749938660P fill=12317951490937222069
cksum=d7640c4088c7c5bc:afd4985656e8c109:7da33b6d8b6da3f4:42e888f721c91c2f
> fffffe2db7781da0::print blkptr_t
{
    blk_dva = [
        {
            dva_word = [ 0x7eb5772d427b0689, 0xb2a5227c437e7225 ]
        },
        {
            dva_word = [ 0x3d57f6ebc0a70101, 0xdf3158513205af14 ]
        },
        {
            dva_word = [ 0xa6356b07a8c58b5b, 0x1c9a59df3602ed4f ]
        },
    ]
    blk_prop = 0x48ce131834aff75e
    blk_pad = [ 0xe38a77efe92a78f6, 0x55163f6729539b1e ]
    blk_phys_birth = 0x3098188f77cdfbe4
    blk_birth = 0x7902c62d2e5e7478
    blk_fill = 0xaaf2279e21ce73b5
    blk_cksum = {
        zc_word = [ 0xd7640c4088c7c5bc, 0xafd4985656e8c109, 0x7da33b6d8b6da3f4, 0x42e888f721c91c2f ]
    }
}
> 

Dumb question for the filer: Do you have ECC RAM on this machine?

Dumb question for ZFS experts: Do the errors in the blkptr correspond to single-bit errors, in case the answer to the prior question is, "No ECC RAM"?

Actions #14

Updated by Nick Hall about 1 month ago

Dan McDonald wrote in #note-13:

I was provided another dump, with a completely different panic-stack, and a completely different iffy-block-pointer:

[...]

Dumb question for the filer: Do you have ECC RAM on this machine?

Yes, this server has ECC RAM and I have never seen it detect a memory error. I also ran a memory diagnostic after this first happened as I also was wondering, but no problems were detected.

Actions #15

Updated by Nick Hall about 1 month ago

I will also note, in case this was missed in my original post, that I can do a zfs send | recv from this pool into a brand new pool on a different computer, and then that computer will panic in SmartOS while doing a zpool scrub on the newly received pool data. So the panic can't be caused by a transient issue but there must be something in a zfs structure that survives zfs send which is causing the panic. Of course the initial corruption could have been a transient issue, I'm not sure, but that would be weird to be a bit flip as the ECC RAM should be able to correct it, or detect it in the case of a rare double bit-flip I believe?

Actions #16

Updated by Jason King about 1 month ago

It's still the prefetch thread though.

Actions #17

Updated by Jason King about 1 month ago

It looks like that blkptr_t as well as the next blkptr_t in the prefetch queue look bad.. the others seem reasonable (looking at the DVAs, the first value is the vdev # IIRC -- so low numbers -- 0, 1, 2, etc would be expected).

I can maybe a bit later try to conjure up some mdb to try to tie those blocks back to datasets.

Actions #18

Updated by Jason King about 1 month ago

Just as another bit of data -- the blkid in the zbookmark for that bad blkptr is DMU_SPILL_BLKID.

Actions #19

Updated by Jason King about 1 month ago

On that second dump, fffffe2db7781d90::print scan_prefetch_issue_ctx_t -- it'd be interesting to see if it's also a spill ptr or not.

Actions #20

Updated by Dan McDonald about 1 month ago

(Corrected version)

> fffffe2db7781d80::print scan_prefetch_issue_ctx_t
{
    spic_avl_node = {
        avl_child = [ 0, 0 ]
        avl_pcb = 0xfffffe2db7893801
    }
    spic_spc = 0xfffffe2dba8f5338
    spic_bp = {
        blk_dva = [
            {
                dva_word = [ 0x7eb5772d427b0689, 0xb2a5227c437e7225 ]
            },
            {
                dva_word = [ 0x3d57f6ebc0a70101, 0xdf3158513205af14 ]
            },
            {
                dva_word = [ 0xa6356b07a8c58b5b, 0x1c9a59df3602ed4f ]
            },
        ]
        blk_prop = 0x48ce131834aff75e
        blk_pad = [ 0xe38a77efe92a78f6, 0x55163f6729539b1e ]
        blk_phys_birth = 0x3098188f77cdfbe4
        blk_birth = 0x7902c62d2e5e7478
        blk_fill = 0xaaf2279e21ce73b5
        blk_cksum = {                 
            zc_word = [ 0xd7640c4088c7c5bc, 0xafd4985656e8c109, 0x7da33b6d8b6da3f4, 0x42e888f721c91c2f ]
        }
    }
    spic_zb = {
        zb_objset = 0xe2b2
        zb_object = 0x1bf86
        zb_level = 0
        zb_blkid = 0xfffffffffffffffe
    }
}
> 

Actions #21

Updated by Jason King about 1 month ago

Oops... my hex math was off.. fffffe2db7781d80::print scan_prefetch_issue_ctx_t

Actions #22

Updated by Dan McDonald about 1 month ago

Corrected the above one per your request.

Actions

Also available in: Atom PDF