Bug #13736
openZFS Panic - Assertion failed during scrub
0%
Description
I have a server running SmartOS which is panicing during a ZFS scrub with:
panic[cpu3]/thread=fffffe003df7dc20: assertion failed: size <= (1ULL << 24) (0x19b8a00 <= 0x1000000), file: ../../common/fs/zfs/abd.c, line: 312 fffffe003df7d840 fffffffffba43521 () fffffe003df7d890 zfs:abd_alloc+170 () fffffe003df7d8f0 zfs:arc_get_data_abd+6d () fffffe003df7d920 zfs:arc_hdr_alloc_pabd+35 () fffffe003df7d9a0 zfs:arc_hdr_alloc+186 () fffffe003df7dac0 zfs:arc_read+31f () fffffe003df7db50 zfs:dsl_scan_prefetch_thread+1d6 () fffffe003df7dc00 genunix:taskq_thread+2cd () fffffe003df7dc10 unix_thread_start+b ()
It panics 100% of the time during a certain point in the scrub, when it is around 85% complete.
I imported the pool onto a Linux distribution running ZFS 2.0 and it scrubs without a problem, repeatedly. I do not see any errors under Linux and ZFS 2.0, but if I come back into Illumos then ZFS always gives this assertion during the scrub.
I put the hard drives into another server to eliminate the chance of certain hardware issues, and the same thing happened.
I eventually did a ZFS send | recv of the pool onto another system, as I thought that perhaps there was somehow some strange ZFS metadata corruption that scrub couldn't fix, and I figured the new pool would then work fine.
So the new pool was on a different server with different disk controller, different hard disks, etc. There is no hardware in common between the two, and the pool has been recreated from scratch with zfs send | recv. But this pool has the same issue! The zfs panic assertion now happens during its scrub after it has been filled with the data from zfs recv.
So since I've eliminated any chance of a hardware problem, what remains is that there must be a bug in the ZFS code in Illumos that is triggering on some data in my pool, and it must be data that can be sent over in the zfs send stream. And this bug must not exist in the ZFS 2.0 implementation on Linux, as I've had no problems there.
I needed to get this server up and running so I migrated what I needed to Linux so that I could get it running, however, I still have the original pool and can do some tests with it in Illumos/SmartOS to narrow down the bug if someone can assist with what I can run to debug this further. Thanks.
Files
Related issues
Updated by Nick Hall about 2 years ago
- Related to Bug #13033: assertion failed: size <= (1ULL << 24) (0x1189000 <= 0x1000000), file: ../../common/fs/zfs/abd.c, line: 309 added
Updated by Dan McDonald about 2 years ago
1.) Providing a core dump would be helpful (one was also asked for in #13033)
2.) Please include the SmartOS PI version on the report.
Updated by Dan McDonald about 2 years ago
Thank you for confirming it's recent. Happen to have the vmdump.N file handy?
Updated by Nick Hall about 2 years ago
Dan McDonald wrote in #note-4:
Thank you for confirming it's recent. Happen to have the vmdump.N file handy?
Yes I'll send you an email about this.
Updated by Dan McDonald about 2 years ago
I received a vmdump with a DIFFERENT stack, but what I'm going to guess might have the same blktptr in the aforementioned one.
$C
fffffe00218e9740 vpanic()
fffffe00218e97d0 0xfffffffffba43521()
fffffe00218e9800 zio_buf_alloc+0x47(0) /* XXX KEBE SAYS LENGTH OF 0!!! /
fffffe00218e9840 abd_alloc_linear+0x77(0, 1) / XXX KEBE SAYS LENGTH OF 0!!! /
fffffe00218e9890 abd_alloc+0x12a(0, 1) / XXX KEBE SAYS LENGTH OF 0!!! /
fffffe00218e98f0 arc_get_data_abd+0x6d(fffffe178b34bc30, 0, =
fffffe178b34bc30) / XXX KEBE SAYS LENGTH OF 0!!! /
fffffe00218e9920 arc_hdr_alloc_pabd+0x35(fffffe178b34bc30, 0) / XXX KEBE SAYS LOOK HERE /
fffffe00218e99a0 arc_hdr_alloc+0x106(9f7c4122c82b9011, 0, 0, 0, 31, 2, fffffe1700000000) / XXX KEBE SAYS OH NO, SIZES OF 0 TOO! */
fffffe00218e9ac0 arc_read+0x31f(fffffe177dbd5ad0, fffffe16e3bbc000, fffffe17061d59e0, fffffffff7d0e100, fffffe1735029e30, 4, =
fffffe16000000a0, fffffe00218e9b04, fffffe17061d5a60)
fffffe00218e9b50 dsl_scan_prefetch_thread+0x1d6(fffffe16e37ce900)
fffffe00218e9c00 taskq_thread+0x2cd(fffffe16e6459958)
fffffe00218e9c10 thread_start+0xb()
...the arc is getting sizes of 0 from the block pointer in arc_read():
fffffe17061d59e0::print -at blkptr_t
fffffe17061d59e0 blkptr_t {
fffffe17061d59e0 dva_t [3] blk_dva [
fffffe17061d59e0 dva_t {
fffffe17061d59e0 uint64_t [2] dva_word [ 0x99cd2afa800d52b3, 0x4b1d0699bd57d987 ]
},
fffffe17061d59f0 dva_t {
fffffe17061d59f0 uint64_t [2] dva_word [ 0x38cb48404a39bceb, 0x536e9dc7725610a5 ]
},
fffffe17061d5a00 dva_t {
fffffe17061d5a00 uint64_t [2] dva_word [ 0xeec9883556a04ed3, 0x7ffb0b1c1675c3ce ]
},
]
fffffe17061d5a10 uint64_t blk_prop 0x6e5251b161084baf
fffffe17061d5a18 uint64_t [2] blk_pad [ 0x951227bc15bd6ab0, 0x95e62f4d1569c0e ]
fffffe17061d5a28 uint64_t blk_phys_birth 0xfd71a304dd811644
fffffe17061d5a30 uint64_t blk_birth 0x5f5e37f25489e6e3
fffffe17061d5a38 uint64_t blk_fill 0xa4510e31f14f832e
fffffe17061d5a40 zio_cksum_t blk_cksum {
fffffe17061d5a40 uint64_t [4] zc_word =3D [ 0xd1e37409aeef358c, 0x335ad1a77d5fe3ea, 0x69652c8fe057fabe, 0xfd3fde8977c08336 ]
}
}
ZFS experts might want to decode the blk_prop to see what's wrong.
Updated by Jason King about 2 years ago
Does ::blkptr
show anything more interesting?
Updated by Dan McDonald about 2 years ago
> fffffe17061d59e0::blkptr EMBEDDED [L14 ? (82)] et=81 ? (49) size=1084bb0L/31P birth=6871991595353892579L >
Updated by Jason King about 2 years ago
That seems... wrong....
What's interesting is that OpenZFS does have this bit of code a bit ahead of the code that would call arc_hdr_alloc
/*
* Gracefully handle a damaged logical block size as a
* checksum error.
*/
if (lsize > spa_maxblocksize(spa)) {
rc = SET_ERROR(ECKSUM);
if (hash_lock != NULL)
mutex_exit(hash_lock);
goto out;
}
Though what's confusing me though is that if this was getting triggered on OpenZFS, I would expect it to show up in a scrub, which it's not. Unfortunately each of those lines in OpenZFS seems to have a separate commit, so trying to tie it back to a specific change is challenging -- part of it was part of the compressed arc stuff George Wilson did, so I'm not sure.
Updated by Igor Kozhukhov about 2 years ago
could you try to test this iso:
http://apt2.dilos.org/dilos/isos/dilos-core-21-05-01_01-08-2.0.2.85.iso
no need install it, just load, import pool and try to run scrub and let mw know result.
i have ported many oepnzfs updates to DilOS
Updated by Jason King about 2 years ago
Something that might be interesting:
fffffe17061d59c0::print scan_prefetch_issue_ctx_t
(Addr based on ::offsetof scan_prefetch_issue_ctx_t spic_bp
since the bp address passed to arc_read()
in dsl_scan_prefetch_thread
appears to come from that.
Something else that might be interesting as well:
fffffe16e37ce900::print dsl_scan_t scn_prefetch_queue | ::walk avl | ::print scan_prefetch_issue_ctx_t spic_bp | ::blkptr
and see how sane those blockptrs look.. that should be any other block pointers queued for prefetch, but not read yet.
Updated by Dan McDonald about 2 years ago
- File dsl_scan.txt.gz dsl_scan.txt.gz added
First one:
> fffffe17061d59c0::print scan_prefetch_issue_ctx_t { spic_avl_node = { avl_child = [ 0, 0 ] avl_pcb = 0xfffffe17061d7c41 } spic_spc = 0xfffffe1735029e30 spic_bp = { blk_dva = [ { dva_word = [ 0x99cd2afa800d52b3, 0x4b1d0699bd57d987 ] }, { dva_word = [ 0x38cb48404a39bceb, 0x536e9dc7725610a5 ] }, { dva_word = [ 0xeec9883556a04ed3, 0x7ffb0b1c1675c3ce ] }, ] blk_prop = 0x6e5251b161084baf blk_pad = [ 0x951227bc15bd6ab0, 0x95e62f4d1569c0e ] blk_phys_birth = 0xfd71a304dd811644 blk_birth = 0x5f5e37f25489e6e3 blk_fill = 0xa4510e31f14f832e blk_cksum = { zc_word = [ 0xd1e37409aeef358c, 0x335ad1a77d5fe3ea, 0x69652c8fe057fabe, 0xfd3fde8977c08336 ] } } spic_zb = { zb_objset = 0xe2b2 zb_object = 0x1b29c zb_level = 0 zb_blkid = 0xfffffffffffffffe } }
Second one is attached as a .gz file because it's 4.7MB uncompressed.
Updated by Dan McDonald about 2 years ago
I was provided another dump, with a completely different panic-stack, and a completely different iffy-block-pointer:
> $C fffffe004061e7a0 vpanic() fffffe004061e7f0 vcmn_err+0x42(3, fffffffff7e8f9d0, fffffe004061e800) fffffe004061e860 zfs_panic_recover+0x65(fffffffff7e8f9d0) fffffe004061e8c0 zfs_blkptr_verify+0x122(fffffe2cff9c2000, fffffe2db7781da0) fffffe004061e9a0 zio_read+0x5a(fffffe2d0f5ade88, fffffe2cff9c2000, fffffe2db7781da0, fffffe2de2a97000, 696000, fffffffff7d466b0, fffffe2e44d2bd28, ffffffff00000004, fffffe00008000a0, fffffe2db7781e20) fffffe004061eac0 arc_read+0x820(fffffe2d0f5ade88, fffffe2cff9c2000, fffffe2db7781da0, fffffffff7d9a100, fffffe2dba8f5338, 4, fffffe2d008000a0, fffffe004061eb04, fffffe2db7781e20) fffffe004061eb50 dsl_scan_prefetch_thread+0x1d6(fffffe2cffa52c40) fffffe004061ec00 taskq_thread+0x2cd(fffffe2d08a40298) fffffe004061ec10 thread_start+0xb() > ::status debugging crash dump vmcore.10 (64-bit) from nsvs.selltothem.com operating system: 5.11 joyent_20210325T002528Z (i86pc) git branch: release-20210325 git rev: 0eb9f0cca48b7b34380838b2e9020d2d44e7a27c image uuid: (not set) panic message: blkptr at fffffe2db7781da0 has invalid TYPE 206 dump content: kernel pages only > fffffe2db7781da0::blkptr DVA[0]=<11892525:4a44f886fce44a00:f60d1200> DVA[1]=<5764843:62b0a2640b5e2800:14e020200> DVA[2]=<3500807:34b3be6c05da9e00:18b16b600> [L8 ? (206)] ? (19) ? (24) unencrypted BE gang dedup triple size=1eebe00L/696000P birth=8719749725896471672L/3501575714749938660P fill=12317951490937222069 cksum=d7640c4088c7c5bc:afd4985656e8c109:7da33b6d8b6da3f4:42e888f721c91c2f > fffffe2db7781da0::print blkptr_t { blk_dva = [ { dva_word = [ 0x7eb5772d427b0689, 0xb2a5227c437e7225 ] }, { dva_word = [ 0x3d57f6ebc0a70101, 0xdf3158513205af14 ] }, { dva_word = [ 0xa6356b07a8c58b5b, 0x1c9a59df3602ed4f ] }, ] blk_prop = 0x48ce131834aff75e blk_pad = [ 0xe38a77efe92a78f6, 0x55163f6729539b1e ] blk_phys_birth = 0x3098188f77cdfbe4 blk_birth = 0x7902c62d2e5e7478 blk_fill = 0xaaf2279e21ce73b5 blk_cksum = { zc_word = [ 0xd7640c4088c7c5bc, 0xafd4985656e8c109, 0x7da33b6d8b6da3f4, 0x42e888f721c91c2f ] } } >
Dumb question for the filer: Do you have ECC RAM on this machine?
Dumb question for ZFS experts: Do the errors in the blkptr correspond to single-bit errors, in case the answer to the prior question is, "No ECC RAM"?
Updated by Nick Hall about 2 years ago
Dan McDonald wrote in #note-13:
I was provided another dump, with a completely different panic-stack, and a completely different iffy-block-pointer:
[...]
Dumb question for the filer: Do you have ECC RAM on this machine?
Yes, this server has ECC RAM and I have never seen it detect a memory error. I also ran a memory diagnostic after this first happened as I also was wondering, but no problems were detected.
Updated by Nick Hall about 2 years ago
I will also note, in case this was missed in my original post, that I can do a zfs send | recv from this pool into a brand new pool on a different computer, and then that computer will panic in SmartOS while doing a zpool scrub on the newly received pool data. So the panic can't be caused by a transient issue but there must be something in a zfs structure that survives zfs send which is causing the panic. Of course the initial corruption could have been a transient issue, I'm not sure, but that would be weird to be a bit flip as the ECC RAM should be able to correct it, or detect it in the case of a rare double bit-flip I believe?
Updated by Jason King about 2 years ago
It looks like that blkptr_t as well as the next blkptr_t in the prefetch queue look bad.. the others seem reasonable (looking at the DVAs, the first value is the vdev # IIRC -- so low numbers -- 0, 1, 2, etc would be expected).
I can maybe a bit later try to conjure up some mdb to try to tie those blocks back to datasets.
Updated by Jason King about 2 years ago
Just as another bit of data -- the blkid in the zbookmark for that bad blkptr is DMU_SPILL_BLKID
.
Updated by Jason King about 2 years ago
On that second dump, fffffe2db7781d90::print scan_prefetch_issue_ctx_t
-- it'd be interesting to see if it's also a spill ptr or not.
Updated by Dan McDonald about 2 years ago
(Corrected version)
> fffffe2db7781d80::print scan_prefetch_issue_ctx_t { spic_avl_node = { avl_child = [ 0, 0 ] avl_pcb = 0xfffffe2db7893801 } spic_spc = 0xfffffe2dba8f5338 spic_bp = { blk_dva = [ { dva_word = [ 0x7eb5772d427b0689, 0xb2a5227c437e7225 ] }, { dva_word = [ 0x3d57f6ebc0a70101, 0xdf3158513205af14 ] }, { dva_word = [ 0xa6356b07a8c58b5b, 0x1c9a59df3602ed4f ] }, ] blk_prop = 0x48ce131834aff75e blk_pad = [ 0xe38a77efe92a78f6, 0x55163f6729539b1e ] blk_phys_birth = 0x3098188f77cdfbe4 blk_birth = 0x7902c62d2e5e7478 blk_fill = 0xaaf2279e21ce73b5 blk_cksum = { zc_word = [ 0xd7640c4088c7c5bc, 0xafd4985656e8c109, 0x7da33b6d8b6da3f4, 0x42e888f721c91c2f ] } } spic_zb = { zb_objset = 0xe2b2 zb_object = 0x1bf86 zb_level = 0 zb_blkid = 0xfffffffffffffffe } } >
Updated by Jason King about 2 years ago
Oops... my hex math was off.. fffffe2db7781d80::print scan_prefetch_issue_ctx_t
Updated by Dan McDonald about 2 years ago
Corrected the above one per your request.