panic when adding a duplicate dbuf to dn_dbufs
We hit a panic in dbuf_compare when testing Illumos recently (this was initially reported to us by Joyent, but we reproduced in house on Illumos):
> ::status debugging crash dump vmcore.0 (64-bit) from oi-151 operating system: 5.11 zfs-precommit-build-430 (i86pc) image uuid: 3647cb5c-7baf-6f82-fe77-afd204183a92 panic message: assertion failed: d1 == d2 (0xffffff0235debd70 == 0xffffff022b48a368), file: ../../common/fs/zfs/dnode.c, line: 94 dump content: kernel pages only > ::stack vpanic() 0xfffffffffbe0b97d() dbuf_compare+0x83(ffffff0235debd70, ffffff022b48a368) avl_find+0x7b(ffffff01f4d9f6e8, ffffff0235debd70, ffffff00091d87c8) avl_add+0x27(ffffff01f4d9f6e8, ffffff0235debd70) dbuf_create+0x2d8(ffffff01f4d9f470, 0, cb, ffffff01ed7d3220, ffffff01f7655580) dbuf_hold_impl+0x192(ffffff01f4d9f470, 0, cb, 0, fffffffff7917c90, ffffff00091d8978) dbuf_hold+0x27(ffffff01f4d9f470, cb, fffffffff7917c90) dmu_buf_hold_array_by_dnode+0x129(ffffff01f4d9f470, 1960000, 800, 0, fffffffff7917c90, ffffff00091d8a8c) dmu_write_uio_dnode+0x5f(ffffff01f4d9f470, ffffff00091d8e00, 800, ffffff01eeaed800) dmu_write_uio_dbuf+0x5d(ffffff01ebcf8700, ffffff00091d8e00, 800, ffffff01eeaed800) zfs_write+0xcf1(ffffff01e8c80700, ffffff00091d8e00, 0, ffffff01f214db88, 0) fop_write+0x5b(ffffff01e8c80700, ffffff00091d8e00, 0, ffffff01f214db88, 0) pwrite64+0x276(7, 8c5d000, c00, 195fc00, 0) _sys_sysenter_post_swapgs+0x237()
Upon further inspection, we see that we have violated the assumption that db_created is unique for every dbuf. However, given that the timestamp in this case comes from the dbuf_cache kmem_cache, it seems that it's possible to have two kmem cache entries be created with the same time stamp in say different CPUs magazines and get extremely unlucky (especially if we decided to prefill a slab, it could only come up much later). The assumption that the db_created timestamp is unique was thus fundamentally flawed.
The underlying issue here is that previously it was acceptable to have two dbufs of the same id in dn_dbufs that were both DB_EVICTING. This is no longer the case as of the recent commit "4873 zvol unmap calls can take a very long time for larger datasets", where we turned dn_dbufs from a linked list into an AVL tree and introduced the constraint that all dbufs in dn_dbufs be unique.
Unfortunately, we need to have the ability to have multiple dbufs in dn_dbufs with the same block id and offset. Looking at dbuf_clear, we see that it marks a dbuf as DB_EVICTING but doesn't actually remove it from dn_dbufs (if it doesn't hold the dnode lock). All functions that interact w/ the dbuf hash table (dbuf_find, dbuf_hash_insert, etc) silently ignore dbufs marked as DB_EVICTING, so dbuf_clear does the moral equivalent of removing the from the dbuf hash table. With this, dbuf_hold is unable to find the old dbuf via dbuf_find and calls dbuf_create to create a new dbuf. We now have two dbufs of the same id in the dn_dbufs - the newly created one, and the old one that is marked DB_EVICTING.
To allow us to continue to use an avl tree for dn_bufs but also all for multiple entries with the same (level, blockid), we must additionally order the dbufs by some arbitrary and unique value. We do this by ordering dbufs by their address in dbuf_compare. We also ensure that dbuf with state==DB_SEARCH compares before any other dbufs of that blkid, so that we can search for the first dbuf with that blkid.
Updated by Electric Monk over 5 years ago
- % Done changed from 0 to 100
- Status changed from In Progress to Closed
commit 86bb58aec7165f8a0303564575c65e5a2ad58bf1 Author: Alex Reece <email@example.com> Date: 2014-08-19T18:20:31.000Z 5095 panic when adding a duplicate dbuf to dn_dbufs Reviewed by: Adam Leventhal <firstname.lastname@example.org> Reviewed by: George Wilson <email@example.com> Reviewed by: Mattew Ahrens <firstname.lastname@example.org> Reviewed by: Dan Kimmel <email@example.com> Reviewed by: Dan McDonald <firstname.lastname@example.org> Reviewed by: Josef Sipek <email@example.com> Approved by: Robert Mustacchi <firstname.lastname@example.org>