scan code should check the return value of zfs_btree_first
Gernot reported on smartos-discuss that he has systems reliably panicking when running a zpool scrub. The message thread is here: https://smartos.topicbox.com/groups/smartos-discuss/T7028875b79636021/crash-while-scrubing
Gernot provided a dump with the given stack:
> ::stack scan_io_queue_fetch_ext+0xdc(fffffe0bc70e8880) scan_io_queues_run_one+0xb1(fffffe0bc70e8880) taskq_thread+0x2cd(fffffe0ba85d7270) thread_start+0xb()
At this point scan_io_queue_fetch_ext has just issued zfs_btree_first() which returned NULL. This is valid because the height of the btree is '-1':
> fffffe0bc70e8880::print dsl_scan_io_queue_t q_exts_by_size.bt_height | =D -1
zfs_btree_first returns NULL when the height is -1. Passing NULL into the following range tree inline functions leads to the panic we see in this dump.
It turns out that ZoL already has a fix for this in commit 'Function name and comment updates,' 516a83f8861269de9f795a96c471623e3bd67009. We should pull this commit into illumos.
Updated by Kody Kantor about 1 month ago
Gernot (the original reporter) said via email that he tested this change on his affected systems and it has resolved the issue.
Additionally, I ran a few manual
zpool detach, and
zpool scrub commands on a zpool on my OpenIndiana machine and didn't experience any panics or errors.
Updated by Electric Monk about 1 month ago
- Status changed from New to Closed
- % Done changed from 0 to 100
commit bfb9edc9bd178b0ce7fa2fbe1fc66e18e316af4e Author: Paul Dagnelie <email@example.com> Date: 2020-01-06T19:44:07.000Z 12143 scan code should check the return value of zfs_btree_first Portions contributed by: Kody Kantor <firstname.lastname@example.org> Reviewed by: Sara Hartse <email@example.com> Reviewed by: Brian Behlendorf <firstname.lastname@example.org> Reviewed by: Matt Ahrens <email@example.com> Reviewed by: Jason King <firstname.lastname@example.org> Reviewed by: Toomas Soome <email@example.com> Reviewed by: Jerry Jelinek <firstname.lastname@example.org> Approved by: Dan McDonald <email@example.com>