10405 Implement ZFS sorted scans
Review Request #1226 - Created Oct. 8, 2018 and updated
This originated from ZFS On Linux, as https://github.com/zfsonlinux/zfs/commit/d4a72f23863382bdf6d0ae33196f5b5decbc48fd https://svnweb.freebsd.org/base?view=revision&revision=334844 During scans (scrubs or resilvers), it sorts the blocks in each transaction group by block offset; the result can be a significant improvement. (On my test system just now, which I put some effort to introduce fragmentation into the pool since I set it up yesterday, a scrub went from 1h2m to 33.5m with the changes.) I've seen similar rations on production systems. FreeNAS has had these changes since Oct 2017. Scrub & rebuild pools. Note times for performance analysis. The pools are compatible with systems without the changes, so bouncing back and forth between two versions is possible, and I've used that for correctness-checking.
zpool offline rpool disk and then later zpool online rpool disk
In my test systems the speedup on phydical disks is very noticeable, 4x 4TB raidz1 (7200RPM WD Black SATA), the scrub/resilver is down to 2 hours from 4 hours (3.5TB allocated data). The virtual disks under vmware fusion (on top of SSD) are not that drastic, but still significant and overall impression is consistent with notes from ZoL and FreeBSD.
I think this is similar to, but supersedes https://github.com/openzfs/openzfs/pull/648/, is that right?
There were a few follow on commits which George Wilson or Tom Caputi can point you at, including a8b2e30685c9214ccfd0181977540e080340df4e "Support re-prioritizing asynchronous prefetches"
add following patches (as suggested):
a76f3d0437e5e974f0f748f8735af3539443b388: Fix deadlock in IO pipeline
c26cf0966d131b722c32f8ccecfe5791a789d975: Fix zio->io_priority failed (7 < 6) assert
Revision 2 (+3215 -860)
I know you didn't write this code so I only concentrated on looking for possible bugs or problems. Overall this looks good. I assume this was already reviewed by people who are more knowledgeable about zfs than me. Here are the issues I noticed.
348-349 This comment seems wrong now
5069 At first I thought this should still be wrapped in VERIFY0, but after
looking at the next change, I think we're missing the assignment of
the return to 'rc'.
2445 Why was this block added? It looks redundant with the existing block
137 and 160 either change to boolean or use 0
159 this tunable seems worthy of an explanatory comment
384 I'm not sure why the fill_weight code is duplicated here from scan_init.
1757 Unless I'm misreading the code, it appears that we'll leak bp_toread if
dsl_scan_recurse returns an error.
3857 I don't see how the rest of the code in this function will block the
caller. Is something missing here?
210 This function seems kind of problematic. If rtop_remove is set but
rtop_add is NULL, we lose the entry. It seems like it would be better
to do the checks up front before we remove/adjust/add, although based
on how this is used, maybe it should just assert the ops? I don't see
any of these with only one op and not the other. I'm not sure why the
extra checks were added for the individual ops throughout the code.
136 Trailing whilespace. cstyle should have caught this.
Rebase on large dnode update, to make sure we are on the same page.
Revision 3 (+3215 -860)
I went back through my old review comments and cross checked them with the latest ZoL code. These are my comments that I still think we need to address:
5069 assign return value to 'rc'
2445 Why was this block added? It looks redundant with the existing block at 2406 and doesn't exist in ZoL.
99 s/limitted/limited/ this is spelled correctly in ZoL.
384 I'm not sure why the fill_weight code is duplicated here from scan_init, which matches ZoL. This function doesn't exists in ZoL. We would also need to remove call from spa_init.
1757 We'll leak bp_toread if dsl_scan_recurse returns an error. In ZoL, code does goto out.
136 Trailing whilespace, doesn't exist in ZoL.