10405 Implement ZFS sorted scans
Review Request #1226 - Created Oct. 8, 2018 and submitted
| Information | |
|---|---|
| Toomas Soome | |
| illumos-gate | |
| 10405 | |
| 1572 | |
| 645d0e2... | |
| Reviewers | |
| general | |
This originated from ZFS On Linux, as https://github.com/zfsonlinux/zfs/commit/d4a72f23863382bdf6d0ae33196f5b5decbc48fd https://svnweb.freebsd.org/base?view=revision&revision=334844 During scans (scrubs or resilvers), it sorts the blocks in each transaction group by block offset; the result can be a significant improvement. (On my test system just now, which I put some effort to introduce fragmentation into the pool since I set it up yesterday, a scrub went from 1h2m to 33.5m with the changes.) I've seen similar rations on production systems. FreeNAS has had these changes since Oct 2017. Scrub & rebuild pools. Note times for performance analysis. The pools are compatible with systems without the changes, so bouncing back and forth between two versions is possible, and I've used that for correctness-checking.
zpool scrub
zpool offline rpool disk and then later zpool online rpool disk
zpool replaceIn my test systems the speedup on phydical disks is very noticeable, 4x 4TB raidz1 (7200RPM WD Black SATA), the scrub/resilver is down to 2 hours from 4 hours (3.5TB allocated data). The virtual disks under vmware fusion (on top of SSD) are not that drastic, but still significant and overall impression is consistent with notes from ZoL and FreeBSD.
I think this is similar to, but supersedes https://github.com/openzfs/openzfs/pull/648/, is that right?
There were a few follow on commits which George Wilson or Tom Caputi can point you at, including a8b2e30685c9214ccfd0181977540e080340df4e "Support re-prioritizing asynchronous prefetches"
Change Summary:
add following patches (as suggested):
a76f3d0437e5e974f0f748f8735af3539443b388: Fix deadlock in IO pipeline
c26cf0966d131b722c32f8ccecfe5791a789d975: Fix zio->io_priority failed (7 < 6) assert
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 2 (+3215 -860) |
I know you didn't write this code so I only concentrated on looking for possible bugs or problems. Overall this looks good. I assume this was already reviewed by people who are more knowledgeable about zfs than me. Here are the issues I noticed.
usr/src/uts/common/fs/zfs/arc.c
348-349 This comment seems wrong now
5069 At first I thought this should still be wrapped in VERIFY0, but after
looking at the next change, I think we're missing the assignment of
the return to 'rc'.usr/src/uts/common/fs/zfs/dbuf.c
2445 Why was this block added? It looks redundant with the existing block
at 2406.usr/src/uts/common/fs/zfs/dsl_scan.c
99 s/limitted/limited/
137 and 160 either change to boolean or use 0
159 this tunable seems worthy of an explanatory comment
384 I'm not sure why the fill_weight code is duplicated here from scan_init.
1757 Unless I'm misreading the code, it appears that we'll leak bp_toread if
dsl_scan_recurse returns an error.
3857 I don't see how the rest of the code in this function will block the
caller. Is something missing here?usr/src/uts/common/fs/zfs/range_tree.c
210 This function seems kind of problematic. If rtop_remove is set but
rtop_add is NULL, we lose the entry. It seems like it would be better
to do the checks up front before we remove/adjust/add, although based
on how this is used, maybe it should just assert the ops? I don't see
any of these with only one op and not the other. I'm not sure why the
extra checks were added for the individual ops throughout the code.usr/src/uts/common/fs/zfs/sys/dsl_scan.h
136 Trailing whilespace. cstyle should have caught this.
Change Summary:
Rebase on large dnode update, to make sure we are on the same page.
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 3 (+3215 -860) |
I went back through my old review comments and cross checked them with the latest ZoL code. These are my comments that I still think we need to address:
usr/src/uts/common/fs/zfs/arc.c
5069 assign return value to 'rc'usr/src/uts/common/fs/zfs/dbuf.c
2445 Why was this block added? It looks redundant with the existing block at 2406 and doesn't exist in ZoL.usr/src/uts/common/fs/zfs/dsl_scan.c
99 s/limitted/limited/ this is spelled correctly in ZoL.
384 I'm not sure why the fill_weight code is duplicated here from scan_init, which matches ZoL. This function doesn't exists in ZoL. We would also need to remove call from spa_init.
1757 We'll leak bp_toread if dsl_scan_recurse returns an error. In ZoL, code does goto out.usr/src/uts/common/fs/zfs/sys/dsl_scan.h
136 Trailing whilespace, doesn't exist in ZoL.
Change Summary:
rebase on 10343.
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 4 (+3245 -893) |
Change Summary:
complete rebase and revert fbsd changes.
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 5 (+3236 -924) |
you'll also want the fix for https://github.com/zfsonlinux/zfs/pull/8453 (once finalized).
Change Summary:
rebase on 9433
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 6 (+3236 -924) |
Change Summary:
whitespace cleanup (pbchk is happy).
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 7 (+3235 -923) |
I went back through the diffs that have changed since the last last time I reviewed this, and it looks good to me, as long as the separate "Multiple DVA Scrubbing Fix" which you ported from ZoL is also included in the same integration.
Change Summary:
update tests.
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 8 (+3265 -937) |
Thanks for updating the tests, sorry I missed that earlier.
-
usr/src/uts/common/fs/zfs/dsl_scan.c (Diff revision 8) -
It seems to me like it would be more correct if these were hrtime_t. I see scn_sync_start_time is a uint64_t, so maybe this isn't such a big deal.
-
usr/src/uts/common/fs/zfs/dsl_scan.c (Diff revision 8) -
IIUC, Linux has much more stringent stack requirements than illumos. I would vote to keep this in, but I know we're trying to minimize our diff from ZoL. The ZoL issue states that this saves up to 390 bytes with a recursion depth of around 13.
Also, if it matters, it doesn't look like this was added to ZoL as part of the sequential scan/scrub change.
Change Summary:
use hrtime_t
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 9 (+3265 -937) |
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 10 (+3264 -937) |
Thanks for fixing that up, this looks good to me now.
Change Summary:
rebase on 10592
Commit: |
|
||||
|---|---|---|---|---|---|
Diff: |
Revision 11 (+3264 -937) |
