incremental replication of 'holey' file systems is slow
From Matt Ahrens's bug report at Delphix:
A customer is seeing their incremental replication taking much longer than expected. We observed that they are using 100% of one CPU, with the stack typically: list_next+0x18 dbuf_free_range+0xaa dnode_free_range+0x53f dmu_free_long_range_impl+0x143 dmu_free_long_range+0x5a restore_free+0x71 dmu_recv_stream+0x3dd zfs_ioc_recv+0x35b zfsdev_ioctl+0x347 We found that each dbuf_free_range() covers about 48KB (6 blocks), but takes about half a second to complete. The problem is that dbuf_free_range() walks the list of all dbufs associated with this object to find dbufs in the range. In this case there are 1.2 million dbufs on the list. "zfs send" must generate a free record for every hole in an object, even if was not zeroed since previous snapshot. This is because we (unfortunately) don't store the birth time of hole blocks, so we don't know if the hole was created since the previous snapshot or not. This would seem to be a problem for zvols storing other file systems such as NTFS, which necessarily have holes, and we have not characterized the pattern of holes typically seen. To temporarily work around this issue, you can run "zinject -a", which will flush the cache, thus reducing the list of dbufs to walk. On one customer system, this improved things by about 10x, to about 100,000 dbufs on the list and 50ms to process each record.
Due to the way we receive file systems it is actually guaranteed that there will not be any dbufs for any freed ranges, so the simplest solution is just to skip the dbuf eviction step of processing free records. See the comments in the code for more details.
Updated by Christopher Siden about 6 years ago
- Status changed from In Progress to Closed
commit 2f3d878 Author: Matthew Ahrens <email@example.com> Date: Mon Jul 29 11:58:53 2013 3834 incremental replication of 'holey' file systems is slow Reviewed by: Adam Leventhal <firstname.lastname@example.org> Reviewed by: George Wilson <email@example.com> Approved by: Richard Lowe <firstname.lastname@example.org>