l2arc_write_buffers() may write beyond target_sz
I noticed that on some of our systems we were getting a clearly abnormal number of l2arc checksum errors accounted in l2_cksum_bad. The hardware appeared to be in good health. Using DTrace I noticed that the data seemed to be overwritten with other data. After more DTrace analysis I observed that sometimes l2arc_write_buffers() would advance l2ad_hand by more than target_sz. This meant that l2arc_write_buffers() would write beyond a region cleared by
l2arc_evict() and thus overwrite data belonging to non-evicted buffers. Havoc ensues.
The cache devices in question are all SSDs with physical sector size of 4KB. I am not sure about other ZFS platforms, but on FreeBSD this fact is detected and ashift of 12 is used for the cache vdevs.
Looking at l2arc_write_buffers() code you can see that it takes into account vdev ashift when actually writing buffers and advancing l2ad_hand (so that buffers always start at nicely aligned offsets):
/* * Keep the clock hand suitably device-aligned. */ buf_p_sz = vdev_psize_to_asize(dev->l2ad_vdev, buf_sz); write_psize += buf_p_sz; dev->l2ad_hand += buf_p_sz;
But the same is not done when selecting buffers to be written and checking that target_sz is not exceeded. So, if ARC contains a lot of buffers smaller than 4KB that means that on-disk size of the L2ARC buffers could be quite larger than their logical size.
The problem was introduced with the following change: https://github.com/illumos/illumos-gate/commit/e14bb3258d05c1b1077e2db7cf77088924e56919#diff-dc835266d68d5a3711f1bae8d024d796R4189
Updated by Andriy Gapon over 4 years ago
I actually was wrong about which commit introduced the problem.
Indeed, commit e14bb3258d05c1b1077e2db7cf77088924e56919 added alignment of l2ad_head value by rounding up
vdev_psize_to_asize(), but at the same time
write_size became a sum of the rounded up buffer sizes.
It was commit aad02571bc59671aa3103bb070ae365f531b0b62 (L2ARC compression) that split the buffer writing loop into the buffer selection loop and the buffer compression + writing loop. After that split
write_size became just a sum of
Updated by Electric Monk about 4 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
commit d7d9a6d919f92d74ea0510a53f8441396048e800 Author: Andriy Gapon <email@example.com> Date: 2015-10-07T12:13:27.000Z 5219 l2arc_write_buffers() may write beyond target_sz Reviewed by: Matthew Ahrens <firstname.lastname@example.org> Reviewed by: Saso Kiselkov <email@example.com> Reviewed by: George Wilson <firstname.lastname@example.org> Reviewed by: Steven Hartland <email@example.com> Reviewed by: Justin Gibbs <gibbs@FreeBSD.org> Approved by: Matthew Ahrens <firstname.lastname@example.org>