l2arc_write_buffers() may write beyond target_sz
I noticed that on some of our systems we were getting a clearly abnormal number of l2arc checksum errors accounted in l2_cksum_bad. The hardware appeared to be in good health. Using DTrace I noticed that the data seemed to be overwritten with other data. After more DTrace analysis I observed that sometimes l2arc_write_buffers() would advance l2ad_hand by more than target_sz. This meant that l2arc_write_buffers() would write beyond a region cleared by
l2arc_evict() and thus overwrite data belonging to non-evicted buffers. Havoc ensues.
The cache devices in question are all SSDs with physical sector size of 4KB. I am not sure about other ZFS platforms, but on FreeBSD this fact is detected and ashift of 12 is used for the cache vdevs.
Looking at l2arc_write_buffers() code you can see that it takes into account vdev ashift when actually writing buffers and advancing l2ad_hand (so that buffers always start at nicely aligned offsets):
/* * Keep the clock hand suitably device-aligned. */ buf_p_sz = vdev_psize_to_asize(dev->l2ad_vdev, buf_sz); write_psize += buf_p_sz; dev->l2ad_hand += buf_p_sz;
But the same is not done when selecting buffers to be written and checking that target_sz is not exceeded. So, if ARC contains a lot of buffers smaller than 4KB that means that on-disk size of the L2ARC buffers could be quite larger than their logical size.
The problem was introduced with the following change: https://github.com/illumos/illumos-gate/commit/e14bb3258d05c1b1077e2db7cf77088924e56919#diff-dc835266d68d5a3711f1bae8d024d796R4189