Project

General

Profile

Bug #5219

l2arc_write_buffers() may write beyond target_sz

Added by Andriy Gapon almost 5 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2014-10-08
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

I noticed that on some of our systems we were getting a clearly abnormal number of l2arc checksum errors accounted in l2_cksum_bad. The hardware appeared to be in good health. Using DTrace I noticed that the data seemed to be overwritten with other data. After more DTrace analysis I observed that sometimes l2arc_write_buffers() would advance l2ad_hand by more than target_sz. This meant that l2arc_write_buffers() would write beyond a region cleared by
l2arc_evict() and thus overwrite data belonging to non-evicted buffers. Havoc ensues.

The cache devices in question are all SSDs with physical sector size of 4KB. I am not sure about other ZFS platforms, but on FreeBSD this fact is detected and ashift of 12 is used for the cache vdevs.

Looking at l2arc_write_buffers() code you can see that it takes into account vdev ashift when actually writing buffers and advancing l2ad_hand (so that buffers always start at nicely aligned offsets):

                        /*
                         * Keep the clock hand suitably device-aligned.
                         */
                        buf_p_sz = vdev_psize_to_asize(dev->l2ad_vdev, buf_sz);
                        write_psize += buf_p_sz;
                        dev->l2ad_hand += buf_p_sz;

But the same is not done when selecting buffers to be written and checking that target_sz is not exceeded. So, if ARC contains a lot of buffers smaller than 4KB that means that on-disk size of the L2ARC buffers could be quite larger than their logical size.

The problem was introduced with the following change: https://github.com/illumos/illumos-gate/commit/e14bb3258d05c1b1077e2db7cf77088924e56919#diff-dc835266d68d5a3711f1bae8d024d796R4189

History

#2

Updated by Andriy Gapon over 4 years ago

I actually was wrong about which commit introduced the problem.
Indeed, commit e14bb3258d05c1b1077e2db7cf77088924e56919 added alignment of l2ad_head value by rounding up buf_sz using vdev_psize_to_asize(), but at the same time write_size became a sum of the rounded up buffer sizes.
It was commit aad02571bc59671aa3103bb070ae365f531b0b62 (L2ARC compression) that split the buffer writing loop into the buffer selection loop and the buffer compression + writing loop. After that split write_size became just a sum of b_asize-s.

#3

Updated by Electric Monk almost 4 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit d7d9a6d919f92d74ea0510a53f8441396048e800

commit  d7d9a6d919f92d74ea0510a53f8441396048e800
Author: Andriy Gapon <avg@freebsd.org>
Date:   2015-10-07T12:13:27.000Z

    5219 l2arc_write_buffers() may write beyond target_sz
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Saso Kiselkov <skiselkov@gmail.com>
    Reviewed by: George Wilson <george@delphix.com>
    Reviewed by: Steven Hartland <steven.hartland@multiplay.co.uk>
    Reviewed by: Justin Gibbs <gibbs@FreeBSD.org>
    Approved by: Matthew Ahrens <mahrens@delphix.com>

Also available in: Atom PDF