Project

General

Profile

Feature #3525

Persistent L2ARC

Added by Sašo Kiselkov over 7 years ago. Updated 14 days ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
zfs - Zettabyte File System
Start date:
2013-02-04
Due date:
% Done:

100%

Estimated time:
Difficulty:
Hard
Tags:
needs-triage
Gerrit CR:

Description

This feature implements a light-weight persistent L2ARC metadata structure that allows L2ARC contents to be recovered after a reboot or L2ARC device remove/add. This significantly eases the impact a reboot has on read performance on systems with large caches.

Webrev: http://cr.illumos.org/~webrev/skiselkov/l2arc_persist

The implementation is essentially feature-complete, and should only need some more tuning and testing to make sure it performs adequately.


Related issues

Related to illumos gate - Bug #7341: Pool halt due to checksum error after rebootNew2016-08-30

Actions
Related to illumos gate - Bug #833: ZPOOL cache device is emptied after reboootNew2011-03-18

Actions
Related to illumos gate - Feature #13013: Port OpenZFS zpool label clear improvementsClosedJason King

Actions
Related to illumos gate - Bug #13012: zpool_read_label semantics should match OpenZFSClosedJason King

Actions
#1

Updated by Sašo Kiselkov over 7 years ago

  • Status changed from New to Feedback
  • % Done changed from 80 to 90

Implementation is now feature-complete, new webrev at:
http://cr.illumos.org/~webrev/skiselkov/3525/

#2

Updated by Sašo Kiselkov over 7 years ago

I've written a wiki post about the design at: http://wiki.illumos.org/display/illumos/Persistent+L2ARC

#3

Updated by Eric Smith over 7 years ago

Perhaps use SHA256 rather than Fletcher, or offer it as an option when an L2ARC device is created? SHA256 has instruction set acceleration in upcoming processors such as ARMv7 (Aarch64), and is likely to show up in future x86 processors. As far as I can tell, hardware accelerated Flatcher-4 support isn't likely to show up in any processor architectures any time soon, though admittedly it is less needed.

#4

Updated by Sašo Kiselkov over 7 years ago

On a modern CPU fletcher-4 achieves in excess of 4GB/s/core in hashing performance, whereas SHA-256 (in hand-written assembly) manages less than 1/10th of that. Unless the performance gain from hardware acceleration is likely to far exceed 10x, I see no reason to introduce this.
Also, in this patch the checksum is used simply as a data consistency check, not as a security mechanism or de-duplication control. Another thing to note is that L2ARC writes are few and far between (and happen asynchronously anyway). Even if the checksum used was slower than optimum (and it's not), the performance impact would likely be negligible and probably wouldn't outweigh introducing further fragility through obscure tunables.

#5

Updated by Sašo Kiselkov about 7 years ago

New version up for review: http://cr.illumos.org/~webrev/skiselkov/3525_simplified
Simplified, faster, cleaner.

#6

Updated by Yuxuan Shui almost 7 years ago

arc.c line 5849,
bzero(&dev->l2ad_ublk, sizeof (&dev->l2ad_ublk));

Shouldn't that '&' be removed?

#7

Updated by Arne Jansen about 5 years ago

This is a really interesting feature which seems to be nearly done. Is it still being worked on? If not, are the any fundamental issues with the design?

#8

Updated by Sašo Kiselkov about 5 years ago

Arne Jansen wrote:

This is a really interesting feature which seems to be nearly done. Is it still being worked on? If not, are the any fundamental issues with the design?

We have this working in the upcoming Nexenta 5.0 release, which should be available "soon" (can't give any more precise info, since I don't have it either). Together with that, I'll be working on upstreaming this into Illumos.

#9

Updated by Vitaliy Gusev about 5 years ago

Sašo Kiselkov wrote:

We have this working in the upcoming Nexenta 5.0 release, which should be available "soon" (can't give any more precise info, since I don't have it either). Together with that, I'll be working on upstreaming this into Illumos.

Just suggestion: Make this functionality as separate l2arc.c file. Original arc.c seems overcrowded.

#10

Updated by Sašo Kiselkov about 5 years ago

Vitaliy Gusev wrote:

Just suggestion: Make this functionality as separate l2arc.c file. Original arc.c seems overcrowded.

It's a little late for that. IMO L2ARC needs to be either rewritten from scratch, or dropped entirely and the idea reimplemented by other means.

#11

Updated by Sašo Kiselkov almost 5 years ago

Finalized code review at: https://reviews.csiden.org/r/267/

#12

Updated by Arne Jansen about 4 years ago

  • Related to Bug #7341: Pool halt due to checksum error after reboot added
#13

Updated by Gergő Mihály Doma about 3 years ago

  • Related to Bug #833: ZPOOL cache device is emptied after rebooot added
#14

Updated by Till Wegmüller 7 months ago

Link to merged zfsonlinx pr:

https://github.com/openzfs/zfs/pull/9582

#15

Updated by Jason King 3 months ago

  • Assignee changed from Sašo Kiselkov to Jason King

Porting the OpenZFS change (which was based off the original work here that wasn't integrated), as well as some subsequent bug fixes (as well as two small prerequisites), the following OpenZFS commits were used:

commit 5ff2249fa579c20c12bce627e284415720fd4412
Author: Alexander Motin <mav@FreeBSD.org>
Date:   Tue Dec 3 12:59:30 2019 -0500

    Fix use-after-free in case of L2ARC prefetch failure

    In case L2ARC read failed, l2arc_read_done() creates _different_ ZIO
    to read data from the original storage device.  Unfortunately pointer
    to the failed ZIO remains in hdr->b_l1hdr.b_acb->acb_zio_head, and if
    some other read try to bump the ZIO priority, it will crash.

    The problem is reproducible by corrupting L2ARC content and reading
    some data with prefetch if l2arc_noprefetch tunable is changed to 0.
    With the default setting the issue is probably not reproducible now.

    Reviewed-by: Tom Caputi <tcaputi@datto.com>
    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Alexander Motin <mav@FreeBSD.org>
    Sponsored-By: iXsystems, Inc.
    Closes #9648
From 37c22948e5b1ce9f9f11c66676ecca15caf1f264 Mon Sep 17 00:00:00 2001
From: George Amanakis <gamanakis@gmail.com>
Date: Tue, 31 Mar 2020 13:46:48 -0400
Subject: [PATCH] Reset l2ad_hand and l2ad_first in l2arc_evict

Increasing l2arc_write_size or l2arc_write_boost can result in
l2arc_write_buffers() not having enough space to perform its writes and
panic zio_write_phys().

Instead of resetting l2ad_hand to l2ad_start at the end of
l2arc_write_buffers() and not taking into account a possible
user-mediated increase of l2arc_write_max, we do this in l2arc_evict(),
right after l2arc_write_size() has run. If there is not enough space to
evict (ie we will exceed l2ad_end) we evict to the end of the device,
reset l2ad_hand to l2ad_start, set l2ad_first to 0 and iterate
l2arc_evict(). We avoid infinite iteration of l2arc_evict() by making
sure in l2arc_write_size() that l2ad_start + size does not exceed
l2ad_end.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10154
From ca6c7a94c994b9979824f8fccf1c6ddb00ee5598 Mon Sep 17 00:00:00 2001
From: Brian Behlendorf <behlendorf1@llnl.gov>
Date: Fri, 15 Mar 2019 14:17:38 -0700
Subject: [PATCH] Fix l2arc_evict() destroy race

When destroying an arc_buf_hdr_t its identity cannot be discarded
until it is entirely undiscoverable.  This not only includes being
unhashed, but also being removed from the l2arc header list.
Discarding the header's identify prematurely renders the hash
lock useless because it will always hash to bucket zero.

This change resolves a race with l2arc_evict() by discarding the
identity after it has been removed from the l2arc header list.
This ensures either the header is not on the list or contains
the correct identify.

Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #7688
Closes #8144
From 77f6826b83b7e27f0996f6d192202c36f65e41fd Mon Sep 17 00:00:00 2001
From: George Amanakis <gamanakis@gmail.com>
Date: Fri, 10 Apr 2020 13:33:35 -0400
Subject: [PATCH] Persistent L2ARC

This commit makes the L2ARC persistent across reboots. We implement
a light-weight persistent L2ARC metadata structure that allows L2ARC
contents to be recovered after a reboot. This significantly eases the
impact a reboot has on read performance on systems with large caches.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Saso Kiselkov <skiselkov@gmail.com>
Co-authored-by: Jorgen Lundman <lundman@lundman.net>
Co-authored-by: George Amanakis <gamanakis@gmail.com>
Ported-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #925
Closes #1823
Closes #2672
Closes #3744
Closes #9582

From 9249f1272ec08316f4482cba145b42ba935d3b02 Mon Sep 17 00:00:00 2001
From: George Amanakis <gamanakis@gmail.com>
Date: Fri, 17 Apr 2020 12:27:40 -0400
Subject: [PATCH] Persistent L2ARC minor fixes

Minor fixes on persistent L2ARC improving code readability and fixing
a typo in zdb.c when byte-swapping a log block. It also improves the
pesist_l2arc_007_pos.ksh test by giving it more time to retrieve log
blocks on the cache device.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam D. Moss <c@yotes.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10210
From 657fd33bcff17e44ad55dffdf294d7c107b4bf5d Mon Sep 17 00:00:00 2001
From: George Amanakis <gamanakis@gmail.com>
Date: Thu, 7 May 2020 19:34:03 -0400
Subject: [PATCH] Improvements on persistent L2ARC

Functional changes:

We implement refcounts of log blocks and their aligned size on the
cache device along with two corresponding arcstats. The refcounts are
reflected in the header of the device and provide valuable information
as to whether log blocks are accounted for correctly. These are
dynamically adjusted as log blocks are committed/evicted. zdb also uses
this information in the device header and compares it to the
corresponding values as reported by dump_l2arc_log_blocks() which
emulates l2arc_rebuild(). If the refcounts saved in the device header
report higher values, zdb exits with an error. For this feature to work
correctly there should be no active writes on the device. This is also
employed in the tests of persistent L2ARC. We extend the structure of
the cache device header by adding the two new variables mirroring the
refcounts after the existing variables to preserve backward
compatibility in terms of persistent L2ARC.

1) a new arcstat "l2_log_blk_asize" and refcount "l2ad_lb_asize" which
   reflect the total aligned size of log blocks on the device. This is
   also reflected in the header of the cache device as "dh_lb_asize".
2) a new arcstat "l2arc_log_blk_count" and refcount "l2ad_lb_count" 
   which reflect the total number of L2ARC log blocks present on cache
   devices.  It is also reflected in the header of the cache device as
   "dh_lb_count".

In l2arc_rebuild_vdev() if the amount of committed log entries in a log
block is 0 and the device header is valid we update the device header.
This will facilitate trimming of the whole device in this case when
TRIM for L2ARC is implemented.

Improve loop protection in l2arc_rebuild() by using the starting offset
of the payload of each log block instead of the starting offset of the
log block.

If the zio in l2arc_write_buffers() fails, restore the lbps array in the
header of the device to its previous state in l2arc_write_done().

If l2arc_rebuild() ends the rebuild process without restoring any L2ARC
log blocks in ARC and without any other error, this means that the lbps
array in the header is pointing to non-existent or invalid log blocks.
Reset the device header in this case.

In l2arc_rebuild() change the zfs_dbgmsg messages to
spa_history_log_internal() making them user visible with zpool history
command.

Non-functional changes:

Make the first test in persistent L2ARC use `zdb -lll` to increase
coverage in `zdb.c`.

Rename psize with asize when referring to log blocks, since
L2ARC_SET_PSIZE stores the vdev aligned size for log blocks. Also
rename dh_log_blk_entries to dh_log_entries to make it clear that
it is a mirror of l2ad_log_entries. Added comments for both changes.

Fix inaccurate comments for example in l2arc_log_blk_restore().

Add asserts at the end in l2arc_evict() and l2arc_write_buffers().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10228
From 2054f35e564b59e23950ee573e91dbae3fd2ccd9 Mon Sep 17 00:00:00 2001
From: George Amanakis <gamanakis@gmail.com>
Date: Fri, 10 Jul 2020 17:10:03 -0400
Subject: [PATCH] Fix a persistent L2ARC bug in l2arc_write_done()

In case l2arc_write_done() handles a zio that was not successful check
that the list of log block pointers is not empty when restoring them
in the device header. Otherwise zero them out. In any case perform the
actual write updating the device header after the zio of
l2arc_write_buffers() completes as l2arc_write_done() may have touched
the memory holding the log block pointers in the device header.

Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10540
Closes #10543
#16

Updated by Jason King 3 months ago

  • Related to Feature #13013: Port OpenZFS zpool label clear improvements added
#17

Updated by Jason King 3 months ago

  • Related to Bug #13012: zpool_read_label semantics should match OpenZFS added
#18

Updated by Electric Monk 3 months ago

  • Gerrit CR set to 848
#19

Updated by Jason King 21 days ago

To test, I ran the zfs test suite (including the new persistent L2ARC tests introduced with this change). All the expected tests pass, and the failures are all known issues with existing bugs filed for them.

#20

Updated by Electric Monk 14 days ago

  • Status changed from Feedback to Closed
  • % Done changed from 90 to 100

git commit f0a052391861a2b96cf28973c3b7f2854591aa79

commit  f0a052391861a2b96cf28973c3b7f2854591aa79
Author: George Amanakis <gamanakis@gmail.com>
Date:   2020-10-16T16:10:02.000Z

    3525 Persistent L2ARC
    Portions contributed by: Saso Kiselkov <skiselkov@gmail.com>
    Portions contributed by: Jorgen Lundman <lundman@lundman.net>
    Portions contributed by: Brian Behlendorf <behlendorf1@llnl.gov>
    Portions contributed by: Alexander Motin <mav@FreeBSD.org>
    Portions contributed by: Jason King <jason.king@joyent.com>
    Reviewed by: C Fraire <cfraire@me.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF