Project

General

Profile

Actions

Bug #13374

closed

Port L2ARC Improvements from OpenZFS

Added by Jason King 10 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
zfs - Zettabyte File System
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

After the initial port of persistent L2ARC, there's been several improvements we should port:

From da60484db5960f2b94cf3302dd08fe7fb673eaf3 Mon Sep 17 00:00:00 2001
From: George Amanakis <gamanakis@gmail.com>
Date: Sat, 1 Aug 2020 14:17:18 -0400
Subject: [PATCH] Fix logging in l2arc_rebuild()

In case the L2ARC rebuild was canceled, do not log to spa history
log as the pool may be in the process of being removed and a panic
may occur:

BUG: kernel NULL pointer dereference, address: 0000000000000018
RIP: 0010:spa_history_log_internal+0xb1/0x120 [zfs]
Call Trace:
 l2arc_rebuild+0x464/0x7c0 [zfs]
 l2arc_dev_rebuild_start+0x2d/0x130 [zfs]
 ? l2arc_rebuild+0x7c0/0x7c0 [zfs]
 thread_generic_wrapper+0x78/0xb0 [spl]
 kthread+0xfb/0x130
 ? IS_ERR+0x10/0x10 [spl]
 ? kthread_park+0x90/0x90
 ret_from_fork+0x35/0x40

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10659 
From fc34dfba8e8238683e90e3fa83d16be3343886f6 Mon Sep 17 00:00:00 2001
From: Allan Jude <allanjude@freebsd.org>
Date: Fri, 14 Aug 2020 02:31:20 -0400
Subject: [PATCH] Fix L2ARC reads when compressed ARC disabled

When reading compressed blocks from the L2ARC, with
compressed ARC disabled, arc_hdr_size() returns
LSIZE rather than PSIZE, but the actual read is PSIZE.
This causes l2arc_read_done() to compare the checksum
against the wrong size, resulting in checksum failure.

This manifests as an increase in the kstat l2_cksum_bad
and the read being retried from the main pool, making the
L2ARC ineffective.

Add new L2ARC tests with Compressed ARC enabled/disabled

Blocks are handled differently depending on the state of the
zfs_compressed_arc_enabled tunable.

If a block is compressed on-disk, and compressed_arc is enabled:
- the block is read from disk
- It is NOT decompressed
- It is added to the ARC in its compressed form
- l2arc_write_buffers() may write it to the L2ARC (as is)
- l2arc_read_done() compares the checksum to the BP (compressed)

However, if compressed_arc is disabled:
- the block is read from disk
- It is decompressed
- It is added to the ARC (uncompressed)
- l2arc_write_buffers() will use l2arc_apply_transforms() to
  recompress the block, before writing it to the L2ARC
- l2arc_read_done() compares the checksum to the BP (compressed)
- l2arc_read_done() will use l2arc_untransform() to uncompress it

This test writes out a test file to a pool consisting of one disk
and one cache device, then randomly reads from it. Since the arc_max
in the tests is low, this will feed the L2ARC, and result in reads
from the L2ARC.

We compare the value of the kstat l2_cksum_bad before and after
to determine if any blocks failed to survive the trip through the
L2ARC.

Sponsored-by: The FreeBSD Foundation
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes #10693
From 523e1295fe8fb15d37b2459ce48a32b9454d448d Mon Sep 17 00:00:00 2001
From: Alexander Motin <mav@FreeBSD.org>
Date: Tue, 25 Aug 2020 17:33:36 -0400
Subject: [PATCH] Introduce limit on size of L2ARC headers

Since L2ARC buffers are not evicted on memory pressure, too large
amount of headers on system with irrationally large L2ARC can render
it slow or even unusable.  This change limits L2ARC writes and
rebuild if unevictable L2ARC-only headers reach dangerous level.

While there, call arc_adapt() on L2ARC rebuild, so that it could
properly grow arc_c, reflecting potentially significant ARC size
increase and avoiding slow growth with hopeless eviction attempts
later when "overflow" is detected.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #10765
From feb3a7eef114b6d23bc1aaf5c550b2abeffaca94 Mon Sep 17 00:00:00 2001
From: George Amanakis <gamanakis@gmail.com>
Date: Tue, 8 Sep 2020 14:44:37 -0400
Subject: [PATCH] Introduce ZFS module parameter l2arc_mfuonly

In certain workloads it may be beneficial to reduce wear of L2ARC
devices by not caching MRU metadata and data into L2ARC. This commit
introduces a new tunable l2arc_mfuonly for this purpose.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10710
From 085321621e79a75bea41c2b6511da6ebfbf2ba0a Mon Sep 17 00:00:00 2001
From: George Amanakis <gamanakis@gmail.com>
Date: Mon, 14 Sep 2020 13:10:44 -0400
Subject: [PATCH] Add L2ARC arcstats for MFU/MRU buffers and buffer content
 type

Currently the ARC state (MFU/MRU) of cached L2ARC buffer and their
content type is unknown. Knowing this information may prove beneficial
in adjusting the L2ARC caching policy.

This commit adds L2ARC arcstats that display the aligned size
(in bytes) of L2ARC buffers according to their content type
(data/metadata) and according to their ARC state (MRU/MFU or
prefetch). It also expands the existing evict_l2_eligible arcstat to
differentiate between MFU and MRU buffers.

L2ARC caches buffers from the MRU and MFU lists of ARC. Upon caching a
buffer, its ARC state (MRU/MFU) is stored in the L2 header
(b_arcs_state). The l2_m{f,r}u_asize arcstats reflect the aligned size
(in bytes) of L2ARC buffers according to their ARC state (based on
b_arcs_state). We also account for the case where an L2ARC and ARC
cached MRU or MRU_ghost buffer transitions to MFU. The l2_prefetch_asize
reflects the alinged size (in bytes) of L2ARC buffers that were cached
while they had the prefetch flag set in ARC. This is dynamically updated
as the prefetch flag of L2ARC buffers changes.

When buffers are evicted from ARC, if they are determined to be L2ARC
eligible then their logical size is recorded in
evict_l2_eligible_m{r,f}u arcstats according to their ARC state upon
eviction.

Persistent L2ARC:
When committing an L2ARC buffer to a log block (L2ARC metadata) its
b_arcs_state and prefetch flag is also stored. If the buffer changes
its arcstate or prefetch flag this is reflected in the above arcstats.
However, the L2ARC metadata cannot currently be updated to reflect this
change.
Example: L2ARC caches an MRU buffer. L2ARC metadata and arcstats count
this as an MRU buffer. The buffer transitions to MFU. The arcstats are
updated to reflect this. Upon pool re-import or on/offlining the L2ARC
device the arcstats are cleared and the buffer will now be counted as an
MRU buffer, as the L2ARC metadata were not updated.
Bug fix:
- If l2arc_noprefetch is set, arc_read_done clears the L2CACHE flag of
  an ARC buffer. However, prefetches may be issued in a way that
  arc_read_done() is bypassed. Instead, move the related code in
  l2arc_write_eligible() to account for those cases too.

Also add a test and update manpages for l2arc_mfuonly module parameter,
and update the manpages and code comments for l2arc_noprefetch.
Move persist_l2arc tests to l2arc.

Reviewed-by: Ryan Moeller <freqlabs@FreeBSD.org>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10743 
From af20b97078af7bf4ba7552dff04cc40b643ab72c Mon Sep 17 00:00:00 2001
From: Matthew Macy <mmacy@freebsd.org>
Date: Sun, 27 Sep 2020 17:08:38 -0700
Subject: [PATCH] zfetch: Don't issue new streams when old have not completed

The current dmu_zfetch code implicitly assumes that I/Os complete
within min_sec_reap seconds. With async dmu and a readonly workload
(and thus no exponential backoff in operations from the "write
throttle") such as L2ARC rebuild it is possible to saturate the drives
with I/O requests. These are then effectively compounded with prefetch
requests.

This change reference counts streams and prevents them from being
recycled after their min_sec_reap timeout if they still have
outstanding I/Os.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #10900
From a76e4e676180acad9e12f5a1e03f5ff48ec85983 Mon Sep 17 00:00:00 2001
From: George Amanakis <gamanakis@gmail.com>
Date: Mon, 5 Oct 2020 18:29:05 -0400
Subject: [PATCH] Make L2ARC tests more robust

Instead of relying on arbitrary timers after pool export/import or cache
device off/online rely on arcstats. This makes the L2ARC tests more
robust. Also cleanup some functions related to persistent L2ARC.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Adam Moss <c@yotes.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #10983
commit 4aafab91c5033e35217209d121f4c2fb83a8f690
Author: GeLiXin <ge.lixin@zte.com.cn>
Date:   Tue Nov 1 07:04:01 2016 +0800

    Fix coverity defects: CID 147509

    CID 147509: Explicit null dereferenced
    - l2arc_sublist_lock is fragile as relied on caller too much.

    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
    Closes #5319
Actions #1

Updated by Jason King 10 months ago

  • Description updated (diff)
Actions #2

Updated by Electric Monk 6 months ago

  • Gerrit CR set to 1471
Actions #3

Updated by Jason King 6 months ago

Just to note, while the L2ARC tests are improved, they still can fail on occasion (and require re-running). I've talked to George Amanakis about this, and it just seems to be due to some of the inherent non-determinism of even a test system (due to all the other background stuff). Rather than continue to hold this up any further, since it should still be an improvement it seemed worth getting it out to review.

Actions #4

Updated by Jason King 6 months ago

For testing, I ran the zfs test suite. The only failures were known failures with existing tickets.

Actions #5

Updated by Electric Monk 5 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 9e3493cb8a0cfe96c9aef9b7da42c6c9b5c24b43

commit  9e3493cb8a0cfe96c9aef9b7da42c6c9b5c24b43
Author: Jason King <jason.brian.king@gmail.com>
Date:   2021-05-13T16:49:05.000Z

    13374 Port L2ARC Improvements from OpenZFS
    Portions contributed by: George Amanakis <gamanakis@gmail.com>
    Portions contributed by: Allan Jude <allanjude@freebsd.org>
    Portions contributed by: Alexander Motin <mav@FreeBSD.org>
    Portions contributed by: Matthew Macy <mmacy@freebsd.org>
    Portions contributed by: GeLiXin <ge.lixin@zte.com.cn>
    Portions contributed by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed by: Adam Moss <c@yotes.com>
    Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
    Reviewed by: Ryan Moeller <freqlabs@FreeBSD.org>
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Andy Fiddaman <andy@omnios.org>
    Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Actions

Also available in: Atom PDF