Project

General

Profile

Feature #3525

Persistent L2ARC

Added by Sašo Kiselkov almost 7 years ago. Updated about 4 years ago.

Status:
Feedback
Priority:
Normal
Category:
zfs - Zettabyte File System
Start date:
2013-02-04
Due date:
% Done:

90%

Estimated time:
Difficulty:
Hard
Tags:
needs-triage

Description

This feature implements a light-weight persistent L2ARC metadata structure that allows L2ARC contents to be recovered after a reboot or L2ARC device remove/add. This significantly eases the impact a reboot has on read performance on systems with large caches.

Webrev: http://cr.illumos.org/~webrev/skiselkov/l2arc_persist

The implementation is essentially feature-complete, and should only need some more tuning and testing to make sure it performs adequately.


Related issues

Related to illumos gate - Bug #7341: Pool halt due to checksum error after rebootNew2016-08-30

Actions
Related to illumos gate - Bug #833: ZPOOL cache device is emptied after reboootNew2011-03-18

Actions

History

#1

Updated by Sašo Kiselkov almost 7 years ago

  • Status changed from New to Feedback
  • % Done changed from 80 to 90

Implementation is now feature-complete, new webrev at:
http://cr.illumos.org/~webrev/skiselkov/3525/

#2

Updated by Sašo Kiselkov almost 7 years ago

I've written a wiki post about the design at: http://wiki.illumos.org/display/illumos/Persistent+L2ARC

#3

Updated by Eric Smith over 6 years ago

Perhaps use SHA256 rather than Fletcher, or offer it as an option when an L2ARC device is created? SHA256 has instruction set acceleration in upcoming processors such as ARMv7 (Aarch64), and is likely to show up in future x86 processors. As far as I can tell, hardware accelerated Flatcher-4 support isn't likely to show up in any processor architectures any time soon, though admittedly it is less needed.

#4

Updated by Sašo Kiselkov over 6 years ago

On a modern CPU fletcher-4 achieves in excess of 4GB/s/core in hashing performance, whereas SHA-256 (in hand-written assembly) manages less than 1/10th of that. Unless the performance gain from hardware acceleration is likely to far exceed 10x, I see no reason to introduce this.
Also, in this patch the checksum is used simply as a data consistency check, not as a security mechanism or de-duplication control. Another thing to note is that L2ARC writes are few and far between (and happen asynchronously anyway). Even if the checksum used was slower than optimum (and it's not), the performance impact would likely be negligible and probably wouldn't outweigh introducing further fragility through obscure tunables.

#5

Updated by Sašo Kiselkov about 6 years ago

New version up for review: http://cr.illumos.org/~webrev/skiselkov/3525_simplified
Simplified, faster, cleaner.

#6

Updated by Yuxuan Shui about 6 years ago

arc.c line 5849,
bzero(&dev->l2ad_ublk, sizeof (&dev->l2ad_ublk));

Shouldn't that '&' be removed?

#7

Updated by Arne Jansen about 4 years ago

This is a really interesting feature which seems to be nearly done. Is it still being worked on? If not, are the any fundamental issues with the design?

#8

Updated by Sašo Kiselkov about 4 years ago

Arne Jansen wrote:

This is a really interesting feature which seems to be nearly done. Is it still being worked on? If not, are the any fundamental issues with the design?

We have this working in the upcoming Nexenta 5.0 release, which should be available "soon" (can't give any more precise info, since I don't have it either). Together with that, I'll be working on upstreaming this into Illumos.

#9

Updated by Vitaliy Gusev about 4 years ago

Sašo Kiselkov wrote:

We have this working in the upcoming Nexenta 5.0 release, which should be available "soon" (can't give any more precise info, since I don't have it either). Together with that, I'll be working on upstreaming this into Illumos.

Just suggestion: Make this functionality as separate l2arc.c file. Original arc.c seems overcrowded.

#10

Updated by Sašo Kiselkov about 4 years ago

Vitaliy Gusev wrote:

Just suggestion: Make this functionality as separate l2arc.c file. Original arc.c seems overcrowded.

It's a little late for that. IMO L2ARC needs to be either rewritten from scratch, or dropped entirely and the idea reimplemented by other means.

#11

Updated by Sašo Kiselkov about 4 years ago

Finalized code review at: https://reviews.csiden.org/r/267/

#12

Updated by Arne Jansen about 3 years ago

  • Related to Bug #7341: Pool halt due to checksum error after reboot added
#13

Updated by Gergő Mihály Doma about 2 years ago

  • Related to Bug #833: ZPOOL cache device is emptied after rebooot added

Also available in: Atom PDF