Feature #848


Want ZFS support for caching any writes to separate media or dataset

Added by Jim Klimov about 12 years ago.

Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:
External Bug:


I have a home NAS with limited hardware resources, and in particular the CPU+RAM have proved to be a bottleneck for dedup with compression.

I can send files into the top-level (uncompressed, undeduped) raidz2 pool at 25 to 50 Mb/sec (locally and over CIFS share from my old computers), but writes to the deduped+compressed pool happen at 1 to 5Mb/sec.

I do see space-saving benefits from the CPU intensive technologies, and I want to use them, but I do also want to use (fill up) this machine in more realistic time-frames.

Since much of home-NAS usage is bursty (i.e. I upload 8Gb of automated smartphone backups and then the machine is idling) I have to quickly upload my data to one "quick" dataset and then move it over into the dedup+compressed dataset.

This is tedious and not transparent to applications (i.e. I move a file from one dir to another).

I propose that ZFS allows caching such writes for post-processing to a faster reliable media chosen by the user, such as a dedicated dataset on the same pool, separate partition (disk, pool) - idling boot drives, USB stick, SSD/DDR disk, etc.

As a result the (file) data should be written and "committed" to the pool as quickly as the fast-write media allows (until it fills up), and then the blocks would be processed and moved to final storage. Any "overflowing" blocks which don't fit in the write cache would have to stream into the final storage dataset as slowly as it writes.

This should be transparent to the user for subsequent reads which were not yet finally committed to "slow" storage, much like the RAM cache for async writes works now.

Possible configuration might be like this:

  1. zfs set writecache=/dev/dsk/c4t0d0p3 pool/compdedup/DUMP
  2. zfs set writecache=pool/fastwrite pool/compdedup/DUMP
  3. zfs set writecache=fastpool/fastwrite pool/compdedup/DUMP

I guess it is okay to require a separate pool if the user wants to mirror "fastwrite" cache devices for reliability.

It may also be useful to allow specification of the writecache device's minimal block size (i.e. 4kb for newer HDDs and many SSDs) in another attribute, like this:

  1. zfs set writecacheblocksize=4k pool/compdedup/DUMP

Another alternative is configuration like current log/cache devices for the pool (and writecache being like the current per-dataset primarycache/secondarycache attributes), but this would require that the cache device is dedicated to only one pool - and I think this would be a drawback.

I myself have several pools on one system and would like to speed them up by the same shared and large "fast-write media" cache.

Alternatively, though, it might make sense (in the future) to allow the L2ARC device also serve as a fast-write cache - if the user chooses to do so. But for many users like myself, caching in a dedicated dataset of the same pool or on a different device would suffice.

//Jim Klimov

PS: Apparently, a bit more love and this would pave the way into HSM systems with differently-tiered devices ;)

No data to display


Also available in: Atom PDF