Fix for 6111 breaks recursive send when snapshot intentionally deleted.
I posted this as a comment to #6111 some time ago but it doesn't appear to have been noticed.
I have just discovered this change when I updated to FreeBSD 11, and I have to say that it breaks behaviour I relied upon previously, namely that datasets without the `snap2` snapshot would be skipped (with a warning) but the send would be completed. An example of this behaviour is provided in the official Oracle documentation at https://docs.oracle.com/cd/E26502_01/html/E29007/recover-3.html (see step 3).
The new behaviour created by this change now aborts the send in its entirety, whereas previously the send would complete with a non-zero return code.
The issue that was described in that bug report I would argue was not even a bug, but correct behaviour in that the user was warned that a filesystem could not be sent with the requested snapshot range because the snapshots don't exist. Hiding that warning (and cleaning the return code) if the filesystem didn't exist at the time the snapshot was created could be considered a convenience but it also now hides information that might be relevant.
Worse is that the change now makes legal behaviour abort. That is, previously you could snapshot a system, delete the snapshots on filesystems you didn't want replicated, and the system would send the stream without those filesystems with just a warning. Please see the referenced Oracle documentation for the use case.
I believe that #6111 should be reverted.
Updated by Duane Leslie over 2 years ago
Further experimentation with the issue described in #6111 makes the use of the creation time of the snapshot appear to be a misleading and irrelevant distinction to make, given that I can make the snapshots at any time and it is just the names that are used to determine which data is sent. Consider the example provided for #6111:
root@zfs:/# zfs create pool/test root@zfs:/# zfs snapshot pool/test@snap1 root@zfs:/# zfs snapshot pool/test@snap2 root@zfs:/# zfs create pool/test/child root@zfs:/# zfs send -R -I pool/test@snap1 pool/test@snap2 > /dev/null WARNING: could not send pool/test/child@snap2: does not exist WARNING: could not send pool/test/child@snap2: does not exist
Manually creating the snapshot between creating the filesystem and the send call gives the following:
root@zfs:/# zfs create pool/test root@zfs:/# zfs snapshot pool/test@snap1 root@zfs:/# zfs snapshot pool/test@snap2 root@zfs:/# zfs create pool/test/child root@zfs:/# zfs snapshot pool/test/child@snap2 root@zfs:/# zfs send -R -I pool/test@snap1 pool/test@snap2 > /dev/null root@zfs:/# echo $? 0
This makes the process succeed without an error, so in fact the creation time of the snapshot on the parent filesystem is completely irrelevant, and it is a mistake to assume that it means something.