Bug #5273
openIncremental zfs send unmounts underlying snapshots, causing application which opens files from the snapshots to return I/O error.
0%
Description
This issue is probably introduced by the fix for issue 3740 (https://www.illumos.org/issues/3740). Here is how to reproduce: 1. Suppose there's a zpool named zp01. In terminal windows A, run the following commands: zfs create -o mountpoint=/zp01/src zp01/src echo abcdefghijklmnopqrstuvwxyz > /zp01/src/file zfs snapshot zp01/src@snap01 zfs snapshot zp01/src@snap02 zfs snapshot zp01/src@snap03 zfs snapshot zp01/src@snap04 ls -l /zp01/src/.zfs/snapshot/* > /dev/null zfs create zp01/dst 2. In terminal window B, run the following dtrace command: dtrace -n 'fbt:zfs:zfsvfs_teardown:entry /strstr(stringof(((struct zfsvfs *)arg0)->z_vfs->vfs_mntpt->rs_string), "snapshot") != 0/ { printf("pid %d, execname %s, unmounting %d, %s, %Y", pid, execname != 0 ? execname : "NULL", arg1, stringof(((struct zfsvfs *)arg0)->z_vfs->vfs_mntpt->rs_string), walltimestamp ); stack(); ustack(); }' 3. In terminal window C, compile the attached C file, and then run it to open and read /zp01/src/.zfs/snapshot/snap02/file -> cat read-file.c #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> int main(int argc, char *argv[]) { char *filename = NULL; int fd; int ret; char buf[16]; int seconds = 300; if(argc >= 2) { filename = argv[1]; if(argc >= 3) seconds = atoi(argv[2]); } else { printf("%s <file name>\\n", argv[0]); exit(1); } fd = open(filename, O_RDONLY); if(fd == -1) { perror("open"); exit(1); } while(seconds--) { ret = pread(fd, buf, 16, 0); if(ret == -1) { perror("pread"); break; } sleep(1); } ret = close(fd); if(ret == -1) { perror("close"); exit(1); } return(0); } -> gcc -o read-file read-file.c -> ./read-file /zp01/src/.zfs/snapshot/snap02/file 4. Go to terminal window A, run the following commands: -> zfs send zp01/src@snap01 | zfs recv -e zp01/dst -> zfs send -I zp01/src@snap01 zp01/src@snap04 | zfs recv -e -F zp01/dst You will see the following error message in terminal window C: pread: I/O error close: I/O error and in terminal window B, you will see something like the following: CPU ID FUNCTION:NAME 5 49994 zfsvfs_teardown:entry pid 6058, execname zfs, unmounting 1, /zp01/src/.zfs/snapshot/snap01, 2014 Oct 28 23:51:14 zfs`zfs_umount+0xfc genunix`fsop_unmount+0x1b genunix`dounmount+0x57 zfs`zfs_unmount_snap+0x63 zfs`dsl_dataset_user_release_impl+0xc3 zfs`dsl_dataset_user_release_tmp+0x1d zfs`dsl_dataset_user_release_onexit+0x8b zfs`zfs_onexit_destroy+0x43 zfs`zfs_ctldev_destroy+0x18 zfs`zfsdev_close+0x89 genunix`dev_close+0x31 specfs`device_close+0xd8 specfs`spec_close+0x17b genunix`fop_close+0x61 genunix`closef+0x5e genunix`closeandsetf+0x398 genunix`close+0x13 unix`_sys_sysenter_post_swapgs+0x149 libc.so.1`__close+0x15 libzfs.so.1`zfs_send+0x96a zfs`zfs_do_send+0x598 zfs`main+0x22c zfs`_start+0x83 5 49994 zfsvfs_teardown:entry pid 6058, execname zfs, unmounting 1, /zp01/src/.zfs/snapshot/snap02, 2014 Oct 28 23:51:14 zfs`zfs_umount+0xfc genunix`fsop_unmount+0x1b genunix`dounmount+0x57 zfs`zfs_unmount_snap+0x63 zfs`dsl_dataset_user_release_impl+0xc3 zfs`dsl_dataset_user_release_tmp+0x1d zfs`dsl_dataset_user_release_onexit+0x8b zfs`zfs_onexit_destroy+0x43 zfs`zfs_ctldev_destroy+0x18 zfs`zfsdev_close+0x89 genunix`dev_close+0x31 specfs`device_close+0xd8 specfs`spec_close+0x17b genunix`fop_close+0x61 genunix`closef+0x5e genunix`closeandsetf+0x398 genunix`close+0x13 unix`_sys_sysenter_post_swapgs+0x149 libc.so.1`__close+0x15 libzfs.so.1`zfs_send+0x96a zfs`zfs_do_send+0x598 zfs`main+0x22c zfs`_start+0x83 5 49994 zfsvfs_teardown:entry pid 6058, execname zfs, unmounting 1, /zp01/src/.zfs/snapshot/snap03, 2014 Oct 28 23:51:14 zfs`zfs_umount+0xfc genunix`fsop_unmount+0x1b genunix`dounmount+0x57 zfs`zfs_unmount_snap+0x63 zfs`dsl_dataset_user_release_impl+0xc3 zfs`dsl_dataset_user_release_tmp+0x1d zfs`dsl_dataset_user_release_onexit+0x8b zfs`zfs_onexit_destroy+0x43 zfs`zfs_ctldev_destroy+0x18 zfs`zfsdev_close+0x89 genunix`dev_close+0x31 specfs`device_close+0xd8 specfs`spec_close+0x17b genunix`fop_close+0x61 genunix`closef+0x5e genunix`closeandsetf+0x398 genunix`close+0x13 unix`_sys_sysenter_post_swapgs+0x149 libc.so.1`__close+0x15 libzfs.so.1`zfs_send+0x96a zfs`zfs_do_send+0x598 zfs`main+0x22c zfs`_start+0x83 5 49994 zfsvfs_teardown:entry pid 6058, execname zfs, unmounting 1, /zp01/src/.zfs/snapshot/snap04, 2014 Oct 28 23:51:14 zfs`zfs_umount+0xfc genunix`fsop_unmount+0x1b genunix`dounmount+0x57 zfs`zfs_unmount_snap+0x63 zfs`dsl_dataset_user_release_impl+0xc3 zfs`dsl_dataset_user_release_tmp+0x1d zfs`dsl_dataset_user_release_onexit+0x8b zfs`zfs_onexit_destroy+0x43 zfs`zfs_ctldev_destroy+0x18 zfs`zfsdev_close+0x89 genunix`dev_close+0x31 specfs`device_close+0xd8 specfs`spec_close+0x17b genunix`fop_close+0x61 genunix`closef+0x5e genunix`closeandsetf+0x398 genunix`close+0x13 unix`_sys_sysenter_post_swapgs+0x149 libc.so.1`__close+0x15 libzfs.so.1`zfs_send+0x96a zfs`zfs_do_send+0x598 zfs`main+0x22c zfs`_start+0x83
Files
Updated by Jan Schlien almost 7 years ago
Any progress on this?
I'm experiencing the same problem. My application traverses a snapshot and suffers from EIO errors. Shouldn't an open file within the snapshot prevent unmounting (EBUSY)? Why does zfsctl_snapdir_lookup() do VFS_RELE()? I understand the nfs related comment on snapdirs, but that does not imply the kernel cannot hold a reference on the snapshot.
As a workaround, is it possible to manually hold and release a reference like VFS_HOLD/VFS_RELE do from user mode for the duration of a custom application's snapshot operations?
Being able to remount the snapshot right away is no solution, it would be quite a mess to deal with intermittent EIOs in all applications just because zfs decides to force-umount snapshots once in a while.
Updated by Matthew Ahrens over 6 years ago
In this use case, the unmounting is not really necessary. The issue here is:
- "zfs send -I" (and "zfs send -R") put holds (a la "zfs hold") on the snapshots that they send
- they also release the holds. When the hold is released, we unmount the snapshot because we might need to destroy the snapshot (if it was marked for defer destroy, and this is the last hold).
As you can see in the stack trace, the unmount happens in dsl_dataset_user_release_impl(). We do the unmount even when it isn't necessary (e.g. when the snapshot is not marked for defer destroy) because we don't have locking to ensure that it isn't marked defer destroy after we make a check and determine that we don't need to unmount it.
It might be possible to add some sort of locking between checking if we should unmount, and doing the defer destroy. But there's no obvious way to do this today. (It would be great if we could move the unmount into syncing context where we hold the dp_config_rwlock, but that doesn't work today.)
Another "fix" would be to accept the racy-ness and do the unmount only when we think we need to destroy the snapshot, and if we were wrong, then the release will fail with EBUSY.