Project

General

Profile

Bug #4588

zfs send blocked until device of zpool is taken offline

Added by Franz Schober over 6 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
2014-02-10
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

A zfs send and a FC device are blocking while using a COMSTAR target provided by the same host.

For building a synchronous mirror on iSCSI/FC devices we import a FC device from a remote system and a second device provided by the same same system
as described in #4454.

  pool: nordp
 state: ONLINE
  scan: resilvered 1,50K in 0h0m with 0 errors on Fri Feb  7 22:04:18 2014
config:

        NAME                                       STATE     READ WRITE CKSUM
        nordp                                      ONLINE       0     0     0
          mirror-0                                 ONLINE       0     0     0
            c0t600144F09082C029000052CC68420010d0  ONLINE       0     0     0
            c0t600144F09082C021000052CC68740004d0  ONLINE       0     0     0

stmfadm list-lu -v
LU Name: 600144F09082C029000052CC68420010
    Operational Status: Online
    Provider Name     : sbd
    Alias             : /dev/zvol/rdsk/norddisk/nord1_nordp_vol1
    View Entry Count  : 1
    Data File         : /dev/zvol/rdsk/norddisk/nord1_nordp_vol1
    Meta File         : not set
    Size              : 34634616274944
    Block Size        : 512
    Management URL    : not set
    Vendor ID         : SUN
    Product ID        : COMSTAR
    Serial Num        : not set
    Write Protect     : Disabled
    Writeback Cache   : Enabled
    Access State      : Active
....

zfs list

norddisk                   63,4T   794G  63,9K  /norddisk
....
norddisk/nord1_nordp_vol1  31,6T  31,1T  1,23T  -
....

When doing a zfs send from the zfs dataset of the pool to another backup system, the zfs send blocks for a while after sending most of the data until zfs
is going to take the device c0t600144F09082C029000052CC68420010d0 offline from the pool nordp.

dmesg:
Feb  8 09:34:13 00-25-90-82-c0-0c scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0):
Feb  8 09:34:13 00-25-90-82-c0-0c       /scsi_vhci/disk@g600144f09082c021000052cc68740004 (sd2): Command Timeout on path fp4/disk@w10000090fa454d51,0
Feb  8 09:34:17 00-25-90-82-c0-0c scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0):
Feb  8 09:34:17 00-25-90-82-c0-0c       /scsi_vhci/disk@g600144f09082c021000052cc68740004 (sd2): Command Timeout on path fp0/disk@w10000090fa454abd,0
Feb  8 09:34:42 00-25-90-82-c0-0c scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk@g600144f09082c021000052cc68740004 (sd2):
Feb  8 09:34:42 00-25-90-82-c0-0c       SYNCHRONIZE CACHE command failed (5)
Feb  8 09:35:10 00-25-90-82-c0-0c scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk@g600144f09082c021000052cc68740004 (sd2):
Feb  8 09:35:10 00-25-90-82-c0-0c       SYNCHRONIZE CACHE command failed (5)

Stack of the blocked zfs send command:

[ ffffff01e82d97c0 _resume_from_idle+0xf4() ]
  ffffff01e82d97f0 swtch+0x141()
  ffffff01e82d9830 cv_wait+0x70(ffffff43280adbba, ffffff43280adb78)
  ffffff01e82d9880 txg_wait_synced+0x83(ffffff43280ada00, 7d1a3)
  ffffff01e82d9990 dsl_sync_task+0x187(ffffff43a3395080, fffffffff7e8af10, fffffffff7e8b180, ffffff01e82d9ab0, 10)
  ffffff01e82d9b30 dsl_dataset_user_release_impl+0x1b2(ffffff445d79b6c0, 0, ffffff43280ada00)
  ffffff01e82d9b50 dsl_dataset_user_release_tmp+0x1d(ffffff43280ada00, ffffff445d79b6c0)
  ffffff01e82d9b90 dsl_dataset_user_release_onexit+0x8b(ffffff43859d77c0)
  ffffff01e82d9bd0 zfs_onexit_destroy+0x43(ffffff43d76c36f8)
  ffffff01e82d9c00 zfs_ctldev_destroy+0x18(ffffff43d76c36f8, 5c)
  ffffff01e82d9c60 zfsdev_close+0x89(5a0000005c, 403, 2, ffffff43b2f9fd80)
  ffffff01e82d9c90 dev_close+0x31(5a0000005c, 403, 2, ffffff43b2f9fd80)
  ffffff01e82d9ce0 device_close+0xd8(ffffff449b978b40, 403, ffffff43b2f9fd80)
  ffffff01e82d9d70 spec_close+0x17b(ffffff449b978b40, 403, 1, 0, ffffff43b2f9fd80, 0)
  ffffff01e82d9df0 fop_close+0x61(ffffff449b978b40, 403, 1, 0, ffffff43b2f9fd80, 0)
  ffffff01e82d9e30 closef+0x5e(ffffff44951c00c8)
  ffffff01e82d9ea0 closeandsetf+0x398(e, 0)
  ffffff01e82d9ec0 close+0x13(e)
  ffffff01e82d9f10 _sys_sysenter_post_swapgs+0x149()

After the device is taken offline in the pool nordp, the zfs send completes, but the mirror is broken and has to be resilvered.
Might that issue be related to changes in #3740, #4322 ?

Also available in: Atom PDF