Bug #12036
closediscsi: zpool create pool fails to create label
100%
Description
I have 2 luns:
AVAILABLE DISK SELECTIONS: 0. c0t600144F07A6B316A00005DCFD2FD0001d0 <SUN-COMSTAR-1.0-10.00GB> /scsi_vhci/disk@g600144f07a6b316a00005dcfd2fd0001 1. c0t600144F07A6B316A00005DDD73FA0001d0 <SUN-COMSTAR-1.0-10.00GB> /scsi_vhci/disk@g600144f07a6b316a00005ddd73fa0001
Attempting to create pool:
root@test:/home/tsoome# zpool create tank1 c0t600144F07A6B316A00005DDD73FA0001d0 cannot label 'c0t600144F07A6B316A00005DDD73FA0001d0': try using fdisk(1M) and then provide a specific slice root@test:/home/tsoome#
This command did result with log spam:
Nov 26 20:54:47 test scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk@g600144f07a6b316a00005ddd73fa0001 (sd18): Nov 26 20:54:47 test SYNCHRONIZE CACHE command failed (5)
The lun is created as:
root@beastie:/code/illumos-gate# stmfadm list-lu -v LU Name: 600144F07A6B316A00005DCFD2FD0001 Operational Status: Online Provider Name : sbd Alias : /dev/zvol/dsk/rpool/iscsi/disk1 View Entry Count : 1 Data File : /dev/zvol/dsk/rpool/iscsi/disk1 Meta File : not set Size : 10737418240 Block Size : 4096 Management URL : not set Vendor ID : SUN Product ID : COMSTAR Serial Num : not set Write Protect : Disabled Writeback Cache : Disabled Access State : Active LU Name: 600144F07A6B316A00005DDD73FA0001 Operational Status: Online Provider Name : sbd Alias : /dev/zvol/dsk/rpool/iscsi/disk2 View Entry Count : 1 Data File : /dev/zvol/dsk/rpool/iscsi/disk2 Meta File : not set Size : 10737418240 Block Size : 512 Management URL : not set Vendor ID : SUN Product ID : COMSTAR Serial Num : not set Write Protect : Disabled Writeback Cache : Disabled Access State : Active root@beastie:/code/illumos-gate#
However, the label actually was created:
format> ver Volume name = < > ascii name = <SUN-COMSTAR-1.0-10.00GB> bytes/sector = 512 sectors = 20971520 accessible sectors = 20971486 Part Tag Flag First Sector Size Last Sector 0 usr wm 256 9.99GB 20955102 1 unassigned wm 0 0 0 2 unassigned wm 0 0 0 3 unassigned wm 0 0 0 4 unassigned wm 0 0 0 5 unassigned wm 0 0 0 6 unassigned wm 0 0 0 8 reserved wm 20955103 8.00MB 20971486 format>
Updated by Jerry Jelinek over 3 years ago
I wanted to see if the recent upstreaming of the Nexenta iscsi/comstar fixes was causing this problem.
I tried to reproduce this failure but so far I cannot.
I first used two VMs running SmartOS on a platform without the recent Nexenta iscsi/comstar changes. I was able to create the iscsi target using a zvol and I was able to access that target on the initiator to create a zpool. Here is the commands I ran (which will need to be modified to use the correct LUN, iqn name and IP address on each run).
iscsi target machine --------------------------- # zfs create -V 1g zones/idsk # svcadm enable system/stmf # svcadm enable network/iscsi/target # stmfadm create-lu /dev/zvol/rdsk/zones/idsk Logical unit created: 600144F0D9D1DA6400005DDD95120001 # itadm create-target Target iqn.2010-08.org.illumos:02:d8ff8cac-e751-c1ea-81a4-d6297ad4e1a9 successfully created # stmfadm add-view 600144F0D9D1DA6400005DDD95120001 iscsi initiator machine ----------------------------- # iscsiadm add static-config iqn.2010-08.org.illumos:02:d8ff8cac-e751-c1ea-81a4-d6297ad4e1a9,10.88.88.133 # iscsiadm modify discovery --static enable # iscsiadm list target -S Target: iqn.2010-08.org.illumos:02:d8ff8cac-e751-c1ea-81a4-d6297ad4e1a9 Alias: - TPGT: 1 ISID: 4000002a0000 Connections: 1 LUN: 0 Vendor: SUN Product: COMSTAR OS Device Name: /dev/rdsk/c0t600144F0D9D1DA6400005DDD95120001d0s2 # zpool create foo c0t600144F0D9D1DA6400005DDD95120001d0 # zpool list foo NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT foo 960M 140K 960M - - 0% 0% 1.00x ONLINE -
I then rebooted the initiator (still on the pre-Nexenta changes platform), rebooted the target to a post-Nexenta changes platform, deleted the zvol (to ensure there was no preexisting label), and reran the above steps. This still worked fine.
Finally, I rebooted the initiator onto a post-Nexenta changes platform, rebooted the target (still on the post-Nexenta changes platform), deleted the zvol and reran the above steps again. This still worked fine.
It is easy to confirm if the system is running the Nexenta iscsi/comstar code with the following command
# mdb -ke 'door_max_upcall_reply/D'
This will show 1MB for a platform without the Nexenta iscsi/comstar code and 4MB for a platform with the Nexenta code.
At this point I don't think there is any problem with the Nexenta iscsi/comstar changes interacting badly with zvols. I am not sure what else is different in the platform that was tested which might cause a problem.
Updated by Igor Kozhukhov over 3 years ago
try to create LUN with:
stmfadm create-lu /dev/zvol/rdsk/zones/idsk -pwcd=true
and try to check it on initiator side.
on Toomas example he has ' Writeback Cache : Disabled'
issue is: if you create LUN by older version of stmf, you have saved metadata about it to ZVOL and write back cache property can be 'disabled'. if you try to operate with older LUN by newer stmf, this property will be working as 'write protect' or something wrong and you can't update it to be 'enabled'.
Updated by Jerry Jelinek over 3 years ago
One thing I do notice in the commands I used vs. what appears to be in the commands used on the failing sequence is the following.
I have:
]# stmfadm list-lu -v LU Name: 600144F0D9D1DA6400005DDD95120001 ... Data File : /dev/zvol/rdsk/zones/idsk
That is, I configured the raw zvol.
The original bug report shows:
# stmfadm list-lu -v ... Data File : /dev/zvol/dsk/rpool/iscsi/disk1
The target is using the block device. Normally writing a label will require a raw device, so that may be the problem here. I'll run another test to see if we could previously write a label onto an iscsi target configured with a block device prior to the the recent Nexenta changes.
Updated by Jerry Jelinek over 3 years ago
I first tried defining the iscsi target using the block device, but that did not have any effect and I was still able to write the label on the initiator and create a zpool.
I then rebooted everything fresh again and tried what Igor suggested. This was the one difference in the steps I originally used
# stmfadm create-lu -p wcd=true /dev/zvol/rdsk/zones/idsk
However setting wcd=true did not make any difference and I am still able to label the disk and create the zpool on the initiator. Looking at the original lu output, it looks like that has the default "Writeback Cache : Disabled" setting. In my testing, I have tried it both ways but it doesn't cause a problem either way.
Updated by Igor Kozhukhov over 3 years ago
could you try create small ramdrive and create LUN?
ramdiskadm -a rd1g1 1g stmfadm create-lu -pwcd=true /dev/ramdisk/rd1g1 stmfadm create-view <LUNID>
in my case, LUN is working fine without '-pwcd=true'
if i use this property, i'm not able create zpool on this target on initiator side.
Updated by Toomas Soome over 3 years ago
Jerry Jelinek wrote:
I wanted to see if the recent upstreaming of the Nexenta iscsi/comstar fixes was causing this problem.
I tried to reproduce this failure but so far I cannot.
I first used two VMs running SmartOS on a platform without the recent Nexenta iscsi/comstar changes. I was able to create the iscsi target using a zvol and I was able to access that target on the initiator to create a zpool. Here is the commands I ran (which will need to be modified to use the correct LUN, iqn name and IP address on each run).
[...]
I then rebooted the initiator (still on the pre-Nexenta changes platform), rebooted the target to a post-Nexenta changes platform, deleted the zvol (to ensure there was no preexisting label), and reran the above steps. This still worked fine.Finally, I rebooted the initiator onto a post-Nexenta changes platform, rebooted the target (still on the post-Nexenta changes platform), deleted the zvol and reran the above steps again. This still worked fine.
It is easy to confirm if the system is running the Nexenta iscsi/comstar code with the following command
[...]
This will show 1MB for a platform without the Nexenta iscsi/comstar code and 4MB for a platform with the Nexenta code.At this point I don't think there is any problem with the Nexenta iscsi/comstar changes interacting badly with zvols. I am not sure what else is different in the platform that was tested which might cause a problem.
server:
$ sudo mdb -ke 'door_max_upcall_reply/D' door_max_upcall_reply: door_max_upcall_reply: 4194304
and client:
root@test:/home/tsoome# mdb -ke 'door_max_upcall_reply/D' door_max_upcall_reply: door_max_upcall_reply: 4194304 root@test:/home/tsoome#
However, with dtrace at server side:
$ sudo dtrace -n ':::sbd_flush_data_cache: {stack(); printf("value: %llx\n", arg1);}' dtrace: description ':::sbd_flush_data_cache: ' matched 2 probes CPU ID FUNCTION:NAME 3 68641 sbd_flush_data_cache:entry stmf_sbd`sbd_handle_sync_cache+0xb5 stmf_sbd`sbd_new_task+0x741 stmf`stmf_worker_task+0x370 unix`thread_start+0xb value: 0 3 68642 sbd_flush_data_cache:return stmf_sbd`sbd_handle_sync_cache+0xb5 stmf_sbd`sbd_new_task+0x741 stmf`stmf_worker_task+0x370 unix`thread_start+0xb value: 3000000700000000
and this error code is:
#define SBD_FILEIO_FAILURE (SBD_FAILURE | STMF_FSC(7))
Ok, I know what is wrong there. in sbd_flush_data_cache(), when we enter with fsync_done == 1, we can end up returning uninitialized ret. The '11820 upstream Nexenta iSCSI and COMSTAR fixes' did touch this function, and it seems the behavior of ret was not properly reviewed. While there, the variable r is also a bit dubious.
Also yes, it seems to be difference how char and block zvols are handled; I did remap my luns to rdsk, and did enable write cache and zpool create does not fail.
Updated by Jerry Jelinek over 3 years ago
I also need to add a clarification to one of my previous comments. When I said that the zpool creation worked when I used the block device on the iscsi target, I had inadvertently booted that machine back to the pre-Nexenta code. I re-ran that sequence today and made sure that the iscsi target machine was booted on the new Nexenta code and I used the block device to define that target and I am able to reproduce the failure on the initiator when I try to create the zpool. This appears to have nothing to do with the 'wcd' option but is a regression when defining a target using a block device.
This is the target configuration that fails the zfs labeling
# stmfadm list-lu -v LU Name: 600144F0D9D1DA6400005DDE711E0001 ... Data File : /dev/zvol/dsk/zones/idsk
Just to confirm everything else, I re-ran the test sequence using the raw device on the target machine and also using the 'wcd' option and I have no problem creating the zpool on that target, so it is definitely the raw vs. block device in the target definition that is the regression. However, I do also see the additional problem where the 'wcd' argument is not being applied correctly.
Updated by Jerry Jelinek over 3 years ago
- Assignee set to Jerry Jelinek
I'll take this bug and see if I can fix the two regressions.
Updated by Jerry Jelinek over 3 years ago
I was trying to track down why the labeling no longer worked with a block device and I saw a lot more STMF_SAA_WRITE_ERROR and STMF_SAA_POR errors than on the old code. After looking at this for a while, it seemed like it could all be due to the bug in sbd_flush_data_cache that Toomas found where we can return garbage on the stack.
I made just that one fix and tested on a new build and now I am able to create a zpool on a target defined on a block device. It seems likely that the error returns from sbd_flush_data_cache were causing the consistent failures here. I'm going to do further testing on this before a get a code review out.
Updated by Jerry Jelinek over 3 years ago
For testing I have created an iscsi target using both a raw and block device zvol and confirmed that creating the zpool on the initiator works for both cases.
I have also tested both 'wcd' true/false settings on both the 'stmfadm create-lu' and 'stmfadm modify-lu' commands. I have confirmed that the behavior works as it did before the new Nexenta iscsi/comstar code. That is, 'wcd' can be set either way when using a raw device, but cannot be set to false when the iscsi target is a block device.
Updated by Electric Monk over 3 years ago
- Status changed from New to Closed
- % Done changed from 90 to 100
git commit ec38b3f5fb92a39a55897ec0e3e4eeb0e84fefcc
commit ec38b3f5fb92a39a55897ec0e3e4eeb0e84fefcc Author: Jerry Jelinek <jerry.jelinek@joyent.com> Date: 2019-12-02T18:06:34.000Z 12036 iscsi: zpool create pool fails to create label Reviewed by: Rick McNeal <rick.mcneal@nexenta.com> Reviewed by: C Fraire <cfraire@me.com> Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: John Levon <john.levon@joyent.com> Reviewed by: Kody Kantor <kody.kantor@joyent.com> Approved by: Dan McDonald <danmcd@joyent.com>