Project

General

Profile

Bug #12036

iscsi: zpool create pool fails to create label

Added by Toomas Soome 11 months ago. Updated 11 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
comstar - iSCSI/FC/SAS target
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

I have 2 luns:

AVAILABLE DISK SELECTIONS:
       0. c0t600144F07A6B316A00005DCFD2FD0001d0 <SUN-COMSTAR-1.0-10.00GB>
          /scsi_vhci/disk@g600144f07a6b316a00005dcfd2fd0001
       1. c0t600144F07A6B316A00005DDD73FA0001d0 <SUN-COMSTAR-1.0-10.00GB>
          /scsi_vhci/disk@g600144f07a6b316a00005ddd73fa0001

Attempting to create pool:

root@test:/home/tsoome# zpool create tank1 c0t600144F07A6B316A00005DDD73FA0001d0
cannot label 'c0t600144F07A6B316A00005DDD73FA0001d0': try using fdisk(1M) and then provide a specific slice
root@test:/home/tsoome# 

This command did result with log spam:

Nov 26 20:54:47 test scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk@g600144f07a6b316a00005ddd73fa0001 (sd18):
Nov 26 20:54:47 test    SYNCHRONIZE CACHE command failed (5)

The lun is created as:

root@beastie:/code/illumos-gate# stmfadm list-lu -v
LU Name: 600144F07A6B316A00005DCFD2FD0001
    Operational Status: Online
    Provider Name     : sbd
    Alias             : /dev/zvol/dsk/rpool/iscsi/disk1
    View Entry Count  : 1
    Data File         : /dev/zvol/dsk/rpool/iscsi/disk1
    Meta File         : not set
    Size              : 10737418240
    Block Size        : 4096
    Management URL    : not set
    Vendor ID         : SUN     
    Product ID        : COMSTAR         
    Serial Num        : not set
    Write Protect     : Disabled
    Writeback Cache   : Disabled
    Access State      : Active
LU Name: 600144F07A6B316A00005DDD73FA0001
    Operational Status: Online
    Provider Name     : sbd
    Alias             : /dev/zvol/dsk/rpool/iscsi/disk2
    View Entry Count  : 1
    Data File         : /dev/zvol/dsk/rpool/iscsi/disk2
    Meta File         : not set
    Size              : 10737418240
    Block Size        : 512
    Management URL    : not set
    Vendor ID         : SUN     
    Product ID        : COMSTAR         
    Serial Num        : not set
    Write Protect     : Disabled
    Writeback Cache   : Disabled
    Access State      : Active
root@beastie:/code/illumos-gate# 

However, the label actually was created:

format> ver

Volume name = <        >
ascii name  = <SUN-COMSTAR-1.0-10.00GB>
bytes/sector    =  512
sectors = 20971520
accessible sectors = 20971486
Part      Tag    Flag     First Sector        Size        Last Sector
  0        usr    wm               256       9.99GB         20955102    
  1 unassigned    wm                 0          0              0    
  2 unassigned    wm                 0          0              0    
  3 unassigned    wm                 0          0              0    
  4 unassigned    wm                 0          0              0    
  5 unassigned    wm                 0          0              0    
  6 unassigned    wm                 0          0              0    
  8   reserved    wm          20955103       8.00MB         20971486    

format> 

#1

Updated by Toomas Soome 11 months ago

  • Description updated (diff)
#2

Updated by Jerry Jelinek 11 months ago

I wanted to see if the recent upstreaming of the Nexenta iscsi/comstar fixes was causing this problem.

I tried to reproduce this failure but so far I cannot.

I first used two VMs running SmartOS on a platform without the recent Nexenta iscsi/comstar changes. I was able to create the iscsi target using a zvol and I was able to access that target on the initiator to create a zpool. Here is the commands I ran (which will need to be modified to use the correct LUN, iqn name and IP address on each run).

iscsi target machine
---------------------------
# zfs create -V 1g zones/idsk
# svcadm enable system/stmf
# svcadm enable network/iscsi/target
# stmfadm create-lu /dev/zvol/rdsk/zones/idsk
Logical unit created: 600144F0D9D1DA6400005DDD95120001
# itadm create-target
Target iqn.2010-08.org.illumos:02:d8ff8cac-e751-c1ea-81a4-d6297ad4e1a9 successfully created
# stmfadm add-view 600144F0D9D1DA6400005DDD95120001

iscsi initiator machine
-----------------------------
# iscsiadm add static-config iqn.2010-08.org.illumos:02:d8ff8cac-e751-c1ea-81a4-d6297ad4e1a9,10.88.88.133
# iscsiadm modify discovery --static enable
# iscsiadm list target -S
Target: iqn.2010-08.org.illumos:02:d8ff8cac-e751-c1ea-81a4-d6297ad4e1a9
        Alias: -
        TPGT: 1
        ISID: 4000002a0000
        Connections: 1
        LUN: 0
             Vendor:  SUN     
             Product: COMSTAR         
             OS Device Name: /dev/rdsk/c0t600144F0D9D1DA6400005DDD95120001d0s2

# zpool create foo c0t600144F0D9D1DA6400005DDD95120001d0
# zpool list foo
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
foo    960M   140K   960M        -         -     0%     0%  1.00x  ONLINE  -

I then rebooted the initiator (still on the pre-Nexenta changes platform), rebooted the target to a post-Nexenta changes platform, deleted the zvol (to ensure there was no preexisting label), and reran the above steps. This still worked fine.

Finally, I rebooted the initiator onto a post-Nexenta changes platform, rebooted the target (still on the post-Nexenta changes platform), deleted the zvol and reran the above steps again. This still worked fine.

It is easy to confirm if the system is running the Nexenta iscsi/comstar code with the following command

# mdb -ke 'door_max_upcall_reply/D'

This will show 1MB for a platform without the Nexenta iscsi/comstar code and 4MB for a platform with the Nexenta code.

At this point I don't think there is any problem with the Nexenta iscsi/comstar changes interacting badly with zvols. I am not sure what else is different in the platform that was tested which might cause a problem.

#3

Updated by Igor Kozhukhov 11 months ago

try to create LUN with:
stmfadm create-lu /dev/zvol/rdsk/zones/idsk -pwcd=true

and try to check it on initiator side.

on Toomas example he has ' Writeback Cache : Disabled'

issue is: if you create LUN by older version of stmf, you have saved metadata about it to ZVOL and write back cache property can be 'disabled'. if you try to operate with older LUN by newer stmf, this property will be working as 'write protect' or something wrong and you can't update it to be 'enabled'.

#4

Updated by Jerry Jelinek 11 months ago

One thing I do notice in the commands I used vs. what appears to be in the commands used on the failing sequence is the following.

I have:

]# stmfadm list-lu -v
LU Name: 600144F0D9D1DA6400005DDD95120001
...
    Data File         : /dev/zvol/rdsk/zones/idsk

That is, I configured the raw zvol.

The original bug report shows:

# stmfadm list-lu -v
...
    Data File         : /dev/zvol/dsk/rpool/iscsi/disk1

The target is using the block device. Normally writing a label will require a raw device, so that may be the problem here. I'll run another test to see if we could previously write a label onto an iscsi target configured with a block device prior to the the recent Nexenta changes.

#5

Updated by Jerry Jelinek 11 months ago

I first tried defining the iscsi target using the block device, but that did not have any effect and I was still able to write the label on the initiator and create a zpool.

I then rebooted everything fresh again and tried what Igor suggested. This was the one difference in the steps I originally used

# stmfadm create-lu -p wcd=true /dev/zvol/rdsk/zones/idsk

However setting wcd=true did not make any difference and I am still able to label the disk and create the zpool on the initiator. Looking at the original lu output, it looks like that has the default "Writeback Cache : Disabled" setting. In my testing, I have tried it both ways but it doesn't cause a problem either way.

#6

Updated by Igor Kozhukhov 11 months ago

could you try create small ramdrive and create LUN?

ramdiskadm -a rd1g1 1g
stmfadm create-lu  -pwcd=true /dev/ramdisk/rd1g1
stmfadm create-view <LUNID>

in my case, LUN is working fine without '-pwcd=true'
if i use this property, i'm not able create zpool on this target on initiator side.

#7

Updated by Toomas Soome 11 months ago

Jerry Jelinek wrote:

I wanted to see if the recent upstreaming of the Nexenta iscsi/comstar fixes was causing this problem.

I tried to reproduce this failure but so far I cannot.

I first used two VMs running SmartOS on a platform without the recent Nexenta iscsi/comstar changes. I was able to create the iscsi target using a zvol and I was able to access that target on the initiator to create a zpool. Here is the commands I ran (which will need to be modified to use the correct LUN, iqn name and IP address on each run).

[...]
I then rebooted the initiator (still on the pre-Nexenta changes platform), rebooted the target to a post-Nexenta changes platform, deleted the zvol (to ensure there was no preexisting label), and reran the above steps. This still worked fine.

Finally, I rebooted the initiator onto a post-Nexenta changes platform, rebooted the target (still on the post-Nexenta changes platform), deleted the zvol and reran the above steps again. This still worked fine.

It is easy to confirm if the system is running the Nexenta iscsi/comstar code with the following command
[...]
This will show 1MB for a platform without the Nexenta iscsi/comstar code and 4MB for a platform with the Nexenta code.

At this point I don't think there is any problem with the Nexenta iscsi/comstar changes interacting badly with zvols. I am not sure what else is different in the platform that was tested which might cause a problem.

server:

$ sudo mdb -ke 'door_max_upcall_reply/D'
door_max_upcall_reply:
door_max_upcall_reply:          4194304         

and client:

root@test:/home/tsoome# mdb -ke 'door_max_upcall_reply/D'
door_max_upcall_reply:
door_max_upcall_reply:          4194304         
root@test:/home/tsoome# 

However, with dtrace at server side:

$ sudo dtrace -n ':::sbd_flush_data_cache: {stack(); printf("value: %llx\n", arg1);}'
dtrace: description ':::sbd_flush_data_cache: ' matched 2 probes
CPU     ID                    FUNCTION:NAME
  3  68641       sbd_flush_data_cache:entry 
              stmf_sbd`sbd_handle_sync_cache+0xb5
              stmf_sbd`sbd_new_task+0x741
              stmf`stmf_worker_task+0x370
              unix`thread_start+0xb
value: 0

  3  68642      sbd_flush_data_cache:return 
              stmf_sbd`sbd_handle_sync_cache+0xb5
              stmf_sbd`sbd_new_task+0x741
              stmf`stmf_worker_task+0x370
              unix`thread_start+0xb
value: 3000000700000000

and this error code is:

#define    SBD_FILEIO_FAILURE    (SBD_FAILURE | STMF_FSC(7))

Ok, I know what is wrong there. in sbd_flush_data_cache(), when we enter with fsync_done == 1, we can end up returning uninitialized ret. The '11820 upstream Nexenta iSCSI and COMSTAR fixes' did touch this function, and it seems the behavior of ret was not properly reviewed. While there, the variable r is also a bit dubious.

Also yes, it seems to be difference how char and block zvols are handled; I did remap my luns to rdsk, and did enable write cache and zpool create does not fail.

#8

Updated by Jerry Jelinek 11 months ago

I also need to add a clarification to one of my previous comments. When I said that the zpool creation worked when I used the block device on the iscsi target, I had inadvertently booted that machine back to the pre-Nexenta code. I re-ran that sequence today and made sure that the iscsi target machine was booted on the new Nexenta code and I used the block device to define that target and I am able to reproduce the failure on the initiator when I try to create the zpool. This appears to have nothing to do with the 'wcd' option but is a regression when defining a target using a block device.

This is the target configuration that fails the zfs labeling

# stmfadm list-lu -v
LU Name: 600144F0D9D1DA6400005DDE711E0001
 ...
    Data File         : /dev/zvol/dsk/zones/idsk

Just to confirm everything else, I re-ran the test sequence using the raw device on the target machine and also using the 'wcd' option and I have no problem creating the zpool on that target, so it is definitely the raw vs. block device in the target definition that is the regression. However, I do also see the additional problem where the 'wcd' argument is not being applied correctly.

#9

Updated by Jerry Jelinek 11 months ago

  • Assignee set to Jerry Jelinek

I'll take this bug and see if I can fix the two regressions.

#10

Updated by Jerry Jelinek 11 months ago

I was trying to track down why the labeling no longer worked with a block device and I saw a lot more STMF_SAA_WRITE_ERROR and STMF_SAA_POR errors than on the old code. After looking at this for a while, it seemed like it could all be due to the bug in sbd_flush_data_cache that Toomas found where we can return garbage on the stack.

I made just that one fix and tested on a new build and now I am able to create a zpool on a target defined on a block device. It seems likely that the error returns from sbd_flush_data_cache were causing the consistent failures here. I'm going to do further testing on this before a get a code review out.

#11

Updated by Jerry Jelinek 11 months ago

For testing I have created an iscsi target using both a raw and block device zvol and confirmed that creating the zpool on the initiator works for both cases.

I have also tested both 'wcd' true/false settings on both the 'stmfadm create-lu' and 'stmfadm modify-lu' commands. I have confirmed that the behavior works as it did before the new Nexenta iscsi/comstar code. That is, 'wcd' can be set either way when using a raw device, but cannot be set to false when the iscsi target is a block device.

#12

Updated by Jerry Jelinek 11 months ago

  • % Done changed from 0 to 90
#13

Updated by Electric Monk 11 months ago

  • Status changed from New to Closed
  • % Done changed from 90 to 100

git commit ec38b3f5fb92a39a55897ec0e3e4eeb0e84fefcc

commit  ec38b3f5fb92a39a55897ec0e3e4eeb0e84fefcc
Author: Jerry Jelinek <jerry.jelinek@joyent.com>
Date:   2019-12-02T18:06:34.000Z

    12036 iscsi: zpool create pool fails to create label
    Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
    Reviewed by: C Fraire <cfraire@me.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: John Levon <john.levon@joyent.com>
    Reviewed by: Kody Kantor <kody.kantor@joyent.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF