Project

General

Profile

Bug #3621

ZFS LU stuck in the offlining state

Added by Alexander Shvayakov over 6 years ago. Updated over 4 years ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Category:
comstar - iSCSI/FC/SAS target
Start date:
2013-03-12
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Hi,

Houston, we have a problem - ZFS Logical Unit stuck in the offlining state with COMSTAR.

Mandatory condition to reproduce the problem - the presence of the I/O load on Logical Unit


stmfadm create-lu /dev/zvol/rdsk/GIGO/zvol200g
stmfadm add-view 600144F054A281000000513DB9770005

I mounted iscsi device on the initiator and began to upload with dd

At this point, you can execute commands:


stmfadm delete-lu -k 600144F054A281000000513DB9770005

or

stmfadm offline-lu 600144F054A281000000513DB9770005

Logical Unit will stuck in the offlining state


stmfadm list-lu -v
LU Name: 600144F054A281000000513DB9770005

    Operational Status: Offlining
    Provider Name     : sbd
    Alias             : /dev/zvol/rdsk/GIGO/zvol200g
    View Entry Count  : 0
    Data File         : /dev/zvol/rdsk/GIGO/zvol200g
    Meta File         : not set
    Size              : 214748364800
    Block Size        : 512
    Management URL    : not set
    Vendor ID         : SUN
    Product ID        : COMSTAR
    Serial Num        : not set
    Write Protect     : Disabled
    Writeback Cache   : Enabled
    Access State      : Active

As a consequence, it cannot be unloaded from the COMSTAR framework


stmfadm delete-lu 600144F054A281000000513DB9770005
stmfadm: resource busy

Hence the GIGO pool cannot be exported:

zpool export GIGO
cannot export 'GIGO': pool is busy


Resulting in the cluster aborting failover
Now only reboot...

I tested it in OmniOS, OpenIndiana and Illumianos
The problem was in all variants.
Can you fix this?

I found this bug during testing rsf-1
Support rsf-1 says that this bug fixed in NexentaStor

Best regards,
Alexander Shvayakov

History

#1

Updated by Alexander Shvayakov over 6 years ago

I made upgrade the OmniOS to bloody version.
pkg:,5.11-0.151005:20130311T155835Z
The problem still persist

#2

Updated by Sašo Kiselkov over 6 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 80

Confirmed and found a fix for this:
http://cr.illumos.org/~webrev/skiselkov/3621/
Webrev submitted to the mailing lists for review.

#3

Updated by Christopher Siden over 6 years ago

  • Subject changed from ZFS LU stuck in the offlining state to ZFS LU stuck in the offlining state
#4

Updated by Christopher Siden over 6 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 80 to 100
commit a49dc89
Author: Saso Kiselkov <skiselkov@gmail.com>
Date:   Thu May 23 09:52:46 2013

    3621 ZFS LU stuck in the offlining state
    Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
    Reviewed by: Jeff Biseda <jeff.biseda@delphix.com>
    Reviewed by: Dan McDonald <danmcd@nexenta.com>
    Approved by: Christopher Siden <christopher.siden@delphix.com>
#5

Updated by Bob Lu over 5 years ago

Hello All,

what's the result of this fix? I verified the fix with OI and found that the lu won't stuck with delete-lu, but it takes some time to delete the lu, for the first couple of delete, it still returned with resource busy, until the lu state changed to offline, delete operation completed.

What I concerned is:
1, Is this the result of the fix?
2, Can we improve the offling time slot? It's import for failover.

and the time differs from 20ms to 39s...:
dtrace: script 'stmf.d' matched 1 probe
CPU ID FUNCTION:NAME
1 6480 stmf_wait_ilu_tasks_finish:deadman-timeout-wait time:20000

stmf`stmf_svc+0x142
genunix`taskq_thread+0x285
unix`thread_start+0x8
5   6480 stmf_wait_ilu_tasks_finish:deadman-timeout-wait time:29480000
stmf`stmf_svc+0x142
genunix`taskq_thread+0x285
unix`thread_start+0x8
thanks.
#6

Updated by Bob Lu over 5 years ago

plus, nexenta is ok with this

#7

Updated by Markus Kovero about 5 years ago

  • Status changed from Closed to Feedback

Problem still persists on OI 151a9.

#8

Updated by Markus Kovero about 5 years ago

Problem seems to be fixed in OI Hipster with illumos-fe2e029.

#9

Updated by Yuri Pankov about 5 years ago

  • Category set to comstar - iSCSI/FC/SAS target
  • Status changed from Feedback to Closed
#10

Updated by Bob Lu about 5 years ago

Markus Kovero wrote:

Problem seems to be fixed in OI Hipster with illumos-fe2e029.

Hi Markus,

You mentioned illumos-fe2e029, is this the one?
commit fe2e029eea29fd49d0d9058dbd5b79a252667e6b
Author: Robert Mustacchi <>
Date: Mon Oct 6 13:07:51 2014 -0700

5202 want ctf(4)
Reviewed by: Keith M Wesolowski &lt;&gt;
Reviewed by: Jerry Jelinek &lt;&gt;
Reviewed by: Garrett D'Amore &lt;&gt;
Approved by: Dan McDonald &lt;&gt;
thanks.
#11

Updated by Markus Kovero over 4 years ago

  • Status changed from Closed to Feedback

I referred to illumos kernel version in hipster. However, even though problem has become much less frequent, it still exists on illumos-f8554bb causing system to kernel panic.
As problem is no longer having stuck LU, but crashing the system, I'll continue this on https://www.illumos.org/issues/5079 as I get crashdump uploaded.

#12

Updated by Markus Kovero over 4 years ago

Latest crashdump can be found from http://www.mui.fi/vmdump.0

Also available in: Atom PDF