Bug #3621
openZFS LU stuck in the offlining state
100%
Description
Hi,
Houston, we have a problem - ZFS Logical Unit stuck in the offlining state with COMSTAR.
Mandatory condition to reproduce the problem - the presence of the I/O load on Logical Unit
stmfadm create-lu /dev/zvol/rdsk/GIGO/zvol200g stmfadm add-view 600144F054A281000000513DB9770005
I mounted iscsi device on the initiator and began to upload with dd
At this point, you can execute commands:
stmfadm delete-lu -k 600144F054A281000000513DB9770005 or stmfadm offline-lu 600144F054A281000000513DB9770005
Logical Unit will stuck in the offlining state
stmfadm list-lu -v LU Name: 600144F054A281000000513DB9770005 Operational Status: Offlining Provider Name : sbd Alias : /dev/zvol/rdsk/GIGO/zvol200g View Entry Count : 0 Data File : /dev/zvol/rdsk/GIGO/zvol200g Meta File : not set Size : 214748364800 Block Size : 512 Management URL : not set Vendor ID : SUN Product ID : COMSTAR Serial Num : not set Write Protect : Disabled Writeback Cache : Enabled Access State : Active
As a consequence, it cannot be unloaded from the COMSTAR framework
stmfadm delete-lu 600144F054A281000000513DB9770005 stmfadm: resource busy
Hence the GIGO pool cannot be exported:
zpool export GIGO cannot export 'GIGO': pool is busy
Resulting in the cluster aborting failover
Now only reboot...
I tested it in OmniOS, OpenIndiana and Illumianos
The problem was in all variants.
Can you fix this?
I found this bug during testing rsf-1
Support rsf-1 says that this bug fixed in NexentaStor
Best regards,
Alexander Shvayakov
Updated by Alexander Shvayakov over 10 years ago
I made upgrade the OmniOS to bloody version.
pkg://omnios/storage/stmf@0.5.11,5.11-0.151005:20130311T155835Z
The problem still persist
Updated by Sašo Kiselkov over 10 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 80
Confirmed and found a fix for this:
http://cr.illumos.org/~webrev/skiselkov/3621/
Webrev submitted to the mailing lists for review.
Updated by Christopher Siden over 10 years ago
- Subject changed from ZFS LU stuck in the offlining state to ZFS LU stuck in the offlining state
Updated by Christopher Siden over 10 years ago
- Status changed from In Progress to Closed
- % Done changed from 80 to 100
commit a49dc89 Author: Saso Kiselkov <skiselkov@gmail.com> Date: Thu May 23 09:52:46 2013 3621 ZFS LU stuck in the offlining state Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: Jeff Biseda <jeff.biseda@delphix.com> Reviewed by: Dan McDonald <danmcd@nexenta.com> Approved by: Christopher Siden <christopher.siden@delphix.com>
Updated by Bob Lu over 9 years ago
Hello All,
what's the result of this fix? I verified the fix with OI and found that the lu won't stuck with delete-lu, but it takes some time to delete the lu, for the first couple of delete, it still returned with resource busy, until the lu state changed to offline, delete operation completed.
What I concerned is:
1, Is this the result of the fix?
2, Can we improve the offling time slot? It's import for failover.
and the time differs from 20ms to 39s...:
dtrace: script 'stmf.d' matched 1 probe
CPU ID FUNCTION:NAME
1 6480 stmf_wait_ilu_tasks_finish:deadman-timeout-wait time:20000
stmf`stmf_svc+0x142
genunix`taskq_thread+0x285
unix`thread_start+0x8
5 6480 stmf_wait_ilu_tasks_finish:deadman-timeout-wait time:29480000
stmf`stmf_svc+0x142
genunix`taskq_thread+0x285
unix`thread_start+0x8
thanks.
Updated by Markus Kovero about 9 years ago
- Status changed from Closed to Feedback
Problem still persists on OI 151a9.
Updated by Markus Kovero about 9 years ago
Problem seems to be fixed in OI Hipster with illumos-fe2e029.
Updated by Yuri Pankov about 9 years ago
- Category set to comstar - iSCSI/FC/SAS target
- Status changed from Feedback to Closed
Updated by Bob Lu about 9 years ago
Markus Kovero wrote:
Problem seems to be fixed in OI Hipster with illumos-fe2e029.
Hi Markus,
You mentioned illumos-fe2e029, is this the one?
commit fe2e029eea29fd49d0d9058dbd5b79a252667e6b
Author: Robert Mustacchi <rm@joyent.com>
Date: Mon Oct 6 13:07:51 2014 -0700
5202 want ctf(4)
Reviewed by: Keith M Wesolowski <wesolows@foobazco.org>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Dan McDonald <danmcd@omniti.com>
thanks.
Updated by Markus Kovero almost 9 years ago
- Status changed from Closed to Feedback
I referred to illumos kernel version in hipster. However, even though problem has become much less frequent, it still exists on illumos-f8554bb causing system to kernel panic.
As problem is no longer having stuck LU, but crashing the system, I'll continue this on https://www.illumos.org/issues/5079 as I get crashdump uploaded.
Updated by Markus Kovero almost 9 years ago
Latest crashdump can be found from http://www.mui.fi/vmdump.0