Project

General

Profile

Bug #2233

zfs mount hangs system

Added by Gleb Panov over 8 years ago. Updated about 8 years ago.

Status:
Feedback
Priority:
High
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
2012-03-04
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

During normal work, OI-148 server rebooted. The boot enviroment never booted again, freezes on startup.
Managed to boot boot enviroment fresh after install of the system. Upgraded it to 151a and installed napp-it. All actions were performed on this BE.

'zpool import' freezes system.
'zpool import -N' or 'zpool import -readonly' works ok.
If I try zfs mount any dataset after importing with -N (without mounting), system freeses. sometimes system reboots, setting 'zpool import -o failmode=panic' makes reboot more probable than freeze, but both happen.

pool had ~8 datasets, 1 had dedupe enabled for testing. 'zpool import -N', then destroyed it. zdb says no deupe tables present for now.
Pool was ~93% full, the reboot happened during or after deletion of some data.
The hardware is 5520 intel board, xeon cpu, 16 gb ram, lsi 9260-8i sas board (imr_sas driver), chenbro case with sas expander, 24 hitachi ultrastar 7k3000 disks.


Related issues

Related to illumos gate - Bug #5833: kernel crash on zfs import (mount)Closed2015-04-13

Actions
#1

Updated by Gleb Panov over 8 years ago

upd:

console output:

panic[cpu3]/thread=ffffff04e66df440: BAD TRAP: type=e (#pf Page fault) rp=ffffff001f456340 addr=20 occurred in module "zfs" due to a NULL pointer dereference

zpool: #pf Page fault
Bad kernel fault at addr=0x20
pid=1676, pc=0xfffffffff7a16998, sp=0xffffff001f456438, eflags=0x10203
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: 20cr3: 3d6e0b000cr8: c

rdi: ffffff04f7c607d0 rsi:                0 rdx:                0
rcx: f92175db r8: 6de8a20 r9: ffffff001f4564a0
rax: 7ffffff rbx: 0 rbp: ffffff001f456480
r10: ffff r11: 0 r12: ffffff04f7c607d0
r13: ffffff04f7c607d0 r14: ffffff001f4565c0 r15: ffffff001f456600
fsb: 0 gsb: ffffff04e557e040 ds: 4b
es: 4b fs: 0 gs: 1c3
trp: e err: 0 rip: fffffffff7a16998
cs: 30 rfl: 10203 rsp: ffffff001f456438
ss: 38

ffffff001f456220 unix:die+dd ()
ffffff001f456330 unix:trap+1799 ()
ffffff001f456340 unix:cmntrap+e6 ()
ffffff001f456480 zfs:zap_leaf_lookup_closest+40 ()
ffffff001f456510 zfs:fzap_cursor_retrieve+c9 ()
ffffff001f4565a0 zfs:zap_cursor_retrieve+188 ()
ffffff001f456780 zfs:zfs_purgedir+4c ()
ffffff001f4567d0 zfs:zfs_rmnode+50 ()
ffffff001f456810 zfs:zfs_zinactive+b5 ()
ffffff001f456860 zfs:zfs_inactive+11c ()
ffffff001f4568b0 genunix:fop_inactive+af ()
ffffff001f4568d0 genunix:vn_rele+5f ()
ffffff001f456ac0 zfs:zfs_unlinked_drain+af ()
ffffff001f456af0 zfs:zfsvfs_setup+102 ()
ffffff001f456b50 zfs:zfs_domount+17c ()
ffffff001f456c70 zfs:zfs_mount+1e2 ()
ffffff001f456ca0 genunix:fsop_mount+21 ()
ffffff001f456e00 genunix:domount+afc ()
ffffff001f456e80 genunix:mount+121 ()
ffffff001f456ec0 genunix:syscall_ap+8c ()
ffffff001f456f10 unix:brand_sys_sysenter+1c9 ()

syncing file systems... done
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel

#2

Updated by Dan Kruchinin over 8 years ago

Could you attach kernel crash dump to the issue?

#3

Updated by Gleb Panov over 8 years ago

  • Status changed from New to Feedback

Hardly. 1) the system gave "magic number 0" which is, I beleive, no dump found. 2) system is important, and is back in production, in RO mode.
PS yes, pool is accessible in either zpool import - readonly, or (the way I managed to share readonly datasets) zpool import -N, then zfs mount pool folder, change share options, then zfs mount every dataset in readonly mode.

#4

Updated by Gary Mills over 8 years ago

That's what I always get with kernel panics on oi_151a too.

Executing start method ("/lib/svc/method/svc-dumpadm"). ]
savecore: bad magic number 0

#5

Updated by Rich Ercolani over 8 years ago

Can I suggest you add to /etc/system:
set dump_plat_mincpu=0
per https://www.illumos.org/issues/1110 so that, if it tries to dump again, you're far more likely to get a dump/reboot than an infinite hang? (Presuming your hang isn't an unrelated issue.)

#6

Updated by Rich Ercolani over 8 years ago

Per IRC logs from September 2011:

  1. mdb -kw

dump_plat_mincpu/W 0

dump_plat_mincpu: 0 = 0x0

To set it in the running system, in addition to the above to set it persistently.

#7

Updated by Dan McDonald about 8 years ago

I notice you're using the "imr_sas" driver, which is a downloadable, unsupported, binary from LSI. Why aren't you using the mr_sas driver, for which we have source? The 9260-8i uses the 2108 chipset, which should be already supported by the mr_sas driver.

Also, there's an update to mr_sas coming, which includes mostly new-device support, but it is possible LSI fixed some 2108 issues too. Either way, the source will be in Illumos itself, and people can help debug better (modulo, of course, a crash dump).

#8

Updated by Pavel Zakharov almost 5 years ago

  • Related to Bug #5833: kernel crash on zfs import (mount) added

Also available in: Atom PDF