Bug #2233
openzfs mount hangs system
0%
Description
During normal work, OI-148 server rebooted. The boot enviroment never booted again, freezes on startup.
Managed to boot boot enviroment fresh after install of the system. Upgraded it to 151a and installed napp-it. All actions were performed on this BE.
'zpool import' freezes system.
'zpool import -N' or 'zpool import -readonly' works ok.
If I try zfs mount any dataset after importing with -N (without mounting), system freeses. sometimes system reboots, setting 'zpool import -o failmode=panic' makes reboot more probable than freeze, but both happen.
pool had ~8 datasets, 1 had dedupe enabled for testing. 'zpool import -N', then destroyed it. zdb says no deupe tables present for now.
Pool was ~93% full, the reboot happened during or after deletion of some data.
The hardware is 5520 intel board, xeon cpu, 16 gb ram, lsi 9260-8i sas board (imr_sas driver), chenbro case with sas expander, 24 hitachi ultrastar 7k3000 disks.
Related issues
Updated by Gleb Panov over 11 years ago
upd:
console output:
panic[cpu3]/thread=ffffff04e66df440: BAD TRAP: type=e (#pf Page fault) rp=ffffff001f456340 addr=20 occurred in module "zfs" due to a NULL pointer dereference
zpool: #pf Page fault
Bad kernel fault at addr=0x20
pid=1676, pc=0xfffffffff7a16998, sp=0xffffff001f456438, eflags=0x10203
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: 20cr3: 3d6e0b000cr8: c
rdi: ffffff04f7c607d0 rsi: 0 rdx: 0
rcx: f92175db r8: 6de8a20 r9: ffffff001f4564a0
rax: 7ffffff rbx: 0 rbp: ffffff001f456480
r10: ffff r11: 0 r12: ffffff04f7c607d0
r13: ffffff04f7c607d0 r14: ffffff001f4565c0 r15: ffffff001f456600
fsb: 0 gsb: ffffff04e557e040 ds: 4b
es: 4b fs: 0 gs: 1c3
trp: e err: 0 rip: fffffffff7a16998
cs: 30 rfl: 10203 rsp: ffffff001f456438
ss: 38
ffffff001f456220 unix:die+dd ()
ffffff001f456330 unix:trap+1799 ()
ffffff001f456340 unix:cmntrap+e6 ()
ffffff001f456480 zfs:zap_leaf_lookup_closest+40 ()
ffffff001f456510 zfs:fzap_cursor_retrieve+c9 ()
ffffff001f4565a0 zfs:zap_cursor_retrieve+188 ()
ffffff001f456780 zfs:zfs_purgedir+4c ()
ffffff001f4567d0 zfs:zfs_rmnode+50 ()
ffffff001f456810 zfs:zfs_zinactive+b5 ()
ffffff001f456860 zfs:zfs_inactive+11c ()
ffffff001f4568b0 genunix:fop_inactive+af ()
ffffff001f4568d0 genunix:vn_rele+5f ()
ffffff001f456ac0 zfs:zfs_unlinked_drain+af ()
ffffff001f456af0 zfs:zfsvfs_setup+102 ()
ffffff001f456b50 zfs:zfs_domount+17c ()
ffffff001f456c70 zfs:zfs_mount+1e2 ()
ffffff001f456ca0 genunix:fsop_mount+21 ()
ffffff001f456e00 genunix:domount+afc ()
ffffff001f456e80 genunix:mount+121 ()
ffffff001f456ec0 genunix:syscall_ap+8c ()
ffffff001f456f10 unix:brand_sys_sysenter+1c9 ()
syncing file systems... done
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Updated by Dan Kruchinin about 11 years ago
Could you attach kernel crash dump to the issue?
Updated by Gleb Panov about 11 years ago
- Status changed from New to Feedback
Hardly. 1) the system gave "magic number 0" which is, I beleive, no dump found. 2) system is important, and is back in production, in RO mode.
PS yes, pool is accessible in either zpool import - readonly, or (the way I managed to share readonly datasets) zpool import -N, then zfs mount pool folder, change share options, then zfs mount every dataset in readonly mode.
Updated by Gary Mills about 11 years ago
That's what I always get with kernel panics on oi_151a too.
Executing start method ("/lib/svc/method/svc-dumpadm"). ]
savecore: bad magic number 0
Updated by Rich Ercolani about 11 years ago
Can I suggest you add to /etc/system:
set dump_plat_mincpu=0
per https://www.illumos.org/issues/1110 so that, if it tries to dump again, you're far more likely to get a dump/reboot than an infinite hang? (Presuming your hang isn't an unrelated issue.)
Updated by Rich Ercolani about 11 years ago
Per IRC logs from September 2011:
- mdb -kw
dump_plat_mincpu/W 0
dump_plat_mincpu: 0 = 0x0
To set it in the running system, in addition to the above to set it persistently.
Updated by Dan McDonald over 10 years ago
I notice you're using the "imr_sas" driver, which is a downloadable, unsupported, binary from LSI. Why aren't you using the mr_sas driver, for which we have source? The 9260-8i uses the 2108 chipset, which should be already supported by the mr_sas driver.
Also, there's an update to mr_sas coming, which includes mostly new-device support, but it is possible LSI fixed some 2108 issues too. Either way, the source will be in Illumos itself, and people can help debug better (modulo, of course, a crash dump).
Updated by Pavel Zakharov over 7 years ago
- Related to Bug #5833: kernel crash on zfs import (mount) added