Bug #6510
Kernel panic while trying to automount filesystem
0%
Description
Hi All,
I have an old server running OpenIndiana 151a8 which I have to upgrade the OS to OmniOS R151014. This server was used as a zfs recv target for my production server running OmniOS R151014. Before I start the upgrade, I exported the data pool on this server and then I proceed to install a new version of OS. After the installation works was completed, I tried to import back the exported pool but I encountered kernel panic and this cause the server to core dump and reboot. After the server is booted, it will core dump again when it reached a certain dataset. I tried to use back the original OS disk which is running OpenIndiana 151a8 and the same thing happens again.
I moved the original set of disks to another server running OmniOS R151014 and tried the process of importing the pool and still the same. Next I boot to Single user mode and tried to import with the -N options and the server did not core dump and reboot. If I exit out the single user mode, the systerm will try to auto-mount the filesystem and the problem surface again.
My filesystem structure is as follows:
tankA tankA/datashares tankA/datashares/data01 tankA/datashares/data01/DS_SG_ITCC tankA/datashares/data01/DS_SG_BE001 <--- This one has problem tankA/datashares/data01/DS_SG_ADMIN ...
I have tried to move the problematic dataset to another filesystem under the same pool (eg. tankA/datashares/data04/) and it also cause the server to core dump and reboot. Even worst is that booting to Single user mode and trying to mount with read/write will cause the server to core dump and reboot again. Mounting the dataset in read-only mode is safe and stable and that is how I managed to copy data out from that dataset.
To narrow down to the problematic dataset in this pool, I have to do zfs send and recv the dataset one by one. I managed to isolate and found 3 datasets causing the issue. I have also tried to take a snapshot of the snapshot and transfer, and clone a snapshot and then transfer, all is still the same, the receiving server will core dump and reboot when it tries to auto-mount the dataset.
My boss have managed to help with the problem and using a test server with the same OS and manage to get the core dump details as follows:
root@sgtestnas:/export/home/phing# savecore -f /dev/zvol/dsk/rpool/dump root@sgtestnas:/export/home/phing# cd /var/crash/unknown root@sgtestnas:/var/crash/unknown# savecore -f vmdump.0 savecore: System dump time: Sat Dec 12 09:52:48 2015 savecore: saving system crash dump in /var/crash/unknown/{unix,vmcore}.0 Constructing namelist /var/crash/unknown/unix.0 Constructing corefile /var/crash/unknown/vmcore.0 0:01 100% done: 83712 of 83712 pages saved root@sgtestnas:/var/crash/unknown# mdb unix.0 vmcore.0 Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs mpt sd ip hook neti sockfs arp usba uhci s1394 stmf stmf_sbd md lofs random idm cpc crypto kvm ufs logindmux nsmb ptm smbsrv nfs ] > ::status debugging crash dump vmcore.0 (64-bit) from sgtestnas operating system: 5.11 omnios-8322307 (i86pc) image uuid: dcffa3b0-d0b1-c9ed-a5e6-f4c4b76581b7 panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff00088b1290 addr=20 occurred in module "zfs" due to a NULL pointer dereference dump content: kernel pages only > ::stack zap_leaf_lookup_closest+0x45(ffffff01d5fc7c88, 0, 0, ffffff00088b13f0) fzap_cursor_retrieve+0xbb(ffffff01d5fc7c88, ffffff00088b1650, ffffff00088b1530) zap_cursor_retrieve+0x11e(ffffff00088b1650, ffffff00088b1530) zfs_purgedir+0x67(ffffff01d5fb9ad0) zfs_rmnode+0x202(ffffff01d5fb9ad0) zfs_zinactive+0xe8(ffffff01d5fb9ad0) zfs_inactive+0x75(ffffff01d5fc1440, ffffff01cd03a178, 0) fop_inactive+0x76(ffffff01d5fc1440, ffffff01cd03a178, 0) vn_rele+0x82(ffffff01d5fc1440) zfs_unlinked_drain+0xaa(ffffff01cfb4b800) zfsvfs_setup+0xe8(ffffff01cfb4b800, 1) zfs_domount+0x131(ffffff01d4d12b70, ffffff01d364bb40) zfs_mount+0x24f(ffffff01d4d12b70, ffffff01ce60aa00, ffffff00088b1e00, ffffff01cd03a178) fsop_mount+0x1e(ffffff01d4d12b70, ffffff01ce60aa00, ffffff00088b1e00, ffffff01cd03a178) domount+0x86b(0, ffffff00088b1e00, ffffff01ce60aa00, ffffff01cd03a178, ffffff00088b1e40) mount+0x167(ffffff01ce384af8, ffffff00088b1e90) syscall_ap+0x94() _sys_sysenter_post_swapgs+0x149() > ::msgbuf MESSAGE sd4 at mpt0: unit-address 4,0: target 4 lun 0 sd4 is /pci@0,0/pci15ad,1976@10/sd@4,0/pci@0,0/pci15ad,1976@10/sd@4,0 (sd4) online This Solaris instance has UUID dcffa3b0-d0b1-c9ed-a5e6-f4c4b76581b7 dump on /dev/zvol/dsk/rpool/dump size 2048 MB dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel 100% done: 77280 pages dumped, dump succeeded pcplusmp: fdc (fdc) instance 0 irq 0x6 vector 0x43 ioapic 0x2 intin 0x6 is bound to cpu 1 ISA-device: fdc0 fd0 at fdc0 fd0 is /pci@0,0/isa@7/fdc@1,3f0/fd@0,0 pcplusmp: lp (ecpp) instance 0 irq 0x7 vector 0x44 ioapic 0x2 intin 0x7 is bound to cpu 0 ISA-device: ecpp0 ecpp0 is /pci@0,0/isa@7/lp@1,378 pcplusmp: asy (asy) instance 1 irq 0x3 vector 0xb0 ioapic 0x2 intin 0x3 is bound to cpu 1 ISA-device: asy1 asy1 is /pci@0,0/isa@7/asy@1,2f8 pcplusmp: asy (asy) instance 0 irq 0x4 vector 0xb1 ioapic 0x2 intin 0x4 is bound to cpu 0 ISA-device: asy0 asy0 is /pci@0,0/isa@7/asy@1,3f8 pseudo-device: ucode0 ucode0 is /pseudo/ucode@0 iscsi0 at root iscsi0 is /iscsi acpinex: L1M0@16, acpinex2 acpinex2 is /fw/sb@0/L1M0@16 acpinex: L0M0@32, acpinex3 acpinex3 is /fw/sb@0/L1M0@16/L0M0@32 ISA-device: pit_beep0 pit_beep0 is /pci@0,0/isa@7/pit_beep pseudo-device: pseudo1 pseudo1 is /pseudo/zconsnex@1 pcplusmp: ide (ata) instance 0 irq 0xe vector 0x45 ioapic 0x2 intin 0xe is bound to cpu 1 pcplusmp: ide (ata) instance 1 irq 0xf vector 0x45 ioapic 0x2 intin 0xf is bound to cpu 0 ATAPI device at targ 0, lun 0 lastlun 0x0 model VMware Virtual IDE CDROM Drive ATA/ATAPI-4 supported, majver 0x1e minver 0x17 PCI Express-device: ide@1, ata1 ata1 is /pci@0,0/pci-ide@7,1/ide@1 UltraDMA mode 2 selected sd5 at ata1: target 0 lun 0 sd5 is /pci@0,0/pci-ide@7,1/ide@1/sd@0,0 pseudo-device: dcpc0 dcpc0 is /pseudo/dcpc@0 pseudo-device: dtrace0 dtrace0 is /pseudo/dtrace@0 pseudo-device: fasttrap0 fasttrap0 is /pseudo/fasttrap@0 pseudo-device: fbt0 fbt0 is /pseudo/fbt@0 pseudo-device: fcp0 fcp0 is /pseudo/fcp@0 pseudo-device: fcsm0 fcsm0 is /pseudo/fcsm@0 pseudo-device: fct0 fct0 is /pseudo/fct@0 WARNING: kvm: no hardware support pseudo-device: llc10 llc10 is /pseudo/llc1@0 pseudo-device: lockstat0 lockstat0 is /pseudo/lockstat@0 pseudo-device: lofi0 lofi0 is /pseudo/lofi@0 pseudo-device: power0 power0 is /pseudo/power@0 pseudo-device: profile0 profile0 is /pseudo/profile@0 pseudo-device: ramdisk1024 ramdisk1024 is /pseudo/ramdisk@1024 pseudo-device: sdt0 sdt0 is /pseudo/sdt@0 pseudo-device: srn0 srn0 is /pseudo/srn@0 pseudo-device: stmf0 stmf0 is /pseudo/stmf@0 pseudo-device: systrace0 systrace0 is /pseudo/systrace@0 pseudo-device: bpf0 bpf0 is /pseudo/bpf@0 pseudo-device: fssnap0 fssnap0 is /pseudo/fssnap@0 IP Filter: v4.1.9, running. pseudo-device: nsmb0 nsmb0 is /pseudo/nsmb@0 pseudo-device: pm0 pm0 is /pseudo/pm@0 pseudo-device: pool0 pool0 is /pseudo/pool@0 panic[cpu0]/thread=ffffff01cfa42840: BAD TRAP: type=e (#pf Page fault) rp=ffffff00088b1290 addr=20 occurred in module "zfs" due to a NULL pointer dereference zfs: #pf Page fault Bad kernel fault at addr=0x20 pid=373, pc=0xfffffffff7a11e95, sp=0xffffff00088b1380, eflags=0x10202 cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 406b8<osxsav,xmme,fxsr,pge,pae,pse,de> cr2: 20 cr3: 13fd16000 cr8: c rdi: ffffff01d5fc7c88 rsi: 0 rdx: 8 rcx: d5fad1c3 r8: ffffff00088b13f0 r9: ffffff00088b11c8 rax: d5fad1c8 rbx: ffffff00088b1650 rbp: ffffff00088b13d0 r10: fffffffffb854358 r11: 0 r12: 8 r13: 0 r14: ffffff00088b13f0 r15: ffffff00088b1530 fsb: 0 gsb: fffffffffbc30c40 ds: 4b es: 4b fs: 0 gs: 1c3 trp: e err: 0 rip: fffffffff7a11e95 cs: 30 rfl: 10202 rsp: ffffff00088b1380 ss: 38 ffffff00088b1170 unix:die+df () ffffff00088b1280 unix:trap+db3 () ffffff00088b1290 unix:cmntrap+e6 () ffffff00088b13d0 zfs:zap_leaf_lookup_closest+45 () ffffff00088b1470 zfs:fzap_cursor_retrieve+bb () ffffff00088b1510 zfs:zap_cursor_retrieve+11e () ffffff00088b1700 zfs:zfs_purgedir+67 () ffffff00088b1750 zfs:zfs_rmnode+202 () ffffff00088b1790 zfs:zfs_zinactive+e8 () ffffff00088b17f0 zfs:zfs_inactive+75 () ffffff00088b1850 genunix:fop_inactive+76 () ffffff00088b1880 genunix:vn_rele+82 () ffffff00088b1a70 zfs:zfs_unlinked_drain+aa () ffffff00088b1ab0 zfs:zfsvfs_setup+e8 () ffffff00088b1290 unix:cmntrap+e6 () ffffff00088b13d0 zfs:zap_leaf_lookup_closest+45 () ffffff00088b1470 zfs:fzap_cursor_retrieve+bb () ffffff00088b1510 zfs:zap_cursor_retrieve+11e () ffffff00088b1700 zfs:zfs_purgedir+67 () ffffff00088b1750 zfs:zfs_rmnode+202 () ffffff00088b1790 zfs:zfs_zinactive+e8 () ffffff00088b17f0 zfs:zfs_inactive+75 () ffffff00088b1850 genunix:fop_inactive+76 () ffffff00088b1880 genunix:vn_rele+82 () ffffff00088b1a70 zfs:zfs_unlinked_drain+aa () ffffff00088b1ab0 zfs:zfsvfs_setup+e8 () ffffff00088b1b10 zfs:zfs_domount+131 () ffffff00088b1c40 zfs:zfs_mount+24f () ffffff00088b1c70 genunix:fsop_mount+1e () ffffff00088b1df0 genunix:domount+86b () ffffff00088b1e80 genunix:mount+167 () ffffff00088b1ec0 genunix:syscall_ap+94 () ffffff00088b1f10 unix:brand_sys_sysenter+1c9 () syncing file systems... done dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel > ::quit
I do not know what is causing the problem and I would like to find out. I still have the dump files and if anyone is interested to analyse the issue, please let me know and I can share the dump file with you.
Rsyncing the data out and re-creating the filesystem and rsync the data back is ok, the filesystem is able to be auto-mounted.
Thanks & Regards.
Updated by Dan McDonald about 5 years ago
This looks an awful lot like #5833. Perhaps this is a duplicate?