Project

General

Profile

Bug #6510

Kernel panic while trying to automount filesystem

Added by Lawrence Giam almost 5 years ago. Updated over 4 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
-
Start date:
2015-12-14
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

Hi All,

I have an old server running OpenIndiana 151a8 which I have to upgrade the OS to OmniOS R151014. This server was used as a zfs recv target for my production server running OmniOS R151014. Before I start the upgrade, I exported the data pool on this server and then I proceed to install a new version of OS. After the installation works was completed, I tried to import back the exported pool but I encountered kernel panic and this cause the server to core dump and reboot. After the server is booted, it will core dump again when it reached a certain dataset. I tried to use back the original OS disk which is running OpenIndiana 151a8 and the same thing happens again.

I moved the original set of disks to another server running OmniOS R151014 and tried the process of importing the pool and still the same. Next I boot to Single user mode and tried to import with the -N options and the server did not core dump and reboot. If I exit out the single user mode, the systerm will try to auto-mount the filesystem and the problem surface again.

My filesystem structure is as follows:

tankA
tankA/datashares
tankA/datashares/data01
tankA/datashares/data01/DS_SG_ITCC
tankA/datashares/data01/DS_SG_BE001   <--- This one has problem
tankA/datashares/data01/DS_SG_ADMIN
...

I have tried to move the problematic dataset to another filesystem under the same pool (eg. tankA/datashares/data04/) and it also cause the server to core dump and reboot. Even worst is that booting to Single user mode and trying to mount with read/write will cause the server to core dump and reboot again. Mounting the dataset in read-only mode is safe and stable and that is how I managed to copy data out from that dataset.

To narrow down to the problematic dataset in this pool, I have to do zfs send and recv the dataset one by one. I managed to isolate and found 3 datasets causing the issue. I have also tried to take a snapshot of the snapshot and transfer, and clone a snapshot and then transfer, all is still the same, the receiving server will core dump and reboot when it tries to auto-mount the dataset.

My boss have managed to help with the problem and using a test server with the same OS and manage to get the core dump details as follows:

root@sgtestnas:/export/home/phing# savecore -f
      /dev/zvol/dsk/rpool/dump

    root@sgtestnas:/export/home/phing# cd /var/crash/unknown
    root@sgtestnas:/var/crash/unknown# savecore -f vmdump.0
    savecore: System dump time: Sat Dec 12 09:52:48 2015
    savecore: saving system crash dump in
      /var/crash/unknown/{unix,vmcore}.0

    Constructing namelist /var/crash/unknown/unix.0
    Constructing corefile /var/crash/unknown/vmcore.0
     0:01 100% done: 83712 of 83712 pages saved

    root@sgtestnas:/var/crash/unknown# mdb unix.0 vmcore.0 

    Loading modules: [ unix genunix specfs dtrace mac
      cpu.generic uppc pcplusmp scsi_vhci zfs mpt sd ip hook neti sockfs
      arp usba uhci s1394 stmf stmf_sbd md lofs random idm cpc crypto
      kvm ufs logindmux nsmb ptm smbsrv nfs ]

    > ::status

      debugging crash dump vmcore.0 (64-bit) from sgtestnas
      operating system: 5.11 omnios-8322307 (i86pc)
      image uuid: dcffa3b0-d0b1-c9ed-a5e6-f4c4b76581b7
      panic message: 

      BAD TRAP: type=e (#pf Page fault) rp=ffffff00088b1290 addr=20
      occurred in module

       "zfs" due to a NULL pointer dereference

      dump content: kernel pages only

      > ::stack

    zap_leaf_lookup_closest+0x45(ffffff01d5fc7c88, 0, 0, ffffff00088b13f0)
    fzap_cursor_retrieve+0xbb(ffffff01d5fc7c88, ffffff00088b1650, ffffff00088b1530)
    zap_cursor_retrieve+0x11e(ffffff00088b1650, ffffff00088b1530)
    zfs_purgedir+0x67(ffffff01d5fb9ad0)
    zfs_rmnode+0x202(ffffff01d5fb9ad0)
    zfs_zinactive+0xe8(ffffff01d5fb9ad0)
    zfs_inactive+0x75(ffffff01d5fc1440, ffffff01cd03a178, 0)
    fop_inactive+0x76(ffffff01d5fc1440, ffffff01cd03a178, 0)
    vn_rele+0x82(ffffff01d5fc1440)
    zfs_unlinked_drain+0xaa(ffffff01cfb4b800)
    zfsvfs_setup+0xe8(ffffff01cfb4b800, 1)
    zfs_domount+0x131(ffffff01d4d12b70, ffffff01d364bb40)
    zfs_mount+0x24f(ffffff01d4d12b70, ffffff01ce60aa00, ffffff00088b1e00, ffffff01cd03a178)
    fsop_mount+0x1e(ffffff01d4d12b70, ffffff01ce60aa00, ffffff00088b1e00, ffffff01cd03a178)
    domount+0x86b(0, ffffff00088b1e00, ffffff01ce60aa00, ffffff01cd03a178, ffffff00088b1e40)
    mount+0x167(ffffff01ce384af8, ffffff00088b1e90)
    syscall_ap+0x94()
    _sys_sysenter_post_swapgs+0x149()

      > ::msgbuf

      MESSAGE                                                              

      sd4 at mpt0: unit-address 4,0: target 4 lun 0
      sd4 is /pci@0,0/pci15ad,1976@10/sd@4,0/pci@0,0/pci15ad,1976@10/sd@4,0 (sd4) online

      This Solaris instance has UUID
      dcffa3b0-d0b1-c9ed-a5e6-f4c4b76581b7

      dump on /dev/zvol/dsk/rpool/dump size 2048 MB
      dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel

      100% done: 77280 pages dumped, 

      dump succeeded

      pcplusmp: fdc (fdc) instance 0 irq 0x6 vector 0x43 ioapic 0x2
      intin 0x6 is bound
       to cpu 1

      ISA-device: fdc0
      fd0 at fdc0
      fd0 is /pci@0,0/isa@7/fdc@1,3f0/fd@0,0

      pcplusmp: lp (ecpp) instance 0 irq 0x7 vector 0x44 ioapic 0x2
      intin 0x7 is bound
       to cpu 0

      ISA-device: ecpp0
      ecpp0 is /pci@0,0/isa@7/lp@1,378

      pcplusmp: asy (asy) instance 1 irq 0x3 vector 0xb0 ioapic 0x2
      intin 0x3 is bound
       to cpu 1

      ISA-device: asy1
      asy1 is /pci@0,0/isa@7/asy@1,2f8

      pcplusmp: asy (asy) instance 0 irq 0x4 vector 0xb1 ioapic 0x2
      intin 0x4 is bound
       to cpu 0

      ISA-device: asy0
      asy0 is /pci@0,0/isa@7/asy@1,3f8
      pseudo-device: ucode0
      ucode0 is /pseudo/ucode@0
      iscsi0 at root
      iscsi0 is /iscsi
      acpinex: L1M0@16, acpinex2
      acpinex2 is /fw/sb@0/L1M0@16
      acpinex: L0M0@32, acpinex3
      acpinex3 is /fw/sb@0/L1M0@16/L0M0@32
      ISA-device: pit_beep0
      pit_beep0 is /pci@0,0/isa@7/pit_beep
      pseudo-device: pseudo1
      pseudo1 is /pseudo/zconsnex@1
      pcplusmp: ide (ata) instance 0 irq 0xe vector 0x45 ioapic 0x2
      intin 0xe is bound
       to cpu 1

      pcplusmp: ide (ata) instance 1 irq 0xf vector 0x45 ioapic 0x2
      intin 0xf is bound
       to cpu 0

              ATAPI device at targ 0, lun 0 lastlun 0x0
              model VMware Virtual IDE CDROM Drive
              ATA/ATAPI-4 supported, majver 0x1e minver 0x17

      PCI Express-device: ide@1, ata1       
      ata1 is /pci@0,0/pci-ide@7,1/ide@1
              UltraDMA mode 2 selected

      sd5 at ata1: target 0 lun 0
      sd5 is /pci@0,0/pci-ide@7,1/ide@1/sd@0,0
      pseudo-device: dcpc0
      dcpc0 is /pseudo/dcpc@0
      pseudo-device: dtrace0
      dtrace0 is /pseudo/dtrace@0
      pseudo-device: fasttrap0
      fasttrap0 is /pseudo/fasttrap@0
      pseudo-device: fbt0
      fbt0 is /pseudo/fbt@0
      pseudo-device: fcp0
      fcp0 is /pseudo/fcp@0
      pseudo-device: fcsm0
      fcsm0 is /pseudo/fcsm@0
      pseudo-device: fct0
      fct0 is /pseudo/fct@0

      WARNING: kvm: no hardware support

      pseudo-device: llc10
      llc10 is /pseudo/llc1@0
      pseudo-device: lockstat0              
      lockstat0 is /pseudo/lockstat@0
      pseudo-device: lofi0
      lofi0 is /pseudo/lofi@0
      pseudo-device: power0
      power0 is /pseudo/power@0
      pseudo-device: profile0
      profile0 is /pseudo/profile@0
      pseudo-device: ramdisk1024
      ramdisk1024 is /pseudo/ramdisk@1024
      pseudo-device: sdt0
      sdt0 is /pseudo/sdt@0
      pseudo-device: srn0
      srn0 is /pseudo/srn@0
      pseudo-device: stmf0
      stmf0 is /pseudo/stmf@0
      pseudo-device: systrace0
      systrace0 is /pseudo/systrace@0
      pseudo-device: bpf0
      bpf0 is /pseudo/bpf@0
      pseudo-device: fssnap0
      fssnap0 is /pseudo/fssnap@0
      IP Filter: v4.1.9, running.
      pseudo-device: nsmb0                  
      nsmb0 is /pseudo/nsmb@0
      pseudo-device: pm0
      pm0 is /pseudo/pm@0
      pseudo-device: pool0
      pool0 is /pseudo/pool@0

      panic[cpu0]/thread=ffffff01cfa42840: 

      BAD TRAP: type=e (#pf Page fault) rp=ffffff00088b1290 addr=20
      occurred in module

       "zfs" due to a NULL pointer dereference

      zfs: 

      #pf Page fault

      Bad kernel fault at addr=0x20

      pid=373, pc=0xfffffffff7a11e95, sp=0xffffff00088b1380,
      eflags=0x10202

      cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4:
      406b8<osxsav,xmme,fxsr,pge,pae,pse,de>
      cr2: 20
      cr3: 13fd16000
      cr8: c
              rdi: ffffff01d5fc7c88 rsi:                0
      rdx:                8
              rcx:         d5fad1c3  r8: ffffff00088b13f0  r9:
      ffffff00088b11c8
               rax:         d5fad1c8 rbx: ffffff00088b1650 rbp:
      ffffff00088b13d0
              r10: fffffffffb854358 r11:                0
      r12:                8
              r13:                0 r14: ffffff00088b13f0 r15:
      ffffff00088b1530
              fsb:                0 gsb: fffffffffbc30c40 
      ds:               4b
               es:               4b  fs:                0 
      gs:              1c3
              trp:                e err:                0 rip:
      fffffffff7a11e95
               cs:               30 rfl:            10202 rsp:
      ffffff00088b1380
               ss:               38

      ffffff00088b1170 unix:die+df ()
      ffffff00088b1280 unix:trap+db3 ()
      ffffff00088b1290 unix:cmntrap+e6 ()
      ffffff00088b13d0 zfs:zap_leaf_lookup_closest+45 ()
      ffffff00088b1470 zfs:fzap_cursor_retrieve+bb ()
      ffffff00088b1510 zfs:zap_cursor_retrieve+11e ()
      ffffff00088b1700 zfs:zfs_purgedir+67 ()
      ffffff00088b1750 zfs:zfs_rmnode+202 ()
      ffffff00088b1790 zfs:zfs_zinactive+e8 ()
      ffffff00088b17f0 zfs:zfs_inactive+75 ()
      ffffff00088b1850 genunix:fop_inactive+76 ()
      ffffff00088b1880 genunix:vn_rele+82 ()
      ffffff00088b1a70 zfs:zfs_unlinked_drain+aa ()
      ffffff00088b1ab0 zfs:zfsvfs_setup+e8 ()
       ffffff00088b1290 unix:cmntrap+e6 ()
      ffffff00088b13d0 zfs:zap_leaf_lookup_closest+45 ()
      ffffff00088b1470 zfs:fzap_cursor_retrieve+bb ()
      ffffff00088b1510 zfs:zap_cursor_retrieve+11e ()
      ffffff00088b1700 zfs:zfs_purgedir+67 ()
      ffffff00088b1750 zfs:zfs_rmnode+202 ()
      ffffff00088b1790 zfs:zfs_zinactive+e8 ()
      ffffff00088b17f0 zfs:zfs_inactive+75 ()
      ffffff00088b1850 genunix:fop_inactive+76 ()
      ffffff00088b1880 genunix:vn_rele+82 ()
      ffffff00088b1a70 zfs:zfs_unlinked_drain+aa ()
      ffffff00088b1ab0 zfs:zfsvfs_setup+e8 ()
      ffffff00088b1b10 zfs:zfs_domount+131 ()
      ffffff00088b1c40 zfs:zfs_mount+24f ()
      ffffff00088b1c70 genunix:fsop_mount+1e ()
      ffffff00088b1df0 genunix:domount+86b ()
      ffffff00088b1e80 genunix:mount+167 ()
      ffffff00088b1ec0 genunix:syscall_ap+94 ()
      ffffff00088b1f10 unix:brand_sys_sysenter+1c9 ()

      syncing file systems...
       done

      dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
    > ::quit

I do not know what is causing the problem and I would like to find out. I still have the dump files and if anyone is interested to analyse the issue, please let me know and I can share the dump file with you.

Rsyncing the data out and re-creating the filesystem and rsync the data back is ok, the filesystem is able to be auto-mounted.

Thanks & Regards.

History

#1

Updated by Dan McDonald over 4 years ago

This looks an awful lot like #5833. Perhaps this is a duplicate?

Also available in: Atom PDF