Bug #3532
openAborted zfs recv causes NULL pointer kernel panic
0%
Description
Aborting a zfs recv operation is causing the system to panic. This was easily reproduced, 5 out of 5 times. Running illumos-gate from about 2 weeks ago.
crash dump at http://patrickdk.com/crash0.tar.gz
::status debugging crash dump vmcore.0 (64-bit) from stor3 operating system: 5.11 oi_151a7c (i86pc) image uuid: 5e96e3ce-73f5-e683-c532-ec021616a724 panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff0010375530 addr=0 occurred in module "unix" due to a NULL pointer dereference dump content: kernel pages only
::stack strncpy+0x28(ffffff02ee0e1000, 0, 2000) dodefault+0xca(ffffff02e1bc14a0, 1, 2000, ffffff02ee0e1000) dsl_prop_get_dd+0x1bf(ffffff02d583a000, ffffff02e1bc14a0, 1, 2000, ffffff02ee0e1000, 0) dsl_prop_get_ds+0x152() dsl_prop_set_sync+0x37a() dsl_props_set_sync+0x9b(ffffff02e27da700, ffffff001089b860, ffffff02ead4b500) dsl_sync_task_group_sync+0x146(ffffff02eb692198, ffffff02ead4b500) dsl_pool_sync+0x443(ffffff02e0de3a00, 58898) spa_sync+0x33f(ffffff02d5f52040, 58898) txg_sync_thread+0x1fc(ffffff02e0de3a00) thread_start+8()
::msgbuf MESSAGE open version 5000 pool aggr0 using 5000 This Solaris instance has UUID 5e96e3ce-73f5-e683-c532-ec021616a724 dump on /dev/zvol/dsk/rpool/dump size 8192 MB pseudo-device: pm0 pm0 is /pseudo/pm@0 pseudo-device: power0 power0 is /pseudo/power@0 pseudo-device: srn0 srn0 is /pseudo/srn@0 iscsi0 at root iscsi0 is /iscsi fcoe0 at root fcoe0 is /fcoe pseudo-device: pseudo1 pseudo1 is /pseudo/zconsnex@1 pcplusmp: asy (asy) instance 0 irq 0x4 vector 0xb0 ioapic 0x8 intin 0x4 is bound to cpu 3 pcplusmp: ide (ata) instance 0 irq 0xe vector 0x43 ioapic 0x8 intin 0xe is bound to cpu 0 acpinex: sb@0, acpinex1 acpinex1 is /fw/sb@0 ATAPI device at targ 0, lun 0 lastlun 0x0 model TEAC CD-ROM CD-224E PCI Express-device: ide@0, ata0 ata0 is /pci@0,0/pci-ide@1f,1/ide@0 ISA-device: asy0 asy0 is /pci@0,0/isa@1f/asy@1,3f8 pcplusmp: ide (ata) instance 1 irq 0xf vector 0x44 ioapic 0x8 intin 0xf is bound to cpu 1 UltraDMA mode 2 selected sd0 at ata0: target 0 lun 0 sd0 is /pci@0,0/pci-ide@1f,1/ide@0/sd@0,0 ISA-device: pit_beep0 pit_beep0 is /pci@0,0/isa@1f/pit_beep pseudo-device: llc10 llc10 is /pseudo/llc1@0 pseudo-device: lofi0 lofi0 is /pseudo/lofi@0 pseudo-device: ramdisk1024 ramdisk1024 is /pseudo/ramdisk@1024 pseudo-device: ucode0 ucode0 is /pseudo/ucode@0 pseudo-device: dcpc0 dcpc0 is /pseudo/dcpc@0 pseudo-device: dtrace0 dtrace0 is /pseudo/dtrace@0 pseudo-device: fasttrap0 fasttrap0 is /pseudo/fasttrap@0 pseudo-device: fbt0 fbt0 is /pseudo/fbt@0 pseudo-device: lockstat0 lockstat0 is /pseudo/lockstat@0 pseudo-device: profile0 profile0 is /pseudo/profile@0 pseudo-device: sdt0 sdt0 is /pseudo/sdt@0 pseudo-device: systrace0 systrace0 is /pseudo/systrace@0 pseudo-device: fcsm0 fcsm0 is /pseudo/fcsm@0 NOTICE: fcsm(0): attached to path /pci@0,0/pci8086,3595@2/pci8086,32a@0,2/pci1077,101@7/fp@0,0 NOTICE: fcsm(2): attached to path /pci@0,0/pci8086,3595@2/pci8086,32a@0,2/pci1077,101@7,1/fp@0,0 pseudo-device: nvidia255 nvidia255 is /pseudo/nvidia@255 pseudo-device: fcp0 fcp0 is /pseudo/fcp@0 pseudo-device: fct0 fct0 is /pseudo/fct@0 pseudo-device: stmf0 stmf0 is /pseudo/stmf@0 pseudo-device: pool0 pool0 is /pseudo/pool@0 pseudo-device: fssnap0 fssnap0 is /pseudo/fssnap@0 IP Filter: v4.1.9, running. pseudo-device: nsmb0 nsmb0 is /pseudo/nsmb@0 pseudo-device: bpf0 bpf0 is /pseudo/bpf@0 pseudo-device: ii0 ii0 is /pseudo/ii@0 sv (revision 11.11, SunOS 5.11, None) pseudo-device: sv0 sv0 is /pseudo/sv@0 pseudo-device: rdc0 rdc0 is /pseudo/rdc@0 pseudo-device: ncall0 ncall0 is /pseudo/ncall@0 pseudo-device: nsctl0 nsctl0 is /pseudo/nsctl@0 pseudo-device: nsctl0 nsctl0 is /pseudo/nsctl@0 pseudo-device: sdbc0 sdbc0 is /pseudo/sdbc@0 device pciclass,030000@d(display#0) keeps up device sd@0,0(sd#0), but the former is not power managed pcplusmp: ide (ata) instance 1 irq 0xf vector 0x44 ioapic 0x8 intin 0xf is bound to cpu 2 pcplusmp: ide (ata) instance 1 irq 0xf vector 0x44 ioapic 0x8 intin 0xf is bound to cpu 3 WARNING: constraints forbid retire: /pci@0,0/pci8086,3595@2/pci8086,32a@0,2 pseudo-device: devinfo0 devinfo0 is /pseudo/devinfo@0 panic[cpu2]/thread=ffffff0010375c40: BAD TRAP: type=e (#pf Page fault) rp=ffffff0010375530 addr=0 occurred in module "unix" due to a NULL pointer dereference sched: #pf Page fault Bad kernel fault at addr=0x0 pid=0, pc=0xfffffffffb887e98, sp=0xffffff0010375620, eflags=0x10206 cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de> cr2: 0 cr3: 8c00000 cr8: c rdi: ffffff02ee0e1000 rsi: 0 rdx: 2000 rcx: ffffff02ee0e1000 r8: 3e r9: ffffff0010375438 rax: ffffff02ee0e1000 rbx: 0 rbp: ffffff0010375640 r10: 1 r11: 0 r12: ffffff02ee0e1000 r13: 2000 r14: 32 r15: ffffff02ee0e1000 fsb: 0 gsb: ffffff02d7cea080 ds: 4b es: 4b fs: 0 gs: 1c3 trp: e err: 0 rip: fffffffffb887e98 cs: 30 rfl: 10206 rsp: ffffff0010375620 ss: 38 ffffff0010375410 unix:die+df () ffffff0010375520 unix:trap+db3 () ffffff0010375530 unix:cmntrap+e6 () ffffff0010375640 unix:strncpy+28 () ffffff0010375690 zfs:dodefault+ca () ffffff0010375750 zfs:dsl_prop_get_dd+1bf () ffffff0010375810 zfs:dsl_prop_get_ds+152 () ffffff00103758e0 zfs:dsl_prop_set_sync+37a () ffffff0010375990 zfs:dsl_props_set_sync+9b () ffffff00103759e0 zfs:dsl_sync_task_group_sync+146 () ffffff0010375aa0 zfs:dsl_pool_sync+443 () ffffff0010375b70 zfs:spa_sync+33f () ffffff0010375c20 zfs:txg_sync_thread+1fc () ffffff0010375c30 unix:thread_start+8 () syncing file systems... done dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Updated by Christopher Siden over 10 years ago
- Category set to zfs - Zettabyte File System
Updated by Christopher Siden over 10 years ago
I can't reproduce this by aborting a zfs recv with ctrl + C, do you have more specific information about the commands you ran to cause this reliably?
Updated by Patrick Domack over 10 years ago
I'll attempt to see what kind of things will cause it. In this case, it was moving a zvol from an existing location to a new location, and it didn't exist on the destination yet.
From the source location:
NAME PROPERTY VALUE SOURCE
aggr0/luns2/esx3 type volume -
aggr0/luns2/esx3 creation Wed May 2 10:29 2012 -
aggr0/luns2/esx3 used 279G -
aggr0/luns2/esx3 available 2.19T -
aggr0/luns2/esx3 referenced 215G -
aggr0/luns2/esx3 compressratio 1.95x -
aggr0/luns2/esx3 reservation none default
aggr0/luns2/esx3 volsize 800G local
aggr0/luns2/esx3 volblocksize 16K -
aggr0/luns2/esx3 checksum on default
aggr0/luns2/esx3 compression lzjb local
aggr0/luns2/esx3 readonly off default
aggr0/luns2/esx3 copies 1 default
aggr0/luns2/esx3 refreservation none default
aggr0/luns2/esx3 primarycache all default
aggr0/luns2/esx3 secondarycache all default
aggr0/luns2/esx3 usedbysnapshots 63.9G -
aggr0/luns2/esx3 usedbydataset 215G -
aggr0/luns2/esx3 usedbychildren 0 -
aggr0/luns2/esx3 usedbyrefreservation 0 -
aggr0/luns2/esx3 logbias latency default
aggr0/luns2/esx3 dedup off default
aggr0/luns2/esx3 mlslabel none default
aggr0/luns2/esx3 sync standard default
aggr0/luns2/esx3 refcompressratio 1.83x -
aggr0/luns2/esx3 written 5.57G -
And the sending command: zfs send -v -R aggr0/luns2/esx3@repsnap
And the receiving command: zfs recv -v -F -d aggr0
It might not have been a control-c on the recv side, but the sending end, causing a incomplete transmission?
I'll attempt to see if I can narrow it down.