Project

General

Profile

Actions

Bug #3532

open

Aborted zfs recv causes NULL pointer kernel panic

Added by Patrick Domack over 10 years ago. Updated over 10 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
2013-02-06
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:
External Bug:

Description

Aborting a zfs recv operation is causing the system to panic. This was easily reproduced, 5 out of 5 times. Running illumos-gate from about 2 weeks ago.

crash dump at http://patrickdk.com/crash0.tar.gz

::status
debugging crash dump vmcore.0 (64-bit) from stor3
operating system: 5.11 oi_151a7c (i86pc)
image uuid: 5e96e3ce-73f5-e683-c532-ec021616a724
panic message: 
BAD TRAP: type=e (#pf Page fault) rp=ffffff0010375530 addr=0 occurred in module "unix" due to a NULL pointer dereference
dump content: kernel pages only

::stack
strncpy+0x28(ffffff02ee0e1000, 0, 2000)
dodefault+0xca(ffffff02e1bc14a0, 1, 2000, ffffff02ee0e1000)
dsl_prop_get_dd+0x1bf(ffffff02d583a000, ffffff02e1bc14a0, 1, 2000, ffffff02ee0e1000, 0)
dsl_prop_get_ds+0x152()
dsl_prop_set_sync+0x37a()
dsl_props_set_sync+0x9b(ffffff02e27da700, ffffff001089b860, ffffff02ead4b500)
dsl_sync_task_group_sync+0x146(ffffff02eb692198, ffffff02ead4b500)
dsl_pool_sync+0x443(ffffff02e0de3a00, 58898)
spa_sync+0x33f(ffffff02d5f52040, 58898)
txg_sync_thread+0x1fc(ffffff02e0de3a00)
thread_start+8()

::msgbuf
MESSAGE                                                               
open version 5000 pool aggr0 using 5000
This Solaris instance has UUID 5e96e3ce-73f5-e683-c532-ec021616a724
dump on /dev/zvol/dsk/rpool/dump size 8192 MB
pseudo-device: pm0
pm0 is /pseudo/pm@0
pseudo-device: power0
power0 is /pseudo/power@0
pseudo-device: srn0
srn0 is /pseudo/srn@0
iscsi0 at root
iscsi0 is /iscsi
fcoe0 at root
fcoe0 is /fcoe
pseudo-device: pseudo1
pseudo1 is /pseudo/zconsnex@1
pcplusmp: asy (asy) instance 0 irq 0x4 vector 0xb0 ioapic 0x8 intin 0x4 is bound to cpu 3
pcplusmp: ide (ata) instance 0 irq 0xe vector 0x43 ioapic 0x8 intin 0xe is bound to cpu 0
acpinex: sb@0, acpinex1
acpinex1 is /fw/sb@0
        ATAPI device at targ 0, lun 0 lastlun 0x0
        model TEAC CD-ROM CD-224E
PCI Express-device: ide@0, ata0
ata0 is /pci@0,0/pci-ide@1f,1/ide@0
ISA-device: asy0
asy0 is /pci@0,0/isa@1f/asy@1,3f8
pcplusmp: ide (ata) instance 1 irq 0xf vector 0x44 ioapic 0x8 intin 0xf is bound to cpu 1
        UltraDMA mode 2 selected
sd0 at ata0: target 0 lun 0
sd0 is /pci@0,0/pci-ide@1f,1/ide@0/sd@0,0
ISA-device: pit_beep0
pit_beep0 is /pci@0,0/isa@1f/pit_beep
pseudo-device: llc10
llc10 is /pseudo/llc1@0
pseudo-device: lofi0
lofi0 is /pseudo/lofi@0
pseudo-device: ramdisk1024
ramdisk1024 is /pseudo/ramdisk@1024
pseudo-device: ucode0
ucode0 is /pseudo/ucode@0
pseudo-device: dcpc0
dcpc0 is /pseudo/dcpc@0
pseudo-device: dtrace0
dtrace0 is /pseudo/dtrace@0
pseudo-device: fasttrap0
fasttrap0 is /pseudo/fasttrap@0
pseudo-device: fbt0
fbt0 is /pseudo/fbt@0
pseudo-device: lockstat0
lockstat0 is /pseudo/lockstat@0
pseudo-device: profile0
profile0 is /pseudo/profile@0
pseudo-device: sdt0
sdt0 is /pseudo/sdt@0
pseudo-device: systrace0
systrace0 is /pseudo/systrace@0
pseudo-device: fcsm0                  
fcsm0 is /pseudo/fcsm@0
NOTICE: fcsm(0): attached to path /pci@0,0/pci8086,3595@2/pci8086,32a@0,2/pci1077,101@7/fp@0,0
NOTICE: fcsm(2): attached to path /pci@0,0/pci8086,3595@2/pci8086,32a@0,2/pci1077,101@7,1/fp@0,0
pseudo-device: nvidia255
nvidia255 is /pseudo/nvidia@255
pseudo-device: fcp0
fcp0 is /pseudo/fcp@0
pseudo-device: fct0
fct0 is /pseudo/fct@0
pseudo-device: stmf0
stmf0 is /pseudo/stmf@0
pseudo-device: pool0
pool0 is /pseudo/pool@0
pseudo-device: fssnap0
fssnap0 is /pseudo/fssnap@0
IP Filter: v4.1.9, running.
pseudo-device: nsmb0
nsmb0 is /pseudo/nsmb@0
pseudo-device: bpf0
bpf0 is /pseudo/bpf@0
pseudo-device: ii0
ii0 is /pseudo/ii@0
sv (revision 11.11, SunOS 5.11, None)
pseudo-device: sv0
sv0 is /pseudo/sv@0
pseudo-device: rdc0
rdc0 is /pseudo/rdc@0
pseudo-device: ncall0
ncall0 is /pseudo/ncall@0
pseudo-device: nsctl0
nsctl0 is /pseudo/nsctl@0
pseudo-device: nsctl0
nsctl0 is /pseudo/nsctl@0
pseudo-device: sdbc0
sdbc0 is /pseudo/sdbc@0
device pciclass,030000@d(display#0) keeps up device sd@0,0(sd#0), but the former is not power managed
pcplusmp: ide (ata) instance 1 irq 0xf vector 0x44 ioapic 0x8 intin 0xf is bound to cpu 2
pcplusmp: ide (ata) instance 1 irq 0xf vector 0x44 ioapic 0x8 intin 0xf is bound to cpu 3
WARNING: constraints forbid retire: /pci@0,0/pci8086,3595@2/pci8086,32a@0,2
pseudo-device: devinfo0
devinfo0 is /pseudo/devinfo@0

panic[cpu2]/thread=ffffff0010375c40: 
BAD TRAP: type=e (#pf Page fault) rp=ffffff0010375530 addr=0 occurred in module "unix" due to a NULL pointer dereference

sched: 
#pf Page fault
Bad kernel fault at addr=0x0
pid=0, pc=0xfffffffffb887e98, sp=0xffffff0010375620, eflags=0x10206
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: 0
cr3: 8c00000
cr8: c

        rdi: ffffff02ee0e1000 rsi:                0 rdx:             2000
        rcx: ffffff02ee0e1000  r8:               3e  r9: ffffff0010375438
        rax: ffffff02ee0e1000 rbx:                0 rbp: ffffff0010375640
        r10:                1 r11:                0 r12: ffffff02ee0e1000
        r13:             2000 r14:               32 r15: ffffff02ee0e1000
        fsb:                0 gsb: ffffff02d7cea080  ds:               4b
         es:               4b  fs:                0  gs:              1c3
        trp:                e err:                0 rip: fffffffffb887e98
         cs:               30 rfl:            10206 rsp: ffffff0010375620
         ss:               38

ffffff0010375410 unix:die+df ()
ffffff0010375520 unix:trap+db3 ()
ffffff0010375530 unix:cmntrap+e6 ()
ffffff0010375640 unix:strncpy+28 ()
ffffff0010375690 zfs:dodefault+ca ()
ffffff0010375750 zfs:dsl_prop_get_dd+1bf ()
ffffff0010375810 zfs:dsl_prop_get_ds+152 ()
ffffff00103758e0 zfs:dsl_prop_set_sync+37a ()
ffffff0010375990 zfs:dsl_props_set_sync+9b ()
ffffff00103759e0 zfs:dsl_sync_task_group_sync+146 ()
ffffff0010375aa0 zfs:dsl_pool_sync+443 ()
ffffff0010375b70 zfs:spa_sync+33f ()
ffffff0010375c20 zfs:txg_sync_thread+1fc ()
ffffff0010375c30 unix:thread_start+8 ()

syncing file systems...
 done
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Actions #1

Updated by Christopher Siden over 10 years ago

  • Category set to zfs - Zettabyte File System
Actions #2

Updated by Christopher Siden over 10 years ago

I can't reproduce this by aborting a zfs recv with ctrl + C, do you have more specific information about the commands you ran to cause this reliably?

Actions #3

Updated by Patrick Domack over 10 years ago

I'll attempt to see what kind of things will cause it. In this case, it was moving a zvol from an existing location to a new location, and it didn't exist on the destination yet.

From the source location:

NAME PROPERTY VALUE SOURCE
aggr0/luns2/esx3 type volume -
aggr0/luns2/esx3 creation Wed May 2 10:29 2012 -
aggr0/luns2/esx3 used 279G -
aggr0/luns2/esx3 available 2.19T -
aggr0/luns2/esx3 referenced 215G -
aggr0/luns2/esx3 compressratio 1.95x -
aggr0/luns2/esx3 reservation none default
aggr0/luns2/esx3 volsize 800G local
aggr0/luns2/esx3 volblocksize 16K -
aggr0/luns2/esx3 checksum on default
aggr0/luns2/esx3 compression lzjb local
aggr0/luns2/esx3 readonly off default
aggr0/luns2/esx3 copies 1 default
aggr0/luns2/esx3 refreservation none default
aggr0/luns2/esx3 primarycache all default
aggr0/luns2/esx3 secondarycache all default
aggr0/luns2/esx3 usedbysnapshots 63.9G -
aggr0/luns2/esx3 usedbydataset 215G -
aggr0/luns2/esx3 usedbychildren 0 -
aggr0/luns2/esx3 usedbyrefreservation 0 -
aggr0/luns2/esx3 logbias latency default
aggr0/luns2/esx3 dedup off default
aggr0/luns2/esx3 mlslabel none default
aggr0/luns2/esx3 sync standard default
aggr0/luns2/esx3 refcompressratio 1.83x -
aggr0/luns2/esx3 written 5.57G -

And the sending command: zfs send -v -R aggr0/luns2/esx3@repsnap
And the receiving command: zfs recv -v -F -d aggr0

It might not have been a control-c on the recv side, but the sending end, causing a incomplete transmission?

I'll attempt to see if I can narrow it down.

Actions

Also available in: Atom PDF