Bug #3320

ZFS: kernel panic running "zfs diff" on oi_151a7

Added by Alexander Winston about 5 years ago. Updated almost 4 years ago.

Status:NewStart date:2012-10-29
Priority:UrgentDue date:
Assignee:-% Done:

0%

Category:zfs - Zettabyte File System
Target version:-
Difficulty:Medium Tags:needs-triage

Description

I got this kernel panic on oi_151a7 when running "zfs diff" between two snapshots of the same dataset. The dataset has been upgraded to filesystem version 5 on a raidz1 pool upgraded to pool version 28.

All commodity hardware. CPU is an Intel Q6600. Total RAM is 4 GiB non-ECC. Motherboard is based on the NVIDIA nForce 680i LT SLI series with the BIOS is flashed to the latest version. The raidz1 pool is made from 3 SATA HDDs. "zpool scrub" comes back clean and "zpool status" shows the pool as ONLINE with zero read, write, checksum, or data errors.

  • ::status
    debugging crash dump vmcore.0 (64-bit) from openindiana
    operating system: 5.11 oi_151a7 (i86pc)
    image uuid: (not set)
    panic message: 
    BAD TRAP: type=e (#pf Page fault) rp=ffffff0009061680 addr=20 occurred in module
     "zfs" due to a NULL pointer dereference
    dump content: kernel pages only
    
  • ::msgbuf
    panic[cpu1]/thread=ffffff01d3b048a0: 
    BAD TRAP: type=e (#pf Page fault) rp=ffffff0009061680 addr=20 occurred in module
     "zfs" due to a NULL pointer dereference
    
    zfs: 
    #pf Page fault
    Bad kernel fault at addr=0x20
    pid=2558, pc=0xfffffffff87c04e8, sp=0xffffff0009061778, eflags=0x10213
    cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
    cr2: 20
    cr3: 14b756000
    cr8: c
    
            rdi: ffffff01d4036610 rsi:                0 rdx:                0
            rcx:         d75c9393  r8:         28a36c68  r9: ffffff00090617e0
            rax:            7ffff rbx:                0 rbp: ffffff00090617c0
            r10:             ffff r11:                0 r12: ffffff01d4036610
            r13: ffffff01d4036610 r14: ffffff0009061900 r15: ffffff01cdf7ebc0
            fsb:                0 gsb: ffffff01cd5e3000  ds:               4b
             es:               4b  fs:                0  gs:              1c3
            trp:                e err:                0 rip: fffffffff87c04e8
             cs:               30 rfl:            10213 rsp: ffffff0009061778
             ss:               38
    
    ffffff0009061550 unix:die+dd ()
    ffffff0009061670 unix:trap+17db ()
    ffffff0009061680 unix:cmntrap+e6 ()
    ffffff00090617c0 zfs:zap_leaf_lookup_closest+40 ()
    ffffff0009061850 zfs:fzap_cursor_retrieve+c9 ()
    ffffff00090618e0 zfs:zap_cursor_retrieve+188 ()
    ffffff0009061990 zfs:zap_value_search+7c ()
    ffffff0009061b60 zfs:zfs_obj_to_path_impl+11d ()
    ffffff0009061be0 zfs:zfs_obj_to_stats+ae ()
    ffffff0009061c20 zfs:zfs_ioc_obj_to_stats+7b ()
    ffffff0009061cc0 zfs:zfsdev_ioctl+347 ()
    ffffff0009061d00 genunix:cdev_ioctl+45 ()
    ffffff0009061d40 specfs:spec_ioctl+5a ()
    ffffff0009061dc0 genunix:fop_ioctl+7b ()
    ffffff0009061ec0 genunix:ioctl+18e ()
    ffffff0009061f10 unix:brand_sys_sysenter+1c9 ()
    
    syncing file systems...
     done
    dumping to /dev/dsk/c4t1d0, offset 65536, content: kernel
    
  • ::stack
    zap_leaf_lookup_closest+0x40(ffffff01d4036610, 0, 0, ffffff00090617e0)
    fzap_cursor_retrieve+0xc9(ffffff01d4036610, ffffff0009061900, ffffff01cdf7ebc0)
    zap_cursor_retrieve+0x188(ffffff0009061900, ffffff01cdf7ebc0)
    zap_value_search+0x7c(ffffff01d77786c0, 1006, 1007, ffffffffffff, 
    ffffff00090619f1)
    zfs_obj_to_path_impl+0x11d(ffffff01d77786c0, 1007, ffffff01d7d35820, 
    ffffff01d3146088, ffffff01cdc5c430, 800)
    zfs_obj_to_stats+0xae(ffffff01d77786c0, 1007, ffffff01cdc5d570, ffffff01cdc5c430
    , 800)
    zfs_ioc_obj_to_stats+0x7b(ffffff01cdc5c000)
    zfsdev_ioctl+0x347(b500000000, 5a38, fe88d0c0, 100003, ffffff01cf7df5d0, 
    ffffff0009061de4)
    cdev_ioctl+0x45(b500000000, 5a38, fe88d0c0, 100003, ffffff01cf7df5d0, 
    ffffff0009061de4)
    spec_ioctl+0x5a(ffffff01cee2fa00, 5a38, fe88d0c0, 100003, ffffff01cf7df5d0, 
    ffffff0009061de4)
    fop_ioctl+0x7b(ffffff01cee2fa00, 5a38, fe88d0c0, 100003, ffffff01cf7df5d0, 
    ffffff0009061de4)
    ioctl+0x18e(3, 5a38, fe88d0c0)
    _sys_sysenter_post_swapgs+0x149()
    
  • ::panicinfo
                 cpu                1
              thread ffffff01d3b048a0
             message 
    BAD TRAP: type=e (#pf Page fault) rp=ffffff0009061680 addr=20 occurred in module
     "zfs" due to a NULL pointer dereference
                 rdi ffffff01d4036610
                 rsi                0
                 rdx                0
                 rcx         d75c9393
                  r8         28a36c68
                  r9 ffffff00090617e0
                 rax            7ffff
                 rbx                0
                 rbp ffffff00090617c0
                 r10             ffff
                 r10             ffff
                 r11                0
                 r12 ffffff01d4036610
                 r13 ffffff01d4036610
                 r14 ffffff0009061900
                 r15 ffffff01cdf7ebc0
              fsbase                0
              gsbase ffffff01cd5e3000
                  ds               4b
                  es               4b
                  fs                0
                  gs              1c3
              trapno                e
                 err                0
                 rip fffffffff87c04e8
                  cs               30
              rflags            10213
                 rsp ffffff0009061778
                  ss               38
              gdt_hi                0
              gdt_lo         700001ef
              idt_hi                0
              idt_lo         d0000fff
                 ldt                0
                task               70
                 cr0         8005003b
                 cr2               20
                 cr3        14b756000
                 cr4              6f8
    
  • ::cpuinfo -v
     ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
      0 fffffffffbc30520  1f    0    0  -1   no    no t-1    ffffff0007a05c40
     (idle)
                           |    
                RUNNING <--+    
                  READY         
               QUIESCED         
                 EXISTS         
                 ENABLE         
    
     ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
      1 fffffffffbc3ad20  1b    0    0  60   no    no t-0    ffffff01d3b048a0 zfs
                           |    
                RUNNING <--+    
                  READY         
                 EXISTS         
                 ENABLE         
    
     ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
      2 ffffff01cddccac0  1f    1    0  -1   no    no t-0    ffffff0007cf6c40
     (idle)
                           |    |
                RUNNING <--+    +-->  PRI THREAD           PROC
                  READY                60 ffffff0007b19c40 sched
               QUIESCED         
                 EXISTS         
                 ENABLE         
    
     ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
      3 ffffff01cddcb580  1f    0    0  -1   no    no t-1    ffffff0007d92c40
     (idle)
                           |    
                RUNNING <--+    
                  READY         
               QUIESCED         
                 EXISTS         
                 ENABLE         
    
    

Related issues

Related to illumos gate - Bug #3391: ZFS: kernel panic running "zfs diff" on oi_151a5 New 2012-11-27

History

#1 Updated by Christopher Siden about 5 years ago

Are you willing/able to provide a crash dump for this failure? It's hard to tell from the information provided what caused the NULL pointer reference.

#2 Updated by sham pavman over 4 years ago

I just tried this on my a7 and everything seems to work fine.

light@openindiana:~/illumos$ zfs list -t snapshot
NAME                                           USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT/nightly-733@2011-09-08-13:36:54    98.0M      -  10.8G  -
rpool/ROOT/openindiana-1@install               152M      -  2.73G  -
rpool/ROOT/openindiana-1@2011-09-08-13:05:13  4.11G      -  10.7G  -
rpool/ROOT/openindiana-1@2011-09-27-07:09:02   127M      -  11.2G  -
rpool/ROOT/openindiana-1@2012-07-27-06:41:27  39.1M      -  11.3G  -
rpool/ROOT/openindiana-1@2012-09-27-06:41:44  31.0M      -  11.3G  -
rpool/ROOT/openindiana-1@2013-03-21-15:10:46  58.3M      -  12.9G  -
tank/a@first                                    19K      -    31K  -
tank/a@sec                                        0      -  31.5K  -
light@openindiana:~/illumos$ zfs diff tank/a@first tank/a@sec
Unable to obtain diffs: 
   The sys_mount privilege or diff delegated permission is needed
   to execute the diff ioctl
light@openindiana:~/illumos$ sudo zfs diff tank/a@first tank/a@sec
M       /tank/a/
+       /tank/a/myfile
light@openindiana:~/illumos$ 

Can you just verify this on another independent setup and let us know the result please

#3 Updated by John Huang almost 4 years ago

Modify function zap_get_leaf_byblk(zap_t *zap, uint64_t blkid, dmu_tx_t *tx, krw_t lt, zap_leaf_t **lp) of zap.c

l = dmu_buf_get_user(db);

//here l->l_phys maybe NULL
if ((l != NULL) && (l->l_phys == NULL)) {
dmu_buf_rele(db, NULL);
return (EIO);
}

Also available in: Atom