Bug #8565

zpool dumps core trying to destroy unavail/faulted pool and try export it

Added by Igor Kozhukhov 3 months ago. Updated about 1 month ago.

Status:NewStart date:2017-08-07
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:zfs - Zettabyte File System
Target version:-
Difficulty:Medium Tags:needs-triage

Description

DilOS with latest merged changes from illumos
i'm using zfs/zpool with 64bit builds

create zpool
remove drives - zpool fault
try to destroy zpool - it failed with core

root@con3:~# mdb core
Loading modules: [ libumem.so.1 libc.so.1 libavl.so.1 libnvpair.so.1 libuutil.so.1 ld.so.1 ]
> $C
ffffbf7fffdfb8e0 libumem.so.1`free+0xe()
ffffbf7fffdfb900 libshare.so.1`sa_init_arg+0xb(4, ffffbf7fffdfb9d0)
ffffbf7fffdfb950 libzfs.so.1`zfs_init_libshare_impl+0xac(4be550, 4, ffffbf7fffdfb9d0)
ffffbf7fffdfb980 libzfs.so.1`zfs_init_libshare_arg+0x10(4be550, 4, ffffbf7fffdfb9d0)
ffffbf7fffdfba50 libzfs.so.1`zpool_disable_datasets+0x1af(4c1dd0, 1)
ffffbf7fffdfba90 zpool_do_destroy+0xc5(3, ffffbf7fffdffb40)
ffffbf7fffdffae0 main+0x15e(4, ffffbf7fffdffb38)
ffffbf7fffdffb10 _start_crt+0x83()
ffffbf7fffdffb20 _start+0x18()
> 

probably related to changes:
https://github.com/openzfs/openzfs/commit/d8eca23abd748aa0da443ec0816009f71ac10870

lines 1055 & 1057 with free()

example of core with 'zpoo export'

root@con1:~# mdb core 
Loading modules: [ libumem.so.1 libc.so.1 libavl.so.1 libnvpair.so.1 libuutil.so.1 ld.so.1 ]
> $C
ffffbf7fffdfb210 libc.so.1`_lwp_kill+0xa()
ffffbf7fffdfb240 libc.so.1`raise+0x1e(6)
ffffbf7fffdfb250 libumem.so.1`umem_do_abort+0x44()
ffffbf7fffdfb350 0xffffbf7fff0288f9()
ffffbf7fffdfb3a0 libumem.so.1`process_free+0xa5(482a58, 1, 0)
ffffbf7fffdfb3c0 libumem.so.1`umem_malloc_free+0x1a(482a58)
ffffbf7fffdfb8d0 libshare.so.1`sa_init_impl+0x44b(4, ffffbf7fffdfb9c0)
ffffbf7fffdfb8f0 libshare.so.1`sa_init_arg+0xb(4, ffffbf7fffdfb9c0)
ffffbf7fffdfb940 libzfs.so.1`zfs_init_libshare_impl+0xac(484550, 4, ffffbf7fffdfb9c0)
ffffbf7fffdfb970 libzfs.so.1`zfs_init_libshare_arg+0x10(484550, 4, ffffbf7fffdfb9c0)
ffffbf7fffdfba40 libzfs.so.1`zpool_disable_datasets+0x1af(487dd0, 0)
ffffbf7fffdfbaa0 zpool_do_export+0xbb(2, ffffbf7fffdffb50)
ffffbf7fffdffaf0 main+0x15e(3, ffffbf7fffdffb48)
ffffbf7fffdffb20 _start_crt+0x83()
ffffbf7fffdffb30 _start+0x18()
>  

root@con1:~# pstack core 
core 'core' of 1716:    zpool export tstpool3
 ffffbf7fff2ab1fa _lwp_kill () + a
 ffffbf7fff240d6e raise (6) + 1e
 ffffbf7fff0286c4 umem_do_abort () + 44
 ffffbf7fff0288f9 ???????? ()
 ffffbf7fff02b525 process_free (482a58, 1, 0) + a5
 ffffbf7fff02b66a umem_malloc_free (482a58) + 1a
 ffffbf7ffe3fefeb sa_init_impl (4, ffffbf7fffdfb9c0) + 44b
 ffffbf7ffe3ff17b sa_init_arg (4, ffffbf7fffdfb9c0) + b
 ffffbf7ffe4b8d0c zfs_init_libshare_impl (484550, 4, ffffbf7fffdfb9c0) + ac
 ffffbf7ffe4b8d60 zfs_init_libshare_arg (484550, 4, ffffbf7fffdfb9c0) + 10
 ffffbf7ffe4b9bff zpool_disable_datasets (487dd0, 0) + 1af
 000000000040a0fb zpool_do_export (2, ffffbf7fffdffb50) + bb
 0000000000411f2e main (3, ffffbf7fffdffb48) + 15e
 00000000004084c3 _start_crt () + 83
 0000000000408428 _start () + 18

History

#1 Updated by Igor Kozhukhov 3 months ago

  • Description updated (diff)

#2 Updated by Yuri Pankov 3 months ago

Yeah, it's reproducible, here's a bit better trace:

# zpool status
  pool: data
 state: UNAVAIL
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        data        UNAVAIL      0     0     0  insufficient replicas
          c4t0d0    REMOVED      0     0     0

# zpool destroy data
Segmentation Fault (core dumped)
# mdb /var/cores/core.zpool.100653.1502125643
Loading modules: [ libumem.so.1 libc.so.1 libtopo.so.1 libavl.so.1 libnvpair.so.1 libuutil.so.1 ld.so.1 ]
> $C
08043588 libumem.so.1`process_free+0x22(baddcafe, 1, 0, fec58000)
080435a8 libumem.so.1`umem_malloc_free+0x1a(baddcafe, 815eaa0, 81993c8, 1, 0, 0)
08043a88 libshare.so.1`sa_init_impl+0x576(4, 8043b44, 8174f98, fedbc172)
08043aa8 libshare.so.1`sa_init_arg+0x11(4, 8043b44, 400, fedbc1bc, 0, fede0000)
08043ad8 libzfs.so.1`zfs_init_libshare_impl+0x5d(81410c8, 4, 8043b44, 8043b44)
08043af8 libzfs.so.1`zfs_init_libshare_arg+0x14(81410c8, 4, 8043b44, 80528f2, fef70548, fef70548)
08043b78 libzfs.so.1`zpool_disable_datasets+0x1ee(8143dc8, 0, 806730c, 8055344)
08043bb8 zpool_do_destroy+0x140(2, 8047c48, 80787c0, 801, 0, 0)
08047c08 main+0x12c(feed8147, fef53328, 8047c38, 80551a3, 3, 8047c44)
08047c38 _start+0x83(3, 8047d48, 8047d4e, 8047d56, 0, 8047d5b)
>

#3 Updated by Yuri Pankov 3 months ago

  • Category set to zfs - Zettabyte File System
  • Subject changed from zpool destroy produce segmentation fault to zpool dumps core trying to destroy unavail/faulted pool

#4 Updated by Igor Kozhukhov 3 months ago

  • Description updated (diff)

#5 Updated by Matthew Ahrens 3 months ago

It looks like sa_get_one_zfs_share() doesn’t fill in all the paths[] when it gets the SA_SYSTEM_ERR error
Probably we need to free all the paths and set `*path_len=0` in the error case.

Thanks for reporting this, Igor. We'll get on it.

#6 Updated by Igor Kozhukhov about 1 month ago

  • Description updated (diff)
  • Subject changed from zpool dumps core trying to destroy unavail/faulted pool to zpool dumps core trying to destroy unavail/faulted pool and try export it

Also available in: Atom