Project

General

Profile

Bug #9682

page fault in dsl_async_clone_destroy() while opening pool

Added by Brad Lewis about 1 year ago. Updated 12 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
2018-07-27
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

While running the zfs test suite for an unrelated change, I observed a crash. The problem is that when we are opening the pool, spa_load_impl() calls dmu_objset_find(dsl_destroy_inconsistent) before calling spa_spawn_aux_threads(). dsl_destroy_inconsistent() may destroy a clone, which does a sync task which calls dsl_async_clone_destroy(), which calls zthr_wakeup(spa->spa_livelist_delete_zthr). But spa_livelist_delete_zthr is not yet initialized (it will be initialized by the call to spa_spawn_aux_threads()).

> ::status
debugging crash dump vmcore.0 (64-bit) from
operating system: 5.11
origin/os-platform/dlpx-os-gate/___master___/no-check/mahrens/devops-gate/master/zfs-precommit/zap_prefetch/7fc
3634023cde2fa7188cab6c091e62bf4434c31-0-g7fc36340 (i86pc)
image uuid: 0767ca03-f223-eeca-c337-c33e6ef95a44
panic message:
BAD TRAP: type=e (#pf Page fault) rp=ffffff000df776f0 addr=8 occurred in module "unix" due to a NULL pointer de
reference
dump content: kernel pages only

> ::stack
mutex_enter+0xb()
dsl_async_clone_destroy+0x1a2(ffffff03ab53a800, ffffff03ab1d5900)
dsl_destroy_head_sync_impl+0x2e1(ffffff03ab53a800, ffffff03ab1d5900)
dsl_destroy_head_sync+0x73(ffffff000e08d400, ffffff03ab1d5900)
dsl_sync_task_sync+0x10a(ffffff000e08d2b0, ffffff03ab1d5900)
dsl_pool_sync+0x3a3(ffffff0390be0d00, e8)
spa_sync_iterate_to_convergence+0xf7(ffffff038954c000, ffffff03ab1d5a00)
spa_sync+0x272(ffffff038954c000, e8)
txg_sync_thread+0x297(ffffff0390be0d00)
thread_start+8()

History

#1

Updated by Electric Monk 12 months ago

  • % Done changed from 0 to 100
  • Status changed from New to Closed

git commit ade2c82828f0dca1f46919aa1bd936ea1a5a0047

commit  ade2c82828f0dca1f46919aa1bd936ea1a5a0047
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
Date:   2018-10-04T14:28:59.000Z

    9682 page fault in dsl_async_clone_destroy() while opening pool
    Reviewed by: Brad Lewis <brad.lewis@delphix.com>
    Reviewed by: Matt Ahrens <matt@delphix.com>
    Reviewed by: Sara Hartse <sara.hartse@delphix.com>
    Approved by: Robert Mustacchi <rm@joyent.com>

Also available in: Atom PDF