Project

General

Profile

Actions

Bug #13023

open

Changing dump device during savecore causes panic

Added by Jason King over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

Discovered by Jörgen Lundman -- if savecore is running and an administrator runs dumpadm to alter the dump device, the system will panic at zvol_dumpio+c4 due to a NULL pointer dereference (in this instance the dump device was being removed via dumpadm -d none...

Looking at zvol_dumpio+c4:

zvol_dumpio+0xb5:               call   +0x44bdf66       <list_head>
zvol_dumpio+0xba:               movq   %rax,%r14
zvol_dumpio+0xbd:               movq   0x408(%rbx),%rax
zvol_dumpio+0xc4:               imulq  0x20(%r14),%rax
zvol_dumpio+0xc9:               cmpq   %rax,%r12
zvol_dumpio+0xcc:               jb     +0x25    <zvol_dumpio+0xf3>

it appears to be part of this loop:

    /* Locate the extent this belongs to */
     ze = list_head(&zv->zv_extents);
    while (offset >= ze->ze_nblks * zv->zv_volblocksize) {
          offset -= ze->ze_nblks * zv->zv_volblocksize;
          ze = list_next(&zv->zv_extents, ze);
      }

      if (ze == NULL)
          return (SET_ERROR(EINVAL));

Specifically the multiply in the while() condition.

The problem appears to be that dumpadm -d none causes zvol_dump_fini() to be called. This (among other things) calls zvol_free_extents() which removes the list of extents for the dump zvol from zv->extents. Both of these occur without any locks, so once the extents are removed, either while in the loop or prior to starting the loop, ze will end up NULL and cause the panic.

There's a few possible approaches to fix. The simplest would be to add a NULL check in the above while loop which would cause it to abort and return EINVAL. Since undumpifying a zvol appears to clear it (zvol_dump_fini() calls dmu_free_long_range() for the entire size of the zvol), this would effectively kill any running savecore processes when the dump volume is changed while savecore is writing out a dump. Another idea would be to have the dumpadm -d block until the savecore finishes (if it's already running) or is killed. A further idea would be to defer the undumpifying until the savecore process is complete (though this has the disadvantage of the command returning, but the state of the zvol not reflecting it's new state until the savecore process finishes).

No data to display

Actions

Also available in: Atom PDF