Project

General

Profile

Bug #9681

ztest failure in spa_history_log_internal due to spa_rename()

Added by Brad Lewis about 1 year ago. Updated 12 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
2018-07-26
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

A recent test failed while running ztest, and produced the following result:

05/01 10:25:22 ztest -VVVVV -m 2 -r 0 -R 1 -v 2 -a 9 -T 43 -P 11 -s 128m -f /var/tmp/os-ztest
zloop finished, 1 crashes found
running: cat ztest.cores
  • core @ ./zloop-180501-093007/core:
    debugging core file of ztest (64-bit) from zloop-2472-bc7947b1-0.dcenter
    file: /usr/bin/amd64/ztest
    initial argv: /usr/bin/amd64/ztest
    threading model: native threads
    status: process terminated by SIGABRT (Abort), pid=7585 uid=1025 code=-1
    libc.so.1`_lwp_kill+0xa()
    libc.so.1`raise+0x20(6)
    libumem.so.1`umem_do_abort+0x44()
    0xfffffd7ffe1583f5()
    libumem.so.1`umem_error+0x1aa(7, 4599e8, 4a2feb0)
    libumem.so.1`umem_free+0x11b(4a2feb0, 2e)
    libzpool.so.1`log_internal+0xd2(4656070, fffffd7ff9bcd2d0, 1d42000, e93c700,
    fffffd7ff9bd280e, fffffd7fd6738e20)
    libzpool.so.1`spa_history_log_internal+0x162(1d42000, fffffd7ff9bcd2d0, 0,
    fffffd7ff9bd280e)
    libzpool.so.1`spa_async_thread+0x318(1d42000)
    libc.so.1`_thrp_setup+0x8a(fffffd7ffdc0c240)
    libc.so.1`_lwp_start()

It looks like vsnprintf(NULL, 0) returned a shorter length than we actually wrote. Incidentally, we should probably implement a kmem_vasprintf(), like kmem_asprintf(), and use it here rather than rolling our own.

::umem_status

Status: ready and active
Concurrency: 4
Logs: transaction=64k content=64k (inactive)
Message buffer:
umem allocator: bad free: free size (46) != alloc size (42)
buffer=4a2feb0 bufctl=4a310e0 cache: umem_alloc_48
previous transaction on buffer 4a2feb0:
thread=532 time=T-0.000005259 slab=49f0c40 cache: umem_alloc_48
libumem.so.1'umem_cache_alloc_debug+0xfd
libumem.so.1'umem_cache_alloc+0xb3
libumem.so.1'umem_alloc+0x64
libzpool.so.1'log_internal+0x97
libzpool.so.1'spa_history_log_internal+0x162
libzpool.so.1'spa_async_thread+0x318
libc.so.1'_thrp_setup+0x8a
libc.so.1'_lwp_start+0x0
umem: heap corruption detected
stack trace:
libumem.so.1'umem_err_recoverable+0xcd
libumem.so.1'umem_error+0x1aa
libumem.so.1'umem_free+0x11b
libzpool.so.1'log_internal+0xd2
libzpool.so.1'spa_history_log_internal+0x162
libzpool.so.1'spa_async_thread+0x318
libc.so.1'_thrp_setup+0x8a
libc.so.1'_lwp_start+0x0

fffffd7ff9bd280e/s

0xfffffd7ff9bd280e: pool '%s' size: %llu(+%llu)

4a2feb0/s

0x4a2feb0: pool 'ztest_tmp' size: 2281701376(+134217728)

the calling code is:

uint64_t old_space, new_space;
...
spa_history_log_internal(spa, "vdev online", NULL,
"pool '%s' size: %llu(+%llu)",
spa_name(spa), new_space, new_space - old_space);

It looks like spa_name() changed while we were running, between the two calls to vsnprintf(). It looks like this can happen when ztest_spa_rename() calls spa_rename(). I'm not sure what the point of this is, since spa_rename() appears to be dead code, aside from the test case. The obvious way to fix this would be for everyone using spa_name to hold the spa_namespace_lock, but that is probably impractical. We should consider removing spa_rename() instead.

History

#1

Updated by Electric Monk 12 months ago

  • % Done changed from 0 to 100
  • Status changed from New to Closed

git commit 6aee0ad76969eb0027131b3a338f2d94ae86f728

commit  6aee0ad76969eb0027131b3a338f2d94ae86f728
Author: Matthew Ahrens <mahrens@delphix.com>
Date:   2018-10-04T14:29:45.000Z

    9681 ztest failure in spa_history_log_internal due to spa_rename()
    Reviewed by: Prakash Surya <prakash.surya@delphix.com>
    Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
    Approved by: Robert Mustacchi <rm@joyent.com>

Also available in: Atom PDF