Bug #9681
closedztest failure in spa_history_log_internal due to spa_rename()
100%
Description
A recent test failed while running ztest, and produced the following result:
05/01 10:25:22 ztest -VVVVV -m 2 -r 0 -R 1 -v 2 -a 9 -T 43 -P 11 -s 128m -f /var/tmp/os-ztestzloop finished, 1 crashes found
running: cat ztest.cores
- core @ ./zloop-180501-093007/core:
debugging core file of ztest (64-bit) from zloop-2472-bc7947b1-0.dcenter
file: /usr/bin/amd64/ztest
initial argv: /usr/bin/amd64/ztest
threading model: native threads
status: process terminated by SIGABRT (Abort), pid=7585 uid=1025 code=-1
libc.so.1`_lwp_kill+0xa()
libc.so.1`raise+0x20(6)
libumem.so.1`umem_do_abort+0x44()
0xfffffd7ffe1583f5()
libumem.so.1`umem_error+0x1aa(7, 4599e8, 4a2feb0)
libumem.so.1`umem_free+0x11b(4a2feb0, 2e)
libzpool.so.1`log_internal+0xd2(4656070, fffffd7ff9bcd2d0, 1d42000, e93c700,
fffffd7ff9bd280e, fffffd7fd6738e20)
libzpool.so.1`spa_history_log_internal+0x162(1d42000, fffffd7ff9bcd2d0, 0,
fffffd7ff9bd280e)
libzpool.so.1`spa_async_thread+0x318(1d42000)
libc.so.1`_thrp_setup+0x8a(fffffd7ffdc0c240)
libc.so.1`_lwp_start()
It looks like vsnprintf(NULL, 0) returned a shorter length than we actually wrote. Incidentally, we should probably implement a kmem_vasprintf(), like kmem_asprintf(), and use it here rather than rolling our own.
::umem_status
Status: ready and active
Concurrency: 4
Logs: transaction=64k content=64k (inactive)
Message buffer:
umem allocator: bad free: free size (46) != alloc size (42)
buffer=4a2feb0 bufctl=4a310e0 cache: umem_alloc_48
previous transaction on buffer 4a2feb0:
thread=532 time=T-0.000005259 slab=49f0c40 cache: umem_alloc_48
libumem.so.1'umem_cache_alloc_debug+0xfd
libumem.so.1'umem_cache_alloc+0xb3
libumem.so.1'umem_alloc+0x64
libzpool.so.1'log_internal+0x97
libzpool.so.1'spa_history_log_internal+0x162
libzpool.so.1'spa_async_thread+0x318
libc.so.1'_thrp_setup+0x8a
libc.so.1'_lwp_start+0x0
umem: heap corruption detected
stack trace:
libumem.so.1'umem_err_recoverable+0xcd
libumem.so.1'umem_error+0x1aa
libumem.so.1'umem_free+0x11b
libzpool.so.1'log_internal+0xd2
libzpool.so.1'spa_history_log_internal+0x162
libzpool.so.1'spa_async_thread+0x318
libc.so.1'_thrp_setup+0x8a
libc.so.1'_lwp_start+0x0
fffffd7ff9bd280e/s
0xfffffd7ff9bd280e: pool '%s' size: %llu(+%llu)
4a2feb0/s
0x4a2feb0: pool 'ztest_tmp' size: 2281701376(+134217728)
the calling code is:
uint64_t old_space, new_space;
...
spa_history_log_internal(spa, "vdev online", NULL,
"pool '%s' size: %llu(+%llu)",
spa_name(spa), new_space, new_space - old_space);
It looks like spa_name() changed while we were running, between the two calls to vsnprintf(). It looks like this can happen when ztest_spa_rename() calls spa_rename(). I'm not sure what the point of this is, since spa_rename() appears to be dead code, aside from the test case. The obvious way to fix this would be for everyone using spa_name to hold the spa_namespace_lock, but that is probably impractical. We should consider removing spa_rename() instead.
Updated by Electric Monk over 3 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit 6aee0ad76969eb0027131b3a338f2d94ae86f728
commit 6aee0ad76969eb0027131b3a338f2d94ae86f728 Author: Matthew Ahrens <mahrens@delphix.com> Date: 2018-10-04T14:29:45.000Z 9681 ztest failure in spa_history_log_internal due to spa_rename() Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com>