Bug #6091
closedavl_add doesn't assert on non-debug builds
100%
Description
Long story short, avl_add's use of ASSERT(0)
can cause really strange looking crashes on non-debug builds of libavl.so because ASSERTs
turn into no-ops.
We've seen a box where zpool import
segfaulted repeatedly. In all cases, the stacktrace was:
> $c libavl.so.1`avl_insert+0x47(8043764, 8123cb0, fede39d8, fee02000) libavl.so.1`avl_add+0x33(8043764, 8123cb0, 28, 14) libzfs.so.1`zpool_find_import_impl+0x25a(811d548, 8043c2c, fefa0518, fedca87e) libzfs.so.1`zpool_search_import+0x9f(811d548, 8043c2c, 1, ffffffff, fef70a80, 0) zpool_do_import+0x645(1, 8047d00, 8079800, 801, 0, 0) main+0x165(8047cbc, fef5f888, 8047cf0, 805581b, 2, 8047cfc) _start+0x83(2, 8047dd8, 8047dde, 0, 8047de5, 8047e03)
The very strange thing was the where
argument (3rd) to avl_insert
:
> fede39d8::whatis fede39d8 is libzfs.so.1`zfs_strdup+0x1c, in /lib/libzfs.so.1 [fedb0000,fedf2000)
This immediately sounds like some sort of memory corruption. It turns out that it's a bug (or a feature?) in avl_add
:
void avl_add(avl_tree_t *tree, void *new_node) { avl_index_t where; /* * This is unfortunate. We want to call panic() here, even for * non-DEBUG kernels. In userland, however, we can't depend on anything * in libc or else the rtld build process gets confused. So, all we can * do in userland is resort to a normal ASSERT(). */ if (avl_find(tree, new_node, &where) != NULL) #ifdef _KERNEL panic("avl_find() succeeded inside avl_add()"); #else ASSERT(0); #endif avl_insert(tree, new_node, where); }
avl_add
does an avl_find
to obtain a value of where
. If it found an item, it panics/asserts. Finally, it proceeds to avl_insert
the new item in the position indicated by where
. On non-debug builds of libavl.so, the ASSERT turns into a no op and therefore avl_add
ends up calling avl_insert
even when the tree already has a node with that key.
Recall that on non-debug builds, avl_find
does not fill in where
if a node was found. So, on non-debug builds, avl_add
ends up calling avl_insert
when the node already exists with uninitialized value of where. (This is where our zfs_strdup+0x1c
came from.) Disassembly of avl_add
confirms this:
> avl_add::dis libavl.so.1`avl_add: pushl %ebp libavl.so.1`avl_add+1: movl %esp,%ebp libavl.so.1`avl_add+3: pushl %edi libavl.so.1`avl_add+4: pushl %esi libavl.so.1`avl_add+5: pushl %ebx libavl.so.1`avl_add+6: subl $0x20,%esp libavl.so.1`avl_add+9: call +0x0 <libavl.so.1`avl_add+0xe> libavl.so.1`avl_add+0xe: popl %ebx libavl.so.1`avl_add+0xf: addl $0x10f72,%ebx libavl.so.1`avl_add+0x15: movl 0x8(%ebp),%edi libavl.so.1`avl_add+0x18: movl 0xc(%ebp),%esi libavl.so.1`avl_add+0x1b: leal -0x1c(%ebp),%eax libavl.so.1`avl_add+0x1e: pushl %eax libavl.so.1`avl_add+0x1f: pushl %esi libavl.so.1`avl_add+0x20: pushl %edi libavl.so.1`avl_add+0x21: call -0x522 <PLT=libavl.so.1`avl_find> libavl.so.1`avl_add+0x26: addl $0xc,%esp libavl.so.1`avl_add+0x29: pushl -0x1c(%ebp) libavl.so.1`avl_add+0x2c: pushl %esi libavl.so.1`avl_add+0x2d: pushl %edi libavl.so.1`avl_add+0x2e: call -0x53f <PLT=libavl.so.1`avl_insert> libavl.so.1`avl_add+0x33: addl $0x10,%esp libavl.so.1`avl_add+0x36: leal -0xc(%ebp),%esp libavl.so.1`avl_add+0x39: popl %ebx libavl.so.1`avl_add+0x3a: popl %esi libavl.so.1`avl_add+0x3b: popl %edi libavl.so.1`avl_add+0x3c: leave libavl.so.1`avl_add+0x3d: ret
There are no branches between the call to avl_find
at offset 0x21 and the call to avl_insert
at 0x2e and therefore avl_insert
will be always called regardless of avl_find
's return value.
Updated by Electric Monk over 7 years ago
- Status changed from New to Closed
- % Done changed from 30 to 100
git commit faa2b6be2fc102adf9ed584fc1a667b4ddf50d78
commit faa2b6be2fc102adf9ed584fc1a667b4ddf50d78 Author: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Date: 2015-08-06T04:57:58.000Z 6091 avl_add doesn't assert on non-debug builds Reviewed by: Andy Stormont <astormont@racktopsystems.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Steve Dougherty <steve@asksteved.com> Approved by: Dan McDonald <danmcd@omniti.com>