Bug #4229
closedmdb hangs on exit when long umem cache names exist
100%
Description
libumem allows caches to have an ASCII name - up to 31 chars. If a 31-char long name is used, mdb will hang on exit. (A shorter cache name works as expected.)
$ <download attached test file> $ gcc -Wall -O1 -lumem -o test test.c $ ./test 30 making cache of length 30: 123456789012345678901234567890 Segmentation Fault (core dumped) $ echo '' | mdb core $ ./test 31 making cache of length 31: 1234567890123456789012345678901 Segmentation Fault (core dumped) $ echo '' | mdb core <hang>
The hang is mdb pegging a CPU looping inside the teardown code:
nv_hashstring+0x2c(81b2e9d, 8102c68, 8046828, 8074053) mdb_nv_lookup+0x10(8102c68, 81b2e9d, 81038b0, 81b09dd) mdb_module_remove_walker+0x1b(8102c50) mdb_module_unload_common+0x11e(81038b0, 8139f28, 80468c8, 809af3f, 0, 54) mdb_module_unload+0xe(81038b0, 2, 80468d8, 2) mdb_module_unload_all+0x20(2) mdb_destroy+0x3a(8103e98, 8139f10, 80476b8, 8083b76, 0, 8139f10) terminate+0x10(0, 8139f10, 80476b8, 8083ad9) main+0x14d2(2, 80476e8, 80476f4, 80476dc) _start+0x7d(2, 8047830, 8047834, 0, 804783e, 804784a)
Profiling stacks using DTrace shows that we're looping somewhere in:
mdb`mdb_module_unload+0xe mdb`mdb_module_unload_all+0x20 mdb`mdb_destroy+0x3a mdb`terminate+0x10 mdb`main+0x14d2 mdb`_start+0x7d
The culprit seems to be libumem:
$ sudo dtrace -n 'pid$target::mdb_module_unload:entry{ trace(copyinstr(arg0)) }' -p `pgrep mdb` dtrace: description 'pid$target::mdb_module_unload:entry' matched 1 probe CPU ID FUNCTION:NAME 0 77990 mdb_module_unload:entry ld.so.1 0 77990 mdb_module_unload:entry libc.so.1 0 77990 mdb_module_unload:entry libumem.so.1
Files
Updated by Josef Sipek almost 10 years ago
Quick update:
Changing the MDB_NV_NAMELEN #define from 31 to 32 makes mdb happy. In other words, the issue is with mdb_var_t's cap on internally stored variable names.
Updated by Josef Sipek almost 10 years ago
- Category set to mdb - modular debugger
Updated by Josef Sipek almost 10 years ago
Ok, so summary so far...
libumem has cache names. The names can be up to 32 bytes long (including the trailing null). The libumem mdb dmod instantiates a walker for each cache and uses the cache's name for the walker's name. mdb keeps track of walker names using a mdb_nv_t & mdb_var_t. mdb_var_t can store the name either inline or externally (MDB_NV_EXTNAME). There is a limit on the maximum length of an inline name - MDB_NV_NAMELEN (31). The libumem dmod always tries to use inline names.
mdb_var_alloc() gets the name, and if inline names are supposed to be used, it strncpy's it in. If the name was longer than possible, it is silently truncated. Then on mdb shutdown, the code tries to unregister all the walkers but it ends up looping forever because the name of the walker that it expects does not match the name of the mdb_var_t that's keeping track of it.
As a sidenote, mdb_var_alloc always allocates sizeof(mdb_var_t) + 31 if inline names are to be used - regardless of the length of the name.
Updated by Josef Sipek almost 10 years ago
Here are the possible solutions...
- update every user of mdb_nv_insert to check the length and strdup as necessary - this is error prone and causes unnecessary code duplication
- change mdb_var_alloc to strdup long names when MDB_NV_EXTNAME wasn't specified, and automatically enables MDB_NV_EXTNAME
- change mdb_var_alloc to allow any length name inline
Of these solutions, the third one seems to be the cleanest.
The second and third are really the same, but the second splits the allocation into two and then stashes the pointer in mdb_var_t. Additionally, the second approach will also conflict with MDB_NV_OVERLOAD since the external name pointer is shared by MDB_NV_OVERLOAD and MDB_NV_EXTNAME. (The two could be decoupled and that's why I'm including it as a valid approach.)
* A variable may have multiple definitions (v_ndef chain), but this feature * is mutually exclusive with MDB_NV_EXTNAME in order to save space.
Updated by Rich Lowe almost 10 years ago
- Status changed from New to Resolved
- % Done changed from 0 to 100
- Tags deleted (
needs-triage)
Resolved in 065c692