Bug #6338
cannot use umem tools on mdb itself
100%
Description
I have this core file of mdb itself:
$ mdb core.88456 Loading modules: [ libumem.so.1 libc.so.1 libproc.so.1 libumem.so libavl.so.1 libc.so ld.so ld.so.1 ] > ::status debugging core file of mdb (32-bit) from b44c74d6 file: /usr/bin/i86/mdb initial argv: mdb /home/dap/node-testing/extracores-x86/savedcore.jobsupervisor threading model: native threads status: process core file generated with gcore(1)
Note that mdb loads the libumem.so dmod, because MDB itself is linked against libumem. However, you can't actually use any of the umem tools:
> ::findleaks mdb: findleaks: umem is not loaded in the address space > ::umastat mdb: couldn't find umem_null_cache: unknown symbol name mdb: can't walk 'umem_cache': failed to initialize walk
I've gotten as far as seeing that "findleaks" is looking for "umem_ready", which is non-zero (as it should be):
> umem_ready/p libumem.so.1`umem_ready: libumem.so.1`umem_ready: 3
It may be related to the fact that the libumem dmod is named the same as the library itself, and both are loaded:
> $m ! grep libumem fe540000 fe555000 15000 /usr/lib/mdb/proc/libumem.so fe565000 fe566000 1000 /usr/lib/mdb/proc/libumem.so feaa0000 fead1000 31000 /lib/libumem.so.1 feae0000 feae2000 2000 /lib/libumem.so.1 feaf2000 feb05000 13000 /lib/libumem.so.1 feb05000 feb0b000 6000 /lib/libumem.so.1
and the dmod also defines a bunch of global symbols with the same names in both the dmod and the library itself (like "umem_ready"):
> libumem.so`umem_ready$m BASE LIMIT SIZE NAME fe565000 fe566000 1000 /usr/lib/mdb/proc/libumem.so > libumem.so`umem_ready/d libumem.so`umem_ready: libumem.so`umem_ready: 3 > libumem.so.1`umem_ready$m BASE LIMIT SIZE NAME feaf2000 feb05000 13000 /lib/libumem.so.1 > libumem.so.1`umem_ready/d libumem.so.1`umem_ready: libumem.so.1`umem_ready: 3
That's about as far as I've gotten in debugging this.
Updated by David Pacheco over 5 years ago
As I suspected, I think the presence of a dmod with the same name is confusing things. I stopped MDB (with DTrace) while the libumem dmod was loading itself, then attached MDB to that, and stepped through. I made it this far:
> $C 08046078 libproc.so.1`Pxlookup_by_name+0x8f(8148008, ffffffff, fd9612f0, fd9616b0, 80460a4, 80460bc) 080460e8 pt_lookup_cb+0x28(804611c, 814cf58, fd9612f0, 8048000) 08046158 pt_lookup_by_name_thr+0x107(8132f10, fd9612f0, fd9616b0, 80461c0, 80461d8, 1 ) 08046188 pt_lookup_by_name+0x33(8132f10, fd9612f0, fd9616b0, 80461c0, 80461d8, 1c) 080461f8 mdb_tgt_lookup_by_name+0x6f(8132f10, fd9612f0, fd9616b0, 8046248, 0, 811d625 ) 08046228 mdb_lookup_by_obj+0x1c(fd9612f0, fd9616b0, 8046248, 8122bf8) 08046268 libumem.so`umem_set_standalone+0x2a(22, 0, 80462b8, fd960ab4, 8046298, fd975000) 08046288 libumem.so`umem_update_variables+0x18(8122be0, 8122bf8, fd960e3c, fd962049, fd95beeb, fd95a5bf) 080462b8 libumem.so`umem_init+0x7c(804632c, 2, 80462f8, 80822f5, 804632c, fecae6a0) 080462c8 libumem.so`_mdb_init+0x18(804632c, fecae6a0, 2, 8122be0, fed21aa8, 804632c) 080462f8 mdb_module_create+0x147(80467b1, 804632c, 2, 0) 08046748 mdb_module_load+0x227(80467b1, 2, feb04d50, 82ee000) 08046788 module_load+0x26(8046cd8, 80467ac, 80467ac, feacdd63) 08046be8 pt_map_apply+0x2d(8046c74, 814dd58, 811f9a8, 80751fe) 08046c28 libproc.so.1`i_Pobject_iter+0x6e(8148008, 0, 808c4b3, 8046c74) 08046c48 libproc.so.1`Pobject_iter+0x16(8148008, 808c4b3, 8046c74, 0, 0, 0) 08046c98 pt_object_iter+0x75(8132f10, 8082a9f, 8046cd8, 0) 08046cb8 mdb_tgt_object_iter+0x16(8132f10, 8082a9f, 8046cd8, 0) 08046ce8 mdb_module_load_all+0x4b(0, 1, 80a624f, 8047f0a) 08047a08 main+0x1506(80479fc, fedc6728, 8047a38, 8063f4b, 4, 8047a44) 08047a38 _start+0x83(4, 8047b79, 8047b7d, 8047b80, 8047b89, 0)
The object is "libumem.so", and the symbols "umem_alloc" (because that's what the libumem dmod explicitly looks up in order to see if libumem is present in the target):
> fd9612f0/s 0xfd9612f0: libumem.so > fd9616b0/s 0xfd9616b0: umem_alloc
The file_info_t returned by the call to build_map_symtab() inside Pxlookup_by_name() refers to the dmod, not the libumem library:
> 0x8261008::print file_info_t file_pname file_pname = [ "/usr/lib/mdb/proc/libumem.so" ]
I haven't examined all the surrounding code to know this definitely won't work at this point (i.e., that it doesn't handle this later by searching again with a different object), but this is not promising.
Updated by David Pacheco over 5 years ago
Fleshing that out a little: umem_init() inside the dmod attempts to determine whether libumem is enabled inside the target process. It does this by calling umem_update_variables(), which first calls umem_set_standalone() to determine whether the target is linked against the standalone libumem that's used by kmdb. It does this by attempting to look up the symbol "umem_alloc" in the object name "libumem.so". If it finds it, then we're not standalone. Then it looks up "umem_alloc" in the executable itself. If that works, then we're looking at the standalone libumem. If both of these fail, it returns -1. Back in umem_update_variables(), this triggers us to set umem_ready = 0 and return 0, causing initialization to complete, but thinking that libumem is not loaded. It may be easier to see this a little more graphically:
mdb loads the "libumem" dmod: -> _mdb_init() -> umem_init() -> umem_update_variables() -> umem_set_standalone() -> mdb_lookup_by_obj("libumem.so", "umem_alloc") <- mdb_lookup_by_obj returns -1 (because "libumem.so" finds the libumem dmod loaded into the target MDB process, rather than the libumem library loaded into the target MDB process) -> mdb_lookup_by_obj(MDB_OBJ_EXEC, "umem_alloc") <- mdb_lookup_by_obj returns -1 (because this symbol does not appear in MDB's a.out) <- umem_set_standalone() returns -1 <- umem_update_variables() sets umem_ready = 0 and returns 0 -> umem_init() completes
We end up going through umem_update_variables() again immediately through a call to umem_statechange_cb(), but nothing's changed, and we're left with the same result. We also end up going through _mdb_init() for this dmod again as well (for reasons I don't know), but again, with the same result.
To really sum it up: when the libumem dmod initializes itself, it looks for the symbol "umem_alloc" in the target process to determine if libumem is really ready there. When the target is mdb itself with the libumem dmod loaded, it finds this "libumem.so", tries to look up "umem_alloc" there, does not find it, and ends up deciding that libumem is not loaded.
Just to be really clear about what symbols are where:
> ::nm libumem.so ! grep umem_alloc 0xfe54b4a8|0x000000af|FUNC |GLOB |0x0 |14 |get_umem_alloc_sizes > 0xfe54b4a8$m BASE LIMIT SIZE NAME fe540000 fe555000 15000 /usr/lib/mdb/proc/libumem.so > ::nm libumem.so.1 ! grep umem_alloc 0xfeac78b4|0x0000012d|FUNC |LOCL |0x2 |15 |umem_alloc_sizes_add 0xfeaca6f5|0x00000136|FUNC |LOCL |0x2 |15 |_umem_alloc 0xfeac77ed|0x000000c7|FUNC |LOCL |0x2 |15 |umem_alloc_sizes_remove 0xfeac71ba|0x00000034|FUNC |LOCL |0x2 |15 |umem_alloc_sizes_clear 0xfeac9e65|0x000000d9|FUNC |LOCL |0x2 |15 |_umem_alloc_align 0xfeac5f06|0x000000fb|FUNC |LOCL |0x0 |15 |umem_allocator_process 0xfeaf43c0|0x00000124|OBJT |LOCL |0x0 |24 |umem_alloc_sizes 0xfeaf4ba0|0x00010000|OBJT |LOCL |0x0 |28 |umem_alloc_table 0xfeac9d9f|0x000000c6|FUNC |LOCL |0x0 |15 |umem_alloc_retry 0xfeac9e65|0x000000d9|FUNC |WEAK |0x0 |15 |umem_alloc_align 0xfeaca6f5|0x00000136|FUNC |WEAK |0x0 |15 |umem_alloc > 0xfeaf42fc$m BASE LIMIT SIZE NAME feaf2000 feb05000 13000 /lib/libumem.so.1
Updated by Robert Mustacchi over 5 years ago
- Category set to mdb - modular debugger
- Assignee set to Robert Mustacchi
- % Done changed from 0 to 90
- Tags deleted (
needs-triage)
Updated by David Pacheco over 5 years ago
A workaround is that when you load mdb (the first time), run "::unload libumem.so" before you take a core file. Then the dmod won't be in the core file of mdb, and the initialization process won't be confused. So the flow is:
$ mdb your/core/file ... (run commands to test your dmod) > ::unload libumem.so
then gcore that mdb process, run mdb on that core, and run whatever umem tools you want (e.g., "::findleaks").
Updated by Electric Monk over 5 years ago
- Status changed from New to Closed
- % Done changed from 90 to 100
git commit 422418808a6580456c11a891d69016d29dae1440
commit 422418808a6580456c11a891d69016d29dae1440 Author: Robert Mustacchi <rm@joyent.com> Date: 2015-10-26T15:43:09.000Z 6338 cannot use umem tools on mdb itself Reviewed by: Dave Pacheco <dap@joyent.com> Reviewed by: Gordon Ross <gordon.ross@nexenta.com> Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Approved by: Garrett D'Amore <garrett@damore.org>