Project

General

Profile

Bug #6338

cannot use umem tools on mdb itself

Added by David Pacheco about 4 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Category:
mdb - modular debugger
Start date:
2015-10-15
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

I have this core file of mdb itself:

$ mdb core.88456 
Loading modules: [ libumem.so.1 libc.so.1 libproc.so.1 libumem.so libavl.so.1 libc.so ld.so ld.so.1 ]
> ::status
debugging core file of mdb (32-bit) from b44c74d6
file: /usr/bin/i86/mdb
initial argv: mdb /home/dap/node-testing/extracores-x86/savedcore.jobsupervisor
threading model: native threads
status: process core file generated with gcore(1)

Note that mdb loads the libumem.so dmod, because MDB itself is linked against libumem. However, you can't actually use any of the umem tools:

> ::findleaks
mdb: findleaks: umem is not loaded in the address space
> ::umastat
mdb: couldn't find umem_null_cache: unknown symbol name
mdb: can't walk 'umem_cache': failed to initialize walk

I've gotten as far as seeing that "findleaks" is looking for "umem_ready", which is non-zero (as it should be):

> umem_ready/p
libumem.so.1`umem_ready:
libumem.so.1`umem_ready:        3               

It may be related to the fact that the libumem dmod is named the same as the library itself, and both are loaded:

> $m ! grep libumem     
fe540000 fe555000    15000 /usr/lib/mdb/proc/libumem.so
fe565000 fe566000     1000 /usr/lib/mdb/proc/libumem.so
feaa0000 fead1000    31000 /lib/libumem.so.1
feae0000 feae2000     2000 /lib/libumem.so.1
feaf2000 feb05000    13000 /lib/libumem.so.1
feb05000 feb0b000     6000 /lib/libumem.so.1

and the dmod also defines a bunch of global symbols with the same names in both the dmod and the library itself (like "umem_ready"):

> libumem.so`umem_ready$m
    BASE    LIMIT     SIZE NAME
fe565000 fe566000     1000 /usr/lib/mdb/proc/libumem.so
> libumem.so`umem_ready/d
libumem.so`umem_ready:
libumem.so`umem_ready:          3       

> libumem.so.1`umem_ready$m
    BASE    LIMIT     SIZE NAME
feaf2000 feb05000    13000 /lib/libumem.so.1
> libumem.so.1`umem_ready/d
libumem.so.1`umem_ready:
libumem.so.1`umem_ready:        3   

That's about as far as I've gotten in debugging this.

History

#1

Updated by David Pacheco about 4 years ago

As I suspected, I think the presence of a dmod with the same name is confusing things. I stopped MDB (with DTrace) while the libumem dmod was loading itself, then attached MDB to that, and stepped through. I made it this far:

> $C                                  
08046078 libproc.so.1`Pxlookup_by_name+0x8f(8148008, ffffffff, fd9612f0, fd9616b0, 
80460a4, 80460bc)
080460e8 pt_lookup_cb+0x28(804611c, 814cf58, fd9612f0, 8048000)
08046158 pt_lookup_by_name_thr+0x107(8132f10, fd9612f0, fd9616b0, 80461c0, 80461d8, 1
)
08046188 pt_lookup_by_name+0x33(8132f10, fd9612f0, fd9616b0, 80461c0, 80461d8, 1c)
080461f8 mdb_tgt_lookup_by_name+0x6f(8132f10, fd9612f0, fd9616b0, 8046248, 0, 811d625
)
08046228 mdb_lookup_by_obj+0x1c(fd9612f0, fd9616b0, 8046248, 8122bf8)
08046268 libumem.so`umem_set_standalone+0x2a(22, 0, 80462b8, fd960ab4, 8046298, 
fd975000)
08046288 libumem.so`umem_update_variables+0x18(8122be0, 8122bf8, fd960e3c, fd962049, 
fd95beeb, fd95a5bf)
080462b8 libumem.so`umem_init+0x7c(804632c, 2, 80462f8, 80822f5, 804632c, fecae6a0)
080462c8 libumem.so`_mdb_init+0x18(804632c, fecae6a0, 2, 8122be0, fed21aa8, 804632c)
080462f8 mdb_module_create+0x147(80467b1, 804632c, 2, 0)
08046748 mdb_module_load+0x227(80467b1, 2, feb04d50, 82ee000)
08046788 module_load+0x26(8046cd8, 80467ac, 80467ac, feacdd63)
08046be8 pt_map_apply+0x2d(8046c74, 814dd58, 811f9a8, 80751fe)
08046c28 libproc.so.1`i_Pobject_iter+0x6e(8148008, 0, 808c4b3, 8046c74)
08046c48 libproc.so.1`Pobject_iter+0x16(8148008, 808c4b3, 8046c74, 0, 0, 0)
08046c98 pt_object_iter+0x75(8132f10, 8082a9f, 8046cd8, 0)
08046cb8 mdb_tgt_object_iter+0x16(8132f10, 8082a9f, 8046cd8, 0)
08046ce8 mdb_module_load_all+0x4b(0, 1, 80a624f, 8047f0a)
08047a08 main+0x1506(80479fc, fedc6728, 8047a38, 8063f4b, 4, 8047a44)
08047a38 _start+0x83(4, 8047b79, 8047b7d, 8047b80, 8047b89, 0)

The object is "libumem.so", and the symbols "umem_alloc" (because that's what the libumem dmod explicitly looks up in order to see if libumem is present in the target):

> fd9612f0/s
0xfd9612f0:     libumem.so
> fd9616b0/s
0xfd9616b0:     umem_alloc

The file_info_t returned by the call to build_map_symtab() inside Pxlookup_by_name() refers to the dmod, not the libumem library:

> 0x8261008::print file_info_t file_pname
file_pname = [ "/usr/lib/mdb/proc/libumem.so" ]

I haven't examined all the surrounding code to know this definitely won't work at this point (i.e., that it doesn't handle this later by searching again with a different object), but this is not promising.

#2

Updated by David Pacheco about 4 years ago

Fleshing that out a little: umem_init() inside the dmod attempts to determine whether libumem is enabled inside the target process. It does this by calling umem_update_variables(), which first calls umem_set_standalone() to determine whether the target is linked against the standalone libumem that's used by kmdb. It does this by attempting to look up the symbol "umem_alloc" in the object name "libumem.so". If it finds it, then we're not standalone. Then it looks up "umem_alloc" in the executable itself. If that works, then we're looking at the standalone libumem. If both of these fail, it returns -1. Back in umem_update_variables(), this triggers us to set umem_ready = 0 and return 0, causing initialization to complete, but thinking that libumem is not loaded. It may be easier to see this a little more graphically:

mdb loads the "libumem" dmod:
    -> _mdb_init()
        -> umem_init()
            -> umem_update_variables()
                -> umem_set_standalone()
                    -> mdb_lookup_by_obj("libumem.so", "umem_alloc")
                    <- mdb_lookup_by_obj returns -1
                       (because "libumem.so" finds the libumem dmod loaded into
                       the target MDB process, rather than the libumem library
                       loaded into the target MDB process)
                    -> mdb_lookup_by_obj(MDB_OBJ_EXEC, "umem_alloc")
                    <- mdb_lookup_by_obj returns -1
                       (because this symbol does not appear in MDB's a.out)
                 <- umem_set_standalone() returns -1
            <- umem_update_variables() sets umem_ready = 0 and returns 0
        -> umem_init() completes

We end up going through umem_update_variables() again immediately through a call to umem_statechange_cb(), but nothing's changed, and we're left with the same result. We also end up going through _mdb_init() for this dmod again as well (for reasons I don't know), but again, with the same result.

To really sum it up: when the libumem dmod initializes itself, it looks for the symbol "umem_alloc" in the target process to determine if libumem is really ready there. When the target is mdb itself with the libumem dmod loaded, it finds this "libumem.so", tries to look up "umem_alloc" there, does not find it, and ends up deciding that libumem is not loaded.

Just to be really clear about what symbols are where:

> ::nm libumem.so ! grep umem_alloc
0xfe54b4a8|0x000000af|FUNC |GLOB |0x0  |14      |get_umem_alloc_sizes
> 0xfe54b4a8$m
    BASE    LIMIT     SIZE NAME
fe540000 fe555000    15000 /usr/lib/mdb/proc/libumem.so

> ::nm libumem.so.1 ! grep umem_alloc
0xfeac78b4|0x0000012d|FUNC |LOCL |0x2  |15      |umem_alloc_sizes_add
0xfeaca6f5|0x00000136|FUNC |LOCL |0x2  |15      |_umem_alloc
0xfeac77ed|0x000000c7|FUNC |LOCL |0x2  |15      |umem_alloc_sizes_remove
0xfeac71ba|0x00000034|FUNC |LOCL |0x2  |15      |umem_alloc_sizes_clear
0xfeac9e65|0x000000d9|FUNC |LOCL |0x2  |15      |_umem_alloc_align
0xfeac5f06|0x000000fb|FUNC |LOCL |0x0  |15      |umem_allocator_process
0xfeaf43c0|0x00000124|OBJT |LOCL |0x0  |24      |umem_alloc_sizes
0xfeaf4ba0|0x00010000|OBJT |LOCL |0x0  |28      |umem_alloc_table
0xfeac9d9f|0x000000c6|FUNC |LOCL |0x0  |15      |umem_alloc_retry
0xfeac9e65|0x000000d9|FUNC |WEAK |0x0  |15      |umem_alloc_align
0xfeaca6f5|0x00000136|FUNC |WEAK |0x0  |15      |umem_alloc
> 0xfeaf42fc$m
    BASE    LIMIT     SIZE NAME
feaf2000 feb05000    13000 /lib/libumem.so.1
#3

Updated by Robert Mustacchi about 4 years ago

  • Category set to mdb - modular debugger
  • Assignee set to Robert Mustacchi
  • % Done changed from 0 to 90
  • Tags deleted (needs-triage)
#4

Updated by David Pacheco about 4 years ago

A workaround is that when you load mdb (the first time), run "::unload libumem.so" before you take a core file. Then the dmod won't be in the core file of mdb, and the initialization process won't be confused. So the flow is:

$ mdb your/core/file
... (run commands to test your dmod)
> ::unload libumem.so

then gcore that mdb process, run mdb on that core, and run whatever umem tools you want (e.g., "::findleaks").

#5

Updated by Electric Monk about 4 years ago

  • Status changed from New to Closed
  • % Done changed from 90 to 100

git commit 422418808a6580456c11a891d69016d29dae1440

commit  422418808a6580456c11a891d69016d29dae1440
Author: Robert Mustacchi <rm@joyent.com>
Date:   2015-10-26T15:43:09.000Z

    6338 cannot use umem tools on mdb itself
    Reviewed by: Dave Pacheco <dap@joyent.com>
    Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
    Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
    Approved by: Garrett D'Amore <garrett@damore.org>

Also available in: Atom PDF