Pfindexec() finds wrong binary, wreaking havoc
In thoth, some mdb processes were found to have died in v8.so:
% thoth info 6b829fcde8b7e4140acddd8a21c4e0d0 | json stack [ "mdb: debugger failed with error: fatal signal received", "v8.so`jsobj_properties+0x333()", "v8.so`jsobj_print_jsobject+0x82()", "v8.so`jsobj_print+0x1ec()", "v8.so`dcmd_jsprint+0x102()", "dcmd_invoke+0x40()", "mdb_call_idcmd+0x128()", "mdb_call+0x325()", "yyparse+0x3f7()", "mdb_run+0x26d()", "main+0x153a()", "_start+0x83()" ]
The death in v8.so is because of a divide-by-zero – but that should be impossible based on the code: we are dividing by V8_PROP_DESC_SIZE, which should be set to a non-zero value from v8dbg_prop_desc_size (if it exists) or the value 3 (if it doesn't exist). But as it turns out, this particular symbol appears to be in something of netherworld with respect to the dump being debugged. Take an example such dump, /thoth/stor/thoth/9a6966f1fc94f9565b5a05d7ea900824/core.node.24731:
bcantrill@manta # mdb $MANTA_INPUT_FILE Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ] > ::nm ! grep v8dbg_prop_desc_size > ::nm -D ! grep v8dbg_prop_desc_size > v8dbg_prop_desc_size/D 0x8828754: 0
bcantrill@manta # elfdump $MANTA_INPUT_FILE | grep v8dbg_prop_desc_size bcantrill@manta # strings $MANTA_INPUT_FILE | grep v8dbg_prop_desc_size
That is, there is no string representation for this symbol in the dump, yet somehow mdb ends up finding something for it.
As it turns out, the problem here is the way libproc finds executable names: Pfindexec() goes through an ugly heuristic to try to find a local binary to match the executable from the core (even though such a thing is broadly unnecessary in a dump that has included text), and in this case, it is finding /opt/local/bin/node in the Manta environment – which has v8dbg_prop_desc_size defined. Now, libproc is smart enough to later discard the binary that it finds (file_differs() from Pbuild_file_symtab() tells it that the files don't match) – but libproc never clears out the execname. mdb – for its part – opens anything returned by Pexecname() and processes it as an ELF file. This means that for symbols that are not found in the actually symbol table, mdb will try to look them up in the symbol table that corresponds to Pexecname() – which is entirely wrong in this case.
The fix is for libproc to clean up its mess once it's realized it's made one: if file_differs() indicates that the file that corresponds to Pexecname() does not match the information in-core, the execname should be explicitly NULL'd out to avoid further confusion.