Bug #3921


Pfindexec() finds wrong binary, wreaking havoc

Added by Robert Mustacchi almost 9 years ago. Updated almost 9 years ago.

lib - userland libraries
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


In thoth, some mdb processes were found to have died in

% thoth info 6b829fcde8b7e4140acddd8a21c4e0d0 | json stack
  "mdb: debugger failed with error: fatal signal received",

The death in is because of a divide-by-zero – but that should be impossible based on the code: we are dividing by V8_PROP_DESC_SIZE, which should be set to a non-zero value from v8dbg_prop_desc_size (if it exists) or the value 3 (if it doesn't exist). But as it turns out, this particular symbol appears to be in something of netherworld with respect to the dump being debugged. Take an example such dump, /thoth/stor/thoth/9a6966f1fc94f9565b5a05d7ea900824/core.node.24731:

bcantrill@manta # mdb $MANTA_INPUT_FILE
Loading modules: [ ]
> ::nm ! grep v8dbg_prop_desc_size
> ::nm -D ! grep v8dbg_prop_desc_size
> v8dbg_prop_desc_size/D
0x8828754:      0

Most vexing:

bcantrill@manta # elfdump $MANTA_INPUT_FILE | grep v8dbg_prop_desc_size
bcantrill@manta # strings $MANTA_INPUT_FILE | grep v8dbg_prop_desc_size

That is, there is no string representation for this symbol in the dump, yet somehow mdb ends up finding something for it.

As it turns out, the problem here is the way libproc finds executable names: Pfindexec() goes through an ugly heuristic to try to find a local binary to match the executable from the core (even though such a thing is broadly unnecessary in a dump that has included text), and in this case, it is finding /opt/local/bin/node in the Manta environment – which has v8dbg_prop_desc_size defined. Now, libproc is smart enough to later discard the binary that it finds (file_differs() from Pbuild_file_symtab() tells it that the files don't match) – but libproc never clears out the execname. mdb – for its part – opens anything returned by Pexecname() and processes it as an ELF file. This means that for symbols that are not found in the actually symbol table, mdb will try to look them up in the symbol table that corresponds to Pexecname() – which is entirely wrong in this case.

The fix is for libproc to clean up its mess once it's realized it's made one: if file_differs() indicates that the file that corresponds to Pexecname() does not match the information in-core, the execname should be explicitly NULL'd out to avoid further confusion.

Actions #1

Updated by Robert Mustacchi almost 9 years ago

  • Status changed from New to Resolved
  • % Done changed from 90 to 100

Resolved in 8f68126c6dc4f2178929233f0c7ccce877e973ca.


Also available in: Atom PDF