Project

General

Profile

Bug #6344

mdb choked on core file from newer platform

Added by Robert Mustacchi over 4 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Category:
mdb - modular debugger
Start date:
2015-10-16
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

Steps to reproduce:

dap@b426724c marlin $ uname -v
joyent_20140126T003934Z

dap@b426724c marlin $ gcore $$
gcore: core.34733 dumped

dap@b426724c marlin $ mdb core.34733
Loading modules: [ libc.so.1 ld.so.1 ]
> $C
08047b28 libc.so.1`__waitid+0x15(7, 0, 8047b40, f, 0, 0)
08047bc8 libc.so.1`waitpid+0x75(ffffffff, 8047c1c, c, 0, 3, fed42bcc)
08047c38 waitchld+0x7d(4783, 1, 0, 8, 80f3cec, 8124de8)
08047ca8 wait_for+0x1ed(4783, 0, 4776, 4776, 0, 0)
08047d18 execute_command_internal+0x1610(8136aa8, 0, ffffffff, ffffffff, 81392e8, 81392e8)
08047d48 execute_command+0x45(8136aa8, 0, 8047d88, 8070f16)
08047d88 reader_loop+0x7e(fefa0500, fefa0500, 3, d7acdc4, 0, 8047e28)
08047de8 main+0xd16(8047ddc, fef00888, 8047e14, 806e8e3, 1, 8047e20)
08047e14 _start+0x83(1, 8047ed0, 0, 8047ed6, 8047edf, 8047eeb)

Now, I've uploaded it to thoth, but the MDB on that system can't find the thread stack:

dap@b426724c marlin $ thoth upload core.34733
thoth: creating f16fbb63e6a26614778eaabb2526ae11
thoth: uploading core.34733 to f16fbb63e6a26614778eaabb2526ae11
thoth: core.34733     [===========================>] 100%   4.08MB                  
thoth: creating job to uncompress f16fbb63e6a26614778eaabb2526ae11
thoth: adding key to job 065a1f3d-40d3-6a63-a8b3-8788364e254b
thoth: processing job 065a1f3d-40d3-6a63-a8b3-8788364e254b
thoth: waiting for completion of job 065a1f3d-40d3-6a63-a8b3-8788364e254b
thoth: job 065a1f3d-40d3-6a63-a8b3-8788364e254b completed in 0h0m5s
thoth: creating job to process f16fbb63e6a26614778eaabb2526ae11
thoth: adding key to job 6430a045-83a0-ec14-cda2-d61d81e27e84
thoth: processing f16fbb63e6a26614778eaabb2526ae11
thoth: waiting for completion of job 6430a045-83a0-ec14-cda2-d61d81e27e84
thoth: job 6430a045-83a0-ec14-cda2-d61d81e27e84 completed in 0h0m21s
dap@b426724c marlin $ thoth debug f16fbb63e6a26614778eaabb2526ae11
thoth: debugging f16fbb63e6a26614778eaabb2526ae11
 * created interactive job -- 8863c7d5-2f2a-651b-c93e-d016b4fe010b
 * waiting for session... / established
thoth: dump info is in $THOTH_INFO
Loading modules: [ libc.so.1 ld.so.1 ]
> $C
mdb: failed to get current register set: libthread_db call failed unexpectedly
> !uname -a
SunOS 967f24c4-e2df-4d1f-95c0-f0784d57ef61 5.11 joyent_20140109T230431Z i86pc i386 i86pc Solaris

The workaround is to use "::tmodel":

> $C
mdb: failed to get current register set: libthread_db call failed unexpectedly
> ::tmodel lwp
> $C
08047b28 libc.so.1`__waitid+0x15(7, 0, 8047b40, f, 0, 0)
08047bc8 libc.so.1`waitpid+0x75(ffffffff, 8047c1c, c, 0, 3, fed42bcc)
08047c38 waitchld+0x7d(4783, 1, 0, 8, 80f3cec, 8124de8)
08047ca8 wait_for+0x1ed(4783, 0, 4776, 4776, 0, 0)
08047d18 execute_command_internal+0x1610(8136aa8, 0, ffffffff, ffffffff, 81392e8, 
81392e8)
08047d48 execute_command+0x45(8136aa8, 0, 8047d88, 8070f16)
08047d88 reader_loop+0x7e(fefa0500, fefa0500, 3, d7acdc4, 0, 8047e28)
08047de8 main+0xd16(8047ddc, fef00888, 8047e14, 806e8e3, 1, 8047e20)
08047e14 _start+0x83(1, 8047ed0, 0, 8047ed6, 8047edf, 8047eeb)

I originally saw this with a core file from Node's calling abort().

This was broken by the addition of SIGINFO, which increased NSIG from
73 to 74. This, in turn, increased the size of the uberdata_t structure –
which changed the offsets of the structure after the siguaction member
and prevented uberdata from being meaningfully read by td_read_uberdata().
Worse, because td_read_uberdata() does very little to verify the
uberdata that it reads, it doesn't fail explicitly – it just silently
fails to initialize.

There are several ways to fix this. First, the binary dependency of
libc_db on the uberdata structure is clearly brittle and moronic – it
should use CTF to extract the fields that it needs. This may be easier
said than done, however, as the libc_db context is delicate with respect to
other libraries.

Absent this solution (or any another solution that properly versions the
structure), the current situation can still be improved by performing more
verification in td_read_uberdata() – and failing explicitly if that
structure is obviously wrong. This will allow debuggers (which is to say,
mdb) to recover from initialization failure – which mdb will do by
switching to the LWP thread model.

History

#1

Updated by Electric Monk about 4 years ago

  • Status changed from New to Closed

git commit cd8037ee05b2c5598728fdcd498596fc17bfdfa3

commit  cd8037ee05b2c5598728fdcd498596fc17bfdfa3
Author: Bryan Cantrill <bryan@joyent.com>
Date:   2015-10-27T00:00:53.000Z

    6344 mdb choked on core file from newer platform
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
    Approved by: Matthew Ahrens <mahrens@delphix.com>

Also available in: Atom PDF