Bug #6344
closedmdb choked on core file from newer platform
100%
Description
Steps to reproduce:
dap@b426724c marlin $ uname -v joyent_20140126T003934Z dap@b426724c marlin $ gcore $$ gcore: core.34733 dumped dap@b426724c marlin $ mdb core.34733 Loading modules: [ libc.so.1 ld.so.1 ] > $C 08047b28 libc.so.1`__waitid+0x15(7, 0, 8047b40, f, 0, 0) 08047bc8 libc.so.1`waitpid+0x75(ffffffff, 8047c1c, c, 0, 3, fed42bcc) 08047c38 waitchld+0x7d(4783, 1, 0, 8, 80f3cec, 8124de8) 08047ca8 wait_for+0x1ed(4783, 0, 4776, 4776, 0, 0) 08047d18 execute_command_internal+0x1610(8136aa8, 0, ffffffff, ffffffff, 81392e8, 81392e8) 08047d48 execute_command+0x45(8136aa8, 0, 8047d88, 8070f16) 08047d88 reader_loop+0x7e(fefa0500, fefa0500, 3, d7acdc4, 0, 8047e28) 08047de8 main+0xd16(8047ddc, fef00888, 8047e14, 806e8e3, 1, 8047e20) 08047e14 _start+0x83(1, 8047ed0, 0, 8047ed6, 8047edf, 8047eeb)
Now, I've uploaded it to thoth, but the MDB on that system can't find the thread stack:
dap@b426724c marlin $ thoth upload core.34733 thoth: creating f16fbb63e6a26614778eaabb2526ae11 thoth: uploading core.34733 to f16fbb63e6a26614778eaabb2526ae11 thoth: core.34733 [===========================>] 100% 4.08MB thoth: creating job to uncompress f16fbb63e6a26614778eaabb2526ae11 thoth: adding key to job 065a1f3d-40d3-6a63-a8b3-8788364e254b thoth: processing job 065a1f3d-40d3-6a63-a8b3-8788364e254b thoth: waiting for completion of job 065a1f3d-40d3-6a63-a8b3-8788364e254b thoth: job 065a1f3d-40d3-6a63-a8b3-8788364e254b completed in 0h0m5s thoth: creating job to process f16fbb63e6a26614778eaabb2526ae11 thoth: adding key to job 6430a045-83a0-ec14-cda2-d61d81e27e84 thoth: processing f16fbb63e6a26614778eaabb2526ae11 thoth: waiting for completion of job 6430a045-83a0-ec14-cda2-d61d81e27e84 thoth: job 6430a045-83a0-ec14-cda2-d61d81e27e84 completed in 0h0m21s dap@b426724c marlin $ thoth debug f16fbb63e6a26614778eaabb2526ae11 thoth: debugging f16fbb63e6a26614778eaabb2526ae11 * created interactive job -- 8863c7d5-2f2a-651b-c93e-d016b4fe010b * waiting for session... / established thoth: dump info is in $THOTH_INFO Loading modules: [ libc.so.1 ld.so.1 ] > $C mdb: failed to get current register set: libthread_db call failed unexpectedly > !uname -a SunOS 967f24c4-e2df-4d1f-95c0-f0784d57ef61 5.11 joyent_20140109T230431Z i86pc i386 i86pc Solaris
The workaround is to use "::tmodel":
> $C mdb: failed to get current register set: libthread_db call failed unexpectedly > ::tmodel lwp > $C 08047b28 libc.so.1`__waitid+0x15(7, 0, 8047b40, f, 0, 0) 08047bc8 libc.so.1`waitpid+0x75(ffffffff, 8047c1c, c, 0, 3, fed42bcc) 08047c38 waitchld+0x7d(4783, 1, 0, 8, 80f3cec, 8124de8) 08047ca8 wait_for+0x1ed(4783, 0, 4776, 4776, 0, 0) 08047d18 execute_command_internal+0x1610(8136aa8, 0, ffffffff, ffffffff, 81392e8, 81392e8) 08047d48 execute_command+0x45(8136aa8, 0, 8047d88, 8070f16) 08047d88 reader_loop+0x7e(fefa0500, fefa0500, 3, d7acdc4, 0, 8047e28) 08047de8 main+0xd16(8047ddc, fef00888, 8047e14, 806e8e3, 1, 8047e20) 08047e14 _start+0x83(1, 8047ed0, 0, 8047ed6, 8047edf, 8047eeb)
I originally saw this with a core file from Node's calling abort().
This was broken by the addition of SIGINFO, which increased NSIG from
73 to 74. This, in turn, increased the size of the uberdata_t structure –
which changed the offsets of the structure after the siguaction member
and prevented uberdata from being meaningfully read by td_read_uberdata().
Worse, because td_read_uberdata() does very little to verify the
uberdata that it reads, it doesn't fail explicitly – it just silently
fails to initialize.
There are several ways to fix this. First, the binary dependency of
libc_db on the uberdata structure is clearly brittle and moronic – it
should use CTF to extract the fields that it needs. This may be easier
said than done, however, as the libc_db context is delicate with respect to
other libraries.
Absent this solution (or any another solution that properly versions the
structure), the current situation can still be improved by performing more
verification in td_read_uberdata() – and failing explicitly if that
structure is obviously wrong. This will allow debuggers (which is to say,
mdb) to recover from initialization failure – which mdb will do by
switching to the LWP thread model.
Updated by Electric Monk almost 7 years ago
- Status changed from New to Closed
git commit cd8037ee05b2c5598728fdcd498596fc17bfdfa3
commit cd8037ee05b2c5598728fdcd498596fc17bfdfa3 Author: Bryan Cantrill <bryan@joyent.com> Date: 2015-10-27T00:00:53.000Z 6344 mdb choked on core file from newer platform Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Gordon Ross <gordon.ross@nexenta.com> Approved by: Matthew Ahrens <mahrens@delphix.com>