Project

General

Profile

Actions

Bug #1406

open

build failed after update to OI-151a and illumos-gate changeset:13438:084a34091b25

Added by Igor Kozhukhov almost 12 years ago. Updated almost 12 years ago.

Status:
New
Priority:
High
Assignee:
Category:
kernel
Start date:
2011-08-23
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:
External Bug:

Description

build failed after update to OI-151a and illumos-gate changeset:13438:084a34091b25

igor@srv248:/nfs/builds/illumos-gate/illumos-gate/usr/src/lib$ dmake -m serial install
/nfs/builds/illumos-gate/illumos-gate/usr/src/lib/common
/nfs/builds/illumos-gate/illumos-gate/usr/src/lib/common/i386
/nfs/builds/illumos-gate/illumos-gate/usr/src/lib/common/amd64
/nfs/builds/illumos-gate/illumos-gate/usr/src/cmd/sgs/libconv
/nfs/builds/illumos-gate/illumos-gate/usr/src/cmd/sgs/libconv/i386
/nfs/builds/illumos-gate/illumos-gate/usr/src/cmd/sgs/tools/i386
/nfs/builds/illumos-gate/illumos-gate/usr/src/cmd/sgs/libconv/i386
/nfs/builds/illumos-gate/illumos-gate/usr/src/cmd/sgs/libconv/amd64
/nfs/builds/illumos-gate/illumos-gate/usr/src/cmd/sgs/tools/i386
/nfs/builds/illumos-gate/illumos-gate/usr/src/cmd/sgs/libconv/amd64
/nfs/builds/illumos-gate/illumos-gate/usr/src/cmd/sgs/libdl
/nfs/builds/illumos-gate/illumos-gate/usr/src/cmd/sgs/libdl/i386
/nfs/builds/illumos-gate/illumos-gate/usr/src/cmd/sgs/libdl/amd64
/nfs/builds/illumos-gate/illumos-gate/usr/src/lib/libc
/nfs/builds/illumos-gate/illumos-gate/usr/src/lib/libc/i386
`libc.so.1' is up to date.
/nfs/builds/illumos-gate/illumos-gate/usr/src/lib/libc/i386
`llib-lc.ln' is up to date.
/nfs/builds/illumos-gate/illumos-gate/usr/src/lib/libc/i386
`libc_pic.a' is up to date.
/nfs/builds/illumos-gate/illumos-gate/usr/src/lib/libc/i386_hwcap1
`libc_hwcap1.so.1' is up to date.
/nfs/builds/illumos-gate/illumos-gate/usr/src/lib/libc/i386_hwcap2
`libc_hwcap2.so.1' is up to date.
/nfs/builds/illumos-gate/illumos-gate/usr/src/lib/libc/i386_hwcap3
`libc_hwcap3.so.1' is up to date.
/nfs/builds/illumos-gate/illumos-gate/usr/src/lib/libc/amd64
/usr/sbin/dtrace -G -64 -C -xlazyload -s ../port/threads/plockstat.d -o pics/plockstat.o pics/alloc.o                    pics/assfail.o          pics/cancel.o           pics/door_calls.o               pics/pthr_attr.o                pics/pthr_barrier.o             pics/pthr_cond.o                pics/pthr_mutex.o               pics/pthr_rwlock.o              pics/pthread.o          pics/rwlock.o           pics/scalls.o           pics/sema.o                     pics/sigaction.o                pics/spawn.o                    pics/synch.o                    pics/tdb_agent.o                pics/thr.o                      pics/thread_interface.o         pics/tls.o                      pics/tsd.o
dtrace: failed to compile script ../port/threads/plockstat.d: "/usr/lib/dtrace/scsi.d", line 46: translator member ic_cdb definition uses incompatible types: "uint8_t *" = "struct prasmap32" 
*** Error code 1
dmake: Fatal error: Command failed for target `pics/plockstat.o'
Current working directory /nfs/builds/illumos-gate/illumos-gate/usr/src/lib/libc/amd64
*** Error code 1
The following command caused the error:
cd amd64; pwd; VERSION='libm' dmake libc.so.1
dmake: Fatal error: Command failed for target `amd64/libc.so.1'
Current working directory /nfs/builds/illumos-gate/illumos-gate/usr/src/lib/libc
*** Error code 1
The following command caused the error:
if [ -f libc/Makefile  ]; then \
        cd libc; pwd; dmake install; \
else \
        true; \
fi
dmake: Fatal error: Command failed for target `libc'
Current working directory /nfs/builds/illumos-gate/illumos-gate/usr/src/lib
igor@srv248:/nfs/builds/illumos-gate/illumos-gate/usr/src/lib$ 
Actions #1

Updated by Rich Lowe almost 12 years ago

This is almost always a mishap caused by the (necessarily) out-of-date CTF information in closed kernel modules.

A workaround is to strip the .SUNW_ctf sections from such modules (and hope that you never need to debug them). I think that mcs -d -n .SUNW_ctf <module> will do the trick. Though given how annoying it is to find which module, you may just want to ditch that section from all closed kernel modules (make copies, in case you need them back).

Actions #2

Updated by Igor Kozhukhov almost 12 years ago

Richard, thank you very much for your help!
I have fixed my problem now.

My steps:
1. run command on /kernel:
find /kernel -type f -name "*" | xargs mcs -d -n .SUNW_ctf

2. run command on /platform/i86pc/kernel:
find /platform/i86pc/kernel -type f -name "*" | xargs mcs -d -n .SUNW_ctf

3. re-built boot_archive

4. reboot comp

illumos build in zone works well

Actions #3

Updated by Rich Lowe almost 12 years ago

Wow. I'd have expected stripping them from all modules to have broken more stuff. You only needed to do the ones that come from closed-bins.

Actions #4

Updated by Jason King almost 12 years ago

  • Category set to tools - gate/build tools
  • Assignee set to Jason King

Preliminary webrev to fix this for all closed bins is here: http://dev1.illumos.org/~jbk/ctfstrip

Actions #5

Updated by Rich Lowe almost 12 years ago

  • Category changed from tools - gate/build tools to kernel

For the sake of the bug, an evaluation.

To keep the amount of CTF data in memory for the kernel small, we remove from each module type information which is present in genunix or in ip (home to the vast majority of types). This unfortunately weds any given module tightly to the genunix against this was done, as they now refer -- by identifier -- to the type in genunix.

Build-to-build, the type identifiers are stable. However, the introduction or removal of a type in genunix will alter the identifiers and any old modules now contain invalid type data.
Some folks may remember that the OpenSolaris distribution used to be a frequent victim of this, as they would ignore .SUNW_ctf when making upgrade choices.

We can verify that this is the cause, in any given situation, my using ctfdump (from onbld), and looking at the definition for a given type in every module, and and the types to which it refers. One will, no doubt, dangle off into space (and that one will, almost certainly, be a closed-bin). I've seen this before most frequently with klm, as it contains NFS-relevant types, I would suspect that current victims are seeing it in mpt, or perhaps scsi_vhci (since they're scsi relevant types at fault.). I haven't done the investigation with ctfdump for this specific instance, but it has reached a point where nonsensical type errors immediately lead (correctly, it turns out) to suspecting this problem.

Fortunately (or unfortunately, if you care about debugging our software) the only times this fails loudly are when it affects a type about which DTrace cares. Other CTF consumers, especially mdb, will trust the bogus type information and have nothing to type-check to clue them in that something is amiss.

Unfortunately, there are few avenues to fix these problems. I've reached the conclusion that the only sensible thing we can do is when copying the closed binaries into the proto area to strip their .SUNW_ctf section on the principle it will likely do more harm than good. This, of course, renders these modules considerably more difficult to debug, rendering their unique types invisible to DTrace and to mdb. I think this is worth it since, as described, the information that is there is untrustworthy at best, and since we are changing $ROOT but not $CLOSEDROOT the original binary remains, should it be necessary.

Actions

Also available in: Atom PDF