Project

General

Profile

Actions

Bug #14722

closed

ld should keep group members in separate output sections

Added by Alan Wu over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
tools - gate/build tools
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

Hello! I ran into a linker crash while trying to build a development version of
Ruby. I'm on OmniOS and the linker looks to be from illumos. Sorry if this is
already fixed; I'm new to the system and don't know how to get a newer version
of ld.

$ ld --version
ld: Software Generation Utilities - Solaris Link Editors: 5.11-1.1763 (illumos)

The crash happens when linking ruby:

generating a glommed object with DTrace probes for static library
linking static-library libruby-static.a
linking ruby
collect2: fatal error: ld terminated with signal 11 [Segmentation Fault], core dumped

Ruby's configure script prefers -ggdb3 when working with gcc. If I override
it to use -g instead the crash goes away. For DTrace, Ruby uses partial
linking to create an archive that contains just one object file based on an
[old suggestion from Bryan Cantrill][1]. The linker also crashes if I pass gcc
the object file instead of the archive that contains it. I'm able to reproduce
the crash with a simple ld ruby-glommed.o. The file is available for
[download][2] along with a core dump.

To reproduce on OmniOS OmniOS v11 r151034m:

$ git clone https://github.com/ruby/ruby src
$ mkdir build
$ cd src
$ git checkout 9d18661e1de053a9fecae7f4ab4ed41300537cec
$ ./autogen.sh
$ cd ../build
$ ../src/configure MAKE=gmake AR=ar
$ gmake -j ruby

Here is some info from `mdb`:

$ mdb ld-core
Loading modules: [ ld libc.so.1 ld.so.1 ]
> ::status
debugging core file of ld (64-bit) from ip-172-31-16-42
file: /usr/bin/amd64/ld
initial argv: /bin/ld -Y P,/usr/gcc/9/lib/amd64:/lib/amd64:/usr/lib/amd64 -R /usr/gcc/9/lib/a
threading model: native threads
status: process terminated by SIGSEGV (Segmentation Fault), addr=4c
> $c
libld.so.4`ld64_group_process+0x175(fffffc7fea443b60, fffffc7fea400020)
libld.so.4`process_elf+0x823(fffffc7fea441860, 10b41c0, fffffc7fea400020)
libld.so.4`ld64_process_ifl+0x751(fffffc7fea441830, 0, ffffffff, 10b41c0, a, fffffc7fea400020)
libld.so.4`ar_input+0x84(ffffffff, fffffc7fea434cb0, fffffc7fea400020, 10b41c0, fffffc7fea441830, fffffc7fffdf1440)
libld.so.4`ar_extract_bysym+0x187(fffffc7fea434c90, ffffffff, fffffc7fea434cb0, fffffc7fea400020, fffffc7fffdf143c, fffffc7fffdf1440)
libld.so.4`ld64_process_archive+0x84(fffffc7fea434c90, ffffffff, fffffc7fea434cb0, fffffc7fea400020)
libld.so.4`ld64_process_ifl+0x5ac(fffffc7fea434c90, fffffc7fea434c92, 4, 10a0770, 2, fffffc7fea400020)
libld.so.4`ld64_process_open+0xcb(fffffc7fea434c90, fffffc7fea434c92, fffffc7fffdf171c, fffffc7fea400020, 2, fffffc7fffdf1bc0)
libld.so.4`find_lib_name+0x134(fffffc7fffdf2506, fffffc7fffdf260f, fffffc7fea400020, fffffc7fffdf1bc0, 0)
libld.so.4`ld64_find_library+0xaa(fffffc7fffdf260f, fffffc7fea400020)
libld.so.4`parseopt_pass2+0x158(fffffc7fea400020, 40, fffffc7fffdf1f68)
libld.so.4`process_files_com+0xb6(fffffc7fea400020, 40, fffffc7fffdf1f68)
libld.so.4`ld64_process_files+0x60(fffffc7fea400020, 40, fffffc7fffdf1f68)
libld.so.4`ld64_main+0x3e3(40, fffffc7fffdf1f68, 3e)
main+0xd4()
_start_crt+0x83()
_start+0x18()

[1]: https://marc.info/?l=opensolaris-dtrace-discuss&m=114761203110734
[2]: https://github.com/XrXr/illumos-ld-bug-report/releases/tag/v1


Related issues

Related to illumos gate - Bug #15260: ld crashes if group members are stripped by -sClosedRich Lowe

Actions
Actions #1

Updated by Rich Lowe over 1 year ago

I think this is because we're setting the comdat flag on a NULL input section descriptor. I'll give it a proper look.

Actions #2

Updated by Rich Lowe over 1 year ago

I think this is a typo introduced in 2008 where by

        if (gd.gd_data[0] & GRP_COMDAT)
            gifl->ifl_isdesc[gndx]->is_flags |= FLG_IS_COMDAT;

that gndx should be ndx

Actions #3

Updated by Rich Lowe over 1 year ago

Ignore comment #2, I got myself all twisted around.

So we're seeing this group:

Group Section:  .group
     index    flags / section         signature symbol
       [0]   [ COMDAT ]               wm4.config.h.2.340b74a62c4a665ee97a6d14594312d6
       [1]   .debug_macro [28]
       [2]   (null) [0]

and choking because of that NULL section within it. Do you have the input objects that produced ruby-glommed.o?

Actions #4

Updated by Alan Wu over 1 year ago

Do you have the input objects that produced ruby-glommed.o?

I ran

gmake ruby-glommed.o V=1
and made a tar ball out of the
result and its parts here. https://github.com/XrXr/illumos-ld-bug-report/releases/download/v1/glommed-with-parts.tgz

Command used to generate ruby-glommed.o in glommed-with-parts.tgz:

$ /usr/ccs/bin/ld -r -o ruby-glommed.o dln.o localeinit.o loadpath.o array.o ast.o bignum.o class.o compar.o compile.o complex.o cont.o debug.o debug_counter.o dir.o dln_find.o encoding.o enum.o enumerator.o error.o eval.o file.o gc.o hash.o inits.o io.o io_buffer.o iseq.o load.o marshal.o math.o memory_view.o mjit.o mjit_compile.o node.o numeric.o object.o pack.o parse.o proc.o process.o ractor.o random.o range.o rational.o re.o regcomp.o regenc.o regerror.o regexec.o regparse.o regsyntax.o ruby.o scheduler.o signal.o sprintf.o st.o strftime.o string.o struct.o symbol.o thread.o time.o transcode.o transient_heap.o util.o variable.o version.o vm.o vm_backtrace.o vm_dump.o vm_sync.o vm_trace.o  coroutine/ucontext/Context.o probes.o enc/ascii.o enc/us_ascii.o enc/unicode.o enc/utf_8.o enc/trans/newline.o setproctitle.o addr2line.o  builtin.o
Actions #5

Updated by Rich Lowe over 1 year ago

  • Category set to tools - gate/build tools
  • Status changed from New to In Progress
  • Assignee set to Rich Lowe

I'm pretty sure I've got to the bottom of this -- we're shooting ourselves in the foot. I just have to decide on which not-foot place we need to shoot instead.

Is using plain '-g' a good enough workaround for you for right now? Or should I hurry?

Actions #6

Updated by Alan Wu over 1 year ago

No rush!

Actions #7

Updated by Rich Lowe over 1 year ago

  • Subject changed from ld crashes with gcc -ggdb3 inputs to ld crashes processing groups with already discarded sections
Actions #8

Updated by Rich Lowe over 1 year ago

  • Subject changed from ld crashes processing groups with already discarded sections to ld should keep group members in separate output sections

It turns out that this is happening because of 3 bugs.

The original, important, bug is that we merge sections which are members of groups into a the same output section.
You can see this in your object:

Group Section:  .group
     index    flags / section         signature symbol
       [0]   [ COMDAT ]               wm4.transcode_data.h.18.21742e1335e8c8e0e79ecd5a137d9f33
       [1]   .debug_macro [28]
       [2]   (null) [0]

Group Section:  .group
     index    flags / section         signature symbol
       [0]   [ COMDAT ]               wm4.elf.h.43.8672e8b9cd963266f2c492b110844465
       [1]   .debug_macro [28]
       [2]   (null) [0]

Note that .debug_macro28 is in both groups here, ignore the null for now.

This has several possible repercussions which I'll take from the README for the tests I'm writing for the fix for this

If we place an input section which is in a group in the same output section as
another input section this leaves us with problems:

1. If the other input section was not previously member of a group, its data
   becomes part of a group and we may now discard it along with that group.
2. If the other input section _was_ a member of a group we now have two groups
   containing the same section, where discarding one will corrupt the other.
3. ... and if that section had associated relocations, which must have been
   part of the group, we will now associate those relocations with the merged
   output section further corrupting the group, as there is now no mapping
   between input and output relocation sections.

Number 3 there is this bug.

The 2nd bug is because of that when we come to emit the group section we find a section with no output section and -- for reasons lost to history -- we just emit a 0 section as a member of the group. section 0 is not a normal section, and was perhaps intended as a sentinel, but is definitely not a valid thing to have in a group and may lead to problems with other tools.

The 3rd bug is that, when processing the group -- despite intending section 0 as a sentinel, probably -- we don't handle the possibility that it has been used that way, and rather than skipping it attempt to process it as with any other section. Crashing as in the original bug report.

The fix for this is:
1. Don't merge sections which are members of groups (the SHF_GROUP flag is set)
2. If we're emitting a group and one of its sections is missing, raise an error
3. Treat section 0 as invalid for group processing (both in ld and, for parity, in elfdump)

Actions #9

Updated by Rich Lowe over 1 year ago

An interesting symptom of this bug, which I guess I should preserve for the record, is that if you have an object file which exhibits this bug (ruby-glommed.o for eg), and the GNU binutils are used to put it into an archive or to rebuild the archive symbol table, that symbol table will become corrupt or empty, and your link may just fail deeply mysteriously with unresolved symbols that are, nevertheless, present in the archive. Made worse because nm(1) will read the object symtabs not the archive table, and will show you the symbols that ld will then tell you don't exist.

Actions #10

Updated by Electric Monk over 1 year ago

  • Gerrit CR set to 2176
Actions #11

Updated by Electric Monk over 1 year ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit a196c3ffe35a9a1589267b660ce6c2ff2ccab5e9

commit  a196c3ffe35a9a1589267b660ce6c2ff2ccab5e9
Author: Richard Lowe <richlowe@richlowe.net>
Date:   2022-06-16T20:19:10.000Z

    14722 ld should keep group members in separate output sections
    Reviewed by: Gordon Ross <Gordon.W.Ross@gmail.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Igor Kozhukhov <igor@dilos.org>
    Approved by: Dan McDonald <danmcd@mnx.io>

Actions #12

Updated by Rich Lowe 12 months ago

  • Related to Bug #15260: ld crashes if group members are stripped by -s added
Actions

Also available in: Atom PDF