Bug #14722
closedld should keep group members in separate output sections
100%
Description
Hello! I ran into a linker crash while trying to build a development version of
Ruby. I'm on OmniOS and the linker looks to be from illumos. Sorry if this is
already fixed; I'm new to the system and don't know how to get a newer version
of ld
.
$ ld --version ld: Software Generation Utilities - Solaris Link Editors: 5.11-1.1763 (illumos)
The crash happens when linking ruby
:
generating a glommed object with DTrace probes for static library linking static-library libruby-static.a linking ruby collect2: fatal error: ld terminated with signal 11 [Segmentation Fault], core dumped
Ruby's configure script prefers -ggdb3
when working with gcc. If I override
it to use -g
instead the crash goes away. For DTrace, Ruby uses partial
linking to create an archive that contains just one object file based on an
[old suggestion from Bryan Cantrill][1]. The linker also crashes if I pass gcc
the object file instead of the archive that contains it. I'm able to reproduce
the crash with a simple ld ruby-glommed.o
. The file is available for
[download][2] along with a core dump.
To reproduce on OmniOS OmniOS v11 r151034m:
$ git clone https://github.com/ruby/ruby src $ mkdir build $ cd src $ git checkout 9d18661e1de053a9fecae7f4ab4ed41300537cec $ ./autogen.sh $ cd ../build $ ../src/configure MAKE=gmake AR=ar $ gmake -j ruby
Here is some info from `mdb`:
$ mdb ld-core Loading modules: [ ld libc.so.1 ld.so.1 ] > ::status debugging core file of ld (64-bit) from ip-172-31-16-42 file: /usr/bin/amd64/ld initial argv: /bin/ld -Y P,/usr/gcc/9/lib/amd64:/lib/amd64:/usr/lib/amd64 -R /usr/gcc/9/lib/a threading model: native threads status: process terminated by SIGSEGV (Segmentation Fault), addr=4c > $c libld.so.4`ld64_group_process+0x175(fffffc7fea443b60, fffffc7fea400020) libld.so.4`process_elf+0x823(fffffc7fea441860, 10b41c0, fffffc7fea400020) libld.so.4`ld64_process_ifl+0x751(fffffc7fea441830, 0, ffffffff, 10b41c0, a, fffffc7fea400020) libld.so.4`ar_input+0x84(ffffffff, fffffc7fea434cb0, fffffc7fea400020, 10b41c0, fffffc7fea441830, fffffc7fffdf1440) libld.so.4`ar_extract_bysym+0x187(fffffc7fea434c90, ffffffff, fffffc7fea434cb0, fffffc7fea400020, fffffc7fffdf143c, fffffc7fffdf1440) libld.so.4`ld64_process_archive+0x84(fffffc7fea434c90, ffffffff, fffffc7fea434cb0, fffffc7fea400020) libld.so.4`ld64_process_ifl+0x5ac(fffffc7fea434c90, fffffc7fea434c92, 4, 10a0770, 2, fffffc7fea400020) libld.so.4`ld64_process_open+0xcb(fffffc7fea434c90, fffffc7fea434c92, fffffc7fffdf171c, fffffc7fea400020, 2, fffffc7fffdf1bc0) libld.so.4`find_lib_name+0x134(fffffc7fffdf2506, fffffc7fffdf260f, fffffc7fea400020, fffffc7fffdf1bc0, 0) libld.so.4`ld64_find_library+0xaa(fffffc7fffdf260f, fffffc7fea400020) libld.so.4`parseopt_pass2+0x158(fffffc7fea400020, 40, fffffc7fffdf1f68) libld.so.4`process_files_com+0xb6(fffffc7fea400020, 40, fffffc7fffdf1f68) libld.so.4`ld64_process_files+0x60(fffffc7fea400020, 40, fffffc7fffdf1f68) libld.so.4`ld64_main+0x3e3(40, fffffc7fffdf1f68, 3e) main+0xd4() _start_crt+0x83() _start+0x18()
[1]: https://marc.info/?l=opensolaris-dtrace-discuss&m=114761203110734
[2]: https://github.com/XrXr/illumos-ld-bug-report/releases/tag/v1
Related issues
Updated by Rich Lowe over 1 year ago
I think this is because we're setting the comdat flag on a NULL input section descriptor. I'll give it a proper look.
Updated by Rich Lowe over 1 year ago
I think this is a typo introduced in 2008 where by
if (gd.gd_data[0] & GRP_COMDAT) gifl->ifl_isdesc[gndx]->is_flags |= FLG_IS_COMDAT;
that gndx
should be ndx
Updated by Rich Lowe over 1 year ago
Ignore comment #2, I got myself all twisted around.
So we're seeing this group:
Group Section: .group index flags / section signature symbol [0] [ COMDAT ] wm4.config.h.2.340b74a62c4a665ee97a6d14594312d6 [1] .debug_macro [28] [2] (null) [0]
and choking because of that NULL section within it. Do you have the input objects that produced ruby-glommed.o?
Updated by Alan Wu over 1 year ago
Do you have the input objects that produced ruby-glommed.o?
I ran
gmake ruby-glommed.o V=1and made a tar ball out of the
result and its parts here. https://github.com/XrXr/illumos-ld-bug-report/releases/download/v1/glommed-with-parts.tgz
Command used to generate ruby-glommed.o in glommed-with-parts.tgz:
$ /usr/ccs/bin/ld -r -o ruby-glommed.o dln.o localeinit.o loadpath.o array.o ast.o bignum.o class.o compar.o compile.o complex.o cont.o debug.o debug_counter.o dir.o dln_find.o encoding.o enum.o enumerator.o error.o eval.o file.o gc.o hash.o inits.o io.o io_buffer.o iseq.o load.o marshal.o math.o memory_view.o mjit.o mjit_compile.o node.o numeric.o object.o pack.o parse.o proc.o process.o ractor.o random.o range.o rational.o re.o regcomp.o regenc.o regerror.o regexec.o regparse.o regsyntax.o ruby.o scheduler.o signal.o sprintf.o st.o strftime.o string.o struct.o symbol.o thread.o time.o transcode.o transient_heap.o util.o variable.o version.o vm.o vm_backtrace.o vm_dump.o vm_sync.o vm_trace.o coroutine/ucontext/Context.o probes.o enc/ascii.o enc/us_ascii.o enc/unicode.o enc/utf_8.o enc/trans/newline.o setproctitle.o addr2line.o builtin.o
Updated by Rich Lowe over 1 year ago
- Category set to tools - gate/build tools
- Status changed from New to In Progress
- Assignee set to Rich Lowe
I'm pretty sure I've got to the bottom of this -- we're shooting ourselves in the foot. I just have to decide on which not-foot place we need to shoot instead.
Is using plain '-g' a good enough workaround for you for right now? Or should I hurry?
Updated by Rich Lowe over 1 year ago
- Subject changed from ld crashes with gcc -ggdb3 inputs to ld crashes processing groups with already discarded sections
Updated by Rich Lowe over 1 year ago
- Subject changed from ld crashes processing groups with already discarded sections to ld should keep group members in separate output sections
It turns out that this is happening because of 3 bugs.
The original, important, bug is that we merge sections which are members of groups into a the same output section.
You can see this in your object:
Group Section: .group index flags / section signature symbol [0] [ COMDAT ] wm4.transcode_data.h.18.21742e1335e8c8e0e79ecd5a137d9f33 [1] .debug_macro [28] [2] (null) [0] Group Section: .group index flags / section signature symbol [0] [ COMDAT ] wm4.elf.h.43.8672e8b9cd963266f2c492b110844465 [1] .debug_macro [28] [2] (null) [0]
Note that .debug_macro28 is in both groups here, ignore the null for now.
This has several possible repercussions which I'll take from the README for the tests I'm writing for the fix for this
If we place an input section which is in a group in the same output section as another input section this leaves us with problems: 1. If the other input section was not previously member of a group, its data becomes part of a group and we may now discard it along with that group. 2. If the other input section _was_ a member of a group we now have two groups containing the same section, where discarding one will corrupt the other. 3. ... and if that section had associated relocations, which must have been part of the group, we will now associate those relocations with the merged output section further corrupting the group, as there is now no mapping between input and output relocation sections.
Number 3 there is this bug.
The 2nd bug is because of that when we come to emit the group section we find a section with no output section and -- for reasons lost to history -- we just emit a 0 section as a member of the group. section 0 is not a normal section, and was perhaps intended as a sentinel, but is definitely not a valid thing to have in a group and may lead to problems with other tools.
The 3rd bug is that, when processing the group -- despite intending section 0 as a sentinel, probably -- we don't handle the possibility that it has been used that way, and rather than skipping it attempt to process it as with any other section. Crashing as in the original bug report.
The fix for this is:
1. Don't merge sections which are members of groups (the SHF_GROUP flag is set)
2. If we're emitting a group and one of its sections is missing, raise an error
3. Treat section 0 as invalid for group processing (both in ld and, for parity, in elfdump)
Updated by Rich Lowe over 1 year ago
An interesting symptom of this bug, which I guess I should preserve for the record, is that if you have an object file which exhibits this bug (ruby-glommed.o for eg), and the GNU binutils are used to put it into an archive or to rebuild the archive symbol table, that symbol table will become corrupt or empty, and your link may just fail deeply mysteriously with unresolved symbols that are, nevertheless, present in the archive. Made worse because nm(1) will read the object symtabs not the archive table, and will show you the symbols that ld will then tell you don't exist.
Updated by Electric Monk over 1 year ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
git commit a196c3ffe35a9a1589267b660ce6c2ff2ccab5e9
commit a196c3ffe35a9a1589267b660ce6c2ff2ccab5e9 Author: Richard Lowe <richlowe@richlowe.net> Date: 2022-06-16T20:19:10.000Z 14722 ld should keep group members in separate output sections Reviewed by: Gordon Ross <Gordon.W.Ross@gmail.com> Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Approved by: Dan McDonald <danmcd@mnx.io>
Updated by Rich Lowe 12 months ago
- Related to Bug #15260: ld crashes if group members are stripped by -s added