Project

General

Profile

Bug #308

ld may misalign sections only preceded by empty sections

Added by Jason King about 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
cmd - userland programs
Start date:
2010-10-06
Due date:
% Done:

0%

Estimated time:
Difficulty:
Tags:

Description

Copy of Opensolaris bug 6988300

When objects containing COMMON (bss) data are linked into
an executable, and a separate BSS segment is established
via a mapfile, the symbols that reference that segment end
up with addresses outside the section boundaries.

% sh -x build.sh 
+ cc -M /usr/lib/ld/map.pagealign -M /usr/lib/ld/map.bssalign -M /usr/lib/ld/map.noexbss -B direct -o bad.xterm button.o cachedGCs.o charproc.o charsets.o cursor.o data.o doublechr.o fontutils.o input.o linedata.o main.o menu.o misc.o print.o ptydata.o scrollback.o screen.o scrollbar.o tabs.o util.o xstrings.o xtermcap.o VTPrsTbl.o TekPrsTbl.o Tekproc.o charclass.o precompose.o wcwidth.o -R/usr/lib -lXft -lfontconfig -R/usr/lib -lXaw7 -lXmu -R/usr/lib -lXt -lX11 -lICE -ltermcap

% elfdump bad.xterm > /dev/null
bad.xterm: .dynsym: index[224]: bad symbol entry: smeBSBObjectClass: section[25] size: 0x26c8: symbol (address 0x84026ec, size 0x4) lies outside of containing section
bad.xterm: .dynsym: index[251]: bad symbol entry: sessionShellWidgetClass: section[25] size: 0x26c8: symbol (address 0x8402700, size 0x4) lies outside of containing section
bad.xterm: .dynsym: index[560]: bad symbol entry: scrollbarWidgetClass: section[25] size: 0x26c8: symbol (address 0x84026f4, size 0x4) lies outside of containing section
bad.xterm: .dynsym: index[579]: bad symbol entry: simpleMenuWidgetClass: section[25] size: 0x26c8: symbol (address 0x84026e8, size 0x4) lies outside of containing section
bad.xterm: .dynsym: index[605]: bad symbol entry: max_plus1: section[25] size: 0x26c8: symbol (address 0x84026f8, size 0x4) lies outside of containing section
bad.xterm: .dynsym: index[676]: bad symbol entry: my_wcwidth: section[25] size: 0x26c8: symbol (address 0x84026f0, size 0x4) lies outside of containing section
bad.xterm: .dynsym: index[821]: bad symbol entry: resource: section[25] size: 0x26c8: symbol (address 0x84026a8, size 0x40) lies outside of containing section
bad.xterm: .dynsym: index[855]: bad symbol entry: first_widechar: section[25] size: 0x26c8: symbol (address 0x84026fc, size 0x4) lies outside of containing section
bad.xterm: .dynsym: index[874]: bad symbol entry: tekshellwidget: section[25] size: 0x26c8: symbol (address 0x8402704, size 0x4) lies outside of containing section
bad.xterm: .symtab: index[865]: bad symbol entry: smeBSBObjectClass: section[25] size: 0x26c8: symbol (address 0x84026ec, size 0x4) lies outside of containing section
bad.xterm: .symtab: index[892]: bad symbol entry: sessionShellWidgetClass: section[25] size: 0x26c8: symbol (address 0x8402700, size 0x4) lies outside of containing section
bad.xterm: .symtab: index[1201]: bad symbol entry: scrollbarWidgetClass: section[25] size: 0x26c8: symbol (address 0x84026f4, size 0x4) lies outside of containing section
bad.xterm: .symtab: index[1220]: bad symbol entry: simpleMenuWidgetClass: section[25] size: 0x26c8: symbol (address 0x84026e8, size 0x4) lies outside of containing section
bad.xterm: .symtab: index[1246]: bad symbol entry: max_plus1: section[25] size: 0x26c8: symbol (address 0x84026f8, size 0x4) lies outside of containing section
bad.xterm: .symtab: index[1317]: bad symbol entry: my_wcwidth: section[25] size: 0x26c8: symbol (address 0x84026f0, size 0x4) lies outside of containing section
bad.xterm: .symtab: index[1462]: bad symbol entry: resource: section[25] size: 0x26c8: symbol (address 0x84026a8, size 0x40) lies outside of containing section
bad.xterm: .symtab: index[1496]: bad symbol entry: first_widechar: section[25] size: 0x26c8: symbol (address 0x84026fc, size 0x4) lies outside of containing section
bad.xterm: .symtab: index[1515]: bad symbol entry: tekshellwidget: section[25] size: 0x26c8: symbol (address 0x8402704, size 0x4) lies outside of containing section
bad.xterm: .dynsym: index[821]: bad symbol entry: resource: section[25] size: 0x26c8: symbol (address 0x84026a8, size 0x40) lies outside of containing section
bad.xterm: .dynsym: index[579]: bad symbol entry: simpleMenuWidgetClass: section[25] size: 0x26c8: symbol (address 0x84026e8, size 0x4) lies outside of containing section
bad.xterm: .dynsym: index[224]: bad symbol entry: smeBSBObjectClass: section[25] size: 0x26c8: symbol (address 0x84026ec, size 0x4) lies outside of containing section
bad.xterm: .dynsym: index[676]: bad symbol entry: my_wcwidth: section[25] size: 0x26c8: symbol (address 0x84026f0, size 0x4) lies outside of containing section
bad.xterm: .dynsym: index[560]: bad symbol entry: scrollbarWidgetClass: section[25] size: 0x26c8: symbol (address 0x84026f4, size 0x4) lies outside of containing section
bad.xterm: .dynsym: index[605]: bad symbol entry: max_plus1: section[25] size: 0x26c8: symbol (address 0x84026f8, size 0x4) lies outside of containing section
bad.xterm: .dynsym: index[855]: bad symbol entry: first_widechar: section[25] size: 0x26c8: symbol (address 0x84026fc, size 0x4) lies outside of containing section
bad.xterm: .dynsym: index[251]: bad symbol entry: sessionShellWidgetClass: section[25] size: 0x26c8: symbol (address 0x8402700, size 0x4) lies outside of containing section
bad.xterm: .dynsym: index[874]: bad symbol entry: tekshellwidget: section[25] size: 0x26c8: symbol (address 0x8402704, size 0x4) lies outside of containing section

Many of these symbols have copy relocations, which 'ldd -r' reveals are out of
bounds, and ignored by the runtime linker:

% ldd -r bad.xterm 
        libXft.so.2 =>   /usr/lib/libXft.so.2
        libfontconfig.so.1 =>    /usr/lib/libfontconfig.so.1
        libXaw7.so.7 =>  /usr/lib/libXaw7.so.7
        libXmu.so.4 =>   /usr/lib/libXmu.so.4
        libXt.so.4 =>    /usr/lib/libXt.so.4
        libX11.so.4 =>   /usr/lib/libX11.so.4
        libICE.so.6 =>   /usr/lib/libICE.so.6
        libcurses.so.1 =>        /usr/lib/libcurses.so.1
        libc.so.1 =>     /usr/lib/libc.so.1
        libfreetype.so.6 =>      /usr/lib/libfreetype.so.6
        libXrender.so.1 =>       /usr/lib/libXrender.so.1
        libexpat.so.1 =>         /usr/lib/libexpat.so.1
        libXext.so.0 =>  /usr/lib/libXext.so.0
        libSM.so.6 =>    /usr/lib/libSM.so.6
        libXpm.so.4 =>   /usr/lib/libXpm.so.4
        libXmuu.so.1 =>  /usr/lib/libXmuu.so.1
        libm.so.2 =>     /usr/lib/libm.so.2
        libnsl.so.1 =>   /usr/lib/libnsl.so.1
        libsocket.so.1 =>        /usr/lib/libsocket.so.1
        libXau.so.6 =>   /usr/lib/libXau.so.6
        libXdmcp.so.6 =>         /usr/lib/libXdmcp.so.6
        libz.so.1 =>     /usr/lib/libz.so.1
        libmp.so.2 =>    /lib/libmp.so.2
        libmd.so.1 =>    /lib/libmd.so.1
        relocation R_386_COPY offset invalid: simpleMenuWidgetClass: offset=0x84026e8 lies outside memory image; relocation discarded
        relocation R_386_COPY offset invalid: smeBSBObjectClass: offset=0x84026ec lies outside memory image; relocation discarded
        relocation R_386_COPY offset invalid: scrollbarWidgetClass: offset=0x84026f4 lies outside memory image; relocation discarded
        relocation R_386_COPY offset invalid: sessionShellWidgetClass: offset=0x8402700 lies outside memory image; relocation discarded
        libXevie.so.1 =>         /usr/lib/libXevie.so.1
        libXss.so.1 =>   /usr/lib/libXss.so.1

Since the copy relocations are not carried out, global data in the
program is uninitialized, leading to the runtime failure:

% ./bad.xterm 
./bad.xterm Xt error: XtAppCreateShell requires non-NULL widget class

Related issues

Related to illumos gate - Bug #1301: ld crashes with '-z ignore' due to a null data descriptorResolved2011-07-29

Actions

History

#1

Updated by Jason King about 9 years ago

A workaround is to not use the /usr/lib/ld/map.bssalign and /usr/lib/ld/map.noexbss mapfiles.

#2

Updated by Jason King about 9 years ago

Doing a little digging, one of the issues is that the .bss segment's sh_size value is too small. Using elfedit shdr:sh_size .bss 0x2504 (for xterm) resolves the elfdump issues, however the R_386_COPY remain.

Or rather, the issue seems to be confined to the .bss segment itself.

#3

Updated by Jason King about 9 years ago

  • Assignee set to Jason King
#4

Updated by Jason King about 9 years ago

The program header for the bss segment is also too small, adjusting that resolves the R_386_COPY issues. So it appears it's somehow ignoring the last 3-4 objects when calculating the segment sizes.

#5

Updated by Rich Lowe over 8 years ago

  • Category set to cmd - userland programs
  • Status changed from New to In Progress
  • Assignee changed from Jason King to Rich Lowe

Taking this, on the basis that I have a likely fix etc. Jason, I'm assuming you're ok with this?

#6

Updated by Rich Lowe over 8 years ago

  • Subject changed from separate bss segment causes symbol/section mismatch to ld may misalign sections only preceded by empty sections

If a mapfile which places .bss in a separate segment (such as /usr/lib/ld/map.bssalign or /usr/lib/ld/map.noexbss) is used, and sections for thread-local storage are created1 the linker may misalign .bss, possibly causing symbols to be placed outside its bounds.

This has mostly been seen with builds of Xorg from the xnv source tree, which use map.bssalign.

Symptoms

    % elfdump bad.xterm > /dev/null
    bad.xterm: .dynsym: index[226]: bad symbol entry: smeBSBObjectClass: section[25] size: 0x26c8: symbol (address 0x84026ec, size 0x4) lies outside of containing section
    bad.xterm: .dynsym: index[252]: bad symbol entry: sessionShellWidgetClass: section[25] size: 0x26c8: symbol (address 0x8402700, size 0x4) lies outside of containing section
    ...

And xterm fails to run
    % ./bad.xterm 
    ./bad.xterm Xt error: XtAppCreateShell requires non-NULL widget class

Anecdotally, there are binaries which suffer from this problem yet don't end up with symbols hanging outside their section. One of my test builds showed vncconfig in this state.

    % elfdump root_i386/usr/bin/i86/vncconfig > /dev/null  
    % elfdump root_i386/usr/bin/i86/vncconfig | ggrep -A5 'sh_name: \.bss$'
    Section Header[31]:  sh_name: .bss
        sh_addr:      0x8400024       sh_flags:   [ SHF_WRITE SHF_ALLOC ]
        sh_size:      0x474           sh_type:    [ SHT_NOBITS ]
        sh_offset:    0x1c340         sh_entsize: 0
        sh_link:      0               sh_info:    0
        sh_addralign: 0x40

A simple test over an xnv tree finds 101 invalid binaries (with bss symbols straying outside the section), and about 15 more with .bss sections not honouring their .sh_addralign.

Why it's misaligned

map.bssalign and map.noexbss create a new loadable segment in which to place the bss, and in the case of the former heavily align it. In this new section will be placed .tbss, .tdata2, and .bss.

The alignment of the first section's sh_offset is controlled solely by the alignment of that initial section -- in our case this will be .tdata -- an empty section requiring no alignment:

  Section Header[23]:  sh_name: .tdata
     sh_addr:      0x8400000       sh_flags:   [ SHF_WRITE SHF_ALLOC SHF_TLS ]
     sh_size:      0               sh_type:    [ SHT_PROGBITS ]
     sh_offset:    0x5fc7c         sh_entsize: 0
     sh_link:      0               sh_info:    0
     sh_addralign: 0x1

followed by .tbss, also empty and similarly lacking in required alignment.

  Section Header[24]:  sh_name: .tbss
      sh_addr:      0x8400000       sh_flags:   [ SHF_WRITE SHF_ALLOC SHF_TLS ]
      sh_size:      0               sh_type:    [ SHT_NOBITS ]
      sh_offset:    0x5fc7c         sh_entsize: 0
      sh_link:      0               sh_info:    0
      sh_addralign: 0x1

The .bss though does have alignment requirements (0x40), and so the offset is moved to accommodate them, exposing our problem: The linker assumes that address and offset have congruent alignment, such that moving a section through one (offset) to achieve alignment also guarantees it in the other (address) -- see the loop at source:usr/src/cmd/sgs/libld/common/update.c lines 4056,4081

In this instance, aligning the offset of .bss moves it 4 bytes forward to 0x5fc80, moving the address a similar 4 bytes forward to 0x8400004 and misaligning it

  Section Header[25]:  sh_name: .bss
     sh_addr:      0x8400004       sh_flags:   [ SHF_WRITE SHF_ALLOC ]
     sh_size:      0x26c8          sh_type:    [ SHT_NOBITS ]
     sh_offset:    0x5fc80         sh_entsize: 0
     sh_link:      0               sh_info:    0
     sh_addralign: 0x40      

Why that causes us to overflow the section

The size of the bss is calculated based on the size and alignment of symbols in the input section assuming that each is correctly aligned beginning at the start of the section, which in our case they will not (see: source:usr/src/cmd/sgs/libld/common/syms.c lines 1517,1536)

Symbols are placed in the output section at their actually correct alignment, with no assumption as to initial section alignment, this causes their location within the section to be pushed forward to the next aligned address (see source:usr/src/cmd/sgs/libelf/common/update.c lines 234,252) eventually causing us to spill over the section end.

Footnotes

1 this happens much more frequently since the integration of PSARC 2010/299 GNU/Linux/BSD compatibility functions and 6960818 add get_nprocs(), getline(), strdupa(), strndup() to libc in commit:48f2dbca79a2, which added magical incantations for strdupa() and strndupa() to string.h which make use of TLS.

2 .tdata ends up in the bss segment to retain its spacial relationship with .tbss, see: 6910387 .tdata and .tbss separation invalidates TLS program header information

#7

Updated by Jason King over 8 years ago

By all means, feel free to take the lead on this... no objections here.

#8

Updated by Rich Lowe over 8 years ago

The easiest way to deal with this is by altering the alignment requirement of empty sections at the beginning of a segment to match the requirement of the first non-empty section in that segment (if one exists), this keeps .tdata, .tbss, and .bss at the same virtual address and file offset, and allows the assumptions elsewhere in the link editor to remain valid.

A fixed xterm binary suggests that this is also the tack taken by Oracle's linker.

#9

Updated by Rich Lowe over 8 years ago

  • Status changed from In Progress to Resolved

Integrated in r13296 commit:2d47a00dfb9b

Also available in: Atom PDF