Project

General

Profile

Bug #3436

relocatable objects also need sloppy relocation

Added by Alasdair Lumsden over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
cmd - userland programs
Start date:
2012-12-31
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Hi,

We've been using gcc 4.4 with ec-userland over the past 2 years with no issue. Over Christmas I started work to update it to gcc 4.7.2. I got it all built, but unfortunately some software just wouldn't link with the code generated by the updated gcc, such as pcre:

CXXLD  libpcrecpp.la
ld: fatal: relocation error: R_386_GOTOFF: file .libs/libpcrecpp_la-pcre_scanner.o: section [170].rel.eh_frame: symbol .LFB930: symbol has been discarded with discarded section: [68].text._ZN7pcrecpp11StringPieceC2ERKSs

I get the same issue with gcc 4.6.2. I haven't tried gcc 4.5, but I have rebuilt gcc 4.4 and there are no issues with that. I've tried two different GNU as/binutil versions (2.20 and 2.23) and that made no difference - seems to be gcc specific.

It seems to be related to c++ code, and indeed gcc can't even build itself with the updated version:

/bin/sh ../libtool --tag CXX   --mode=link /ws/ec-userland/components/gcc46/build/i86/./gcc/xgcc -shared-libgcc -B/ws/ec-userland/components/gcc46/build/i86/./gcc -nostdinc++ -L/ws/ec-userland/components/gcc46/build/i86/i386-pc-solaris2.11/libstdc++-v3/src -L/ws/ec-userland/components/gcc46/build/i86/i386-pc-solaris2.11/libstdc++-v3/src/.libs -B/usr/i386-pc-solaris2.11/bin/ -B/usr/i386-pc-solaris2.11/lib/ -isystem /usr/i386-pc-solaris2.11/include -isystem /usr/i386-pc-solaris2.11/sys-include      -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual  -fdiagnostics-show-location=once  -ffunction-sections -fdata-sections   -o libstdc++.la -version-info 6:16:0 -Wl,-M,libstdc++-symbols.ver-sun -lm -rpath /usr/lib atomic.lo bitmap_allocator.lo pool_allocator.lo mt_allocator.lo codecvt.lo compatibility.lo compatibility-c++0x.lo compatibility-debug_list.lo compatibility-debug_list-2.lo compatibility-list.lo compatibility-list-2.lo complex_io.lo ctype.lo debug.lo functexcept.lo functional.lo globals_io.lo hash_c++0x.lo hash_tr1.lo hashtable_c++0x.lo hashtable_tr1.lo ios.lo ios_failure.lo ios_init.lo ios_locale.lo limits.lo list.lo locale.lo locale_init.lo locale_facets.lo localename.lo math_stubs_float.lo math_stubs_long_double.lo stdexcept.lo strstream.lo system_error.lo tree.lo istream.lo placeholders.lo regex.lo shared_ptr.lo streambuf.lo mutex.lo condition_variable.lo chrono.lo thread.lo future.lo valarray.lo atomicity.lo codecvt_members.lo collate_members.lo ctype_members.lo messages_members.lo monetary_members.lo numeric_members.lo time_members.lo basic_file.lo c++locale.lo allocator-inst.lo concept-inst.lo ext-inst.lo fstream-inst.lo ios-inst.lo iostream-inst.lo istream-inst.lo locale-inst.lo misc-inst.lo ostream-inst.lo sstream-inst.lo streambuf-inst.lo string-inst.lo wlocale-inst.lo wstring-inst.lo  parallel_settings.lo compatibility-parallel_list.lo compatibility-parallel_list-2.lo  ../libsupc++/libsupc++convenience.la 
libtool: link: /ws/ec-userland/components/gcc46/build/i86/./gcc/collect-ld -r -o .libs/libstdc++.la-1.o .libs/atomic.o .libs/bitmap_allocator.o .libs/pool_allocator.o .libs/mt_allocator.o .libs/codecvt.o .libs/compatibility.o .libs/compatibility-c++0x.o .libs/compatibility-debug_list.o .libs/compatibility-debug_list-2.o .libs/compatibility-list.o .libs/compatibility-list-2.o .libs/complex_io.o .libs/ctype.o .libs/debug.o .libs/functexcept.o .libs/functional.o .libs/globals_io.o .libs/hash_c++0x.o .libs/hash_tr1.o
ld: fatal: relocation error: R_386_32: file .libs/pool_allocator.o: section [113].rel.debug_frame: symbol .text._ZNK9__gnu_cxx24__concurrence_lock_error4whatEv (section): symbol has been discarded with discarded section: [40].text._ZNK9__gnu_cxx24__concurrence_lock_error4whatEv
ld: fatal: relocation error: R_386_32: file .libs/pool_allocator.o: section [113].rel.debug_frame: symbol .text._ZNK9__gnu_cxx26__concurrence_unlock_error4whatEv (section): symbol has been discarded with discarded section: [42].text._ZNK9__gnu_cxx26__concurrence_unlock_error4whatEv
ld: fatal: relocation error: R_386_32: file .libs/pool_allocator.o: section [113].rel.debug_frame: symbol .text._ZN9__gnu_cxx24__concurrence_lock_errorD2Ev (section): symbol has been discarded with discarded section: [44].text._ZN9__gnu_cxx24__concurrence_lock_errorD2Ev
ld: fatal: relocation error: R_386_32: file .libs/pool_allocator.o: section [113].rel.debug_frame: symbol .text._ZN9__gnu_cxx26__concurrence_unlock_errorD2Ev (section): symbol has been discarded with discarded section: [46].text._ZN9__gnu_cxx26__concurrence_unlock_errorD2Ev
ld: fatal: relocation error: R_386_32: file .libs/pool_allocator.o: section [113].rel.debug_frame: symbol .text._ZN9__gnu_cxx24__concurrence_lock_errorD0Ev (section): symbol has been discarded with discarded section: [50].text._ZN9__gnu_cxx24__concurrence_lock_errorD0Ev
ld: fatal: relocation error: R_386_32: file .libs/pool_allocator.o: section [113].rel.debug_frame: symbol .text._ZN9__gnu_cxx26__concurrence_unlock_errorD0Ev (section): symbol has been discarded with discarded section: [52].text._ZN9__gnu_cxx26__concurrence_unlock_errorD0Ev
ld: fatal: relocation error: R_386_32: file .libs/pool_allocator.o: section [113].rel.debug_frame: symbol .text._ZN9__gnu_cxx13__scoped_lockD2Ev (section): symbol has been discarded with discarded section: [54].text._ZN9__gnu_cxx13__scoped_lockD2Ev
make[5]: *** [libstdc++.la] Error 1
make[5]: Leaving directory `/ws/ec-userland/components/gcc46/build/i86/i386-pc-solaris2.11/libstdc++-v3/src'
make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory `/ws/ec-userland/components/gcc46/build/i86/i386-pc-solaris2.11/libstdc++-v3'
make[3]: *** [all] Error 2
make[3]: Leaving directory `/ws/ec-userland/components/gcc46/build/i86/i386-pc-solaris2.11/libstdc++-v3'
make[2]: *** [all-target-libstdc++-v3] Error 2
make[2]: Leaving directory `/ws/ec-userland/components/gcc46/build/i86'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/ws/ec-userland/components/gcc46/build/i86'
gmake: *** [/ws/ec-userland/components/gcc46/build/i86/.built] Error 2

The only thing I could really find on the subject that made any sense was this post by Rainer Orth about COMDAT:

http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01384.html

The post seems to be about Sun as, but we're using GNU as so I'm slightly confused, but it did mention this:

"After some discussion with the Sun assembler and linker maintainers, it was decided to work around this in Sun ld instead and relax a few checks there, too, to allow COMDAT with Sun as to work. The necessary support is included in ld 2.2267 (Solaris 11, build 164)."

I've attempted this with the Solaris 11 linker from build 175 (making sure to LD_PRELOAD libld.so.4 and libelf.so.1 from Solaris 11 too) and sure enough it links stuff just fine, and I can successfully build and link pcre and gcc with the newer gcc. GNU ld also works fine.

So it looks like our linker needs a prod.

Of course, I could also be doing something horribly stupid or wrong, so I'd be interested in hearing from anyone using a newer gcc with illumos (such as the OmniOS folk).

Cheers,

Alasdair


Files

pcre.txt (13.2 KB) pcre.txt here is a successful build of pkgsrc/devel/pcre Richard PALO, 2012-12-31 06:08 PM

Related issues

Related to illumos gate - Bug #3439: discarded sections shouldn't end up on output listsResolved2013-01-01

Actions

History

#1

Updated by Alasdair Lumsden over 7 years ago

Here are my gcc's:

gcc 4.7.2:

build pcre (sngl): gcc -v
Using built-in specs.
COLLECT_GCC=gcc
Target: i386-pc-solaris2.11
Configured with: /ws/ec-userland/components/gcc47/source/gcc-4.7.2/configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --bindir=/usr/bin --libdir=/usr/lib --sbindir=/usr/bin --sysconfdir=/etc --localstatedir=/var --includedir=/usr/include --libexecdir=/usr/lib --build=i386-pc-solaris2.11 --with-mpc=/usr --with-mpfr=/usr --with-gmp=/usr --without-ppl --without-cloog --without-gnu-ld --with-gnu-as --with-as=/usr/bin/as --with-ld=/usr/bin/ld --enable-stage1-languages=c --enable-languages=c,c++,objc,fortran --enable-objc-gc --enable-libssp --disable-libitm --disable-libada
Thread model: posix
gcc version 4.7.2 (GCC) 

gcc 4.6:

build pcre (sngl): /usr/gcc/4.6/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/gcc/4.6/bin/gcc
Target: i386-pc-solaris2.11
Configured with: /ws/ec-userland/components/gcc46/source/gcc-4.6.3/configure --prefix=/usr/gcc/4.6 --infodir=/usr/gcc/4.6/share/info --mandir=/usr/gcc/4.6/share/man --bindir=/usr/gcc/4.6/bin --libdir=/usr/gcc/4.6/lib --sbindir=/usr/gcc/4.6/bin --sysconfdir=/etc --localstatedir=/var --includedir=/usr/include --libexecdir=/usr/gcc/4.6/lib --build=i386-pc-solaris2.11 --with-mpc=/usr --with-mpfr=/usr --with-gmp=/usr --without-ppl --without-cloog --without-gnu-ld --with-gnu-as --with-as=/usr/bin/as --with-ld=/usr/bin/ld --enable-stage1-languages=c --enable-languages=c,c++,objc,fortran --enable-objc-gc --enable-libssp --disable-libitm --disable-libada
Thread model: posix
gcc version 4.6.3 (GCC) 

gcc 4.4:

build pcre (sngl): /usr/gcc/4.4/bin/gcc -v
Using built-in specs.
Target: i386-pc-solaris2.11
Configured with: /ws/ec-userland/components/gcc44-old/source/gcc-4.4.5/configure --prefix=/usr/gcc/4.4 --infodir=/usr/gcc/4.4/share/info --mandir=/usr/gcc/4.4/share/man --bindir=/usr/gcc/4.4/bin --libdir=/usr/gcc/4.4/lib --sbindir=/usr/gcc/4.4/bin --sysconfdir=/etc --localstatedir=/var --includedir=/usr/include --libexecdir=/usr/gcc/4.4/lib --build=i386-pc-solaris2.11 --with-mpc=/usr --with-mpfr=/usr --with-gmp=/usr --without-ppl --without-cloog --without-gnu-ld --with-gnu-as --with-as=/usr/bin/as --with-ld=/usr/bin/ld --enable-stage1-languages=c --enable-languages=c,c++,objc,fortran --enable-objc-gc --enable-libssp --disable-libitm --disable-libada
Thread model: posix
gcc version 4.4.5 (GCC) 

This is on SmartOS, a fairly recent build from December.

Interestingly, igork isn't seeing this issue with his gcc 4.6, which he says is using Sun ld. He is however using some richlowe patches, so I'm going to try rebuilding my gcc 4.6 with those patches and see how I get on.

#2

Updated by Alasdair Lumsden over 7 years ago

Well, I have some news.

It turns out that the difference between "environments where things seem to work fine" and my environment are the CFLAGS/CXXFLAGS that were being passed in.

In my environment, I was passing in:

CFLAGS="-m32" 
CXXFLAGS="-m32" 
LDFLAGS="-m32"

and for 64bit:

CFLAGS="-m64" 
CXXFLAGS="-m64" 
LDFLAGS="-m64"

PCRE builds fine if I instead pass this in, leaving CFLAGS/CXXFLAGS unset:

CC="gcc -m32" 
CXX="g++ -m32" 
LDFLAGS="-m32"

What's happening here is that PCRE has defaults for CFLAGS/CXXFLAGS of -O2. If I set my own values then -O2 doesn't get passed through, and that causes it to break.

I have confirmed this by manually invoking g++ and the linker, you can see the first attempt fails whilst the second identical command, just with -O2 added, works:

build i86 (sngl): g++ -DHAVE_CONFIG_H -I. -I/ws/ec-userland/components/pcre-old/source/pcre-8.12 -m32 -MT pcre_scanner.lo -MD -MP -MF .deps/pcre_scanner.Tpo -c /ws/ec-userland/components/pcre-old/source/pcre-8.12/pcre_scanner.cc  -fPIC -DPIC -o .libs/pcre_scanner.o
build i86 (sngl): /usr/bin/ld -r -o .libs/libpcrecpp.la-1.o .libs/pcrecpp.o .libs/pcre_scanner.o .libs/pcre_stringpiece.o ld: fatal: relocation error: R_386_GOTOFF: file .libs/pcre_scanner.o: section [176].rel.eh_frame: symbol .LFB965: symbol has been discarded with discarded section: [70].text._ZN7pcrecpp11StringPieceC2ERKSs
build i86 (sngl): 
build i86 (sngl): g++ -DHAVE_CONFIG_H -I. -I/ws/ec-userland/components/pcre-old/source/pcre-8.12 -m32 -O2 -MT pcre_scanner.lo -MD -MP -MF .deps/pcre_scanner.Tpo -c /ws/ec-userland/components/pcre-old/source/pcre-8.12/pcre_scanner.cc  -fPIC -DPIC -o .libs/pcre_scanner.o
build i86 (sngl): /usr/bin/ld -r -o .libs/libpcrecpp.la-1.o .libs/pcrecpp.o .libs/pcre_scanner.o .libs/pcre_stringpiece.o build i86 (sngl): 

Note that gcc 4.4 does not trigger this; if I invoke the same command omitting -O2 with /ec/bin/g++ (gcc 4.4), it works:

build i86 (sngl): /ec/bin/g++ -DHAVE_CONFIG_H -I. -I/ws/ec-userland/components/pcre-old/source/pcre-8.12 -m32 -MT pcre_scanner.lo -MD -MP -MF .deps/pcre_scanner.Tpo -c /ws/ec-userland/components/pcre-old/source/pcre-8.12/pcre_scanner.cc  -fPIC -DPIC -o .libs/pcre_scanner.o
build i86 (sngl): /usr/bin/ld -r -o .libs/libpcrecpp.la-1.o .libs/pcrecpp.o .libs/pcre_scanner.o .libs/pcre_stringpiece.o
build i86 (sngl): 
.

So we can determine that gcc 4.6 and newer can, when C++ optimisations are off, generate code that Sun ld is unhappy with. Simply adding -O2 (in this case) works around it. If I encounter more cases, I'll post more info.

I have uploaded the two pcre_scanner.o files so you can prod/poke them:

http://al.cloud.ec/pcre_scanner.o.broken
http://al.cloud.ec/pcre_scanner.o.working

I will do more experimenting by altering my build environment (ec-userland) to use CC="gcc -m32" / CXX="g++ -m32" instead of the CFLAGS/CXXFLAGS option and see how I get on.

#3

Updated by Richard PALO over 7 years ago

Hmm, using pkgsrc trunk on oi_151a7 in a NG zone I haven't noticed this particular problem.
I seem to have survived building gcc4.7.2 using 4.7.0 (except for java, which rich knows about, I believe it involves a different nightmare)

I am using the -64 bit mode, is this on 32-bit?

I do notice that this version seems to default to '-O' for CFLAGS/CXXFLAGS...

#4

Updated by Alasdair Lumsden over 7 years ago

I've just confirmed exactly the same behaviour with gcc 4.6.

Strangely igork is not seeing this issue at all, even when PCRE is built without optimisations. Here is his pcre_scanner.o:

http://al.cloud.ec/pcre_scanner.o.igork

Unfortunately he built pcre 8.21 instead of 8.12 and is refusing to do 8.12 instead asking me to do 8.21 so the object might not be so useful.

Here is his build line:

/usr/gcc/4.6/bin/g++ -DHAVE_CONFIG_H -I. -I/myshare/builds/dilos-userland/dilos-userland-review/components/pcre/pcre-8.21 -m32 -MT pcre_scanner.lo -MD -MP -MF .deps/pcre_scanner.Tpo -c /myshare/builds/dilos-userland/dilos-userland-review/components/pcre/pcre-8.21/pcre_scanner.cc  -fPIC -DPIC -o .libs/pcre_scanner.o

I was rather hoping when igork disabled optimisations in his PCRE build that it would break too, but sadly that would be too simple.

#5

Updated by Rich Lowe over 7 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 50
  • Tags deleted (needs-triage)

There's a couple of intertwined issues here, but it comes down to the sloppy relocations which GCC necessitates not being accepted when output is a relocatable object (and then some other bugs that would make it break even if they were).

Alasdair, if I give you test bits, are you willing to give them a good going over to see how things behave? It'd probably be best if your victim software had a comprehensive test suite.

#6

Updated by Alasdair Lumsden over 7 years ago

Yes, I can certainly test stuff!

gmp, mpc and mpfr all have test suites that are easily run, and PHP has an enormous test suite I can also put through its paces. I can do 32bit and 64bit builds and tests.

Hook me up and I'll put things through their paces.

#7

Updated by Rich Lowe over 7 years ago

What matters is that it's code which previously failed to link, that you can then test. It's probably important it failed to link, it's probably also important it's C++ (though, of course, that's probably the case anyway).

I'll get back to you on IRC, or via direct mail. Doing a bit more research (and becoming far more bemused)

#8

Updated by Rich Lowe over 7 years ago

  • % Done changed from 50 to 70
#9

Updated by Rich Lowe over 7 years ago

  • Subject changed from Linker doesn't like gcc 4.6 c++ code and newer (COMDAT related?) to relocatable objects also need sloppy relocation
#10

Updated by Alasdair Lumsden over 7 years ago

For those watching at home, I figured out why pcre was breaking in my build environment.

I was using a chroot I made that was vaguely attempting to emulate Joyent's "sngl" brand, which has a writable /usr, symlinking important bits from /system/usr (itself a lofs mount of /usr from the global zone).

I had /system/usr/bin at the end of my path, and was populating /usr with software from ec-userland, but ec-userland didn't have /system/usr/bin at the end of it's build path.

GNU autoconf uses getconf to determine the maximum length of command line arguments. This was not in the PATH, so it was defaulting to 512 characters. libtool took this value, and some of the link lines exceeded that length, so libtool was changing how it linked the objects.

This is why my environment was producing different objects, that was triggering this linker issue.

Rich explained that S11 ld was enhanced to automatically choose "-zrelaxreloc" when it encounters sloppy GNUisms, and that the illumos "-zrelaxreloc" is broken and segfaults (I can confirm that's the case). Rich is going to enhance illumos ld to do the same thing.

In the mean time I have fixed up my build environment so getconf works, and pcre builds happily now.

#11

Updated by Rich Lowe over 7 years ago

There's a bunch of history here, but I'll get to that in a full evaluation so it doesn't get lost.

Basically:

  • We handle certain relocations "sloppily", because there's no alternative
  • We only do this for dynamic objects (executables, shared libraries).
  • When libtool thinks it can only use 512byte command lines, it links input objects into relocatable objects, and then links those composites with eachother (so each command line is always short).
  • Because of that, it turns out that we need automatic sloppy relocs for for relocatable objects too (.o files), and to fix any bugs encountered on the way.
  • I've done this
  • It turns out Oracle have also done this

I've given Alasdair a SUNWonld with which to test my changes in his specific circumstance and (most importantly) to make sure that the resulting pcre bits all work correctly.

#12

Updated by Alasdair Lumsden over 7 years ago

I can confirm that the SUNWonld ld successfully links the .o files with the sloppy relocs. I've posted the resultant objects to: http://al.cloud.ec/fun/onld/

I have also tested the linker with a gcc 4.7.2 build (works fine), gmp, mpfr and mpc. The gmp, mpfr and mpc test suites pass with flying colours.

I've set this linker as the default in my ec-userland build zone and will report back any issues as I'm doing a lot of builds, but so far it's been flawless.

#13

Updated by Rich Lowe over 7 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 70 to 100

Resolved in 396f90f

Also available in: Atom PDF