Bug #3858
openCan't build nightly over NFSv4
0%
Description
Trying to build the gate over NFS fails with:
$ cat mail_msg ==== Nightly distributed build started: Mon Jul 1 13:55:35 UTC 2013 ==== ==== Nightly distributed build completed: Mon Jul 1 16:48:54 UTC 2013 ==== ==== Total build time ==== real 2:53:18 ==== Build environment ==== /usr/bin/uname SunOS illumos-build 5.11 oi_151a8 i86pc i386 i86pc /opt/SUNWspro/bin/dmake dmake: Sun Distributed Make 7.8 SunOS_i386 Patch 126504-01 2007/07/19 number of concurrent jobs = 8 32-bit compiler /opt/onbld/bin/i386/cw -_gcc cw version 1.29 (SHADOW MODE DISABLED) primary: /opt/gcc/4.4.4/bin/gcc gcc (Illumos gcc-4.4.4-il-3) 4.4.4 64-bit compiler /opt/onbld/bin/i386/cw -_gcc cw version 1.29 (SHADOW MODE DISABLED) primary: /opt/gcc/4.4.4/bin/gcc gcc (Illumos gcc-4.4.4-il-3) 4.4.4 /usr/java/bin/javac java full version "1.6.0_26-b03" /usr/ccs/bin/as as: Sun Compiler Common 12 SunOS_i386 snv_121 08/03/2009 /usr/ccs/bin/ld ld: Software Generation Utilities - Solaris Link Editors: 5.11-1.1739 (illumos) Build project: default Build taskid: 183 ==== Nightly argument issues ==== ==== Build version ==== /home/jeffpc/illumos-dev ==== Make clobber ERRORS ==== ==== Make tools clobber ERRORS ==== ==== Tools build errors ==== ==== Build errors (non-DEBUG) ==== dmake: Warning: Target `install' not remade because of errors The following command caused the error: dmake: Warning: Command failed for target `5.10.0' dmake: Warning: Target `install' not remade because of errors The following command caused the error: dmake: Warning: Command failed for target `perl' dmake: Warning: Target `install' not remade because of errors The following command caused the error: dmake: Warning: Target `install' not remade because of errors ==== Build warnings (non-DEBUG) ==== dmake: Warning: Target `install' not remade because of errors dmake: Warning: Command failed for target `5.10.0' dmake: Warning: Target `install' not remade because of errors dmake: Warning: Command failed for target `perl' dmake: Warning: Target `install' not remade because of errors dmake: Warning: Target `install' not remade because of errors ==== Elapsed build time (non-DEBUG) ==== real 2:23:26.0 user 2:24:27.6 sys 1:13:18.5 ==== Build noise differences (non-DEBUG) ==== 11c11 < LD_RUN_PATH="/usr/lib:/lib" /home/jeffpc/illumos-dev/upstream/usr/src/tools/proto/root_i386-nd/opt/onbld/bin/i386/cw -_cc -G Project.o -o blib/arch/auto/Sun/Solaris/Project/Project.so \\ --- > LD_RUN_PATH="/usr/lib:/lib" /home/jeffpc/illumos-dev/upstream/usr/src/tools/proto/root_i386-nd/opt/onbld/bin/i386/cw -_gcc -G Project.o -o blib/arch/auto/Sun/Solaris/Project/Project.so \\ ==== cstyle/hdrchk errors ==== ==== Find core files ==== ==== Check lists of files ====
nightly.log shows:
illumos-build --> 1 job pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr : unable to preserve owner/group pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5 : unable to preserve owner/group pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0 : unable to preserve owner/group pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib : unable to preserve owner/group pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int : unable to preserve owner/group pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/auto : unable to preserve owner/group pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/auto/Sun : unable to preserve owner/group pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/auto/Sun/Solaris : unable to preserve owner/group pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/auto/Sun/Solaris/Utils : unable to preserve owner/group pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/auto/Sun/Solaris/Utils/Utils.bs : unable to preserve owner/group pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/auto/Sun/Solaris/Utils/Utils.so : unable to preserve owner/group pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/Sun : unable to preserve owner/group pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/Sun/Solaris : unable to preserve owner/group pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/Sun/Solaris/Utils.pm : unable to preserve owner/group *** Error code 1 dmake: Warning: Target `install' not remade because of errors Current working directory /home/jeffpc/illumos-dev/upstream/usr/src/cmd/perl/5.10.0 *** Error code 1 The following command caused the error: cd 5.10.0; pwd; PERL_VERSION=5.10.0 dmake install dmake: Warning: Command failed for target `5.10.0'
Related issues
Updated by Marcel Telka almost 10 years ago
- Category set to nfs - NFS server and client
Updated by Josef Sipek almost 10 years ago
My guess is that this is a build system issue not an NFS client/server issue.
Updated by Marcel Telka almost 10 years ago
Josef Sipek wrote:
My guess is that this is a build system issue not an NFS client/server issue.
IIRC, there were few (one or two) bugs fixed in Solaris (since August 2010) related to build over NFS. I think they were fixed at the NFS client side.
Updated by Piotr Jasiukajtis almost 10 years ago
I build ON on a NFSv3 mountpoint without any issues.
Updated by Jim Klimov almost 10 years ago
My guess would be race-condition type of things between parallel make jobs - at least, I have seen similar behavior with building other projects over NFS which were resolved by doing sequential make with only one job allowed. I've detailed my guesses on the mailing list http://www.listbox.com/member/archive/182179/2013/07/sort/time_rev/page/1/entry/5:39/20130702161827:8AFB08E6-E354-11E2-94A3-CC978D77862B/ . and just wanted to leave a note here ;)
For those of you who report successes - do your NFS servers include fast SLOG devices (SSD or DDR-drives), or do they disable sync on the build datasets, so as to reduce or eliminate the window of opportunity for race conditions like those I think I faced while building on "slow-sync" NFS servers? Another possibility noted here and in the thread - NFSv3 vs. NFSv4...
Updated by Marcel Telka almost 10 years ago
- Subject changed from can't build nightly over NFS to Can't build nightly over NFSv4
Updated by Ian Collins almost 10 years ago
Attempting to build SmartOS over NFS, my failure (same on two runs) was here:
(cd /mnt/smartos-live/proto.strap && gtar xzf /mnt/smartos-live/illumos-adjunct.tgz)
gtar: usr/include/trousers/trousers.h: Cannot change ownership to uid 503, gid 10: Not owner
....
Which wouldn't have been running an parallel builds. The server has decent log devices and is connected over 10GE. The only parts of the build that excite the log devices are the gtar calls. I don't know where uid 503 comes from.
Updated by Marcel Telka almost 10 years ago
Ian Collins wrote:
Attempting to build SmartOS over NFS, my failure (same on two runs) was here:
(cd /mnt/smartos-live/proto.strap && gtar xzf /mnt/smartos-live/illumos-adjunct.tgz)
gtar: usr/include/trousers/trousers.h: Cannot change ownership to uid 503, gid 10: Not owner
Is it over NFSv3 or NFSv4?
Updated by Sebastian Feld almost 10 years ago
It happens for NFSv4.
Wouldn't it be a good idea to disable NFSv4 support by default until it becomes stable enough to build Illumos?
Updated by Sebastian Feld almost 10 years ago
Bug #3879 has been filed to disable NFSv4 support until it becomes stable.
Updated by Marcel Telka almost 10 years ago
Sebastian Feld wrote:
Wouldn't it be a good idea to disable NFSv4 support by default until it becomes stable enough to build Illumos?
No.
Updated by Marcel Telka almost 10 years ago
A note by Irek Szczesniak:
As follow up, to keep the discussion warm, an update: If ON is being build other activity on the same NFSv4 share may fail randomly with a permission error. I've tried so with building ON and have a script running with nice -n 19 which creates a directory, unpacks the illumos sources in that directory, deletes them and runs again. Occasionally the unpack fails with a permission error. Type of archive utility doesn't matter, we tried gtar, star and AST pax.
Updated by Garrett D'Amore almost 10 years ago
It would generally be good to know if this problem is specific to NFS client or server functionality.
Updated by Marcel Telka almost 10 years ago
Garrett D'Amore wrote:
It would generally be good to know if this problem is specific to NFS client or server functionality.
I believe this is NFSv4 client issue.
Updated by Garrett D'Amore over 9 years ago
So, the last update with a uid/gid problem is highly indicative of a problem with credentials. NFSv4 uses a different mechanism to ensure that credentials are properly configured, and in particular the issue of "root trust" is handled differently.
Are you sure this is properly configured?
Updated by Jan Kryl over 9 years ago
For those who can reproduce the pax failure everytime they run the build I would suggest to test NFSv4 configuration. Apparently pax/tar tries to change owner of extracted files to user who runs the pax/tar. With NFSv3 there is no problem because it uses numeric UIDs/GIDs over the wire (no translations between UID and username on server side). With NFSv4 the file owner of created file on the server is still governed by numeric UID specified in RPC packet however when obtaining file attributes by NFSv4 GETATTR, UIDs/GIDs are translated to 'name'@'domain' form. This holds only for AUTH_SYS RPC authentication mechanism - the default on solaris. This can lead to strange results when using NFSv4 to create a file on a server when UID/GID of the file on the server is UID/GID of the process on client, however ls -l on the client reports different owner, group. This can happen when IDs are not kept in sync between the client and server. For test I would suggest to create a file on server and see the reported owner/group:
touch /mnt/file
ls -l /mnt/file
If the owner is not the same as reported by "id" then naming is misconfigured and needs to be fixed.
If the problem can be seen only sometimes and cannot be reproduced in deterministic way, then it can have two root causes:
- bug in NFSv4 auth code itself
- temporary outages of naming service
Note that temporary outage of naming service can affect translation of UIDs even after the naming is working correctly again because translation results are cached in kernel for quite a long time (IIRC 5 minutes).
Updated by Andrew Stormont over 9 years ago
Is it only the perl modules that are using pax?