Project

General

Profile

Bug #3858

Can't build nightly over NFSv4

Added by Josef Sipek over 7 years ago. Updated over 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
nfs - NFS server and client
Start date:
2013-07-01
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

Trying to build the gate over NFS fails with:

$ cat mail_msg 

==== Nightly distributed build started:   Mon Jul  1 13:55:35 UTC 2013 ====
==== Nightly distributed build completed: Mon Jul  1 16:48:54 UTC 2013 ====

==== Total build time ====

real    2:53:18

==== Build environment ====

/usr/bin/uname
SunOS illumos-build 5.11 oi_151a8 i86pc i386 i86pc

/opt/SUNWspro/bin/dmake
dmake: Sun Distributed Make 7.8 SunOS_i386 Patch 126504-01 2007/07/19
number of concurrent jobs = 8

32-bit compiler
/opt/onbld/bin/i386/cw -_gcc
cw version 1.29 (SHADOW MODE DISABLED)
primary: /opt/gcc/4.4.4/bin/gcc
gcc (Illumos gcc-4.4.4-il-3) 4.4.4

64-bit compiler
/opt/onbld/bin/i386/cw -_gcc
cw version 1.29 (SHADOW MODE DISABLED)
primary: /opt/gcc/4.4.4/bin/gcc
gcc (Illumos gcc-4.4.4-il-3) 4.4.4

/usr/java/bin/javac
java full version "1.6.0_26-b03" 

/usr/ccs/bin/as
as: Sun Compiler Common 12 SunOS_i386 snv_121 08/03/2009

/usr/ccs/bin/ld
ld: Software Generation Utilities - Solaris Link Editors: 5.11-1.1739 (illumos)

Build project:  default
Build taskid:   183

==== Nightly argument issues ====

==== Build version ====

/home/jeffpc/illumos-dev

==== Make clobber ERRORS ====

==== Make tools clobber ERRORS ====

==== Tools build errors ====

==== Build errors (non-DEBUG) ====

dmake: Warning: Target `install' not remade because of errors
The following command caused the error:
dmake: Warning: Command failed for target `5.10.0'
dmake: Warning: Target `install' not remade because of errors
The following command caused the error:
dmake: Warning: Command failed for target `perl'
dmake: Warning: Target `install' not remade because of errors
The following command caused the error:
dmake: Warning: Target `install' not remade because of errors

==== Build warnings (non-DEBUG) ====

dmake: Warning: Target `install' not remade because of errors
dmake: Warning: Command failed for target `5.10.0'
dmake: Warning: Target `install' not remade because of errors
dmake: Warning: Command failed for target `perl'
dmake: Warning: Target `install' not remade because of errors
dmake: Warning: Target `install' not remade because of errors

==== Elapsed build time (non-DEBUG) ====

real  2:23:26.0
user  2:24:27.6
sys   1:13:18.5

==== Build noise differences (non-DEBUG) ====

11c11
< LD_RUN_PATH="/usr/lib:/lib" /home/jeffpc/illumos-dev/upstream/usr/src/tools/proto/root_i386-nd/opt/onbld/bin/i386/cw -_cc  -G Project.o  -o blib/arch/auto/Sun/Solaris/Project/Project.so     \\
---
> LD_RUN_PATH="/usr/lib:/lib" /home/jeffpc/illumos-dev/upstream/usr/src/tools/proto/root_i386-nd/opt/onbld/bin/i386/cw -_gcc  -G Project.o  -o blib/arch/auto/Sun/Solaris/Project/Project.so     \\

==== cstyle/hdrchk errors ====

==== Find core files ====

==== Check lists of files ====

nightly.log shows:

illumos-build --> 1 job
pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr : unable to preserve owner/group
pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5 : unable to preserve owner/group
pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0 : unable to preserve owner/group
pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib : unable to preserve owner/group
pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int : unable to preserve owner/group
pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/auto : unable to preserve owner/group
pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/auto/Sun : unable to preserve owner/group
pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/auto/Sun/Solaris : unable to preserve owner/group
pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/auto/Sun/Solaris/Utils : unable to preserve owner/group
pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/auto/Sun/Solaris/Utils/Utils.bs : unable to preserve owner/group
pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/auto/Sun/Solaris/Utils/Utils.so : unable to preserve owner/group
pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/Sun : unable to preserve owner/group
pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/Sun/Solaris : unable to preserve owner/group
pax: /home/jeffpc/illumos-dev/upstream/proto/root_i386/usr/perl5/5.10.0/lib/i86pc-solaris-64int/Sun/Solaris/Utils.pm : unable to preserve owner/group
*** Error code 1
dmake: Warning: Target `install' not remade because of errors
Current working directory /home/jeffpc/illumos-dev/upstream/usr/src/cmd/perl/5.10.0
*** Error code 1
The following command caused the error:
cd 5.10.0; pwd; PERL_VERSION=5.10.0 dmake install
dmake: Warning: Command failed for target `5.10.0'

Related issues

Related to illumos gate - Bug #3879: Disable NFSv4 support until it becomes usableClosedMarcel Telka2013-07-11

Actions
#1

Updated by Marcel Telka over 7 years ago

  • Category set to nfs - NFS server and client
#2

Updated by Josef Sipek over 7 years ago

My guess is that this is a build system issue not an NFS client/server issue.

#3

Updated by Marcel Telka over 7 years ago

Josef Sipek wrote:

My guess is that this is a build system issue not an NFS client/server issue.

IIRC, there were few (one or two) bugs fixed in Solaris (since August 2010) related to build over NFS. I think they were fixed at the NFS client side.

#4

Updated by Piotr Jasiukajtis over 7 years ago

I build ON on a NFSv3 mountpoint without any issues.

#5

Updated by Jim Klimov over 7 years ago

My guess would be race-condition type of things between parallel make jobs - at least, I have seen similar behavior with building other projects over NFS which were resolved by doing sequential make with only one job allowed. I've detailed my guesses on the mailing list http://www.listbox.com/member/archive/182179/2013/07/sort/time_rev/page/1/entry/5:39/20130702161827:8AFB08E6-E354-11E2-94A3-CC978D77862B/ . and just wanted to leave a note here ;)

For those of you who report successes - do your NFS servers include fast SLOG devices (SSD or DDR-drives), or do they disable sync on the build datasets, so as to reduce or eliminate the window of opportunity for race conditions like those I think I faced while building on "slow-sync" NFS servers? Another possibility noted here and in the thread - NFSv3 vs. NFSv4...

#6

Updated by Marcel Telka over 7 years ago

  • Subject changed from can't build nightly over NFS to Can't build nightly over NFSv4
#7

Updated by Ian Collins over 7 years ago

Attempting to build SmartOS over NFS, my failure (same on two runs) was here:

(cd /mnt/smartos-live/proto.strap && gtar xzf /mnt/smartos-live/illumos-adjunct.tgz)
gtar: usr/include/trousers/trousers.h: Cannot change ownership to uid 503, gid 10: Not owner
....

Which wouldn't have been running an parallel builds. The server has decent log devices and is connected over 10GE. The only parts of the build that excite the log devices are the gtar calls. I don't know where uid 503 comes from.

#8

Updated by Marcel Telka over 7 years ago

Ian Collins wrote:

Attempting to build SmartOS over NFS, my failure (same on two runs) was here:

(cd /mnt/smartos-live/proto.strap && gtar xzf /mnt/smartos-live/illumos-adjunct.tgz)
gtar: usr/include/trousers/trousers.h: Cannot change ownership to uid 503, gid 10: Not owner

Is it over NFSv3 or NFSv4?

#9

Updated by Sebastian Feld over 7 years ago

It happens for NFSv4.

Wouldn't it be a good idea to disable NFSv4 support by default until it becomes stable enough to build Illumos?

#10

Updated by Sebastian Feld over 7 years ago

Bug #3879 has been filed to disable NFSv4 support until it becomes stable.

#11

Updated by Marcel Telka over 7 years ago

Sebastian Feld wrote:

Wouldn't it be a good idea to disable NFSv4 support by default until it becomes stable enough to build Illumos?

No.

#12

Updated by Marcel Telka over 7 years ago

A note by Irek Szczesniak:

As follow up, to keep the discussion warm, an update: If ON is being
build other activity on the same NFSv4 share may fail randomly with a
permission error. I've tried so with building ON and have a script
running with nice -n 19 which creates a directory, unpacks the illumos
sources in that directory, deletes them and runs again. Occasionally
the unpack fails with a permission error. Type of archive utility
doesn't matter, we tried gtar, star and AST pax.
#13

Updated by Garrett D'Amore over 7 years ago

It would generally be good to know if this problem is specific to NFS client or server functionality.

#14

Updated by Marcel Telka over 7 years ago

Garrett D'Amore wrote:

It would generally be good to know if this problem is specific to NFS client or server functionality.

I believe this is NFSv4 client issue.

#15

Updated by Garrett D'Amore almost 7 years ago

So, the last update with a uid/gid problem is highly indicative of a problem with credentials. NFSv4 uses a different mechanism to ensure that credentials are properly configured, and in particular the issue of "root trust" is handled differently.

Are you sure this is properly configured?

#16

Updated by Jan Kryl almost 7 years ago

For those who can reproduce the pax failure everytime they run the build I would suggest to test NFSv4 configuration. Apparently pax/tar tries to change owner of extracted files to user who runs the pax/tar. With NFSv3 there is no problem because it uses numeric UIDs/GIDs over the wire (no translations between UID and username on server side). With NFSv4 the file owner of created file on the server is still governed by numeric UID specified in RPC packet however when obtaining file attributes by NFSv4 GETATTR, UIDs/GIDs are translated to 'name'@'domain' form. This holds only for AUTH_SYS RPC authentication mechanism - the default on solaris. This can lead to strange results when using NFSv4 to create a file on a server when UID/GID of the file on the server is UID/GID of the process on client, however ls -l on the client reports different owner, group. This can happen when IDs are not kept in sync between the client and server. For test I would suggest to create a file on server and see the reported owner/group:

touch /mnt/file
ls -l /mnt/file

If the owner is not the same as reported by "id" then naming is misconfigured and needs to be fixed.
If the problem can be seen only sometimes and cannot be reproduced in deterministic way, then it can have two root causes:

  • bug in NFSv4 auth code itself
  • temporary outages of naming service

Note that temporary outage of naming service can affect translation of UIDs even after the naming is working correctly again because translation results are cached in kernel for quite a long time (IIRC 5 minutes).

#17

Updated by Andrew Stormont over 6 years ago

Is it only the perl modules that are using pax?

Also available in: Atom PDF