Project

General

Profile

Bug #2757

32bit sparc direct system calls should not rely on compiler mannerisms

Added by Rich Lowe over 8 years ago. Updated over 1 year ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Category:
kernel
Start date:
2012-05-21
Due date:
% Done:

50%

Estimated time:
Difficulty:
Expert
Tags:
Gerrit CR:

Description

Background

Most illumos system calls take their arguments via the normal C ABI calling
conventions, rather than the traditional UNIX 'uap' method.

When making a system call across an ABI boundary, things are thus momentarily
complicated, as the caller is passing arguments according to one ABI, but the
callee is expecting them via another.

On SPARC negative numbers are always two's complement in the full register
width (32bits on v7, 64bits on v9). When a system call is made from a v7
(32bit) process to a v9 (64bit) kernel, we are currently relying on the
compiler to sign-extend the integer arguments such that they remain negative
even as the register logically doubles in size.

Problem

From the point of view of a compiler, sign-extension of function arguments is
not necessary, consider:

  • Any negative number is negative across the whole register width
  • Any function argument is already contained in an GP register

Thus any argument is necessarily already appropriately extended, probably at
the time of the load into the register via (ldsw, ldsh, etc.). From the
point of view of the compiler, a function call from a completely separate ABI
is a thing that (quite logically) cannot happen. GCC since 4.2 has noticed
that this sign-extension is wasteful, and ceased doing it. It is unreasonable
to expect that any other compiler would continue to do something which, to its
own analysis, is unnecessary.

Given a compiler which does not do this sign-extension, negative numbers
passed to system calls from 32bit processes will be massively positive as seen
by the kernel (they'll be treated as if unsigned, basically).

A simple example, using hex literals to make the distinctions clearer:

A 32 bit process does:


lseek(fd, 0xfffffffb, SEEK_CUR);

/* That is ... */
lseek(fd, -5, SEEK_CUR);

The (64 bit) kernel sees:


lseek32(fd, 0xfffffffb, SEEK_CUR);

/* However, that's now ... */
lseek32(fd, 4294967291, SEEK_CUR);

Correctness requires that the 2nd argument be sign-extended to
0xfffffffffffffffb, and thus remain -5.

Solution

We need to sign-extend the arguments as the system call is made, prior to the
function itself being called. It is unfortunately not possible to use wrapper
functions or the like to do this, as loadable system calls give us no
mechanism to force the generation of a wrapper, or the matching of a wrapper
with the needs of an arbitrary call.

Instead, we've chosen to use 6 of the free bits in sy_flags on 64bit kernels
as a mask to indicate which arguments must be extended, and to honour those
flags in syscall_trap32 (the main system call trap), loadable_syscall
through which the first call of a loadable syscall, and all calls of an
unloadable syscall are vectored, and indir the indirect system call
retained for binary compatibility with... something (I have been unable to
determine what)

Further changes

As part of this, syscall_trap32 will be adjusted to only load those
arguments which the sy_narg field indicates are used to minimize performance
impact as much as we can. This has repercussions:

  • Before a loadable system call is loaded it is declared to be of 0 arguments (to mean "any number of..."), this is somewhat baked into the system, so a call of legitimately no arguments (such as `getpid`) will unfortunately load all 6 potential arguments.
  • The indirect system call, indir (SYS_syscall) is exceptionally confusingly declared of 1 argument (the actual call), this must be changed to declare it as taking 6 arguments, as the number of arguments of the eventual call is unknowable. (Much like with loadable syscalls, above).

The current code reloads syscall arguments separately for both the normal case
and the pre_syscall case. We've adjusted syscall_trap32 such that we jump
back and use the same code to do the reloads, rather than doing the reloads
separately and jumping to just after.

Interactions

traptrace

traptrace records are created prior to any extension taking place. This
should not be a problem, as they are also created prior to arguments being
potentially reloaded.

DTrace

The DTrace syscall provider works by interposing in the sysent and
sysent32 tables and wrapping any system calls present. This means it will
see the extended values where previously it saw non-extended values.

The current semi-prototype code is truncating these arguments to the syscall
provider to maintain precise compatibility. ahl points out that that's not
strictly necessary, as the syscall provider's arguments are private
details:

   ID   PROVIDER            MODULE                          FUNCTION NAME
  188    syscall                                               lseek entry

    Probe Description Attributes
        Identifier Names: Private
        Data Semantics:   Private
        Dependency Class: ISA

    Argument Attributes
        Identifier Names: Private
        Data Semantics:   Private
        Dependency Class: ISA

    Argument Types
        None

And further any realistic use of them should have included casts to the
appropriate type.

procfs

When a process is stopped on system call entry (PCSENTRY, libproc(4)'s
Psysentry()), procfs is specified to stop after the system call has begun
but before the arguments have been retrieved from the lwp, such that the
controlling process may change the arguments to the system call.

This happily works out just fine, as we stop via pre_syscall and do the
argument loading and extension after that has completed in that path through
syscall_trap32.

#1

Updated by Rich Lowe over 8 years ago

  • Status changed from New to In Progress
  • % Done changed from 40 to 50
#2

Updated by Rich Lowe over 8 years ago

  • Status changed from In Progress to Feedback
  • Difficulty changed from Hard to Expert

Because of some implementation quirks, we're going to be doing this in the compilers for 4.4.4, at the least.

#3

Updated by Rich Lowe over 8 years ago

The implementation quirk referenced above, but not described is this:

Given types such as intptr_t, most importantly used by ioctl, we have a type which must be sign extended (it is not unsigned), but where a 32bit pointer is always going to thus be corrupted (in that the extension now makes it incorrect).

This leaves us with a couple of options:

  • Every user/kernel copy implementation (we have a surprising number), needs to learn that copies from a 32bit process should have the high bits of the address 0'd such that only user memory is referenced (currently, this does not occur)
  • Every implementation of ioctl, where a pointer is expected and a process is 32bit, needs to be casting through caddr32_t to the same effect.

Both of these will then require extensive testing.

If anyone who cares about SPARC a lot wants to take this bug from me, email me, and I can provide:

  • all the syscall trap changes, etc, including information on who code reviewed them, how they were very thoroughly tested, etc.
  • A probably better description of the work needed to deal with the problem outlined in this comment
  • A vague prototype of the changes to copyin/copyout and all of its friends, probably.
#4

Updated by Rich Lowe over 1 year ago

  • Assignee deleted (Rich Lowe)
  • Parent task deleted (#1450)

Also available in: Atom PDF