32bit sparc direct system calls should not rely on compiler mannerisms
Most illumos system calls take their arguments via the normal C ABI calling
conventions, rather than the traditional UNIX '
When making a system call across an ABI boundary, things are thus momentarily
complicated, as the caller is passing arguments according to one ABI, but the
callee is expecting them via another.
On SPARC negative numbers are always two's complement in the full register
width (32bits on v7, 64bits on v9). When a system call is made from a v7
(32bit) process to a v9 (64bit) kernel, we are currently relying on the
compiler to sign-extend the integer arguments such that they remain negative
even as the register logically doubles in size.
From the point of view of a compiler, sign-extension of function arguments is
not necessary, consider:
- Any negative number is negative across the whole register width
- Any function argument is already contained in an GP register
Thus any argument is necessarily already appropriately extended, probably at
the time of the load into the register via (
ldsh, etc.). From the
point of view of the compiler, a function call from a completely separate ABI
is a thing that (quite logically) cannot happen. GCC since 4.2 has noticed
that this sign-extension is wasteful, and ceased doing it. It is unreasonable
to expect that any other compiler would continue to do something which, to its
own analysis, is unnecessary.
Given a compiler which does not do this sign-extension, negative numbers
passed to system calls from 32bit processes will be massively positive as seen
by the kernel (they'll be treated as if unsigned, basically).
A simple example, using hex literals to make the distinctions clearer:
A 32 bit process does:
lseek(fd, 0xfffffffb, SEEK_CUR); /* That is ... */ lseek(fd, -5, SEEK_CUR);
The (64 bit) kernel sees:
lseek32(fd, 0xfffffffb, SEEK_CUR); /* However, that's now ... */ lseek32(fd, 4294967291, SEEK_CUR);
Correctness requires that the 2nd argument be sign-extended to
0xfffffffffffffffb, and thus remain
We need to sign-extend the arguments as the system call is made, prior to the
function itself being called. It is unfortunately not possible to use wrapper
functions or the like to do this, as loadable system calls give us no
mechanism to force the generation of a wrapper, or the matching of a wrapper
with the needs of an arbitrary call.
Instead, we've chosen to use 6 of the free bits in
sy_flags on 64bit kernels
as a mask to indicate which arguments must be extended, and to honour those
syscall_trap32 (the main system call trap),
through which the first call of a loadable syscall, and all calls of an
unloadable syscall are vectored, and
indir the indirect system call
retained for binary compatibility with... something (I have been unable to
As part of this,
syscall_trap32 will be adjusted to only load those
arguments which the
sy_narg field indicates are used to minimize performance
impact as much as we can. This has repercussions:
- Before a loadable system call is loaded it is declared to be of 0 arguments (to mean "any number of..."), this is somewhat baked into the system, so a call of legitimately no arguments (such as `getpid`) will unfortunately load all 6 potential arguments.
- The indirect system call,
SYS_syscall) is exceptionally confusingly declared of 1 argument (the actual call), this must be changed to declare it as taking 6 arguments, as the number of arguments of the eventual call is unknowable. (Much like with loadable syscalls, above).
The current code reloads syscall arguments separately for both the normal case
and the pre_syscall case. We've adjusted
syscall_trap32 such that we jump
back and use the same code to do the reloads, rather than doing the reloads
separately and jumping to just after.
traptrace records are created prior to any extension taking place. This
should not be a problem, as they are also created prior to arguments being
The DTrace syscall provider works by interposing in the
sysent32 tables and wrapping any system calls present. This means it will
see the extended values where previously it saw non-extended values.
The current semi-prototype code is truncating these arguments to the syscall
provider to maintain precise compatibility. ahl points out that that's not
strictly necessary, as the syscall provider's arguments are private
ID PROVIDER MODULE FUNCTION NAME 188 syscall lseek entry Probe Description Attributes Identifier Names: Private Data Semantics: Private Dependency Class: ISA Argument Attributes Identifier Names: Private Data Semantics: Private Dependency Class: ISA Argument Types None
And further any realistic use of them should have included casts to the
When a process is stopped on system call entry (
procfs is specified to stop after the system call has begun
but before the arguments have been retrieved from the lwp, such that the
controlling process may change the arguments to the system call.
This happily works out just fine, as we stop via
pre_syscall and do the
argument loading and extension after that has completed in that path through
Updated by Rich Lowe about 9 years ago
The implementation quirk referenced above, but not described is this:
Given types such as intptr_t, most importantly used by ioctl, we have a type which must be sign extended (it is not unsigned), but where a 32bit pointer is always going to thus be corrupted (in that the extension now makes it incorrect).
This leaves us with a couple of options:
- Every user/kernel copy implementation (we have a surprising number), needs to learn that copies from a 32bit process should have the high bits of the address 0'd such that only user memory is referenced (currently, this does not occur)
- Every implementation of ioctl, where a pointer is expected and a process is 32bit, needs to be casting through caddr32_t to the same effect.
Both of these will then require extensive testing.
If anyone who cares about SPARC a lot wants to take this bug from me, email me, and I can provide:
- all the syscall trap changes, etc, including information on who code reviewed them, how they were very thoroughly tested, etc.
- A probably better description of the work needed to deal with the problem outlined in this comment
- A vague prototype of the changes to copyin/copyout and all of its friends, probably.