Bug #2757
open32bit sparc direct system calls should not rely on compiler mannerisms
50%
Description
Background¶
Most illumos system calls take their arguments via the normal C ABI calling
conventions, rather than the traditional UNIX 'uap
' method.
When making a system call across an ABI boundary, things are thus momentarily
complicated, as the caller is passing arguments according to one ABI, but the
callee is expecting them via another.
On SPARC negative numbers are always two's complement in the full register
width (32bits on v7, 64bits on v9). When a system call is made from a v7
(32bit) process to a v9 (64bit) kernel, we are currently relying on the
compiler to sign-extend the integer arguments such that they remain negative
even as the register logically doubles in size.
Problem¶
From the point of view of a compiler, sign-extension of function arguments is
not necessary, consider:
- Any negative number is negative across the whole register width
- Any function argument is already contained in an GP register
Thus any argument is necessarily already appropriately extended, probably at
the time of the load into the register via (ldsw
, ldsh
, etc.). From the
point of view of the compiler, a function call from a completely separate ABI
is a thing that (quite logically) cannot happen. GCC since 4.2 has noticed
that this sign-extension is wasteful, and ceased doing it. It is unreasonable
to expect that any other compiler would continue to do something which, to its
own analysis, is unnecessary.
Given a compiler which does not do this sign-extension, negative numbers
passed to system calls from 32bit processes will be massively positive as seen
by the kernel (they'll be treated as if unsigned, basically).
A simple example, using hex literals to make the distinctions clearer:
A 32 bit process does:
lseek(fd, 0xfffffffb, SEEK_CUR);
/* That is ... */
lseek(fd, -5, SEEK_CUR);
The (64 bit) kernel sees:
lseek32(fd, 0xfffffffb, SEEK_CUR);
/* However, that's now ... */
lseek32(fd, 4294967291, SEEK_CUR);
Correctness requires that the 2nd argument be sign-extended to0xfffffffffffffffb
, and thus remain -5
.
Solution¶
We need to sign-extend the arguments as the system call is made, prior to the
function itself being called. It is unfortunately not possible to use wrapper
functions or the like to do this, as loadable system calls give us no
mechanism to force the generation of a wrapper, or the matching of a wrapper
with the needs of an arbitrary call.
Instead, we've chosen to use 6 of the free bits in sy_flags
on 64bit kernels
as a mask to indicate which arguments must be extended, and to honour those
flags in syscall_trap32
(the main system call trap), loadable_syscall
through which the first call of a loadable syscall, and all calls of an
unloadable syscall are vectored, and indir
the indirect system call
retained for binary compatibility with... something (I have been unable to
determine what)
Further changes¶
As part of this, syscall_trap32
will be adjusted to only load those
arguments which the sy_narg
field indicates are used to minimize performance
impact as much as we can. This has repercussions:
- Before a loadable system call is loaded it is declared to be of 0 arguments (to mean "any number of..."), this is somewhat baked into the system, so a call of legitimately no arguments (such as `getpid`) will unfortunately load all 6 potential arguments.
- The indirect system call,
indir
(SYS_syscall
) is exceptionally confusingly declared of 1 argument (the actual call), this must be changed to declare it as taking 6 arguments, as the number of arguments of the eventual call is unknowable. (Much like with loadable syscalls, above).
The current code reloads syscall arguments separately for both the normal case
and the pre_syscall case. We've adjusted syscall_trap32
such that we jump
back and use the same code to do the reloads, rather than doing the reloads
separately and jumping to just after.
Interactions¶
traptrace¶
traptrace records are created prior to any extension taking place. This
should not be a problem, as they are also created prior to arguments being
potentially reloaded.
DTrace¶
The DTrace syscall provider works by interposing in the sysent
andsysent32
tables and wrapping any system calls present. This means it will
see the extended values where previously it saw non-extended values.
The current semi-prototype code is truncating these arguments to the syscall
provider to maintain precise compatibility. ahl points out that that's not
strictly necessary, as the syscall provider's arguments are private
details:
ID PROVIDER MODULE FUNCTION NAME 188 syscall lseek entry Probe Description Attributes Identifier Names: Private Data Semantics: Private Dependency Class: ISA Argument Attributes Identifier Names: Private Data Semantics: Private Dependency Class: ISA Argument Types None
And further any realistic use of them should have included casts to the
appropriate type.
procfs
¶
When a process is stopped on system call entry (PCSENTRY
, libproc(4)'sPsysentry()
), procfs
is specified to stop after the system call has begun
but before the arguments have been retrieved from the lwp, such that the
controlling process may change the arguments to the system call.
This happily works out just fine, as we stop via pre_syscall
and do the
argument loading and extension after that has completed in that path throughsyscall_trap32
.
Related issues
Updated by Rich Lowe almost 11 years ago
- Status changed from New to In Progress
- % Done changed from 40 to 50
Updated by Rich Lowe almost 11 years ago
- Status changed from In Progress to Feedback
- Difficulty changed from Hard to Expert
Because of some implementation quirks, we're going to be doing this in the compilers for 4.4.4, at the least.
Updated by Rich Lowe over 10 years ago
The implementation quirk referenced above, but not described is this:
Given types such as intptr_t, most importantly used by ioctl, we have a type which must be sign extended (it is not unsigned), but where a 32bit pointer is always going to thus be corrupted (in that the extension now makes it incorrect).
This leaves us with a couple of options:
- Every user/kernel copy implementation (we have a surprising number), needs to learn that copies from a 32bit process should have the high bits of the address 0'd such that only user memory is referenced (currently, this does not occur)
- Every implementation of ioctl, where a pointer is expected and a process is 32bit, needs to be casting through caddr32_t to the same effect.
Both of these will then require extensive testing.
If anyone who cares about SPARC a lot wants to take this bug from me, email me, and I can provide:
- all the syscall trap changes, etc, including information on who code reviewed them, how they were very thoroughly tested, etc.
- A probably better description of the work needed to deal with the problem outlined in this comment
- A vague prototype of the changes to copyin/copyout and all of its friends, probably.
Updated by Rich Lowe almost 4 years ago
- Assignee deleted (
Rich Lowe) - Parent task deleted (
#1450)
Updated by Klaus Ziegler over 1 year ago
- Status changed from Feedback to In Progress
Updated by Toomas Soome over 1 year ago
- Related to Feature #14141: build utmpd as 64-bit binary added