Bug #3463
closedagent lwp clobbers amd64 abi stack redzone
100%
Description
NB: this bug mirrors Joyent's OS-1804.
Overview¶
Using libproc, a process may impose upon another process to run some code. By way of example, let us consider pr_getrctl. This function allows us to use the getrctl system call in the context of our victim subject process.
getrctl calls into Psyscall, which in turn calls Pcreate_agent. Here, we use the proc(4) filesystem to stop the subject process and request the creation of an "agent" LWP. This agent LWP is created with its registers set to the same values as the "Representative LWP", chosen (at random?) from the set of LWPs in the subject process.
At this point we cross from user mode into the kernel and procfs handles the request to create the agent LWP. In pr_agent, we create an LWP and copy in the register values we were handed by libproc:
286 static int 287 pr_control(long cmd, arg_t *argp, prnode_t *pnp, cred_t *cr) 288 { ... 460 case PCAGENT: /* create the /proc agent lwp in the target process */ 461 error = pr_agent(pnp, argp->prgregset, &unlocked); ... 1966 static int 1967 pr_agent(prnode_t *pnp, prgregset_t prgregset, int *unlocked) 1968 { ... 2004 clwp = lwp_create(lwp_rtt, NULL, 0, p, TS_STOPPED, 2005 t->t_pri, &smask, NOCLASS, 0); 2006 if (clwp == NULL) { 2007 mutex_enter(&p->p_lock); 2008 return (ENOMEM); 2009 } 2010 prsetprregs(clwp, prgregset, 1); ...
Note that prsetprregs calls setgregs, which simply copies the new value for %rsp into place:
154 void 155 setgregs(ulwp_t *ulwp, gregset_t rs) 156 { ... 166 status.pr_reg[REG_RSP] = rs[REG_RSP];
Later, when Psyscall is arranging for the injected system call, it configures a stack based at the same %rsp:
353 if (model == PR_MODEL_LP64) { 354 sp = P->status.pr_lwp.pr_reg[R_SP] + STACK_BIAS; 355 sp = PSTACK_ALIGN64(sp); ... 406 ap = Psyscall_setup(P, nargs, sysindex, sp);
The code path becomes more and more exciting from here on, but we've already made the critical error of using the red zone for our system call arguments.
Issue Reproduction¶
This behaviour is easily reproducible with the following C:
#include <stdlib.h> #include <stdio.h> #include <sys/int_types.h> void leafroutine(uint64_t arg0) { volatile unsigned int i = 0xaaf0cf0c; for (; i == 0xaaf0cf0c; ); printf("and out -- %u\\n", i); } int main(int argc, char **argv) { leafroutine(0x1111111122222222); }
Compile thus:
gcc -O2 -mno-omit-leaf-frame-pointer -m64 -o awesome awesome.c
Visually inspect the disassembly to ensure the code tickles the bug:
$ dis -F leafroutine awesome disassembly for awesome leafroutine() leafroutine: 55 pushq %rbp leafroutine+0x1: 48 89 e5 movq %rsp,%rbp leafroutine+0x4: c7 45 fc 0c cf f0 movl $0xaaf0cf0c,-0x4(%rbp) aa ...
(Note in the above that we do not advance the stack pointer, merely using the provision of the 128-byte red zone area as described in the ABI.)
We can easily trigger the problem thus:
prctl -n zone.max-physical-memory $(pgrep awesome)
If we pause the process after it breaks out of the for loop, we can inspect the stack...
> $C fffffd7fffdff9a0 leafroutine+0x23() fffffd7fffdff9b0 main+0x13() fffffd7fffdff9c0 _start+0x6c() > fffffd7fffdff9a0-20,20::dump \\/ 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fffffd7fffdff980: 00000000 0000007a 6f6e652e 6d61782d .......zone.max- fffffd7fffdff990: 0a933aff 7ffdffff 2d6d656d 6f727900 ..:.....-memory.
...and there we see not the sentinel value that we expect, but rather corruption in stack obviously left behind by the induced getrctl system call.
Updated by Joshua M. Clulow almost 11 years ago
Webrev for fix is here: http://cr.illumos.org/~webrev/jclulow/3463/
Replacing /usr/lib/amd64/libproc.so.1 (via lofs(7FS)) with the fixed version allows correct operation of the reproduction code in the issue description. If we inspect the repro program after trying to tickle the bug, we see it still running:
> $C fffffd7fffdff9a0 leafroutine+0x10() fffffd7fffdff9b0 main+0x13() fffffd7fffdff9c0 _start+0x6c()
And the sentinel value is on the stack where we left it:
> leafroutine+0x10::dis leafroutine: pushq %rbp leafroutine+1: movq %rsp,%rbp leafroutine+4: movl $0xaaf0cf0c,-0x4(%rbp) > fffffd7fffdff9a0-4/X 0xfffffd7fffdff99c: aaf0cf0c
And 128 bytes down the stack -- past the red zone -- is the LWP Agent Thread stack data would have become our corruption:
> fffffd7fffdff9a0-a0,a0::dump \\\\/ 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef fffffd7fffdff900: 00000000 0000007a 6f6e652e 6d61782d .......zone.max- fffffd7fffdff910: 0a933aff 7ffdffff 2d6d656d 6f727900 ..:.....-memory. fffffd7fffdff920: 00000000 00000000 00000000 00000000 ................ fffffd7fffdff930: a00e4000 00000000 808233ff 7ffdffff ..@.......3..... fffffd7fffdff940: 00000000 00000000 00000000 00000000 ................ fffffd7fffdff950: 005a33ff 7ffdffff a00e4000 00000000 .Z3.......@..... fffffd7fffdff960: 00000000 00000000 00000000 00000000 ................ fffffd7fffdff970: b0f9dfff 7ffdffff 48a13fff 7ffdffff ........H.?..... fffffd7fffdff980: 90f9dfff 7ffdffff 9e0e4000 00000000 ..........@..... fffffd7fffdff990: c0f9dfff 7ffdffff f30c4000 0ccff0aa ..........@..... fffffd7fffdff9a0: b0f9dfff 7ffdffff 430e4000 00000000 ........C.@.....
Updated by Robert Mustacchi almost 11 years ago
- Status changed from New to Resolved
- % Done changed from 90 to 100
- Tags deleted (
needs-triage)
Resolved in f6dcd367bb2ab63785ea4839111eb4846066404d.