Bug #13508
closeddoor_layout() should align the stack to 16 bytes for i386 processes
100%
Description
We recently came across an issue where a door server was crashing with a GP fault when attempting to perform an SSE instruction on a stack variable:
libfreebl3.so`gcm_HashMult_hw+0x19e: movaps %xmm3,0x20(%esp) > $r ! grep esp %kesp = 0x00000000 %esp = 0xfd9f7ff8
We can see that the stack is not aligned to 16 bytes, and so the memory address is not 16-byte aligned, but, according to https://www.felixcloutier.com/x86/movaps, movaps requires memory locations to be 16-byte aligned.
Now, why is the stack unaligned, and to what should it be aligned? http://src.illumos.org/source/xref/illumos-gate/usr/src/lib/libc/i386/gen/makectxt.c#48 says that the IA32 ABI supplement 1.0 tells us the stack should be aligned to 16-bytes prior to any 'call' instruction, so that those functions may make use of SSE instructions without needing to re-align the stack. GCC realigns the stack at any 'entry' function (e.g. program start or thread start, or any function explicitly marked as requiring realignment), and then assumes that the stack is aligned on lower functions in the call stack, maintaining the existing alignment on further calls. By default, GCC aligns the stack to 16-bytes on both i386 and amd64.
So what's the stack aligned to on the various threads?
> 0xfd9f8090::stackregs ! tail -3 fd9fdcd0 lookup+0x123(fd9fdd48, b8, 0, 1) fd9fdd20 switcher+0xfd(deadbeed, fd9fdd48, b8, 0, 0, 806e4e4) 00000000 libc.so.1`__door_return+0x4b() > 6::findstack -v ! tail -3 fd1fdcd8 lookup+0x180(fd1fdd50, b0, 0, 1) fd1fdd28 switcher+0xfd(deadbeed, fd1fdd50, b0, 0, 0, 806e4e4) 00000000 libc.so.1`__door_return+0x4b()
%ebp (which is %esp on function entry + 8 (4-byte return address + 4-byte saved frame pointer) ends in 0x8 for all other threads in the system - including one other thread with the same stack head as ours. However, the crashing thread has an %ebp aligned to 16 - which means the stack is aligned to 8. The unaligned stack also has an arg_size 8 bytes larger than the aligned one; therefore, the door code isn't properly aligning the stack to 16.
This issue has previously been fixed in #6507 "i386 makecontext(3c) needs to 16-byte align the stack", which fixed it for makecontext, and "6881217 32bit stack frames should be aligned on 16-byte boundaries (for sse2 code)", which fixed it for thread create functions, but neither of these fixed this for door calls.
The door code does not begin execution in a new thread in the same manner that a process or thread does; instead, it determines the layout of the stack (on return from the door_return syscall) here: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/doorfs/door_sys.c#1121
/* * Writes the stack layout for door_return() into the door_server_t of the * server thread. */ static int door_layout(kthread_t *tp, size_t data_size, uint_t ndesc, int info_needed) { ... size_t align = STACK_ALIGN; size_t results_sz = sizeof (struct door_results); model_t datamodel = lwp_getdatamodel(ttolwp(tp)); ASSERT(!st->d_layout_done);#ifndef _STACK_GROWS_DOWNWARD #error stack does not grow downward, door_layout() must change #endif#ifdef _SYSCALL32_IMPL if (datamodel != DATAMODEL_NATIVE) { align = STACK_ALIGN32; results_sz = sizeof (struct door_results32); } #endif ... resultsp = P2ALIGN(infop - results_sz, align);
Once the door_results struct is written to the stack, the first four values in door_results double as the first four arguments to the door function (as %sp points to the beginning of the structure on return to userland). Since this is a 32-bit program, the door result gets 4-byte aligned (http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/intel/ia32/sys/stack.h#64); it should instead align it to 16 bytes for i386 targets.
Attached is a test program to reproduce the issue. Compile with "/opt/gcc-4.4.4/bin/gcc -O2 -msse2 test.c -o test-unaligned". With GCC-4.4.4, the program crashes; with GCC7, the function with an __m128i stack variable gets its stack explicitly aligned to 16 bytes, but one can still verify that the stack is not aligned to 16 bytes prior to this.
Files
Updated by Gordon Ross over 2 years ago
One thing I noticed while looking at this was that STACK_ALIGN
http://src.illumos.org/source/s?defs=STACK_ALIGN&project=illumos-gate
seems to be redefined most places it's used.
Should we just change it to 16 and stop redefining?
Updated by Joshua M. Clulow over 2 years ago
Updated by Matt Barden about 2 years ago
Updated by Dan McDonald about 2 years ago
Additional testing:
I peformed "ikeadm dump" operations on various things (algorithms, rules, certificate-cache) which use doors between ikeadm(1M) and the closed-binary in.iked(1M). No noticeable behavior differences between pre-this-fix, and after-this-fix platform images I used on SmartOS.
Updated by Electric Monk about 2 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit 915894ef19890baaed00080f85f6b69e225cda98
commit 915894ef19890baaed00080f85f6b69e225cda98 Author: Matt Barden <mbarden@tintri.com> Date: 2021-03-12T19:31:59.000Z 13508 door_layout() should align the stack to 16 bytes for i386 processes Reviewed by: Robert Mustacchi <rm+illumos@fingolfin.org> Reviewed by: Andy Fiddaman <andy@omniosce.org> Reviewed by: Gordon Ross <gordon.ross@tintri.com> Approved by: Dan McDonald <danmcd@joyent.com>