Project

General

Profile

Bug #13508

Updated by Matt Barden 6 months ago

We recently came across an issue where a door server was crashing with a GP fault when attempting to perform an SSE instruction on a stack variable: 

 <pre> 
 libfreebl3.so`gcm_HashMult_hw+0x19e: movaps %xmm3,0x20(%esp) 
 > $r ! grep esp 
 %kesp = 0x00000000 
    %esp = 0xfd9f7ff8 
 </pre> 

 We can see that the stack is not aligned to 16 bytes, and so the memory address is not 16-byte aligned, but, according to https://www.felixcloutier.com/x86/movaps, movaps requires memory locations to be 16-byte aligned. 

 Now, why is the stack unaligned, and to what should it be aligned? http://src.illumos.org/source/xref/illumos-gate/usr/src/lib/libc/i386/gen/makectxt.c#48 says that the IA32 ABI supplement 1.0 tells us the stack should be aligned to 16-bytes prior to any 'call' instruction, so that those functions may make use of SSE instructions without needing to re-align the stack. GCC realigns the stack at any 'entry' function (e.g. program start or thread start, or any function explicitly marked as requiring realignment), and then assumes that the stack is aligned on lower functions in the call stack, maintaining the existing alignment on further calls. By default, GCC aligns the stack to 16-bytes on both i386 and amd64. 

 So what's the stack aligned to on the various threads? 

 <pre> 
 > 0xfd9f8090::stackregs ! tail -3 
 fd9fdcd0 lookup+0x123(fd9fdd48, b8, 0, 1) 
 fd9fdd20 switcher+0xfd(deadbeed, fd9fdd48, b8, 0, 0, 806e4e4) 
 00000000 libc.so.1`__door_return+0x4b() 
 > 6::findstack -v ! tail -3 
   fd1fdcd8 lookup+0x180(fd1fdd50, b0, 0, 1) 
   fd1fdd28 switcher+0xfd(deadbeed, fd1fdd50, b0, 0, 0, 806e4e4) 
   00000000 libc.so.1`__door_return+0x4b() 
 </pre> 

 %ebp (which is %esp on function entry + 8 (4-byte return address + 4-byte saved frame pointer) ends in 0x8 for all other threads in the system - including one other thread with the same stack head as ours. However, the crashing thread has an %ebp aligned to 16 - which means the stack is aligned to 8. The unaligned stack also has an arg_size 8 bytes larger than the aligned one; therefore, the door code isn't properly aligning the stack to 16. 

 This issue has previously been fixed in #6507 "i386 makecontext(3c) needs to 16-byte align the stack", which fixed it for makecontext, and "6881217 32bit stack frames should be aligned on 16-byte boundaries (for sse2 code)", which fixed it for thread create functions, but neither of these fixed this for door calls. 

 The door code does not begin execution in a new thread in the same manner that a process or thread does; instead, it determines the layout of the stack (on return from the door_return syscall) here:    http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/doorfs/door_sys.c#1121 

 <pre> 
 /* 
  * Writes the stack layout for door_return() into the door_server_t of the 
  * server thread. 
  */ 
 static int 
 door_layout(kthread_t *tp, size_t data_size, uint_t ndesc, int info_needed) 
 { 
 ... 
         size_t align = STACK_ALIGN; 
         size_t results_sz = sizeof (struct door_results); 
         model_t datamodel = lwp_getdatamodel(ttolwp(tp));          ASSERT(!st->d_layout_done);#ifndef _STACK_GROWS_DOWNWARD 
 #error stack does not grow downward, door_layout() must change 
 #endif#ifdef _SYSCALL32_IMPL 
         if (datamodel != DATAMODEL_NATIVE) { 
                 align = STACK_ALIGN32; 
                 results_sz = sizeof (struct door_results32); 
         } 
 #endif 
 ... 

         resultsp = P2ALIGN(infop - results_sz, align); 
 </pre> 

 Once the door_results struct is written to the stack, the first four values in door_results double as the first four arguments to the door function (as %sp points to the beginning of the structure on return to userland).    Since this is a 32-bit program, the door result gets 4-byte aligned (http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/intel/ia32/sys/stack.h#64); it should instead align it to 16 bytes for i386 targets. 

 Attached is a test program to reproduce the issue. Compile with "gcc -O2 -msse2 test.c -o test-unaligned". With GCC-4.4.4, the program crashes; with GCC7, the function with an __m128i stack variable gets its stack explicitly aligned to 16 bytes, but one can still verify that the stack is not aligned to 16 bytes prior to this.

Back