Project

General

Profile

Actions

Bug #13508

closed

door_layout() should align the stack to 16 bytes for i386 processes

Added by Matt Barden 6 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

We recently came across an issue where a door server was crashing with a GP fault when attempting to perform an SSE instruction on a stack variable:

libfreebl3.so`gcm_HashMult_hw+0x19e: movaps %xmm3,0x20(%esp)
> $r ! grep esp
%kesp = 0x00000000
   %esp = 0xfd9f7ff8

We can see that the stack is not aligned to 16 bytes, and so the memory address is not 16-byte aligned, but, according to https://www.felixcloutier.com/x86/movaps, movaps requires memory locations to be 16-byte aligned.

Now, why is the stack unaligned, and to what should it be aligned? http://src.illumos.org/source/xref/illumos-gate/usr/src/lib/libc/i386/gen/makectxt.c#48 says that the IA32 ABI supplement 1.0 tells us the stack should be aligned to 16-bytes prior to any 'call' instruction, so that those functions may make use of SSE instructions without needing to re-align the stack. GCC realigns the stack at any 'entry' function (e.g. program start or thread start, or any function explicitly marked as requiring realignment), and then assumes that the stack is aligned on lower functions in the call stack, maintaining the existing alignment on further calls. By default, GCC aligns the stack to 16-bytes on both i386 and amd64.

So what's the stack aligned to on the various threads?

> 0xfd9f8090::stackregs ! tail -3
fd9fdcd0 lookup+0x123(fd9fdd48, b8, 0, 1)
fd9fdd20 switcher+0xfd(deadbeed, fd9fdd48, b8, 0, 0, 806e4e4)
00000000 libc.so.1`__door_return+0x4b()
> 6::findstack -v ! tail -3
  fd1fdcd8 lookup+0x180(fd1fdd50, b0, 0, 1)
  fd1fdd28 switcher+0xfd(deadbeed, fd1fdd50, b0, 0, 0, 806e4e4)
  00000000 libc.so.1`__door_return+0x4b()

%ebp (which is %esp on function entry + 8 (4-byte return address + 4-byte saved frame pointer) ends in 0x8 for all other threads in the system - including one other thread with the same stack head as ours. However, the crashing thread has an %ebp aligned to 16 - which means the stack is aligned to 8. The unaligned stack also has an arg_size 8 bytes larger than the aligned one; therefore, the door code isn't properly aligning the stack to 16.

This issue has previously been fixed in #6507 "i386 makecontext(3c) needs to 16-byte align the stack", which fixed it for makecontext, and "6881217 32bit stack frames should be aligned on 16-byte boundaries (for sse2 code)", which fixed it for thread create functions, but neither of these fixed this for door calls.

The door code does not begin execution in a new thread in the same manner that a process or thread does; instead, it determines the layout of the stack (on return from the door_return syscall) here: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/doorfs/door_sys.c#1121

/*
 * Writes the stack layout for door_return() into the door_server_t of the
 * server thread.
 */
static int
door_layout(kthread_t *tp, size_t data_size, uint_t ndesc, int info_needed)
{
...
        size_t align = STACK_ALIGN;
        size_t results_sz = sizeof (struct door_results);
        model_t datamodel = lwp_getdatamodel(ttolwp(tp));        ASSERT(!st->d_layout_done);#ifndef _STACK_GROWS_DOWNWARD
#error stack does not grow downward, door_layout() must change
#endif#ifdef _SYSCALL32_IMPL
        if (datamodel != DATAMODEL_NATIVE) {
                align = STACK_ALIGN32;
                results_sz = sizeof (struct door_results32);
        }
#endif
...

        resultsp = P2ALIGN(infop - results_sz, align);

Once the door_results struct is written to the stack, the first four values in door_results double as the first four arguments to the door function (as %sp points to the beginning of the structure on return to userland). Since this is a 32-bit program, the door result gets 4-byte aligned (http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/intel/ia32/sys/stack.h#64); it should instead align it to 16 bytes for i386 targets.

Attached is a test program to reproduce the issue. Compile with "/opt/gcc-4.4.4/bin/gcc -O2 -msse2 test.c -o test-unaligned". With GCC-4.4.4, the program crashes; with GCC7, the function with an __m128i stack variable gets its stack explicitly aligned to 16 bytes, but one can still verify that the stack is not aligned to 16 bytes prior to this.


Files

test.c (1.07 KB) test.c Reproduction program Matt Barden, 2021-03-12 05:31 PM
Actions #1

Updated by Gordon Ross 6 months ago

One thing I noticed while looking at this was that STACK_ALIGN
http://src.illumos.org/source/s?defs=STACK_ALIGN&project=illumos-gate
seems to be redefined most places it's used.
Should we just change it to 16 and stop redefining?

Actions #2

Updated by Matt Barden 6 months ago

  • File test.c added
  • File deleted (test.c)
Actions #3

Updated by Matt Barden 6 months ago

  • Description updated (diff)
Actions #4

Updated by Electric Monk 6 months ago

  • Gerrit CR set to 1229
Actions #6

Updated by Matt Barden 5 months ago

Actions #7

Updated by Dan McDonald 5 months ago

Additional testing:

I peformed "ikeadm dump" operations on various things (algorithms, rules, certificate-cache) which use doors between ikeadm(1M) and the closed-binary in.iked(1M). No noticeable behavior differences between pre-this-fix, and after-this-fix platform images I used on SmartOS.

Actions #8

Updated by Electric Monk 5 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 915894ef19890baaed00080f85f6b69e225cda98

commit  915894ef19890baaed00080f85f6b69e225cda98
Author: Matt Barden <mbarden@tintri.com>
Date:   2021-03-12T19:31:59.000Z

    13508 door_layout() should align the stack to 16 bytes for i386 processes
    Reviewed by: Robert Mustacchi <rm+illumos@fingolfin.org>
    Reviewed by: Andy Fiddaman <andy@omniosce.org>
    Reviewed by: Gordon Ross <gordon.ross@tintri.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Actions

Also available in: Atom PDF