Bug #13254
closedSEGV due to misaligned stack in Unwind_RaiseException
100%
Description
An OmniOS user reported that since upgrading from r151030 to r151034, the IBM TSM client now core dumps.
stack pointer for thread 1: fffffc7fffdfc0f8 [ fffffc7fffdfc0f8 libc.so.1`_Unw_Rollback_Registers+0xa0() ] fffffc7fffdfc178 libc.so.1`complete_context+0xaf() fffffc7fffdfc1b8 libc.so.1`finish_capture+0x7e() fffffc7fffdfc368 libc.so.1`_Unwind_RaiseException_Body+0x59() fffffc7fffdfc4e8 libc.so.1`_SUNW_Unwind_RaiseException+0x4d() fffffc7fffdfc500 libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+0x14() fffffc7fffdfc820 libgsk8cms_64.so`__1cKGSKUtilityNparseOIDTable6FrnUGSKPasswordEncryptor_rknJGSKString__v_+0x2b5() fffffc7fffdfcca0 libgsk8cms_64.so`__1cKGSKUtilityNparseOIDTable6FrknJGSKString__knUGSKPasswordEncryptor__+0x6f() fffffc7fffdfdb40 0xfffffc7fddb25f1b() fffffc7fffdfdc20 libgsk8km_64.so`GSKKM_OpenKeyDbMode+0x83() fffffc7fffdfdc90 libgsk8km_64.so`GSKKM_OpenKeyDb+0x72() fffffc7fffdfe300 __1cIGSKKMlibOgskkmOpenKeyDb6MpkcrI_i_+0x1cc() fffffc7fffdfe3f0 __1cNGSKKeymanagerJopenKeyDb6M_i_+0x135() fffffc7fffdfe9c0 __1cGSess_oIsessInit6M_i_+0x532() fffffc7fffdfed40 __1cFLogon6FpnGSess_o__i_+0x2eb() fffffc7fffdfeda0 __1cMCheckSession6FpnGSess_o_nQsessLoadPolicy_t__i_+0x14a() fffffc7fffdffca0 __1cHdscInit6FippcnJcliType_t__i_+0x754() fffffc7fffdffce0 __1cHdscmain6Fippc_i_+0xaf() fffffc7fffdffd10 main+0x23() fffffc7fffdffd20 0x4f341b() stack pointer for thread 2: fffffc7feeedee40 [ fffffc7feeedee40 libc.so.1`__sigtimedwait+0xa() ] fffffc7feeedee60 libc.so.1`sigwait+0x12() fffffc7feeedeea0 libc.so.1`__posix_sigwait+0x34() fffffc7feeedef00 __1cOpsWaitShutdown6Fpv_i_+0x28() fffffc7feeedef20 __1cPdscSignalThread6Fpv_i_+0x1e() fffffc7feeedef70 __1cMstartThread26FpnLThreadstart__pv_+0xf2() fffffc7feeedefb0 __1cLstartThread6FpnLThreadstart__pv_+0x51() fffffc7feeedefe0 libc.so.1`_thrp_setup+0x6c() fffffc7feeedeff0 libc.so.1`_lwp_start()
Investigation shows the crash is occurring in libc.so.1`_Unw_Rollback_Registers
, on a movdqa
instruction.
libc.so.1`_Unw_Rollback_Registers+0x77: movl %eax,0x18(%rsp) libc.so.1`_Unw_Rollback_Registers+0x7b: movl 0x18(%r14),%eax libc.so.1`_Unw_Rollback_Registers+0x7f: movl %eax,0x10(%rsp) libc.so.1`_Unw_Rollback_Registers+0x83: movl 0x1c(%r14),%eax libc.so.1`_Unw_Rollback_Registers+0x87: movq $0x0,(%rsp) libc.so.1`_Unw_Rollback_Registers+0x8f: movl %eax,0x8(%rsp) libc.so.1`_Unw_Rollback_Registers+0x93: movq (%r14),%rdi libc.so.1`_Unw_Rollback_Registers+0x96: call +0x645 <libc.so.1`interpret_ops> libc.so.1`_Unw_Rollback_Registers+0x9b: xorl %edx,%edx libc.so.1`_Unw_Rollback_Registers+0x9d: nopl (%rax) libc.so.1`_Unw_Rollback_Registers+0xa0: movdqa (%r12,%rdx),%xmm0 <------------------------------- libc.so.1`_Unw_Rollback_Registers+0xa6: movaps %xmm0,(%rbx,%rdx) libc.so.1`_Unw_Rollback_Registers+0xaa: addq $0x10,%rdx libc.so.1`_Unw_Rollback_Registers+0xae: cmpq $0x120,%rdx libc.so.1`_Unw_Rollback_Registers+0xb5: jne -0x17 <libc.so.1`_Unw_Rollback_Registers+0xa0> libc.so.1`_Unw_Rollback_Registers+0xb7: movl 0x20(%r14),%ecx libc.so.1`_Unw_Rollback_Registers+0xbb: movq 0x38(%r14),%rdx libc.so.1`_Unw_Rollback_Registers+0xbf: movq %rbx,%r9 libc.so.1`_Unw_Rollback_Registers+0xc2: movq 0x30(%r14),%rsi libc.so.1`_Unw_Rollback_Registers+0xc6: movq 0x28(%r14),%rdi libc.so.1`_Unw_Rollback_Registers+0xca: movq 0x108(%r13),%r8
The instruction is working on %r12:
%rax = 0xfffffc7fdd4b31a0 %r8 = 0x0000000000000001 %rbx = 0xfffffc7fffdfbe88 %r9 = 0x000000000000001b %rcx = 0xfffffc7fdf43b4ec %r10 = 0xfffffc7fdd0a6628 %rdx = 0x0000000000000000 %r11 = 0xfffffc7fdf43b45c %rsi = 0x0000000000000000 %r12 = 0xfffffc7fffdfbfa8 %rdi = 0xfffffc7fdd0a6628 %r13 = 0xfffffc7fffdfc388 %r14 = 0xfffffc7fffdfc118 %r15 = 0xfffffc7fdd81f620
and %r12 is a pointer which is not 16-byte aligned. Since the instruction requires 16-byte alignment, a SEGV occurs. Looking at the C source which corresponds to this, %r12 is a local stack variable.
From the stack trace above, the last function with an aligned stack is libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_
and disassembly of this function shows that it is not preserving stack alignment when calling into libc's SUNW_Unwind_RaiseException
libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_: pushq %rbp libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+1: movq %rsp,%rbp libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+4: call -0x529 <libCrun.so.1`__ex_throw_body> libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+9: movq %rax,%rdi libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+0xc: pushq %rdi libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+0xd: xorl %eax,%eax libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+0xf: call -0x1f7c <PLT=libc.so.1`_SUNW_Unwind_RaiseException> libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+0x14: movq %rax,%rsi libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+0x17: popq %rdi libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+0x18: xorl %eax,%eax libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+0x1a: call -0x42f <libCrun.so.1`__1cU_ex_debug_handshake16FpnH__CimplGxstack_nT_Unwind_Reason_Code__v_> libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+0x1f: xorl %eax,%eax libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+0x21: call -0x227e <PLT=libCrun.so.1`__1cH__CimplMex_terminate6F_v_> libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+0x26: movq %rbp,%rsp libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+0x29: popq %rbp libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+0x2a: ret
Updated by Andy Fiddaman over 1 year ago
It turns out that this worked on OmniOS r151030 because the primary compiler for that release is gcc4. gcc4 emits the struct copy instructions as:
_Unw_Rollback_Registers+0xb0: movq (%rbx,%rdx),%rcx _Unw_Rollback_Registers+0xb4: movq %rcx,(%r12,%rdx) _Unw_Rollback_Registers+0xb8: movq 0x8(%rbx,%rdx),%rcx _Unw_Rollback_Registers+0xbd: movq %rcx,0x8(%r12,%rdx)
Versus gcc7's:
_Unw_Rollback_Registers+0xa0: movdqa (%r12,%rdx),%xmm0 _Unw_Rollback_Registers+0xa6: movaps %xmm0,(%rbx,%rdx)
Therefore a misaligned stack did not cause a crash on the previous version.
Updated by Andy Fiddaman over 1 year ago
Testing this change shows that the new wrapper function is being called and that the stack is re-aligned before calling the backend. That and the failing closed source application now works.
root@lipkg:~# mdb `which dsmc` > > ::bp complete_context > ::run IBM Spectrum Protect Command Line Backup-Archive Client Interface Client Version 8, Release 1, Level 10.0 Client date/time: 10/27/20 22:03:26 (c) Copyright by IBM Corporation and other(s) 1990, 2020. All Rights Reserved. Node Name: LIPKG mdb: stop at complete_context mdb: target stopped at: libc.so.1`complete_context: pushq %rbp mdb: You've got symbols! Loading modules: [ ld.so.1 libc.so.1 ] > $C fffffc7fffdf15b0 libc.so.1`complete_context() fffffc7fffdf1760 libc.so.1`_Unwind_RaiseException_Body+0x59(fffffc7fcd130d80, fffffc7fffdf1780, 1) fffffc7fffdf18e0 libc.so.1`__Unwind_RaiseException_Backend+0x4d(fffffc7fcd130d80) fffffc7fffdf18f8 libc.so.1`_SUNW_Unwind_RaiseException+0xd() fffffc7fffdf1910 libCrun.so.1`__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_+0x14() fffffc7fffdfa5d0 libgsk8cms_64.so`__1cPGSKPemDataStoreSGSKPemDataStore_is6MrknUGSKPasswordEncryptor_rnDstdNbasic_istream4Ccn0CLchar_traits4Cc____pkc_v_+0x3e25() fffffc7fffdfc030 libgsk8cms_64.so`__1cPGSKPemDataStore2t5B6MrknUGSKPasswordEncryptor_pkc5bnNGSKCountedPtr4CknWGSKKRYAlgorithmFactory____v_+0x26a() fffffc7fffdfcbf0 libgsk8cms_64.so`__1cKGSKUtilityNopenDataStore6FrknUGSKPasswordEncryptor_pkcbnNGSKCountedPtr4CknWGSKKRYAlgorithmFactory____pnMGSKDataStore__+0x6c1() fffffc7fffdfcca0 libgsk8cms_64.so`__1cKGSKUtilityNopenDataStore6FrknUGSKPasswordEncryptor_pkcb_pnMGSKDataStore__+0xd9() fffffc7fffdfdb40 0xfffffc7fcd1e6333() fffffc7fffdfdc20 libgsk8km_64.so`GSKKM_OpenKeyDbMode+0x83() fffffc7fffdfdc90 libgsk8km_64.so`GSKKM_OpenKeyDb+0x72() fffffc7fffdfe300 __1cIGSKKMlibOgskkmOpenKeyDb6MpkcrI_i_+0x1cc() fffffc7fffdfe3f0 __1cNGSKKeymanagerJopenKeyDb6M_i_+0x135() fffffc7fffdfe9c0 __1cGSess_oIsessInit6M_i_+0x532() fffffc7fffdfed40 __1cFLogon6FpnGSess_o__i_+0x2eb() fffffc7fffdfeda0 __1cMCheckSession6FpnGSess_o_nQsessLoadPolicy_t__i_+0x14a() fffffc7fffdffca0 __1cHdscInit6FippcnJcliType_t__i_+0x754() fffffc7fffdffce0 __1cHdscmain6Fippc_i_+0xaf() fffffc7fffdffd10 main+0x23() fffffc7fffdffd20 0x4f341b()
> _SUNW_Unwind_RaiseException::dis libc.so.1`_SUNW_Unwind_RaiseException: pushq %rbp libc.so.1`_SUNW_Unwind_RaiseException+1:movq %rsp,%rbp libc.so.1`_SUNW_Unwind_RaiseException+4:andq $0xfffffffffffffff0,%rsp libc.so.1`_SUNW_Unwind_RaiseException+8:call -0x539 <libc.so.1`__Unwind_RaiseException_Backend> libc.so.1`_SUNW_Unwind_RaiseException+0xd: leave libc.so.1`_SUNW_Unwind_RaiseException+0xe: ret
Updated by Electric Monk over 1 year ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
git commit 1ba82a13d050678604bdd3fae596131469ca10e6
commit 1ba82a13d050678604bdd3fae596131469ca10e6 Author: Andy Fiddaman <omnios@citrus-it.co.uk> Date: 2020-10-29T11:52:00.000Z 13254 SEGV due to misaligned stack in Unwind_RaiseException Reviewed by: Robert Mustacchi <rm@fingolfin.org> Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Jason King <jason.king@joyent.com> Approved by: Dan McDonald <danmcd@joyent.com>