Project

General

Profile

Actions

Bug #15328

open

FPU_CW_INIT mistreats reserved bit

Added by Patrick Mooney 21 days ago. Updated 14 days ago.

Status:
In Progress
Priority:
Normal
Category:
kernel
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

As part of testing for unrelated FPU bits, it was found that bit 6 of the FPU Control Word (fcw) was being cleared in some situations. This could be seen by walking the xsave cache:

> ::walk xsave_cache | ::print struct fxsave_state fx_fcw ! sort | uniq -c
 305 fx_fcw = 0
 126 fx_fcw = 0x133f
  65 fx_fcw = 0x137f

The entries with fx_fcw are strange here, since they are clearly not empty, but are also missing bit 6. The Intel SDM (and AMD's equivalent) both list that bit as Reserved. One might jump to the conclusion that it must therefore be unset, but the default value(s) for the fcw tell a different story:

When the x87 FPU is initialized with either an FINIT/FNINIT or FSAVE/FNSAVE instruction, the x87 FPU control

word is set to 037FH...

The table for the power-up state (and post-reset) state for that register is 0x0040, with the bit also set.

If the bit is Reserved as set, then we should be preserving that value when manipulating the FPU or its save states. Some cursory investigation as to why the bit may have been cleared leads us to an old initialization constant:

uts/intel/sys/fp.h:

/*
 * Initial value of FPU control word as per 4th ed. ABI document
 * - affine infinity
 * - round to nearest or even
 * - 64-bit double precision
 * - all exceptions masked
 */
#define FPU_CW_INIT     0x133f

This constant (and one site where that same raw value is used) forms the basis for how we've inappropriately been clearing that bit in FPU save states.
While it is awfully nice that neither Intel nor AMD CPUs have thrown an error in the face of our fumbling of this reserved bit, we should probably fix it.

Actions #1

Updated by Patrick Mooney 21 days ago

As part of the research for this, especially in the confusion around bit 6 (and how its 1-value is not documented in a particularly verbose manner), it was noted that the bochs i386 emulator explicitly keeps that bit set when the fcw is manipulated. Like Intel and AMD, it just asserts the bit if missing, but makes no fuss about it.

Actions #2

Updated by Robert Mustacchi 21 days ago

To give some additional context here, I was working on #15254 and test cases. One of them basically dumped had a program set the fpu registers and get them when we hit a breakpoint we'd dump them via mdb. One notable thing about the test was that it had both a 32-bit and 64-bit variant and then we'd dump the xregs twice, once with ::tmodel lwp and ::tmodel thread set. At first, this provided consistent results. However, eventually I began to see the following maybe 5% of the time:

rm@iliad /tmp $ diff -u xsave/data/mdb_xregs.ymm.64 mdb_xregs.102372/fpregs.ymm.64.lwp
--- xsave/data/mdb_xregs.ymm.64 Tue Jan 17 17:32:53 2023
+++ mdb_xregs.102372/fpregs.ymm.64.lwp  Tue Jan 17 19:57:51 2023
@@ -53,7 +53,7 @@
 %xmm15 [127:0]   0x119004d1 119004d0 119004cf 119004ce

 387 and FP Control State
-cw     0x137f (IM|DM|ZM|OM|UM|PM|SIG64|RTN|A)
+cw     0x133f (IM|DM|ZM|OM|UM|PM|SIG64|RTN|A)
 sw     0x0000 (TOP=0t0) (0)
 xcp sw 0x0000 (0)

When I saw this it was disconcerting as I began to ask myself why was it changing and what was going on. I suspected the xregs implementation here. However, it was weird. It only seemed to happen for 64-bit processes and not 32-bit processes in the test I ran (which only had a single victim I wrote). From here I went through and started questioning a lot of different things and had a lot of different DTrace pieces that were all a lot more than one liners. The question I was trying to answer at first was when does this change. While I don't have the output from it, here are some of the different pieces:

pfexec dtrace -n 'fbt::fpu_proc_xregs_xsave_fill:entry/execname == "mdb" && strstr(curpsinof->pr_args, "64") != NULL/{ trace(curpsinfo->pr_args); }'

pfexec dtrace -n 'fbt::fpu_proc_xregs_xsave_fill:entry/execname == "mdb" && strstr(curpsinfo->pr_psargs, "64") != NULL/{ print(args[0]->fpu_regs.kfpu_u.kfpu_fx->fx_fcw); }' -n 'fbt::fpsetcw:entry/execname == "xregs_set.64"/{ printf("0x%x\n", arg0); }' -n 'fbt::fpu_signal_copyout:entry/execname == "xregs_set.64"/{ self->t = args[0]; }' -n 'fbt::fpu_signal_copyout:return/self->t/{ print(args[0]->fpu_regs.kfpu_u.kfpu_fx->fx_fcw); self->t = NULL; }' -n 'fbt::fpu_signal_copyin_xmm:entry/execname == "xregs_set.64"/{ print(args[1]->uc_mcontext.fpregs->fp_reg_set.fpchip_state.cw); }'

A particularly desperate case was when i started enabling all the syscalls to track when this would first change. There were a few that seemed deceptively plausible:

$ pfexec dtrace -n 'fbt::fpsetcw:entry/execname == "xregs_set.64"/{ self->t = 1; }' -n 'fbt::fp_save:entry/self->t/{ fpu = args[0]; self->fpu = args[0];  print(self->fpu->fpu_regs.kfpu_u.kfpu_fx->fx_fcw); }' -n 'fbt::fp_save:return/self->fpu/{ print(self->fpu->fpu_regs.kfpu_u.kfpu_fx->fx_fcw); }' -n 'fbt::fpsetcw:return/self->t/{ print(self->fpu->fpu_regs.kfpu_u.kfpu_fx->fx_fcw); self->sys = 1; }' -n 'syscall:::/fpu != NULL && fpu->fpu_regs.kfpu_u.kfpu_fx->fx_fcw == 0x137f/{ stack(); ustack(); trace(execname); exit(0); }'
dtrace: description 'fbt::fpsetcw:entry' matched 1 probe
dtrace: description 'fbt::fp_save:entry' matched 1 probe
dtrace: description 'fbt::fp_save:return' matched 1 probe
dtrace: description 'fbt::fpsetcw:return' matched 1 probe
dtrace: description 'syscall:::' matched 468 probes
CPU     ID                    FUNCTION:NAME
  0   7761                    fp_save:entry uint16_t 0x137f

  0   7762                   fp_save:return uint16_t 0x137f

  0   9528                   fpsetcw:return uint16_t 0x133f

 56    335                     pread:return 
              unix`sys_syscall+0x283

              libc.so.1`__pread+0xa
              libproc.so.1`Pstopstatus+0x3c4
              libproc.so.1`Pwait+0x12
              mdb`pt_setrun+0x16a
              mdb`pt_continue+0xd
              mdb`tgt_continue+0x51e
              mdb`tgt_request_continue+0x5b
              mdb`mdb_tgt_continue+0x1b
              mdb`cmd_cont_common+0x93
              mdb`cmd_cont+0x19
              mdb`cmd_run+0x3b
              mdb`dcmd_invoke+0x60
              mdb`mdb_call_idcmd+0x116
              mdb`mdb_call+0x3a1
              mdb`yyparse+0x76d
              mdb`mdb_run+0x308
              mdb`main+0xc11
              mdb`_start_crt+0x87
              mdb`_start+0x18
  mdb                              

But ultimately a bunch of these weren't leading anywhere as when I started getting other things that saw this like sshd. Looking back on it now, I suspect some of it.

Eventually I started taking a different tact. While I had realized this was a reserved bit (and searched everywhere I could to figure out what it might have meant, but I could not find a historical meaning unlike other reserved bits), I critically misread the CPU default of 37fh as 37h. So the next place I began going was to step through a bunch of the program to try and see if I could pinpoint when it changed. At various points I thought I had managed to get DTrace to inject enough noise that it didn't occur, but that seems not to have been accurate.

One of the more interesting discoveries was that this was set at the start of the process -- not at main, but in the CRT startup. This is part of what prompted the mdb -k bit that Patrick included above and I had started looking at on all manners of systems (AMD physical systems, Intel based VMs wih AVX, etc.). I stepped through at different parts looking at a combination of ::fpregs, using mdb -k and a ::pgrep xregs | ::walk thread | ::print kthread_t t_lwp->lwp_pcb.pcb_fpu.fpu_regs.kfpu_u.kfpu_fx->fx_fcw. It was not adding up. So the CRT changing to what we expected of 133f in main made some sense because of what fpstart did. However, I found that this was quickly reverted the moment we stepped over and resolved something in the PLT. I stepped through to an old friend from #9595 and found myself back in the amd64 boot_elf.s code for elf_rtbndr(). It's worth recalling that after #9595, we actually do an xsave and xrstor there! And it was at that xrstor (and due to single stepping) that I began to see the kernel save sate reflect the 0x137f value.

Now, this started leading to a lot more questions than answers (again because I misread the hardware defaults). The FPU control word actually weirdly will get inherited by a new LWP even though the initial state has it set to 0x133f. I took a VM and set a breakpoint on fp_new_lwp setting a breakpoint to see when the first time the copied state would have 0x137f set. Because of the earlier observation I had mistakenly believed this was 64-bit specific, but had never actually characterized the processes that do and don't have this set. Lo and behold, this turned up that the second time we hit this, it was a problem. And what process did that mean had this bad value set, but init!!

Thankfully, Patrick was able to put together the bit that I had been missing all this time: that we were actually clearing this reserved bit that defaulted to 1! That breakthrough was the missing link that I needed as all of the other debugging while a possibly interesting journey was not going anywhere good.

Actions #3

Updated by Electric Monk 14 days ago

  • Gerrit CR set to 2607
Actions

Also available in: Atom PDF