Bug #13286
closedbhyve ins/outs emulation misuses %rax
100%
Description
I was debugging a guest issue where a data transfer to an emulated device via repz insb
appeared to be performing one fewer transfers than expected. In this specific case, it was expected to be copying an ID string "QEMU", but the copy was completing with "EMU\0". From the perspective of userspace, where this device was being emulated, the 4 inb
transfers were occurring with the expected results. It was somewhere inside the kernel logic that the handling was improper. For ins
/outs
operations, the bhyve instruction emulation logic uses the non-str logic to perform the actual transfer:
if (!in) { vm_copyin(vm, vcpuid, copyinfo, &vie->inout.eax, bytes); } err = vie_emulate_inout_port(vie, vm, vcpuid); if (err == 0 && in) { vm_copyout(vm, vcpuid, &vie->inout.eax, copyinfo, bytes); }
Where
vie_emulate_inout_port
will attempt to perform the access via in-kernel emulation, falling back to a transfer request (and fulfillment) via userspace if required. It's notable that it doesn't return or set the value of the transfer (eax
) but expects it to be properly populated on entry/return. I was drawn to look closer at it after dtracing indicated that inout.eax
was not being updated as expected after the userspace fulfillment:if (!in) { val = vie->inout.eax & mask; } if (vie->inout_req_state != VR_DONE) { err = vm_ioport_access(vm, vcpuid, in, vie->inout.port, vie->inout.bytes, &val); } else { /* * This port access was handled in userspace and the result was * injected in to be handled now. */ val = vie->inout_req_val; vie->inout_req_state = VR_NONE; err = 0; } if (err == ESRCH) { vie->status |= VIES_PENDING_INOUT; vie->inout_req_state = VR_PENDING; return (err); } else if (err != 0) { return (err); } if (in) { val &= mask; val |= (vie->inout.eax & ~mask); err = vm_set_register(vm, vcpuid, VM_REG_GUEST_RAX, val); KASSERT(err == 0, ("emulate_ioport: error %d setting guest " "rax register", err)); }
Looking specifically at the code-path for a successful in
operation, the problem now becomes more clear. Because this is the non-str logic, it's simply setting rax
directly with the result, leaving inout.eax
alone. While that's correct for a simple non-str in
/out
, it's a problem for ins
:
1. The transferred value is never placed in inout.eax
, meaning it will not be copied out during the vie_emulate_inout_str
. Instead, it's left in the guest rax
, only to be placed in inout.eax
by happenstance during the next round of emulation.
2. The ins
instructions are not expected to manipulate %rax
at all.
If the logic in vie_emulate_inout_port
is to be reused in this fashion, it should be made to act on an explicit intermediate value, rather than storing to the guest %rax
. This would correct the transfer result for ins
, and avoid manipulating %rax
unless the instruction calls for it.