Project

General

Profile

Actions

Bug #13286

closed

bhyve ins/outs emulation misuses %rax

Added by Patrick Mooney over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Category:
bhyve
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
bhyve
Gerrit CR:

Description

I was debugging a guest issue where a data transfer to an emulated device via repz insb appeared to be performing one fewer transfers than expected. In this specific case, it was expected to be copying an ID string "QEMU", but the copy was completing with "EMU\0". From the perspective of userspace, where this device was being emulated, the 4 inb transfers were occurring with the expected results. It was somewhere inside the kernel logic that the handling was improper. For ins/outs operations, the bhyve instruction emulation logic uses the non-str logic to perform the actual transfer:

        if (!in) {
                vm_copyin(vm, vcpuid, copyinfo, &vie->inout.eax, bytes);
        }

        err = vie_emulate_inout_port(vie, vm, vcpuid);

        if (err == 0 && in) {
                vm_copyout(vm, vcpuid, &vie->inout.eax, copyinfo, bytes);
        }

Where vie_emulate_inout_port will attempt to perform the access via in-kernel emulation, falling back to a transfer request (and fulfillment) via userspace if required. It's notable that it doesn't return or set the value of the transfer (eax) but expects it to be properly populated on entry/return. I was drawn to look closer at it after dtracing indicated that inout.eax was not being updated as expected after the userspace fulfillment:
        if (!in) {
                val = vie->inout.eax & mask;
        }

        if (vie->inout_req_state != VR_DONE) {
                err = vm_ioport_access(vm, vcpuid, in, vie->inout.port,
                    vie->inout.bytes, &val);
        } else {
                /*
                 * This port access was handled in userspace and the result was
                 * injected in to be handled now.
                 */
                val = vie->inout_req_val;
                vie->inout_req_state = VR_NONE;
                err = 0;
        }

        if (err == ESRCH) {
                vie->status |= VIES_PENDING_INOUT;
                vie->inout_req_state = VR_PENDING;
                return (err);
        } else if (err != 0) {
                return (err);
        }

        if (in) {
                val &= mask;
                val |= (vie->inout.eax & ~mask);
                err = vm_set_register(vm, vcpuid, VM_REG_GUEST_RAX, val);
                KASSERT(err == 0, ("emulate_ioport: error %d setting guest " 
                    "rax register", err));
        }

Looking specifically at the code-path for a successful in operation, the problem now becomes more clear. Because this is the non-str logic, it's simply setting rax directly with the result, leaving inout.eax alone. While that's correct for a simple non-str in/out, it's a problem for ins:
1. The transferred value is never placed in inout.eax, meaning it will not be copied out during the vie_emulate_inout_str. Instead, it's left in the guest rax, only to be placed in inout.eax by happenstance during the next round of emulation.
2. The ins instructions are not expected to manipulate %rax at all.

If the logic in vie_emulate_inout_port is to be reused in this fashion, it should be made to act on an explicit intermediate value, rather than storing to the guest %rax. This would correct the transfer result for ins, and avoid manipulating %rax unless the instruction calls for it.

Actions

Also available in: Atom PDF