Project

General

Profile

Actions

Bug #13645

closed

bhyve queues wrong SIPI

Added by Patrick Mooney 5 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Category:
bhyve
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

While testing propolis on a wider range of gear, I found that it refused to boot a guest with >1 vCPUs while running on an Intel CPUs. Looking at the info printed to the qemu debug port (from the OVMF ROM) and from what was visible in mdb (while tracing the guest), it appeared to be waiting for the AP to start up during boot. Another user observed the guest panic halt with a triple-fault on a seemingly random AP. This lead me to scrutinize the INIT/SIPI sequences. In the OVMF logs, we can see it select memory to use for AP startup code:

AP Loop Mode is 1
WakeupBufferStart = 9F000, WakeupBufferSize = 1000
CpuMpPei: 5-Level Paging = 0
APIC MODE is 1
MpInitLib: Find 2 processors in system.

Where 0x9f000 is the buffer with the startup code, resulting in a SIPI vector of 0x96 to cause the AP to jump there when they start. With dtrace we can see that behavior:

dtrace -n 'vcpu_vector_sipi:entry { trace(arg1); print(arg2) } vm_inject_init:entry { trace(arg1) } vm_inject_sipi:entry { trace(arg1); print(arg2) }'
dtrace: description 'vcpu_vector_sipi:entry ' matched 3 probes
CPU     ID                    FUNCTION:NAME
 19  68150             vm_inject_init:entry                 1
 19  68152             vm_inject_sipi:entry                 1int64_t 0x9f

  1  68467           vcpu_vector_sipi:entry                 1int64_t 0x9f

 19  68152             vm_inject_sipi:entry                 1int64_t 0x9f

The ROM take several laps though this INIT/SIPI logic for various parts of startup. It's on the final sequence that we see something strange:

 19  68150             vm_inject_init:entry                 1
  1  68467           vcpu_vector_sipi:entry                 1int64_t 0x9f

 19  68152             vm_inject_sipi:entry                 1int64_t 0x87

 19  68152             vm_inject_sipi:entry                 1int64_t 0x87

The INIT IPI is sent first, as expected. Surprisingly, a SIPI with vector 0x9f is immediately processed after that. While that matches the vector needed from before, the log tells a different story:

AP Loop Mode is 1
GetMicrocodePatchInfoFromHob: MicrocodeBase = 0x0, MicrocodeSize = 0x0
WakeupBufferStart = 87000, WakeupBufferSize = 239
CpuDxe: 5-Level Paging = 0

The SIPI requests we see after the vcpu_vector_sipi call match that address, but why did we vector immediately into what looks like the old SIPI address? I turns out that the OVMF ROM issues double-SIPIs to start APs on Intel. That second SIPI is queued on the vCPU and not cleared during a subsequent INIT:

int
vm_inject_init(struct vm *vm, int vcpuid)
{
        struct vcpu *vcpu;

        if (vcpuid < 0 || vcpuid >= vm->maxcpus)
                return (EINVAL);

        vcpu = &vm->vcpu[vcpuid];
        vcpu_lock(vcpu);
        vcpu->run_state |= VRS_PEND_INIT;
        vcpu_notify_event_locked(vcpu, VCPU_NOTIFY_EXIT);
        vcpu_unlock(vcpu);
        return (0);
}

int
vm_inject_sipi(struct vm *vm, int vcpuid, uint8_t vector)
{
        struct vcpu *vcpu;

        if (vcpuid < 0 || vcpuid >= vm->maxcpus)
                return (EINVAL);

        vcpu = &vm->vcpu[vcpuid];
        vcpu_lock(vcpu);
        vcpu->run_state |= VRS_PEND_SIPI;
        vcpu->sipi_vector = vector;
        /* SIPI is only actionable if the CPU is waiting in INIT state */
        if ((vcpu->run_state & (VRS_INIT | VRS_RUN)) == VRS_INIT) {
                vcpu_notify_event_locked(vcpu, VCPU_NOTIFY_EXIT);
        }
        vcpu_unlock(vcpu);
        return (0);
}

That SIPI-queuing behavior was initially designed this way to avoid missing a SIPI while an INIT was yet to be processed by the VMM, but it's clearly not working here. It seems reasonable to clear any pending SIPI when an INIT is queued, so it requires a subsequent SIPI (with the then-correct vector) to start the AP.


Related issues

Related to illumos gate - Bug #13275: bhyve needs richer INIT/SIPI supportClosedPatrick Mooney

Actions
Actions #1

Updated by Patrick Mooney 5 months ago

  • Related to Bug #13275: bhyve needs richer INIT/SIPI support added
Actions #2

Updated by Electric Monk 5 months ago

  • Gerrit CR set to 1351
Actions #3

Updated by Patrick Mooney 4 months ago

This was tested on both AMD (to check for regressions) and Intel hardware. On AMD, there was no observed behavior change with both bhyve-specific (using c-bhyve) and OVMF (using Propolis) ROMs. On Intel, the bhyve-specific ROM behaved the same, while the OVMF ROM was now able to boot successfully. The normal array of test guests were also able to boot under typical (c-bhyve with bhyve ROM) circumstances on both machines.

Actions #4

Updated by Electric Monk 4 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 867228adfb0b4bbab1ff33e31ec607f5671c9047

commit  867228adfb0b4bbab1ff33e31ec607f5671c9047
Author: Patrick Mooney <pmooney@pfmooney.com>
Date:   2021-04-02T17:21:31.000Z

    13645 bhyve queues wrong SIPI
    Reviewed by: Yuri Pankov <yuripv@yuripv.dev>
    Reviewed by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
    Approved by: Robert Mustacchi <rm@fingolfin.org>

Actions

Also available in: Atom PDF