VMX event injection can race in bhyve
As part of porting bhyve to illumos, all of its mutex usage was switched to using adaptive mutexes rather than spinlocks. The latter were used in FreeBSD due to OS constraints around sleeping inside
critical_exit() regions. No such limitation was present in illumos, primarily due to the fact that we had to use ctxops as extra protection inside those regions (implemented with
kpreempt_enable instead), since we could not guarantee an absence of operations causing lock-related sleep/swtch activity. As a result of the locks being converted to adaptive, VMX exception/interrupt injection was moved from a region where interrupts were disabled, to one where they remained enabled. This is because certain exception/interrupt code paths required lock activity to completed. The FreeBSD code was depending on the host vCPU doing this injection to have interrupts disabled, since external notifications (IPIs) for new events would be queued, causing an immediate VM exit once it attempted entry into guest context. Any injection done outside of this interrupts-disabled guard was at risk of taking the notification, processing the effectively-noop IPI, and then continuing into VM entry without knowing an event was pending. The delay on this event would then depend on how long the CPU spent in guest context before reevaluating the events available for injections.
This has not been a reported (in the wild) problem so far, probably due to the relative prevalence of APICv support among the machines running illumos-bhyve. That's because APICv event injection is performed without locks, and is therefore done after host CPU interrupts are disabled, safe from the potential race. Only events such as NMIs or ExtINT (source from the AT-PIC or IOAPIC) would be caught by this race, given the current codebase. Certain restructuring for #13007 could have exacerbated it by changing how/when APIC interrupts were injected.
The fix for this, while perhaps a bit involved, should be relatively straightforward. The existing NMI/ExtINT event injection can remain outside the protection of having interrupts disabled on the host CPU. Once interrupts are disabled, however, we must re-check for NMI/ExtINT events which may have fallen into that race window. If found, we can take another trip through the event injection logic.
Updated by Patrick Mooney about 2 years ago
As noted in the description, instances of this race causing a problem have not been observed in the wild or during testing, for that matter. It was discovered while overhauling the event injection system for #13007. Testing to verify that injection still function as before is covered under that ticket.
Updated by Electric Monk about 2 years ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
commit c74a40a584c9d875009f725565896fd7e8ee38d6 Author: Patrick Mooney <firstname.lastname@example.org> Date: 2020-11-24T20:30:25.000Z 13007 bhyve vlapic should set TMR on intr accept 13106 clarify PPR transitions in bhyve vLAPIC 13132 VMX event injection can race in bhyve 13259 SVM event injection can race in bhyve Reviewed by: Robert Mustacchi <email@example.com> Reviewed by: Toomas Soome <firstname.lastname@example.org> Approved by: Dan McDonald <email@example.com>