Project

General

Profile

Actions

Bug #16393

closed

bhyve should take more care with MSR_AMD_TSC_RATIO

Added by Patrick Mooney 3 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Category:
bhyve
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

A SmartOS user reported bhyve panicking the host when run on an older AMD machine. They noted that the issue began when updating to a PI with the #15467 change included. Dan gathered the stack trace from the dump:

> $C
fffffe0010a90760 svm_msr_guest_exit+0x74(fffffe0c049a9000, 1)
fffffe0010a90800 svm_vmrun+0x353(fffffe0c049a9000, 1, fff0)
fffffe0010a90880 vm_run+0x17c(fffffe0bd3d29000, 1, fffffe0010a909c0)
fffffe0010a90c40 vmmdev_do_ioctl+0xb03(fffffe0c02346380, 767001, fffffc7fee7ffe20, 202003, fffffe0c026b4198, fffffe0010a90e18)
fffffe0010a90cc0 vmm_ioctl+0x12f(13100000001, 767001, fffffc7fee7ffe20, 202003, fffffe0c026b4198, fffffe0010a90e18)
fffffe0010a90d00 cdev_ioctl+0x3f(13100000001, 767001, fffffc7fee7ffe20, 202003, fffffe0c026b4198, fffffe0010a90e18)
fffffe0010a90d50 spec_ioctl+0x55(fffffe0be3ab6200, 767001, fffffc7fee7ffe20, 202003, fffffe0c026b4198, fffffe0010a90e18, 0)
fffffe0010a90de0 fop_ioctl+0x40(fffffe0be3ab6200, 767001, fffffc7fee7ffe20, 202003, fffffe0c026b4198, fffffe0010a90e18, 0)
fffffe0010a90f00 ioctl+0x144(3, 767001, fffffc7fee7ffe20)
fffffe0010a90f10 sys_syscall+0x1a8()
> 

> svm_msr_guest_exit+0x74::dis
svm_msr_guest_exit+0x50:        shrq   $0x20,%rdx
svm_msr_guest_exit+0x54:        wrmsr  
svm_msr_guest_exit+0x56:        movq   0x18(%rsi),%rax
svm_msr_guest_exit+0x5a:        movl   $0xc0000084,%ecx <keystringtab+0x84>
svm_msr_guest_exit+0x5f:        movq   %rax,%rdx
svm_msr_guest_exit+0x62:        shrq   $0x20,%rdx
svm_msr_guest_exit+0x66:        wrmsr  
svm_msr_guest_exit+0x68:        xorl   %eax,%eax
svm_msr_guest_exit+0x6a:        movl   $0x1,%edx
svm_msr_guest_exit+0x6f:        movl   $0xc0000104,%ecx <kb_compose_map+0x64>
svm_msr_guest_exit+0x74:        wrmsr  
svm_msr_guest_exit+0x76:        leave  
svm_msr_guest_exit+0x77:        ret    

The faulting instruction is a wrmsr of MSR_AMD_TSC_RATIO. The TSC scaling logic in #15467 does check if the feature is present...:

static __inline bool
has_tsc_freq_ctl(void)
{
        return ((svm_feature & CPUID_AMD_EDX_TSC_RATE_MSR) != 0);
}

...

static freqratio_res_t
svm_freq_ratio(uint64_t guest_hz, uint64_t host_hz, uint64_t *mult)
{
        /*
         * Check whether scaling is needed at all before potentially erroring
         * out for other reasons.
         */
        if (guest_hz == host_hz) {
                return (FR_SCALING_NOT_NEEDED);
        }

        /*
         * Confirm that scaling is available.
         */
        if (!has_tsc_freq_ctl()) {
                return (FR_SCALING_NOT_SUPPORTED);
        }
...

With that said, a fault lies in how MSRs are restored during an exit from guest context:

void
svm_msr_guest_exit(struct svm_softc *sc, int vcpu)
{
        uint64_t *host_msrs = sc->host_msrs[vcpu];
...
        /* Reset frequency multiplier MSR */
        wrmsr(MSR_AMD_TSC_RATIO, AMD_TSCM_RESET_VAL);
...
}

In contrast to the conditional logic for loading the MSR on guest entry:

void
svm_msr_guest_enter(struct svm_softc *sc, int vcpu)
{
        uint64_t *host_msrs = sc->host_msrs[vcpu];
...
        /*
         * Set the frequency multiplier MSR to enable guest TSC scaling if
         * needed.
         */
        uint64_t mult = vm_get_freq_multiplier(sc->vm);
        if (mult != VM_TSCM_NOSCALE) {
                wrmsr(MSR_AMD_TSC_RATIO, mult);
        }
}

Clearly we need a similar conditional in svm_msr_guest_exit so that older CPUs which lack this functionality are not tripped up.


Related issues

Related to illumos gate - Bug #15467: bhyve needs TSC frequency controlClosedJordan Hendricks

Actions
Actions #1

Updated by Patrick Mooney 3 months ago

  • Related to Bug #15467: bhyve needs TSC frequency control added
Actions #2

Updated by Electric Monk 3 months ago

  • Gerrit CR set to 3369
Actions #3

Updated by Patrick Mooney 3 months ago

I ran the bhyve test suite using the proposed fix on both AMD (with TSC scaling support) and Intel (where such support is missing). In both cases, all of the tests passed:

Test: /opt/bhyve-tests/tests/mevent/vnode_zvol (run as root)      [00:02] [PASS]
Test: /opt/bhyve-tests/tests/inst_emul/cpuid (run as root)        [00:00] [PASS]
Test: /opt/bhyve-tests/tests/inst_emul/imul (run as root)         [00:00] [PASS]
Test: /opt/bhyve-tests/tests/inst_emul/rdmsr (run as root)        [00:00] [PASS]
Test: /opt/bhyve-tests/tests/inst_emul/wrmsr (run as root)        [00:00] [PASS]
Test: /opt/bhyve-tests/tests/inst_emul/triple_fault (run as root) [00:00] [PASS]
Test: /opt/bhyve-tests/tests/inst_emul/exit_paging (run as root)  [00:00] [PASS]
Test: /opt/bhyve-tests/tests/inst_emul/page_dirty (run as root)   [00:00] [PASS]
Test: /opt/bhyve-tests/tests/inst_emul/exit_consistent (run as root) [00:00] [PASS]
Test: /opt/bhyve-tests/tests/inst_emul/suspend_info (run as root) [00:00] [PASS]
Test: /opt/bhyve-tests/tests/inst_emul/vcpu_barrier (run as root) [00:00] [PASS]
Test: /opt/bhyve-tests/tests/kdev/vatpit_freq (run as root)       [00:00] [PASS]
Test: /opt/bhyve-tests/tests/kdev/vhpet_freq (run as root)        [00:00] [PASS]
Test: /opt/bhyve-tests/tests/kdev/vlapic_freq (run as root)       [00:00] [PASS]
Test: /opt/bhyve-tests/tests/kdev/vlapic_freq_periodic (run as root) [00:00] [PASS]
Test: /opt/bhyve-tests/tests/kdev/vlapic_mmio_access (run as root) [00:00] [PASS]
Test: /opt/bhyve-tests/tests/kdev/vlapic_msr_access (run as root) [00:00] [PASS]
Test: /opt/bhyve-tests/tests/kdev/vpmtmr_freq (run as root)       [00:00] [PASS]
Test: /opt/bhyve-tests/tests/kdev/vrtc_ops (run as root)          [00:04] [PASS]
Test: /opt/bhyve-tests/tests/kdev/wrmsr_tsc (run as root)         [00:00] [PASS]
Test: /opt/bhyve-tests/tests/kdev/rdmsr_tsc (run as root)         [00:00] [PASS]
Test: /opt/bhyve-tests/tests/mevent/lists_delete (run as root)    [00:00] [PASS]
Test: /opt/bhyve-tests/tests/mevent/read_disable (run as root)    [00:00] [PASS]
Test: /opt/bhyve-tests/tests/mevent/read_pause (run as root)      [00:00] [PASS]
Test: /opt/bhyve-tests/tests/mevent/read_requeue (run as root)    [00:00] [PASS]
Test: /opt/bhyve-tests/tests/mevent/vnode_file (run as root)      [00:09] [PASS]
Test: /opt/bhyve-tests/tests/viona/interface_version (run as root) [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/auto_destruct (run as root)      [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/cpuid_ioctl (run as root)        [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/datarw_constraints (run as root) [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/datarw_msrs (run as root)        [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/datarw_vcpu (run as root)        [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/default_capabs (run as root)     [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/drv_hold (run as root)           [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/fpu_getset (run as root)         [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/import_vlapic (run as root)      [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/interface_version (run as root)  [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/legacy_destruct (run as root)    [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/maxcpu (run as root)             [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/mem_devmem (run as root)         [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/mem_partial (run as root)        [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/mem_seg_map (run as root)        [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/npt_ops (run as root)            [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/pause_resume (run as root)       [00:00] [PASS]
Test: /opt/bhyve-tests/tests/vmm/self_destruct (run as root)      [00:00] [PASS]

Results Summary
PASS      45

Running Time:   00:00:19
Percent passed: 100.0%

Actions #4

Updated by Patrick Mooney 3 months ago

Using a PI built from the proposed fix (thanks danmcd), the community member confirmed that the issue was solved on their specific hardware.

Actions #5

Updated by Patrick Mooney 3 months ago

I also ran the Propolis test suite, which exercises the vmm_data paths via live migration (between instances on the machine under test). All of those tests remained passing.

Actions #6

Updated by Electric Monk 3 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 4bd36be41e0f25c6061bb4934a8c1048dbbd938e

commit  4bd36be41e0f25c6061bb4934a8c1048dbbd938e
Author: Patrick Mooney <pmooney@pfmooney.com>
Date:   2024-03-13T02:44:44.000Z

    16393 bhyve should take more care with MSR_AMD_TSC_RATIO
    Reviewed by: Dan McDonald <danmcd@mnx.io>
    Reviewed by: Jordan Paige Hendricks <jordan@oxide.computer>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Actions

Also available in: Atom PDF