Project

General

Profile

Actions

Bug #13361

closed

bhyve should mask RDT cpuid info

Added by Patrick Mooney 10 months ago. Updated 10 months ago.

Status:
Closed
Priority:
Normal
Category:
bhyve
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

While bhyve currently masks the RDT_A bit ( Resource Director Technology) from leaf 0x7 EBX, it has no such filter for the detailed information in leaf 0x10. An OS which ignores the lack of RDT_A could still find seemingly actionable capabilities advertised in leaf 0x10 which it then tries to act on. I believe this is the cause of some dmesg noise seen on CentOS 8 running on newer (Rome) AMD CPUs under bhyve:

[    1.122722] unchecked MSR access error: WRMSR to 0xc8f (tried to write 0x0000000000000000) at rIP: 0xffffffffa9c64f74 (native_write_msr+0x4/0x20)
[    1.125304] Call Trace:
[    1.125810]  clear_closid_rmid.isra.4+0x32/0x40
[    1.126740]  resctrl_online_cpu+0xcd/0x4c0
[    1.127574]  ? __switch_to_asm+0x41/0x70
[    1.128387]  ? __switch_to_asm+0x35/0x70
[    1.129207]  ? __switch_to_asm+0x41/0x70
[    1.130024]  ? cat_wrmsr+0x60/0x60
[    1.130710]  ? sort_range+0x20/0x20
[    1.131427]  cpuhp_invoke_callback+0x8d/0x500
[    1.132318]  ? sort_range+0x20/0x20
[    1.133047]  cpuhp_thread_fun+0xb0/0x110
[    1.133841]  smpboot_thread_fn+0xc5/0x160
[    1.134669]  kthread+0x112/0x130
[    1.135340]  ? kthread_flush_work_fn+0x10/0x10
[    1.136256]  ret_from_fork+0x35/0x40
[    1.137612] resctrl: L3 monitoring detected

MSR 0xc8f is IA32_PQR_ASSOC, used for some of those resource partitioning tasks.

Since we are not going to give guests access to influence host cache allocation, the related capability bits in cpuid should be hidden.


Related issues

Related to illumos gate - Bug #13369: bhyve should mask PQoS bits from CPUIDNew

Actions
Actions #1

Updated by Patrick Mooney 10 months ago

Linux also lists the RDT-related flags in cpuinfo on that system (prior to a fix): cat_l3 cdp_l3

Actions #2

Updated by Patrick Mooney 10 months ago

  • Subject changed from bhyve should mask RDT leaf to bhyve should mask RDT cpuid info
Actions #3

Updated by Patrick Mooney 10 months ago

An initial test of the fix shows the guest no longer attempting to perform the wrmsr(IA32_PQR_ASSOC) and dmesg is free of the aforementioned error.

Actions #4

Updated by Patrick Mooney 10 months ago

My most modern Intel lab machine (Ivy Bridge) is too old to possess the features in question. Its CPUID leaves stop at 0xd. All the same, I booted a few guests there to make sure there wasn't something horribly wrong.

On my Rome machine, where the RDT pieces are present, the fix prevented Linux from attempting unhandled access to the IA32_PQR_ASSOC MSR. Additionally, the RDT-related capabilities (rdt_a, cat_l3, cdp_l3) were all now absent from /proc/cpuinfo in a linux guest. The other standard smoke-test guests ran without issues. (It's possible that only Linux was reaching for those RDT features as they were erroneously exposed)

Actions #5

Updated by Patrick Mooney 10 months ago

  • Related to Bug #13369: bhyve should mask PQoS bits from CPUID added
Actions #6

Updated by Electric Monk 10 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 1a5f1879be09d3de900b2510692dd12003784d84

commit  1a5f1879be09d3de900b2510692dd12003784d84
Author: Patrick Mooney <pmooney@pfmooney.com>
Date:   2020-12-16T20:02:23.000Z

    13361 bhyve should mask RDT cpuid info
    Reviewed by: Andy Fiddaman <andy@omnios.org>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Robert Mustacchi <rm@fingolfin.org>

Actions

Also available in: Atom PDF