Project

General

Profile

Actions

Bug #14367

closed

bhyve gpt mishandles small mappings

Added by Patrick Mooney 5 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Normal
Category:
bhyve
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

While andyf was testing VARS.fd support from upstream FreeBSD, he hit this panic:

panic[cpu1]/thread=fffffe59dceda440:
assertion failed: vmsm != NULL, file: ../../i86pc/io/vmm/vmm_vm.c, line: 749

fffffe007b8222a0 vpanic()
fffffe007b8222f0 0xfffffffffbdcd955()
fffffe007b8223b0 vmspace_lookup_map+0x227(fffffe7c5ce506c0, ffe00000, 1, fffffe007b8223c0, fffffe007b8223c8)
fffffe007b822420 vmc_hold+0x8f(fffffe59d6463f68, ffe00000, 1)
fffffe007b8224e0 vm_copy_setup+0x165(fffffea88c400000, 0, fffffe59e2358440, ffe00000, 8, 1, fffffe007b822580, fffffe5900000002, fffffe007b822534)
fffffe007b822620 vie_emulate_movs+0x17c(fffffe59e2358400, fffffea88c400000, 0, ffe00000)
fffffe007b822650 vie_emulate_mmio+0x105(fffffe59e2358400, fffffea88c400000, 0)
fffffe007b8226c0 vm_handle_mmio_emul+0x145(fffffea88c400000, 0)
fffffe007b822780 vm_run+0x3ea(fffffea88c400000, 0, fffffe007b8228b8)
fffffe007b822c30 vmmdev_do_ioctl+0xdd3(fffffe59ee861880, 767001, fffffc7fef00ae40, 202003, fffffe7b9247ca20, fffffe007b822e08)
fffffe007b822cb0 vmm_ioctl+0xec(11c00000008, 767001, fffffc7fef00ae40, 202003, fffffe7b9247ca20, fffffe007b822e08)
fffffe007b822cf0 cdev_ioctl+0x2b(11c00000008, 767001, fffffc7fef00ae40, 202003, fffffe7b9247ca20, fffffe007b822e08)
fffffe007b822d40 spec_ioctl+0x45(fffffe5a1b6df200, 767001, fffffc7fef00ae40, 202003, fffffe7b9247ca20, fffffe007b822e08, 0)
fffffe007b822dd0 fop_ioctl+0x5b(fffffe5a1b6df200, 767001, fffffc7fef00ae40, 202003, fffffe7b9247ca20, fffffe007b822e08, 0)
fffffe007b822ef0 ioctl+0x153(3, 767001, fffffc7fef00ae40)
fffffe007b822f00 sys_syscall+0x283()

That address, 0xffe00000, corresponds to the top of the 2MB region occupied by the bootrom. It is specifically part of the writable region exposed by the persistent VARS.fd handling. We can confirm that by looking at the mapping list in the vmspace_t:

{
    vmsm_node = {
        list_next = 0xfffffe622cd4bb48
        list_prev = 0xfffffe7c5ce50708
    }
    vmsm_object = 0xfffffe5a2c8ecc40
    vmsm_addr = 0
    vmsm_len = 0x8000000
    vmsm_offset = 0
    vmsm_prot = 0x7
}
{
    vmsm_node = {
        list_next = 0xfffffe7c5ce50708
        list_prev = 0xfffffe622cd4e828
    }
    vmsm_object = 0xfffffe627f567d60
    vmsm_addr = 0xffe20000
    vmsm_len = 0x1e0000
    vmsm_offset = 0xe20000
    vmsm_prot = 0x5
}

Here, 0xffe00000 - 0xffe20000 are not covered by any valid mapping, meaning accesses to those physical addresses will result in an exit to userspace for MMIO emulation. Why then did we panic on an assert? Let's look at the relevant portion of vmm_lookup_map:

...

        vmm_gpt_walk(gpt, gpa, entries, MAX_GPT_LEVEL);
        leaf = entries[LEVEL1];
        if (leaf == NULL) {
                /*
                 * Since we populated the intermediate tables for any regions
                 * mapped in the GPT, an empty leaf entry indicates there is no
                 * mapping, populated or not, at this GPT.
                 */
                return (FC_NOMAP);
        }

        if (vmm_gpt_is_mapped(gpt, leaf, &pfn, &prot)) {
                if ((req_prot & prot) != req_prot) {
                        return (FC_PROT);
                }
        } else {
                vmspace_mapping_t *vmsm;
                vm_object_t *vmo;

                /*
                 * Because of the prior leaf check, we should be confident that
                 * _some_ mapping covers this GPA
                 */
                vmsm = vm_mapping_find(vms, gpa, PAGESIZE);
                VERIFY(vmsm != NULL);
...

Here, vmm_lookup_map is operating under the assumption that if a leaf pagetable (one with entries describing 4k page mappings) is present, that the GPA in question should be covered by a mapping. This is due to the GPT logic enforcing page table pre-creation (leaf, and all higher levels) for the gpa range covering each new mapping. This breaks down if a mapping covers not aligned (in both size and position) to 2MB, where it covers only part of a leaf table.

Several things must change for this to be fixed. First, the vmsm != NULL assert must be removed, and be replaced with a more appropriate FC_NOMAP bailout instead. That will solve the immediate issue, but we must still ensure that mappings which are not 2MB aligned cannot cause other issues. In particular if 2+ mappings were to "share" a leaf page. An unmap of one, must not result in clean-up of that leaf page, provided the other still exists. Expectations and invariants around page table pre-population and clean-up must be checked for correctness.


Files

bhyve (651 KB) bhyve Test binary Patrick Mooney, 2022-01-06 04:17 AM
BHYVE_VARS.fd (128 KB) BHYVE_VARS.fd bootrom vars Patrick Mooney, 2022-01-06 04:17 AM
README.txt (240 Bytes) README.txt repro instructions Patrick Mooney, 2022-01-06 04:17 AM

Related issues

Related to illumos gate - Bug #13896: bhyve VM interfaces should be better fitClosedPatrick Mooney

Actions
Related to illumos gate - Bug #14398: consolidate bhyve testsClosed

Actions
Blocks illumos gate - Feature #14372: bhyve upstream sync 2022 JanuaryClosedAndy Fiddaman

Actions
Actions #1

Updated by Patrick Mooney 5 months ago

  • Related to Bug #13896: bhyve VM interfaces should be better fit added
Actions #2

Updated by Electric Monk 5 months ago

  • Gerrit CR set to 1919
Actions #3

Updated by Patrick Mooney 5 months ago

Actions #4

Updated by Andy Fiddaman 5 months ago

Actions #5

Updated by Patrick Mooney 5 months ago

With the fix in place, the bhyve tests suites pass:

root@carbide:~# /opt/bhyve-tests/bin/bhyvetest
Test: /opt/bhyve-tests/tests/memmap (run as root)                 [00:00] [PASS]

Results Summary
PASS       1

Running Time:   00:00:00
Percent passed: 100.0%
Log directory:  /var/tmp/test_results/20220111T195008
root@carbide:~# /opt/bhyvetest/bin/bhyvetest -a
Starting tests...
output directory: /root/bhyvetest.100524
Executing test /opt/bhyvetest/tst/mevent/lists.delete.exe ... passed
Executing test /opt/bhyvetest/tst/mevent/read.disable.exe ... passed
Executing test /opt/bhyvetest/tst/mevent/read.pause.exe ... passed
Executing test /opt/bhyvetest/tst/mevent/read.requeue.exe ... passed

-------------
Results
-------------

Tests passed: 4
Tests failed: 0
Tests ran:    4

Congrats, some tiny parts of bhyve aren't completely broken, the tests pass.

The specific memmap test fails (as expected) when run on vanilla gate bits without the fix.

I ran the reproducer from Andy and it successfully made it to the UEFI shell, rather than panicking the machine.

In smoke-testing the standard battery of guests, all booted and ran as expected.

Actions #6

Updated by Patrick Mooney 5 months ago

  • Related to Bug #14398: consolidate bhyve tests added
Actions #7

Updated by Electric Monk 5 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 7daa540591d7332190b10fc3a818bfd5e7d536fe

commit  7daa540591d7332190b10fc3a818bfd5e7d536fe
Author: Patrick Mooney <pmooney@pfmooney.com>
Date:   2022-01-11T20:49:23.000Z

    14367 bhyve gpt mishandles small mappings
    Reviewed by: Dan Cross <cross@oxidecomputer.com>
    Reviewed by: Andy Fiddaman <andy@omnios.org>
    Approved by: Dan McDonald <danmcd@joyent.com>

Actions

Also available in: Atom PDF