Project

General

Profile

Actions

Feature #12608

closed

want memory arena for vmm applications

Added by Patrick Mooney about 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
bhyve
Gerrit CR:

Description

Upstreaming OS-6606 as a bhyve prereq:

For the time being, bhyve on illumos is directly allocating kernel memory to furnish as physical memory for hosted guests. Allocating out of the normal kmem arena would mean guest memory would be included by default in OS crash dumps. For initial testing, the zio arena was used since it isn't included in crash dumps. It would be nice to have an independent arena so ::memstat accounting was clear and future control over potential largepage allocation was easy.

First, a historical note: the current code (since Illumos bug #6914, which introduced segzio) figures on 1.5x physmem for kernelheap, and 1.5x physmem for segzio. Ignoring loose change, this means that we have space for around (COREHEAP_BASE-VALLOC_BASE==1Tb) / 3 physmem. The current code actually uses 256Gb for this figure. (KPM lives below VALLOC_BASE.)

Past this limit, s@tartup_memlist@ is supposed to move things down to make room. But it actually seems to mis-calculate the adjustment:

+               segkpm_base = -(P2ROUNDUP((4 * kpm_resv_amount),
+                   KERNEL_REDZONE_SIZE));

This worked when the physmem limit was 1Tb. But now, at e.g. 512Gb physmem, this comes out as higher than the original SEGKPM_BASE (0xffffff8000000000 > 0xfffffe0000000000).

We only get away with this because a) the code will ensure the new base is no higher than the original immediately after and b) at least on my machine, plat_dr_physmax is huge and the adjustment will work out OK.

Regardless, the funny maths here seems to have no purpose at all, so we'll just rewrite this to make the adjustment normally.

We'll introduce a new segkvmm arena into the kernel address space as described in i86pc/os/startup.c. Note that bhyve currently uses this arena, backed by segkmem_alloc to build a (4Kb) set of kernel mappings for all the VM memory. This is in addition to segvmm which maps the same physical pages into the bhyve userspace.

We'll reserve 4x physmem for this arena: we don't have any quantum caches, as we're expecting only relatively large VA allocations, so we want to over-provision the VA space to avoid fragmentation issues. It's also a convenient number as the default layout accounts for up to 256Gb physmem: 4x that is 1Tb.

To account for the new space, we will permanently move down SEGKPM_BASE by 1Tb (and hence KERNELBASE and USERLIMIT).

We also need to fix up the adjustment code in startup_memlist(). We will now have room for 2Tb/7 physmem - which we'll still call 256Gb - but when we adjust we need to want to make sure there's enough extra VA x8 (1.5x heap, 1.5x segzio, 1x kpm, 4x vmm).

This increase actually means we need to officially drop support for DR memory: on my test machine, lat_dr_physmax is 16Tb, and we actually don't have enough VA above the hole for this calculation!


Related issues

Related to illumos gate - Feature #12609: x86 memory DR should be disabledClosedJohn Levon

Actions
Related to illumos gate - Bug #12718: LDT overlap with TSS reintroduced in 12608ClosedPatrick Mooney

Actions
Related to illumos gate - Bug #12727: unix: 'zvp' is undeclaredClosedToomas Soome

Actions
Actions #1

Updated by Patrick Mooney about 2 years ago

In the original ticket, jlevon has these notes about testing:

This has been tested on two machines : a smaller (vmware) instance, and a machine large enough to cause an adjustment of kernelbase. Verified that the VA layout looks sensible on both systems.

Both DEBUG and non-DEBUG were taken for a spin, and KVM was sanity tested.

bhyve VMs of various sizes were booted to verify that the ::vmem / ::vmem_seg output was as expected, and dropped back to zero on de-allocation.

A bunch of random start/stop load was also placed on the system, though this regularly hits LAB-253.

Actions #2

Updated by Patrick Mooney about 2 years ago

  • Tags set to bhyve
Actions #3

Updated by Patrick Mooney about 2 years ago

Actions #4

Updated by Patrick Mooney about 2 years ago

I booted up machines with this change (one VM, one physical) and observed the creation and sizing of the kvmm_arena. Since there is no consumer yet, it remains empty, so vmm usage is absent from ::memstat. Both machines seemed to be operating as expected.

Actions #5

Updated by Electric Monk about 2 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 04909c8c9ef61a86dd44bdaf341a1d9a2f0206e5

commit  04909c8c9ef61a86dd44bdaf341a1d9a2f0206e5
Author: John Levon <john.levon@joyent.com>
Date:   2020-05-08T15:21:40.000Z

    12608 want memory arena for vmm applications
    12609 x86 memory DR should be disabled
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Gordon Ross <gordon.w.ross@gmail.com>

Actions #6

Updated by Andy Fiddaman about 2 years ago

  • Related to Bug #12718: LDT overlap with TSS reintroduced in 12608 added
Actions #7

Updated by Toomas Soome about 2 years ago

  • Related to Bug #12727: unix: 'zvp' is undeclared added
Actions

Also available in: Atom PDF