want memory arena for vmm applications
Upstreaming OS-6606 as a bhyve prereq:
For the time being, bhyve on illumos is directly allocating kernel memory to furnish as physical memory for hosted guests. Allocating out of the normal kmem arena would mean guest memory would be included by default in OS crash dumps. For initial testing, the zio arena was used since it isn't included in crash dumps. It would be nice to have an independent arena so
::memstataccounting was clear and future control over potential largepage allocation was easy.
First, a historical note: the current code (since Illumos bug #6914, which introduced
segzio) figures on 1.5x physmem for
kernelheap, and 1.5x physmem for
segzio. Ignoring loose change, this means that we have space for around
(COREHEAP_BASE-VALLOC_BASE==1Tb) / 3physmem. The current code actually uses 256Gb for this figure. (KPM lives below
Past this limit, s@tartup_memlist@ is supposed to move things down to make room. But it actually seems to mis-calculate the adjustment:
+ segkpm_base = -(P2ROUNDUP((4 * kpm_resv_amount), + KERNEL_REDZONE_SIZE));
This worked when the physmem limit was 1Tb. But now, at e.g. 512Gb physmem, this comes out as higher than the original
0xffffff8000000000 > 0xfffffe0000000000).
We only get away with this because a) the code will ensure the new base is no higher than the original immediately after and b) at least on my machine, plat_dr_physmax is huge and the adjustment will work out OK.
Regardless, the funny maths here seems to have no purpose at all, so we'll just rewrite this to make the adjustment normally.
We'll introduce a new
segkvmmarena into the kernel address space as described in
i86pc/os/startup.c. Note that bhyve currently uses this arena, backed by
segkmem_allocto build a (4Kb) set of kernel mappings for all the VM memory. This is in addition to
segvmmwhich maps the same physical pages into the bhyve userspace.
We'll reserve 4x physmem for this arena: we don't have any quantum caches, as we're expecting only relatively large VA allocations, so we want to over-provision the VA space to avoid fragmentation issues. It's also a convenient number as the default layout accounts for up to 256Gb physmem: 4x that is 1Tb.
To account for the new space, we will permanently move down
SEGKPM_BASEby 1Tb (and hence
We also need to fix up the adjustment code in
startup_memlist(). We will now have room for 2Tb/7 physmem - which we'll still call 256Gb - but when we adjust we need to want to make sure there's enough extra VA x8 (1.5x heap, 1.5x segzio, 1x kpm, 4x vmm).
This increase actually means we need to officially drop support for DR memory: on my test machine,
lat_dr_physmaxis 16Tb, and we actually don't have enough VA above the hole for this calculation!