kmem_reap does one xcall per page
We see performance problems due to kmem_reap() of ZFS's zio buf caches. The kmem_reap() causes a lot of cross-calls to tear down TLB entries, which is slow. It can also cause latency bubbles in concurrent zfs_read/zfs_write operations.
The problem is reliably triggered when a workload switches between accessing files of different blocksizes. This causes kmem to need to free lots of buffers of the old size in order to allocate ones of the new size.
We can improve on this in some cases by using one xcall per slab, to tear down all the virtually contiguous pages used to back the slab. The performance benefit will be most noticeable when reaping larger (e.g. 128KB) caches.