Feature #11827
closedIncrease concurrency through blkdev
100%
Description
All I/Os which go through blkdev end up on a single waitq/runq pair before being pushed through the parent driver.
The nvme driver in particular uses blkdev, and one premise of NVMe is its ability to support high levels of concurrency. This single queue acts as a pinch point before the I/Os get spread out again into separate submission queues.
It would be much better to allow the blkdev parent to specify how many queues it wants. blkdev will then create the requested number queues and assign I/O requests to a queue using a simple round-robin algorithm. When the request is passed to the parent, the request will contain the queue number it was assigned and it can schedule it as it wishes.
Im particular for nvme, we would request the same number of blkdev queues as there are submission queues, and a request queued to blkdev queue i will be submitted to submission queue i.
Related issues
Updated by Paul Winder over 3 years ago
During vdbench testing I came across #11847.
I used the following settings for vdbench:
hd=localhost,jvms=4 sd=sd1,host=$host,lun=/dev/rdsk/c9t000CCA0C01192080d0s0,openflags=directio * Workload Definitions (WD) wd=seqWr,sd=sd1,rdpct=0,rhpct=0,seekpct=0 wd=seqRd,sd=sd1,rdpct=100,rhpct=0,seekpct=0 wd=rndWr,sd=sd1,rdpct=0,rhpct=0,seekpct=100 wd=rndRd,sd=sd1,rdpct=100,rhpct=0,seekpct=100 wd=rndRW,sd=sd1,rdpct=50,rhpct=0,seekpct=100 * Run Definistions (RD) rd=default,el=300,warmup=300,in=5,forxfersize=(4k),forth=(8,16,32,64),iorate=max,pause=3 rd=runRndRW,wd=rndRW
Which is 50/50 read/write 4k block size with 8, 16, 32 and 64 threads.
IOPS
8 threads | 16 threads | 32 threads | 64 threads | |
---|---|---|---|---|
before | 70691 | 100434 | 128122 | 163194 |
after | 81217 | 135867 | 205450 | 247028 |
% increase | 14 | 35 | 60 | 51 |
Response Time
8 threads | 16 threads | 32 threads | 64 threads | |
---|---|---|---|---|
before | 0.107 | 0.153 | 0.244 | 0.388 |
after | 0.093 | 0.112 | 0.152 | 0.256 |
% reduction | 13 | 27 | 47 | 44 |
CPU Busy (sys)
8 threads | 16 threads | 32 threads | 64 threads | |
---|---|---|---|---|
before | 4.5 | 12.1 | 20.2 | 24.9 |
after | 3.7 | 5.8 | 6.5 | 7.5 |
% reduction | 18 | 51 | 67 | 70 |
This looks like a win all round!
Updated by Paul Winder over 3 years ago
- Related to Bug #11847: The nvme cmd completion queue is contentious added
Updated by Adam Stylinski over 3 years ago
- Related to Bug #9291: NVMe driver is slower than expected added
Updated by Paul Winder over 3 years ago
Ran up a DEBUG kernel, and modunload'ed the nvme driver and caused a dump.
Results of ::findleaks
> ::findleaks findleaks: using cached results (use '-f' to force a full run) CACHE LEAKED BUFCTL CALLER ffffff0186027a08 22 ffffff018d3a1630 AcpiOsAllocate+0x1c ffffff0186027a08 2 ffffff018c92b440 AcpiOsAllocate+0x1c ffffff0186027a08 1 ffffff018c941890 AcpiOsAllocate+0x1c ffffff0186027a08 88 ffffff018d3a2dd0 AcpiOsAllocate+0x1c ffffff0186027a08 3 ffffff0193bfece8 AcpiOsAllocate+0x1c ffffff0186027a08 1 ffffff018c4c6b60 AcpiOsAllocate+0x1c ffffff0186025a08 2 ffffff018c9d4490 AcpiOsAllocate+0x1c ffffff0186025a08 1 ffffff018f2b1570 AcpiOsAllocate+0x1c ffffff0186027a08 2 ffffff018c92b368 AcpiOsAllocate+0x1c ffffff0186027a08 1 ffffff018c92b518 AcpiOsAllocate+0x1c ffffff0186027a08 187 ffffff019b3ae480 AcpiOsAllocate+0x1c ffffff0186027a08 2 ffffff018c4d1978 AcpiOsAllocate+0x1c ffffff0186027a08 17 ffffff0193bfedc0 AcpiOsAllocate+0x1c ffffff0186027a08 3 ffffff018c92b290 AcpiOsAllocate+0x1c ffffff0186027a08 190 ffffff01adb50628 AcpiOsAllocate+0x1c ffffff0186027a08 2 ffffff018c92b6c8 AcpiOsAllocate+0x1c ffffff0186025a08 510 ffffff018c67fd20 AcpiOsAllocate+0x1c ffffff0186027a08 1 ffffff018c9ab2d0 AcpiOsAllocate+0x1c ffffff0186025a08 4 ffffff018c67eec8 AcpiOsAllocate+0x1c ffffff0186025a08 2 ffffff01ab2d9650 bd_errstats_setstr+0x6e ffffff0186027288 2 ffffff01925646c0 bd_errstats_setstr+0x6e ffffff0186025788 2 ffffff01acbeb018 bd_errstats_setstr+0x6e ffffff0186027288 1 ffffff01905d7ec8 bd_errstats_setstr+0x6e ffffff0186025788 1 ffffff01ab6374a8 bd_errstats_setstr+0x6e ffffff0186025a08 1 ffffff01ab2d99b0 bd_errstats_setstr+0x6e ffffff018602a788 1 ffffff01a72c7c00 ufm_cache_fill+0x1c0 ------------------------------------------------------------------------ Total 1049 buffers, 54944 bytes
No leaks caused by my change. I will log a ticket for bd_errstats_setstr leaks.
Created #12009
Updated by Electric Monk over 3 years ago
- Status changed from In Progress to Closed
- % Done changed from 80 to 100
git commit 4d95620bc3105916e69c40cff8e2e3d55bd6c4ae
commit 4d95620bc3105916e69c40cff8e2e3d55bd6c4ae Author: Paul Winder <paul@winders.demon.co.uk> Date: 2019-11-21T16:46:27.000Z 11827 Increase concurrency through blkdev 11847 The nvme cmd completion queue is contentious Reviewed by: Robert Mustacchi <rm@fingolfin.org> Reviewed by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org> Reviewed by: Matthias Scheler <matthias.scheler@wdc.com> Approved by: Dan McDonald <danmcd@joyent.com>
Updated by Paul Winder over 3 years ago
- Related to Bug #12017: Assertion failure in kstat_waitq_to_runq from blkdev added