Project

General

Profile

Actions

Feature #11827

closed

Increase concurrency through blkdev

Added by Paul Winder over 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
driver - device drivers
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

All I/Os which go through blkdev end up on a single waitq/runq pair before being pushed through the parent driver.

The nvme driver in particular uses blkdev, and one premise of NVMe is its ability to support high levels of concurrency. This single queue acts as a pinch point before the I/Os get spread out again into separate submission queues.

It would be much better to allow the blkdev parent to specify how many queues it wants. blkdev will then create the requested number queues and assign I/O requests to a queue using a simple round-robin algorithm. When the request is passed to the parent, the request will contain the queue number it was assigned and it can schedule it as it wishes.

Im particular for nvme, we would request the same number of blkdev queues as there are submission queues, and a request queued to blkdev queue i will be submitted to submission queue i.


Related issues

Related to illumos gate - Bug #11847: The nvme cmd completion queue is contentiousClosedPaul Winder

Actions
Related to illumos gate - Bug #9291: NVMe driver is slower than expectedNew2018-03-16

Actions
Related to illumos gate - Bug #12017: Assertion failure in kstat_waitq_to_runq from blkdevClosedPaul Winder

Actions
Actions #1

Updated by Paul Winder over 2 years ago

During vdbench testing I came across #11847.

I used the following settings for vdbench:

hd=localhost,jvms=4
sd=sd1,host=$host,lun=/dev/rdsk/c9t000CCA0C01192080d0s0,openflags=directio

* Workload Definitions (WD)
wd=seqWr,sd=sd1,rdpct=0,rhpct=0,seekpct=0
wd=seqRd,sd=sd1,rdpct=100,rhpct=0,seekpct=0
wd=rndWr,sd=sd1,rdpct=0,rhpct=0,seekpct=100
wd=rndRd,sd=sd1,rdpct=100,rhpct=0,seekpct=100
wd=rndRW,sd=sd1,rdpct=50,rhpct=0,seekpct=100

* Run Definistions (RD)
rd=default,el=300,warmup=300,in=5,forxfersize=(4k),forth=(8,16,32,64),iorate=max,pause=3

rd=runRndRW,wd=rndRW

Which is 50/50 read/write 4k block size with 8, 16, 32 and 64 threads.

IOPS
8 threads 16 threads 32 threads 64 threads
before 70691 100434 128122 163194
after 81217 135867 205450 247028
% increase 14 35 60 51

Response Time
8 threads 16 threads 32 threads 64 threads
before 0.107 0.153 0.244 0.388
after 0.093 0.112 0.152 0.256
% reduction 13 27 47 44

CPU Busy (sys)
8 threads 16 threads 32 threads 64 threads
before 4.5 12.1 20.2 24.9
after 3.7 5.8 6.5 7.5
% reduction 18 51 67 70

This looks like a win all round!

Actions #2

Updated by Paul Winder over 2 years ago

  • Related to Bug #11847: The nvme cmd completion queue is contentious added
Actions #3

Updated by Adam Stylinski about 2 years ago

  • Related to Bug #9291: NVMe driver is slower than expected added
Actions #4

Updated by Paul Winder about 2 years ago

Ran up a DEBUG kernel, and modunload'ed the nvme driver and caused a dump.
Results of ::findleaks

> ::findleaks   
findleaks: using cached results (use '-f' to force a full run)
CACHE             LEAKED           BUFCTL CALLER
ffffff0186027a08      22 ffffff018d3a1630 AcpiOsAllocate+0x1c
ffffff0186027a08       2 ffffff018c92b440 AcpiOsAllocate+0x1c
ffffff0186027a08       1 ffffff018c941890 AcpiOsAllocate+0x1c
ffffff0186027a08      88 ffffff018d3a2dd0 AcpiOsAllocate+0x1c
ffffff0186027a08       3 ffffff0193bfece8 AcpiOsAllocate+0x1c
ffffff0186027a08       1 ffffff018c4c6b60 AcpiOsAllocate+0x1c
ffffff0186025a08       2 ffffff018c9d4490 AcpiOsAllocate+0x1c
ffffff0186025a08       1 ffffff018f2b1570 AcpiOsAllocate+0x1c
ffffff0186027a08       2 ffffff018c92b368 AcpiOsAllocate+0x1c
ffffff0186027a08       1 ffffff018c92b518 AcpiOsAllocate+0x1c
ffffff0186027a08     187 ffffff019b3ae480 AcpiOsAllocate+0x1c
ffffff0186027a08       2 ffffff018c4d1978 AcpiOsAllocate+0x1c
ffffff0186027a08      17 ffffff0193bfedc0 AcpiOsAllocate+0x1c
ffffff0186027a08       3 ffffff018c92b290 AcpiOsAllocate+0x1c
ffffff0186027a08     190 ffffff01adb50628 AcpiOsAllocate+0x1c
ffffff0186027a08       2 ffffff018c92b6c8 AcpiOsAllocate+0x1c
ffffff0186025a08     510 ffffff018c67fd20 AcpiOsAllocate+0x1c
ffffff0186027a08       1 ffffff018c9ab2d0 AcpiOsAllocate+0x1c
ffffff0186025a08       4 ffffff018c67eec8 AcpiOsAllocate+0x1c
ffffff0186025a08       2 ffffff01ab2d9650 bd_errstats_setstr+0x6e
ffffff0186027288       2 ffffff01925646c0 bd_errstats_setstr+0x6e
ffffff0186025788       2 ffffff01acbeb018 bd_errstats_setstr+0x6e
ffffff0186027288       1 ffffff01905d7ec8 bd_errstats_setstr+0x6e
ffffff0186025788       1 ffffff01ab6374a8 bd_errstats_setstr+0x6e
ffffff0186025a08       1 ffffff01ab2d99b0 bd_errstats_setstr+0x6e
ffffff018602a788       1 ffffff01a72c7c00 ufm_cache_fill+0x1c0
------------------------------------------------------------------------
           Total    1049 buffers, 54944 bytes

No leaks caused by my change. I will log a ticket for bd_errstats_setstr leaks.
Created #12009

Actions #5

Updated by Electric Monk about 2 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 80 to 100

git commit 4d95620bc3105916e69c40cff8e2e3d55bd6c4ae

commit  4d95620bc3105916e69c40cff8e2e3d55bd6c4ae
Author: Paul Winder <paul@winders.demon.co.uk>
Date:   2019-11-21T16:46:27.000Z

    11827 Increase concurrency through blkdev
    11847 The nvme cmd completion queue is contentious
    Reviewed by: Robert Mustacchi <rm@fingolfin.org>
    Reviewed by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
    Reviewed by: Matthias Scheler <matthias.scheler@wdc.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Actions #6

Updated by Paul Winder about 2 years ago

  • Related to Bug #12017: Assertion failure in kstat_waitq_to_runq from blkdev added
Actions

Also available in: Atom PDF