Project

General

Profile

Bug #12798

mlxcx command interface should allow concurrent commands and be interrupt driven

Added by Paul Winder 5 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
driver - device drivers
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

At the moment only a single command can be submitted to the HCA at a time, and then that command is polled for completion.

The HCA supports up to 32 concurrent commands with interrupt driven completions, we should move to that.

Alongside this, the method for acquiring pages for the hardware had to be slightly modified.
  1. The number of pages in a single command was increased from 512 to 4096.
  2. Previously, when the hardware required pages, the driver would iterate in a loop giving pages in 512 page chunks at a time, inside the interrupt thread. When this was moved to a taskq, it unblocked the interrupt and caused a storm of page request interrupts as each 512 did not satisfy the hardware's request. The solution is to remove the iteration, and just give pages to the hardware in chunks of 4096, and let the h/w drive another interrupt until its needs have been satisfied.

Related issues

Related to illumos gate - Bug #12797: mlxcx max flow table limit can be exceededClosed

Actions
Related to illumos gate - Bug #12799: mlxcx #if defined for MAC_VLAN_UNTAGGED is redundantClosed

Actions
Related to illumos gate - Bug #12890: mlxcx uses excessive stack space causing stack overflow panicClosed

Actions

History

#1

Updated by Paul Winder 5 months ago

When Robert released his work-in-progress in 2019, I started investigating and this is one of the first changes I made.

Since then Alex finished off the driver, and again at Racktop I pulled in the changes I had made and have been running with them since we started using the driver in production.

#2

Updated by Paul Winder 5 months ago

  • Description updated (diff)
#3

Updated by Paul Winder 5 months ago

  • Subject changed from mlxcx command interface should allow concurrent commands and be interrupt driven to mlxcx command interface should allow concurrent commands and be interrupt drive
#4

Updated by Paul Winder 5 months ago

  • Subject changed from mlxcx command interface should allow concurrent commands and be interrupt drive to mlxcx command interface should allow concurrent commands and be interrupt driven
#5

Updated by Paul Winder 5 months ago

  • Subject changed from mlxcx command interface should allow concurrent commands and be interrupt driven to mlxcx command interface should allow concurrent commands and be interrupt driven.
#6

Updated by Paul Winder 5 months ago

  • Subject changed from mlxcx command interface should allow concurrent commands and be interrupt driven. to mlxcx command interface should allow concurrent commands and be interrupt driven
#7

Updated by Paul Winder 5 months ago

  • Gerrit CR set to 705
#8

Updated by Dan McDonald 3 months ago

  • Related to Bug #12797: mlxcx max flow table limit can be exceeded added
#9

Updated by Dan McDonald 3 months ago

  • Related to Bug #12799: mlxcx #if defined for MAC_VLAN_UNTAGGED is redundant added
#10

Updated by Dan McDonald 3 months ago

  • Related to Bug #12890: mlxcx uses excessive stack space causing stack overflow panic added
#11

Updated by Paul Winder 3 months ago

Testing:
  • The changes are from Racktop's production version of the driver, and as such have been through Racktop QA.
  • The command interface in particular is stressed during driver load and unload. Changes tested by repeated loading and unloading driver.
  • Basic functionality tested using iperf to provide a load, created and used aggrs, vnics, changed MTUs.
#12

Updated by Electric Monk 3 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 80 to 100

git commit 5f0e3176f407dfb9d989b5dcc94a6d5384d0b142

commit  5f0e3176f407dfb9d989b5dcc94a6d5384d0b142
Author: Paul Winder <pwinder@racktopsystems.com>
Date:   2020-07-10T15:41:30.000Z

    12797 mlxcx max flow table limit can be exceeded
    12798 mlxcx command interface should allow concurrent commands and be interrupt driven
    12799 mlxcx #if defined for MAC_VLAN_UNTAGGED is redundant
    12890 mlxcx uses excessive stack space causing stack overflow panic
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Reviewed by: Andy Stormont <astormont@racktopsystems.com>
    Reviewed by: Robert Mustacchi <rm@fingolfin.org>
    Reviewed by: Alex Wilson <alex@cooperi.net>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF