Project

General

Profile

Actions

Bug #14570

closed

t4nex uses the wrong DDI register attributes

Added by Robert Mustacchi 5 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Normal
Category:
driver - device drivers
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

While working on a system, I found myself quite perplexed that we found the following messages while trying to use a T6. In particular, we were failing to get the initial flash parameters and the following code was returning EAGAIN:

10070         ret = t4_get_flash_params(adapter);
10071         if (ret < 0) {
10072                 CH_ERR(adapter,
10073                        "Unable to retrieve Flash parameters ret = %d\n", -ret);
10074                 return ret;
10075         }

I was able to verify with a logic analyzer that we were successfully seeing the I/O the NOR SPI flash on the device. While working through this we had a series of other observations:

  • Using kmdb to step through this often made this work (though things would fail later)
  • Enabling a general DTrace probe in the driver, e.g. dtrace -Z -n 'fbt:t4nex::' caused us to progress further, but fail later on

Initially as this timeout was coming from the t4_wait_op_done() operation, I surmised that it was possible that we had a timing issue with TSC calibration ala (#14454); however, we notably weren't using the HPET on this system and by trying to manually validate this as much as a human can, it seemed more or less in line.

After we had ruled out timing, for bad reasons, we turned out attention to questions of page tables and memory mappings. Here we found something fascinating: the main T6 BAR requests that the data order be DDI_UNORDERED_OK_ACC. This (somewhat incorrectly and the subject of its own, longer bug) translates into us allowing this register to be cached! And in fact, when we think about what the side effects of DTrace and kmdb traps and single stepping would be, is they would translate into more cache pressure and in particular with kdmb, %cr3 changes that cause further invalidations.

The problem here is that setting DDI_UNORDERED_OK_ACC means that in theory we can change the order of the requests being issued; however, the strict design of the driver interface definitely does not allow that. In this model it explicitly sets register in a particular order and then polls for completion. The driver really should be specifying DDI_STRICTORDER_ACC. This would give it the semantics that we'd expect.

Now, the there are two obvious follow up questions from this. The first is: But if this is broken in the driver how does this work at all?

The biggest issue between this system i was debugging and others was actually the use of MTRRs to set attributes on regions of memory. In general, the way that x86 works is that it takes the more restrictive (uncacheable being more restrictive, cachable being less) and applies that based on a combination of the page table entries and the values in the MTRRs. This system in question did not have any MTRRs set for this purpose, while others may traditionally end up using those and mark all of the regions of MMIO as uncachable. This is something that we currently end up relying on; however, most other drivers all set this correctly (in fact this is basically one of the only drivers in the entire system that ask for this register access ordering). And while one could argue that all systems should have MTRRs set up, there is no true expectation here and in fact other OSes happily do not rely on them as a backup. We should always be setting what we mean in the driver.

The next bit here is what to do with the user bar, which uses the DDI_MERGING_OK_ACC attribute. While there is discussion in the T6 documentation that there is some write-combining support in this BAR, this generally is not going to be doing what is expected, we believe, as this is meant to be for a set of doorbells. There are a bunch of gotchas around programming the priority register for combining reads and writes that lead me to question what this should be. Based on our experiences above and the question of whether or not there should actually be write-combining given what happens (and the fact that on every system to date that has not been write-combining), we should map change that as well while we're here.

Actions #1

Updated by Electric Monk 5 months ago

  • Gerrit CR set to 2068
Actions #2

Updated by Robert Mustacchi 5 months ago

To test this we first used this on the system that was having issues attaching the device driver and writing the firmware. At this point it now works correctly and we can attach and enumerate the device. It has successfully written and updated its flash as well as gone through and able to use the Ethernet lanes, etc.

To perform additional regression testing, a colleague went and set up two T6s back to back and used netperf, via netperf -P 16 -t 60 -c fe80::207:43ff:fe65:bd60 and netperf -P 16 -t 600 -s. We saw no change in the average throughput on that system, approximately 58 Gbit/s.

Actions #3

Updated by Electric Monk 5 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit fe07c600e1e51d75e87d45aef001e9d6131ecdb8

commit  fe07c600e1e51d75e87d45aef001e9d6131ecdb8
Author: Robert Mustacchi <rm@fingolfin.org>
Date:   2022-03-28T15:16:07.000Z

    14570 t4nex uses the wrong DDI register attributes
    Reviewed by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
    Reviewed by: Gordon Ross <Gordon.W.Ross@gmail.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Actions

Also available in: Atom PDF