11281 XHCI polled mode support for USB keyboards

Review Request #2028 — Created June 28, 2019 and submitted


These changes add limited polled mode support to the XHCI driver:
- only input is supported
- only USB interrupt transfers are supported

This is good enough to get a USB keyboard working to be able to use the kernel debugger.

I've tested this on a Celistica Athena. Without my change "mdb -K" on the IPMI remote console will result in a hung system.

With my change the kernel debugger prompt appears and I'm able to request a stack trace with "$C" and exit the debugger with ":c".

  • 0
  • 0
  • 19
  • 6
  • 25
Description From Last Updated
  1. Thanks for working on this. A general question, did you consider ever spinning up a second event ring associated with an interrupt that was never enabled? That might get us out of a lot of the issues where we're trying to share things and process them in a way that they conflict, especially as there's nothing that stops someone from dropping into the system console while other USB devices are being used.

    This is a good start.

    1. Yes, I've considered spinning up a second ring. But my understanding of the USB protocol and the xHCI is still very limited. And spinning up a second ring felt like a massive change that possibly require changes to the existing code path which would involve considerably more risk and testing.

    2. OK. I understand that perspective. It looks like we've changed things such that we won't actually drop events on the ground any more, which is good. Thanks for doing that.
  2. I don't think this comment makes sense any more. We should add a note of this file in xhci.c in the big theory statement. I suspect we likely want to discuss polling, so can you please add a section on that there?
    1. I've added a section and added xhci_polled.c to the list of files.

  3. usr/src/uts/common/io/usb/hcd/xhci/xhci_polled.c (Diff revision 2)
    Why is the forward declaration requiered? Is it because we can't do the __KVPRINTFLIKE on the normal function?
    1. Because you can only put these decorators on function prototypes and not the actual definition of a function.

  4. Can this please be formatted the same as everything else, so we're not lining up the arguments, but concatentated like the other functions.
    1. Fixed (Copy & Pasta)

  5. Would you mind making this file look like all the other files in the driver and not tabbing out between the variable type and declaration. If possible, I'd prefer that the driver looks as similar as possible.
    1. The layout was based on "ehci_polled.c". I'm happy to fix it.

  6. Please do explicit comparisons, as in != NULL.
    1. Yes, I should (Copy & Pasta again)

  7. Shouldn't we sanity check that this is non-NULL and return failure if not?
  8. Shouldn't this be static?
  9. If we have short data at this point in the transfer, why does that represent an underrun? Could we have a comment here that explains the why?
    1. I'll be honest: I don't know whether this is correct or not.

      This routine was pieced together from xhci_endpoint_norm_callback() and I left everything in that seemed necessary.

    2. OK. It seems like what's happens is we've paired down everything and we're assuming that we'll never have USB_ATTRS_SHORT_XFER_OK set. Is that accurate? If that's the case, then it makes sense that we're basically saying that then we'll have ended up with an underrun.
  10. Why is it OK to ignore the return value here?
    1. Because I cannot think of way to handle this. Here we've successfully received a USB scan code so we should return USB_SUCCESS. If we cannot reschedule a transfer we will probably not be able to receive further scan codes. The only way to handle this would be to panic the machine. But that might only re-enter the kernel debugger. And the is always the chance that the USB scan code complets a KMDB command the user gets something out of the machine.

      I've added a comment explaining this and a mechanism for persisting this error to make further reads fail.

    2. Thanks for writing the comment and expanding on what we're doing there.
  11. What has guaranteed that we are only ever going to have events in the ring that relate just to the console usage here? It's not clear to me that this is a safe assumption and dropping to the console after boot wouldn't lead to us breaking something.
    1. Non polled I/O related events are no forwarded to the normal driver code, see below.

  12. Why is it OK to just drop the non-polled transfers?
    1. The problem is how to handle them. I've now added code which passes everything that we cannot handle in polled mode to the normal driver code. And that seems to work fine.

      The idea behind the previous behaviour was to minise the risk of doing something that wouldn't work under KMDB.

    2. Thanks for making the change. With this it's probably less important to look at the second ring.
  13. usr/src/uts/common/io/usb/hcd/xhci/xhci_polled.c (Diff revision 2)
    Why exactly are interrupts relevant to this initialization? I understand why you want to hold the xhci_lock during this whole bit, but it's not clear to me why interrupts matter here.
    1. The comments are copied & pasted. I've removed them.

  14. usr/src/uts/common/io/usb/hcd/xhci/xhci_polled.c (Diff revision 2)
    Please verify that we have a non-NULL value before just passing it to kmem_free().
  15. Lint directives are unnecessary.
  16. Do we need to be disabling interrupts across entry to and from this function? A lot of the comments around here suggest that we're trying to avoid racing with interrupts, so that might be a good thing to do. It also seems like the general endpoint scheduling code should note that we have polled mode on and not or in the bit to generate an interrupt on completion.
    1. As far as I understand the kernel debugger has disabled interrupt handling anyway. So there's nothing to do for us in that regard.

  17. Why do we want to try and enter? Is there a reason we don't want to block for other activity to be set?
    1. My assumption (did I mention that kmdb's internals are underdocumented?) is that we are basically singe threaded at this point. If the lock is taken we would wait forever and the machine would hang. If the call however fails you would hope that kmdb realizes that it cannot initiate console I/O and handles this by e.g. dropping back to the operating system on in case of asynchronous entry reboots the machine.

    2. OK. If the context for this is that we're being called form kmdb context, then that's reasonable. Thanks for the explanation.
  18. I feel like we should get this from xep_type.
    1. I've changed this and added an extra check to xhci_polled_init().

  19. Can we add an additional state bit to the xhci_endpoint_state_t enumeration and make sure that we update when we're in polling mode on an endpoint in the driver state tracking?
  20. Why are we only doing a tryenter here?
    1. The same reason that we used mutex_tryenter() above: to avoid a permanent hang

      It's of course possible to change those two calls. But I think the current solution is better because it at least has a chance of some sensible error handling.

  21. So, I feel like the interrupt management here is unclear. Shouldn't none of this be generating interrupts? I don't understand why we need to manipulate this.
    1. Without this extra code I experienced hangs after entering one character. I also means that the register is in the correct state when we leave the kernel debugger.

      As far as I understand interrupt generation is still enabled but they are simply not served.

    2. I guess the theory here is that if an Interrupt was pending, we're going to clear it as we've basically done all the logical work of doing so anyways. I guess that makes sense.
  22. usr/src/uts/common/io/usb/hcd/xhci/xhci_polled.c (Diff revision 2)
    We aren't currently ever setting this. What kinds of ignorned events do you see as being critical here?
    1. This mechanism got defunct when I made some changes based on the testing on the Athena system because it guaranteed a kernel panic when existing the kernel debugger (see above). I've now added one appropriate check back which sets it.

  23. Why can't all of this be a part of xhci.h?

    1. The other USB HCD drivers have separate include files. But I'm happy to move this into "xhci.h" which I didn't want to clutter too much.

  24. usr/src/uts/common/sys/usb/hcd/xhci/xhci_polled.h (Diff revision 2)
    Conventionally argument names are left out of forward declarations.
  25. Where does 8 come from, please comment this with an explanation like the other constants in xhci.h.
    1. This all came for the EHCI polled code. It's amount of space we need to store a USB HID scan code packet.

      I got rid off the symbol and expanded the comment above the buffer member in xhci_polled_t.

  26. usr/src/uts/common/sys/usb/hcd/xhci/xhci_polled.h (Diff revision 2)
    I don't see this value ever set to B_TRUE in this change. Is it actually necessary?
    1. I've replaced this with a persistent error.

  1. Thanks for the follow ups here, I appreciate it.

  1. Just one note, (c) is not really needed in copyright notice, isnt it so?

    1. Possibly not. I'll keep that in mind for future changes.

  1. Ship It!
Review request changed

Status: Closed (submitted)