Project

General

Profile

Actions

Bug #13691

closed

mlxcx panicked in mac_link_update()

Added by Robert Mustacchi over 2 years ago. Updated over 2 years ago.

Status:
Duplicate
Priority:
Normal
Category:
driver - device drivers
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

A SmartOS user hit a panic and shared the following screenshot: https://pasteboard.co/JVjZIOP.jpg. It showed the system panicking by dereferencing 0x9c in mac_link_update, coming from mlxcx. As Dan McDonald noted, the offset that the code was dereferencing from the asm instruction mac_link_update+1: movl %esi,0x9c(%rdi) and the panic were the same, suggesting we had a NULL mac_handle_t. Looking at the driver source code for mlxcx there are two ways that this could happen which tie into the following observation: the mac handle is the last thing created and registered in attach and the first thing to go in detach. With that, there are two ways this could have happened (which is hard to say due to the lack of a dump):

1. Because we create the taskq and enable interrupts, it is possible that we get a asynchronous interrupt before we call mac_register() and thus get to mlxcx_update_link_state() with a NULL handle.

2. Because the taskq that is processing the link state event is being done asynchronously, that happened and then we called detach(). In this case, this would be a window between detach unregistering with mac and then this happening before the call to taskq_destroy().

To fix both of these, we need two things:

1. We should make mlxcx_update_link_state() gate itself on having a valid mac handle.
2. The teardown path should issue a taskq_wait() to make sure that all outstanding elements of the asynch taskq have finished processing. Just doing (1) is insufficient because we could easily have a problem where the mlxcx_t itself was bad (though we'd panic a bit sooner).

Actions #1

Updated by Electric Monk over 2 years ago

  • Gerrit CR set to 1382
Actions #2

Updated by Robert Mustacchi over 2 years ago

  • Status changed from New to Duplicate
  • % Done changed from 70 to 0

Duplicate of #13359.

Actions

Also available in: Atom PDF