6916 Race between nlm_unexport() and nlm_svc_stopping() can cause panic

Review Request #185 — Created April 16, 2016 and updated

marcel
illumos-gate
6916
general

webrev: http://cr.illumos.org/~webrev/marcel/il-nlm_unexport-panic/

There is a bace between nlm_unexport() and nlm_svc_stopping() which can cause panic.

The problem is in nlm_svc_stopping() where the manipulations with nlm_hosts_tree are not protected by nlm_globals->lock; probably in a hope that we are the only thread touching the nlm_hosts_tree. This is obviously not true, and the nlm_svc_stopping() author was aware about that because this is in the comment above the nlm_svc_stopping():

2432 * NOTE: NFS code can call NLM while it's
2433 * stopping or even if it's shut down. Any attempt
2434 * to lock file either on client or on the server
2435 * will fail if NLM isn't in NLM_ST_UP state.

The fix just makes sure the nlm_hosts_tree is properly protected by nlm_globals->lock as all other functions does.

I ran the NFS server for a while with the fix installed. I tried several scenarios similar to the reproduction steps to make sure the fix works properly and no regression is introduced.

  • 0
  • 0
  • 0
  • 1
  • 1
Description From Last Updated
danmcd
  1. 
      
  2. usr/src/uts/common/klm/nlm_impl.c (Diff revision 1)
     
     

    Before g->lock is reacquired, is there any chance that hostp or next_h become invalid? I don't see any obvious reference-holding mechanism on these structures. Pointers to other code can comprise a good answer.

    1. Very good question! :-)

      This works with an idea that there is only single "destroyer" at a time. Under normal conditions
      such "destroyer" is nlm_gc(). In nlm_gc() there is similar code used as I suggest here in nlm_svc_stopping(). The nlm_gc() works with the same assumption: I'm the only "destroyer" so no fear some nlm_host structure might be destroyed under my hands. Before the nlm_svc_stopping() starts its job it first asks the nlm_gc() thread to exit so there is no other concurrent "destroyer" there.

      Yes, this design is a bit strange, but it was here even before my fix. My fix just makes sure that we (the nlm_svc_stopping() function) do not destroy the hostp structure under hands of some other thread - nlm_unexport() in this case.

      Thank you.

    2. I want comments to that effect. This is very hinky-looking to an outsider reviewing it. Please spell it out somewhere among your changes. I can help if need be.

    3. I added some wording. Please let me know if you would like something different. Thank you.

  3. 
      
marcel
Review request changed

Change Summary:

Added a comment as suggested by Dan McDonald.

Diff:

Revision 2 (+45 -40)

Show changes

danmcd
  1. Thank you. That really helped.

  2. 
      
kmays
  1. Ship It!
  2. 
      
Loading...