6124 NLM server holds vnodes too long

Review Request #84 — Created Aug. 14, 2015 and submitted

marcel
illumos-gate
6124
general

webrev: http://cr.illumos.org/~webrev/marcel/il-NEX-2966-nlm-vhold/

The NLM server creates a nlm_vhold structure for every vnode it touches to make sure the vnode is properly held when there are any active locks for the particular vnode. There one nlm_vhold structure allocated for every host/vnode pair, even in a case the number of locks on the particular vnode is higher than one. The nlm_vhold structure is not used to track all of the locks, it's purpose is just to mark a vnode as "touched" and held for the particular host.

Once a lock is released for the particular vnode, the NLM implementation does not free the nlm_vhold structure, since we are not sure whether this lock is the last one (and the vnode is no longer locked by the particular host), or not. To determine this, some expensive tests are needed, so we defer such tests to the nlm_gc() - the NLM garbage collector.

In the nlm_gc() nlm_vhold structures are scanned to see whether they are still needed or not (if not, they are freed). The problem is that the nlm_gc() checks nlm_vhold structs only for some idle hosts. The non-idle hosts are rarely checked for unused nlm_vhold structs and such vholds are never (or too late) released. This causes the particular vnodes are held for too long time, even the files are removed.

I ran the test from the bugreport to make sure the problem is no longer
reproducible. With the fix, the NLM garbage collector is properly releasing
the vhold structure, the vnode reference counter is decreased and the space is
properly reclaimed.

  • 0
  • 0
  • 2
  • 0
  • 2
Description From Last Updated
pcd
  1. Overall, looks good! Just one question and a couple nits.

  2. usr/src/uts/common/klm/nlm_impl.c (Diff revision 1)
     
     

    What's the performance of this operation like? Checking the fields is fast, obviously, but what about the freeing? Is that something we want to do while holding the global lock? If it's not, we could remove them from their lists and stick them on a to-free list, and then free them while the global lock isn't being held (though we'll have to re-hold the host locks to do that).

    1. This is very good question. I do not know. I just know that we have similarly complex functions run with the global lock held, for example nlm_unexport(). If we will found in the real life that this piece of the code generates some lock contention (or some other performance issue) we could improve it later.

      I just tried to measure the nlm_vhold_destroy() and I found that it is about 25-35 times slower than AVL_NEXT() on this particular platform (average values from 10k+ nlm_vhold_destroy() runs and 400k+ AVL_NEXT() runs). I'm not sure it is worth making the freeing of vholds more complex (using the to-free list, for example) to save this time. As I said: once it is proven we need to improve it, we could do it later as a separate project.

      Thanks.

    2. That's a fair point. I agree we probably don't need to worry about it right now.

  3. usr/src/uts/common/klm/nlm_impl.c (Diff revision 1)
     
     

    [nit]: for if statements that are this many lines, adding braces can improve readability. Just personal preference, though, feel free to disregard.

    1. My personal preference for this particular condition is without the braces, because the condition is already formatted in such a way that the readability should be good enough. (And for me: less tokens, less confusion, usually :-)

      Consider the difference:

        395 +                                if (nvp->nv_refcnt == 0 &&
        396 +                                    nvp->nv_vp->v_filocks == NULL &&
        397 +                                    nvp->nv_vp->v_shrlocks == NULL)
      
        395 +                                if ((nvp->nv_refcnt == 0) &&
        396 +                                    (nvp->nv_vp->v_filocks == NULL) &&
        397 +                                    (nvp->nv_vp->v_shrlocks == NULL))
      

      I think it is almost the same.

      But, OTOH, I have no strong preference for going without the braces, so in a case you (or somebody else) really wants to see the braces there, I'll add them. Just let me know.

    2. Sorry, I meant adding curly braces around the body of the if statement:

      395 +                                if (nvp->nv_refcnt == 0 &&
      396 +                                    nvp->nv_vp->v_filocks == NULL &&
      397 +                                    nvp->nv_vp->v_shrlocks == NULL) {
      398 +                                        nlm_vhold_destroy(hostp, nvp);
      399 +                                }
      

      But, again, not a big deal.

    3. Ah, that one. Yes, this definitely makes sense. Braces added.

  4. usr/src/uts/common/klm/nlm_impl.c (Diff revision 1)
     
     

    [nit]: see above

  5. 
      
marcel
gwr
  1. For some background: the old (closed source) NLM apparently never dropped its vnode holds until the NFS export went away (i.e. unmount).

    When designing the GC functionality Dan K. and I assumed it would be sufficient to GC those holds in "lazy" fashion, looking at the holds under a given NLM host only when it has been idle for a while.
    The recent use case where someone would like deleted files to recover space faster rasises an interesting new requirement that the previous design does not satisfy.

    The proposed fix makes this GC function much more "aggressive" (I wouldn't call it "lazy" anymore, as it now operates on every host).
    I'm not sure that's the best approach. That could introduce quite a lot of activity in these data structures that wasn't there before.
    (right? It's also just very different from the original "lazy" design)

    What other approaches were considered for fixing this?

    Another couple ideas (thinking as I write this):
    Implement "last used" times for vnode holds like we do for hosts (update in vhold get), and only touch those vholds that have not been used recently.
    Implement a notification method, i.e. some nlm_vnode_deleted() function to be called by file-system implementations (like ZFS) to tell NLM it might want to drop its vhold, if it has one.
    (that might eliminate the need to make GC of vholds more aggressive)

    1. I just tested OpenIndiana 151a8 with the closed source NLM implementation and bug 6124 is not reproducible there. It looks like the old closed source NLM dropped the vnode hold immediately once the file was deleted, because ZFS reclaimed the space occupied by the deleted file in less than 10 seconds after the file was deleted (the space reclaim is even faster than with the new open source NLM with my fix).

      Yes, the proposed fix makes the vhold release more aggressive than before, but it still could be considered lazy. The previous "lazy" implementation was too lazy, so lazy that some vholds were never released. My new implementation is a bit less lazy, but it still releases vholds usually after many minutes of the waiting - the nlm_gc() idle period is 5 minutes by default, IIRC.

      The "last used" idea: The vhold/vnode release is cheap, so I do not see any gain with such proposal/implementation.

      The "notification" idea: This would need a modification in all other filesystems (or in the generic vnode manipulation/VOP routines) for unknown gain. That's hardly justifiable. But maybe I do not see something obvious...

    2. I don't like that the GC thread now visits every vhold in the collection on each pass (which it didn't do before).
      However, given that it's not doing anything expensive with each vhold, I'll "hold my nose".

  2. 
      
gwr
  1. Ship It!
  2. 
      
marcel
Review request changed

Status: Closed (submitted)

Change Summary:

commit b1087aec0c7377254423f3429c35960b116e8b34
Author:     Marcel Telka <marcel.telka@nexenta.com>
AuthorDate: Mon Aug 17 20:07:18 2015 +0200
Commit:     Dan McDonald <danmcd@omniti.com>
CommitDate: Tue Aug 25 17:17:15 2015 -0400

    6124 NLM server holds vnodes too long
    Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
    Reviewed by: Dan Fields <dan.fields@nexenta.com>
    Reviewed by: Paul Dagnelie <pcd@delphix.com>
    Reviewed by: Gordon Ross <Gordon.W.Ross@gmail.com>
    Approved by: Dan McDonald <danmcd@omniti.com>

:100644 100644 a10ba86... 0f57ec4... M  usr/src/uts/common/klm/nlm_impl.c
Loading...