Project

General

Profile

Actions

Bug #11679

closed

vn_rele() and friends should VERIFY after mutex

Added by Dan McDonald about 3 years ago. Updated 7 months ago.

Status:
Closed
Priority:
Normal
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Bite-size
Tags:
Gerrit CR:
External Bug:

Description

vn_rele() and its friends have an antipattern:

void
vn_rele(vnode_t *vp)
{
    VERIFY(vp->v_count > 0);   /* XXX KEBE ASKS WHY HERE? */
    mutex_enter(&vp->v_lock);
    /* XXX KEBE ASKS AND NOT HERE? */
    if (vp->v_count == 1) {
        mutex_exit(&vp->v_lock);
        VOP_INACTIVE(vp, CRED(), NULL);
        return;
    }
    VN_RELE_LOCKED(vp);
    mutex_exit(&vp->v_lock);
}

Before commit b5fca8f855 put this as a VERIFY, it's predecessors had an if v_count check also outside the vp mutex. These should be checked AFTER the mutex gets held. In general the v_count should never be 0 at this point, but the more accurate measurement can only happen post-mutex-hold.


Files

vn_rele.d (484 Bytes) vn_rele.d DTrace script used for testing Spencer Evans-Cole, 2022-02-11 04:19 PM
vn_rele_original.txt (3.39 KB) vn_rele_original.txt DTrace output for original vn_rele() and friends Spencer Evans-Cole, 2022-02-11 04:20 PM
vn_rele_moved.txt (3.26 KB) vn_rele_moved.txt DTrace output for moved vn_rele() and friends Spencer Evans-Cole, 2022-02-11 04:20 PM
vn_rele_else.txt (3.26 KB) vn_rele_else.txt DTrace output for if/else vn_rele() and friends Spencer Evans-Cole, 2022-02-11 04:21 PM
11679.patch (2.29 KB) 11679.patch Patch of Gordon's code review comment Spencer Evans-Cole, 2022-03-03 11:20 PM
Actions #1

Updated by Dan McDonald about 3 years ago

  • Difficulty changed from Medium to Bite-size
Actions #2

Updated by Dan McDonald about 3 years ago

There are other parts of the code that check v_count without holding the v_lock. This bug (and its fix) should only focus on vn_rele() and its friends.

Actions #3

Updated by Dan McDonald about 3 years ago

  • Description updated (diff)
Actions #4

Updated by Dan McDonald 10 months ago

  • Assignee set to Spencer Evans-Cole
Actions #5

Updated by Marcel Telka 10 months ago

Maybe the reason to have the VERIFY() call outside the mutex is simply to save few cycles to make sure the mutex is held as short as possible in possibly hot path.

Actions #6

Updated by Joshua M. Clulow 10 months ago

I would probably make the if an if/else and stick the VERIFY in the else arm. I can't imagine that would have a big impact on performance, and I agree that the kind of race these checks catch requires inspection under the lock generally.

Actions #7

Updated by Marcel Telka 10 months ago

You could look at this the other way too:

Let's assume it is expected (designed) that v_count cannot become zero, even for short period of time and even if that would happen with the v_lock held. For example, let's assume the following code is prohibited:

mutex_enter(&vp->v_lock);
vp->v_count--;
vp->v_count++;
mutex_exit(&vp->v_lock);

If the above assumptions are correct, then VERIFY(vp->v_count > 0) outside the mutex is the only way how to catch such cases.

The question left is: Are the assumptions above the real assumptions for v_count design? I do not know. If they are not, then the move of VERIFY() into the mutex should be okay (assuming it won't affect performance).

Actions #8

Updated by Electric Monk 8 months ago

  • Gerrit CR set to 2016
Actions #9

Updated by Spencer Evans-Cole 8 months ago

I ran tests on 2 different versions of changes to vn_rele() and friends. The first change was the one outlined in the bug report, with VERIFY moved after mutex_enter. The second change was the one Josh outlined, where the if was made an if/else and the VERIFY was put in the else arm.

To test these changes, I ran a DTrace script, attached alongside its output for the original, moved and if/else versions of vn_rele() and friends.

Here was my testing process:
1) Start a build of OmniOS R151040
2) Start the DTrace script
3) pwait for nightly to finish and then run pkill for DTrace

After running the tests, I was able to conclude that the performance hit for all 3 of these modified versions was negligible, and I have decided to go forward with the if/else version of vn_rele() and friends. I believe that this is the best version of the change because I saw no performance difference, and this way the VERIFY won't run if the v_count is 1 which saves an extra compare.

However, I would like to acknowledge, one exception to this in vn_rele_dncl(). In this function, I have decided to use the moved version because by the time it hits the else arm v_count_dncl has been decremented by one, so it might be -1/MAXUINT. By moving it, it is still getting checked after the mutex gets held. Although it isn't following the same pattern as my other changes, the bug is still addressed. I ran tested the if/else version with this change.

I will note that the OmniOS-SmartOS version has 1 more functionn than the Gate version, but they are still addressed with the if/else change.

Webrevs Gate/OmniOS-SmartOS are here: https://kebe.com/~danmcd/webrevs/11679/
Gerrit: https://code.illumos.org/c/illumos-gate/+/2016

Actions #10

Updated by Spencer Evans-Cole 7 months ago

On Gerrit, Gordon pointed out that you could drop the else since if the if runs it will return, so the else is not needed.

Attached is a patch with this change made to all the functions.

Actions #11

Updated by Electric Monk 7 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 41a4986b48d4a0e9ed138c952a9bad5124a924a6

commit  41a4986b48d4a0e9ed138c952a9bad5124a924a6
Author: Spencer Evans-Cole <spencer.ec09@gmail.com>
Date:   2022-03-09T23:10:44.000Z

    11679 vn_rele() and friends should VERIFY after mutex
    Reviewed by: Dan McDonald <danmcd@joyent.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Gordon Ross <Gordon.W.Ross@gmail.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>

Actions

Also available in: Atom PDF