Project

General

Profile

Bug #3746

ZRLs are racy

Added by Will Andrews about 6 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2013-04-23
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

From the original change log:

It was possible for a reference to be added even with the lock held, and
for references added just after a lock release to be lost.

This bug was also independently found and reported in wesunsolve.net
issues 6985013 6995524.

In zrl_add(), always use an atomic operation to update the refcount.
The mutex in the ZRL only guarantees that wakeups occur for waiters on the
lock. It offers no protection against concurrent updates of the refcount.
The only refcount transition that is safe to perform without an atomic
operation is from ZRL_LOCKED back to 0, since this can only be performed
by the thread which has the ZRL locked.


Files

zrlock-issue.txt (11.9 KB) zrlock-issue.txt one of our servers had lots of threads blocked due to a zrlock in LOCKED state Youzhong Yang, 2016-09-23 03:07 PM
zrlock-test.c (5.52 KB) zrlock-test.c test program reproducing the issue, and with proposed fix for zrl_add_impl Youzhong Yang, 2016-09-23 03:07 PM

History

#1

Updated by Ed Maste over 3 years ago

We've encountered this issue again in the FreeBSD/arm64 port:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204037

#2

Updated by Youzhong Yang over 2 years ago

We encountered this issue on one of our SmartOS servers, see attached file zrlock-issue.txt.

I was able to reproduce the issue using the attached C program zrlock-test.c:

./zrlock-test 
assertion failed for thread 0xfffffd7ffef03240, thread-id 9: (int32_t)n >= 0 (0xffffffffffffffff >= 0x0), file zrlock-test.c, line 101

And with my proposed fix, it worked very well and I haven't seen a single assertion.

void
zrl_add_impl(zrlock_t *zrl, const char *zc)
{
    uint32_t n = (uint32_t)zrl->zr_refcount;

    while (1) {
            while (n != ZRL_LOCKED) {
                    uint32_t cas = atomic_cas_32(
                        (uint32_t *)&zrl->zr_refcount, n, n + 1);
                    if (cas == n) {
                            ASSERT3S((int32_t)n, >=, 0);
                            return;
                    }
                    n = cas;
            }
            mutex_lock(&zrl->zr_mtx);
            while (zrl->zr_refcount == ZRL_LOCKED) {
                    cond_wait(&zrl->zr_cv, &zrl->zr_mtx);
            }
            n = (uint32_t)zrl->zr_refcount;
            mutex_unlock(&zrl->zr_mtx);
    }
}

#4

Updated by Electric Monk over 2 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 260af64db74a52d64de8c6c5f67dd0a71d228ca5

commit  260af64db74a52d64de8c6c5f67dd0a71d228ca5
Author: Youzhong Yang <yyang@mathworks.com>
Date:   2016-10-12T17:14:32.000Z

    3746 ZRLs are racy
    Authored by: Will Andrews <will@freebsd.org>
    Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
    Reviewed by: Pavel Zakharov <pavel.zakha@gmail.com>
    Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
    Reviewed by: Justin T. Gibbs <gibbs@scsiguy.com>
    Approved by: Matt Ahrens <mahrens@delphix.com>

Also available in: Atom PDF