Project

General

Profile

Bug #4424

devfsadmd and zoneadmds tied in knots

Added by Robert Mustacchi over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Category:
kernel
Start date:
2013-12-24
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

Looking at the stacks, this is really obvious.

> ::stacks -m dev
THREAD           STATE    SOBJ                COUNT
ffffff43105b6760 SLEEP    RWLOCK                  1
                 swtch+0x141
                 turnstile_block+0x262
                 rw_enter_sleep+0x148
                 sdev_access+0x52
                 fop_access+0x54
                 sdev_remove+0x128
                 fop_remove+0x5b
                 vn_removeat+0x382
                 unlinkat+0x59
                 unlink+0x1b
                 _sys_sysenter_post_swapgs+0x149

ffffff43a5193840 SLEEP    RWLOCK                  1
                 swtch+0x141
                 turnstile_block+0x262
                 rw_enter_sleep+0x21d
                 devname_lookup_func+0x886
                 sdev_lookup+0x7c
                 fop_lookup+0xa2
                 lookuppnvp+0x1f6
                 lookuppnatcred+0x15e
                 lookupnameatcred+0xad
                 lookupnameat+0x39
                 vn_openat+0x315
                 copen+0x20c
                 openat+0x2a
                 open+0x25

> ffffff43105b6760::findstack -v
stack pointer for thread ffffff43105b6760: ffffff01ea9afa10
[ ffffff01ea9afa10 _resume_from_idle+0xf4() ]
  ffffff01ea9afa40 swtch+0x141()
  ffffff01ea9afaf0 turnstile_block+0x262(ffffff43a5525da0, 1, ffffff43c9988630, fffffffffbc08cc0, 0, 0)
  ffffff01ea9afb60 rw_enter_sleep+0x148(ffffff43c9988630, 1)
  ffffff01ea9afbe0 sdev_access+0x52(ffffff43c9987640, 80, 0, ffffff42e26900c0, 0)
  ffffff01ea9afc60 fop_access+0x54(ffffff43c9987640, 80, 0, ffffff42e26900c0, 0)
  ffffff01ea9afd00 sdev_remove+0x128(ffffff43c9987640, ffffff439e3d1e30, ffffff42e26900c0, 0, 0)
  ffffff01ea9afd80 fop_remove+0x5b(ffffff43c9987640, ffffff439e3d1e30, ffffff42e26900c0, 0, 0)
  ffffff01ea9afe50 vn_removeat+0x382(0, fe95da8d, 0, 0)
  ffffff01ea9afea0 unlinkat+0x59(ffd19553, fe95da8d, 0)
  ffffff01ea9afec0 unlink+0x1b(fe95da8d)
  ffffff01ea9aff10 _sys_sysenter_post_swapgs+0x149()
> ffffff43a5193840
stack pointer for thread ffffff43a5193840: ffffff02005f4360
[ ffffff02005f4360 _resume_from_idle+0xf4() ]
  ffffff02005f4390 swtch+0x141()
  ffffff02005f4440 turnstile_block+0x262(0, 0, ffffff43c9988630, fffffffffbc08cc0, 0, 0)
  ffffff02005f44b0 rw_enter_sleep+0x21d(ffffff43c9988630, 0)
  ffffff02005f4610 devname_lookup_func+0x886(ffffff43c9988608, ffffff02005f47e0, ffffff02005f4928, ffffff43fa18fb98, 0, 0)
  ffffff02005f4680 sdev_lookup+0x7c(ffffff43c9987640, ffffff02005f47e0, ffffff02005f4928, ffffff02005f4ab0, 0, ffffff42e2b73840, ffffff43fa18fb98, 
  0, 0, 0)
  ffffff02005f4730 fop_lookup+0xa2(ffffff43c9987640, ffffff02005f47e0, ffffff02005f4928, ffffff02005f4ab0, 0, ffffff42e2b73840, ffffff43fa18fb98, 0
  , 0, 0)
  ffffff02005f4990 lookuppnvp+0x1f6(ffffff02005f4ab0, 0, 1, 0, ffffff02005f4cc8, ffffff42e2b73840, ffffff42e2b73840, ffffff43fa18fb98)
  ffffff02005f4a30 lookuppnatcred+0x15e(ffffff02005f4ab0, 0, 1, 0, ffffff02005f4cc8, 0, ffffff43fa18fb98)
  ffffff02005f4b30 lookupnameatcred+0xad(fffffd7fffdff3e0, 0, 1, 0, ffffff02005f4cc8, 0, ffffff43fa18fb98)
  ffffff02005f4b80 lookupnameat+0x39(fffffd7fffdff3e0, 0, 1, 0, ffffff02005f4cc8, 0)
  ffffff02005f4d30 vn_openat+0x315(fffffd7fffdff3e0, 0, 2883, 3d, ffffff02005f4e40, 0, 8024bc00000012, 0, 8024bc00000005)
  ffffff02005f4ea0 copen+0x20c(ffd19553, fffffd7fffdff3e0, 2883, 3d)
  ffffff02005f4ed0 openat+0x2a(ffd19553, fffffd7fffdff3e0, 882, 3d)
  ffffff02005f4f00 open+0x25(fffffd7fffdff3e0, 882, 3d)
  ffffff02005f4f10 sys_syscall+0x17a()
> ffffff43c9988630::rwlock
            ADDR      OWNER/COUNT FLAGS          WAITERS
ffffff43c9988630        READERS=1  B011 ffffff43a5193840 (W)
                                     || ffffff43105b6760 (R)
                 WRITE_WANTED -------+|
                  HAS_WAITERS --------+

sdev_remove grabbed a reader lock and then called sdev_access which also needs a reader lock on the directory. devname_lookup_func is trying to grab it for writer. Because of the recursive grab for reader we deadlock like a champ! The fix is probably that sdev_remove should not call VOP_ACCESS directly since its access function should almost certainly always be itself (sdev_access).

To confirm that everything does in fact call sdev_acecss I ran the following job over all of the dumps in thoth:

mjob get 254516b7-e3ad-4912-ada9-ef7ecd72cd88
{
  "id": "254516b7-e3ad-4912-ada9-ef7ecd72cd88",
  "name": "",
  "state": "done",
  "cancelled": false,
  "inputDone": true,
  "stats": {
    "errors": 1,
    "outputs": 1,
    "retries": 0,
    "tasks": 358,
    "tasksDone": 358
  },
  "timeCreated": "2013-11-23T01:34:38.712Z",
  "timeDone": "2013-11-23T01:44:08.771Z",
  "timeArchiveStarted": "2013-11-23T01:44:09.541Z",
  "timeArchiveDone": "2013-11-23T01:44:11.137Z",
  "phases": [
    {
      "exec": "mdb $MANTA_INPUT_FILE -e '::walk sdev_node_cache | ::printf \\"%a\\\\n\\" sdev_node_t sdev_vnode->v_op->vop_access'",
      "type": "map" 
    },
    {
      "exec": "sort | uniq -c",
      "type": "reduce" 
    }
  ],
  "options": {}
}
rm@zanarkand ~ $ mget $(mjob outputs 254516b7-e3ad-4912-ada9-ef7ecd72cd88)
1293677 sdev_access

History

#1

Updated by Robert Mustacchi over 5 years ago

  • Status changed from New to Resolved

Resolved in de442498e34e37fe9b61cfe5beaacdecd06c6a1c.

Also available in: Atom PDF