Actions
Bug #4424
closeddevfsadmd and zoneadmds tied in knots
Start date:
2013-12-24
Due date:
% Done:
100%
Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:
Description
Looking at the stacks, this is really obvious.
> ::stacks -m dev THREAD STATE SOBJ COUNT ffffff43105b6760 SLEEP RWLOCK 1 swtch+0x141 turnstile_block+0x262 rw_enter_sleep+0x148 sdev_access+0x52 fop_access+0x54 sdev_remove+0x128 fop_remove+0x5b vn_removeat+0x382 unlinkat+0x59 unlink+0x1b _sys_sysenter_post_swapgs+0x149 ffffff43a5193840 SLEEP RWLOCK 1 swtch+0x141 turnstile_block+0x262 rw_enter_sleep+0x21d devname_lookup_func+0x886 sdev_lookup+0x7c fop_lookup+0xa2 lookuppnvp+0x1f6 lookuppnatcred+0x15e lookupnameatcred+0xad lookupnameat+0x39 vn_openat+0x315 copen+0x20c openat+0x2a open+0x25 > ffffff43105b6760::findstack -v stack pointer for thread ffffff43105b6760: ffffff01ea9afa10 [ ffffff01ea9afa10 _resume_from_idle+0xf4() ] ffffff01ea9afa40 swtch+0x141() ffffff01ea9afaf0 turnstile_block+0x262(ffffff43a5525da0, 1, ffffff43c9988630, fffffffffbc08cc0, 0, 0) ffffff01ea9afb60 rw_enter_sleep+0x148(ffffff43c9988630, 1) ffffff01ea9afbe0 sdev_access+0x52(ffffff43c9987640, 80, 0, ffffff42e26900c0, 0) ffffff01ea9afc60 fop_access+0x54(ffffff43c9987640, 80, 0, ffffff42e26900c0, 0) ffffff01ea9afd00 sdev_remove+0x128(ffffff43c9987640, ffffff439e3d1e30, ffffff42e26900c0, 0, 0) ffffff01ea9afd80 fop_remove+0x5b(ffffff43c9987640, ffffff439e3d1e30, ffffff42e26900c0, 0, 0) ffffff01ea9afe50 vn_removeat+0x382(0, fe95da8d, 0, 0) ffffff01ea9afea0 unlinkat+0x59(ffd19553, fe95da8d, 0) ffffff01ea9afec0 unlink+0x1b(fe95da8d) ffffff01ea9aff10 _sys_sysenter_post_swapgs+0x149() > ffffff43a5193840 stack pointer for thread ffffff43a5193840: ffffff02005f4360 [ ffffff02005f4360 _resume_from_idle+0xf4() ] ffffff02005f4390 swtch+0x141() ffffff02005f4440 turnstile_block+0x262(0, 0, ffffff43c9988630, fffffffffbc08cc0, 0, 0) ffffff02005f44b0 rw_enter_sleep+0x21d(ffffff43c9988630, 0) ffffff02005f4610 devname_lookup_func+0x886(ffffff43c9988608, ffffff02005f47e0, ffffff02005f4928, ffffff43fa18fb98, 0, 0) ffffff02005f4680 sdev_lookup+0x7c(ffffff43c9987640, ffffff02005f47e0, ffffff02005f4928, ffffff02005f4ab0, 0, ffffff42e2b73840, ffffff43fa18fb98, 0, 0, 0) ffffff02005f4730 fop_lookup+0xa2(ffffff43c9987640, ffffff02005f47e0, ffffff02005f4928, ffffff02005f4ab0, 0, ffffff42e2b73840, ffffff43fa18fb98, 0 , 0, 0) ffffff02005f4990 lookuppnvp+0x1f6(ffffff02005f4ab0, 0, 1, 0, ffffff02005f4cc8, ffffff42e2b73840, ffffff42e2b73840, ffffff43fa18fb98) ffffff02005f4a30 lookuppnatcred+0x15e(ffffff02005f4ab0, 0, 1, 0, ffffff02005f4cc8, 0, ffffff43fa18fb98) ffffff02005f4b30 lookupnameatcred+0xad(fffffd7fffdff3e0, 0, 1, 0, ffffff02005f4cc8, 0, ffffff43fa18fb98) ffffff02005f4b80 lookupnameat+0x39(fffffd7fffdff3e0, 0, 1, 0, ffffff02005f4cc8, 0) ffffff02005f4d30 vn_openat+0x315(fffffd7fffdff3e0, 0, 2883, 3d, ffffff02005f4e40, 0, 8024bc00000012, 0, 8024bc00000005) ffffff02005f4ea0 copen+0x20c(ffd19553, fffffd7fffdff3e0, 2883, 3d) ffffff02005f4ed0 openat+0x2a(ffd19553, fffffd7fffdff3e0, 882, 3d) ffffff02005f4f00 open+0x25(fffffd7fffdff3e0, 882, 3d) ffffff02005f4f10 sys_syscall+0x17a() > ffffff43c9988630::rwlock ADDR OWNER/COUNT FLAGS WAITERS ffffff43c9988630 READERS=1 B011 ffffff43a5193840 (W) || ffffff43105b6760 (R) WRITE_WANTED -------+| HAS_WAITERS --------+
sdev_remove grabbed a reader lock and then called sdev_access which also needs a reader lock on the directory. devname_lookup_func is trying to grab it for writer. Because of the recursive grab for reader we deadlock like a champ! The fix is probably that sdev_remove should not call VOP_ACCESS directly since its access function should almost certainly always be itself (sdev_access).
To confirm that everything does in fact call sdev_acecss I ran the following job over all of the dumps in thoth:
mjob get 254516b7-e3ad-4912-ada9-ef7ecd72cd88 { "id": "254516b7-e3ad-4912-ada9-ef7ecd72cd88", "name": "", "state": "done", "cancelled": false, "inputDone": true, "stats": { "errors": 1, "outputs": 1, "retries": 0, "tasks": 358, "tasksDone": 358 }, "timeCreated": "2013-11-23T01:34:38.712Z", "timeDone": "2013-11-23T01:44:08.771Z", "timeArchiveStarted": "2013-11-23T01:44:09.541Z", "timeArchiveDone": "2013-11-23T01:44:11.137Z", "phases": [ { "exec": "mdb $MANTA_INPUT_FILE -e '::walk sdev_node_cache | ::printf \\"%a\\\\n\\" sdev_node_t sdev_vnode->v_op->vop_access'", "type": "map" }, { "exec": "sort | uniq -c", "type": "reduce" } ], "options": {} } rm@zanarkand ~ $ mget $(mjob outputs 254516b7-e3ad-4912-ada9-ef7ecd72cd88) 1293677 sdev_access
Updated by Robert Mustacchi almost 10 years ago
- Status changed from New to Resolved
Resolved in de442498e34e37fe9b61cfe5beaacdecd06c6a1c.
Actions