Project

General

Profile

Actions

Bug #5419

closed

hung mount in one zone shouldn't interfere with zone boot/halt of another zone

Added by Robert Mustacchi almost 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
zones
Start date:
2014-12-08
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

From a dump from 79FY9R1 we can see zoneid 24 trying to shutdown and stuck in the EMPTY state.
Debugging the zone we see the zone's zoneadmd has the following stack:

stack pointer for thread ffffff19ce8da020: ffffff00bc0cdc80
[ ffffff00bc0cdc80 _resume_from_idle+0xf1() ]
  ffffff00bc0cdcb0 swtch+0x145()
  ffffff00bc0cdd10 cv_wait_sig+0x14d()
  ffffff00bc0cdd40 block_mounts+0x3d()
  ffffff00bc0cdd70 zone_shutdown+0x44()
  ffffff00bc0cde70 zone+0x11b()
  ffffff00bc0cdf00 dtrace_systrace_syscall+0x11a()
  ffffff00bc0cdf10 sys_syscall+0x17a()

The zone shutdown is waiting for domount to finish. There is only one thread in domount on this box. It is the automounter running in zoneid 13 which is wedged:

stack pointer for thread ffffff19e6986460: ffffff00bd901030
[ ffffff00bd901030 _resume_from_idle+0xf1() ]
  ffffff00bd901060 swtch+0x145()
  ffffff00bd9010f0 cv_timedwait_sig_hires+0x1e9()
  ffffff00bd901120 cv_reltimedwait_sig+0x53()
  ffffff00bd901190 waitforack+0x9c()
  ffffff00bd901220 connmgr_connect+0x12c()
  ffffff00bd901320 connmgr_wrapconnect+0x143()
  ffffff00bd901480 connmgr_get+0x414()
  ffffff00bd9014d0 connmgr_wrapget+0x5d()
  ffffff00bd901630 clnt_cots_kcallit+0x230()
  ffffff00bd9017a0 rfscall+0x521()
  ffffff00bd901890 rfs3call+0xb2()
  ffffff00bd901ae0 nfs3rootvp+0x56f()
  ffffff00bd901be0 nfs3_mount+0x8a6()
  ffffff00bd901c10 fsop_mount+0x21()
  ffffff00bd901d70 domount+0xafc()
  ffffff00bd901df0 mount+0x121()
  ffffff00bd901e30 syscall_ap+0x8c()
  ffffff00bd901ec0 dtrace_systrace_syscall32+0x11a()
  ffffff00bd901f10 _sys_sysenter_post_swapgs+0x149()

R  11812  11808  11808  11808      0 0x42000000 ffffff19d060d088 automountd
        T  0xffffff19e6986ba0 <TS_SLEEP>
        T  0xffffff19e6986800 <TS_SLEEP>
        T  0xffffff19e6986460 <TS_SLEEP>
        T  0xffffff19c9465880 <TS_SLEEP>
        T  0xffffff19ce921400 <TS_SLEEP>

::ps -z shows this is in zone 13
R  11812  11808  11808  11808    13      0 0x42000000 ffffff19d060d088 automount

The logic for the mount_in_progress/block_mounts handshake seems overly restrictive. Instead of tracking this with a single global counter, we should be able to do this on a per-zone basis.

Actions #1

Updated by Electric Monk almost 7 years ago

  • Status changed from New to Closed

git commit 5fd5c68917c02ae3dd752fac0ce9f7a10d02e94f

commit  5fd5c68917c02ae3dd752fac0ce9f7a10d02e94f
Author: Jerry Jelinek <jerry.jelinek@joyent.com>
Date:   2014-12-12T16:23:21.000Z

    5419 hung mount in one zone shouldn't interfere with zone boot/halt of another zone
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Approved by: Dan McDonald <danmcd@omniti.com>

Actions

Also available in: Atom PDF