Project

General

Profile

Actions

Bug #9063

closed

improve procfs exit handling

Added by Robert Mustacchi almost 4 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
kernel
Start date:
2018-02-06
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

Here is a summary of the relevant behavior between the two different code paths

proc_exit()
523 mutex_enter(&p->p_lock);
572 prlwpexit(t);           /* notify /proc */
586 mutex_exit(&p->p_lock);
                                              <---- XXX
663 mutex_enter(&pidlock);
763 mutex_enter(&p->p_lock);
811 p->p_stat = SZOMB
816 cdir = PTOU(p)->u_cdir;
897 mutex_exit(&p->p_lock);
951 mutex_exit(&pidlock);
958 if (cdir) VN_RELE(cdir);

pr_lookup_procdir()
3687 mutex_enter(&pidlock);
3688 p = prfind(pid)
3702 mutex_enter(&p->p_lock);
3703 mutex_exit(&pidlock);
3742 if (p->p_stat == SZOMB)
3743    pcp->prc_flags |= PRC_DESTROY;
3768 adds new vp to p_plist
3771 mutex_exit(&p->p_lock);

Here is how I think this could happen. Within proc the PRC_DESTROY flag controls how much information is available. Normally that is set for a zombie process. When proc has a vnode for a process, then prlwpexit is used in the exit path to set PRC_DESTROY. However, if a process is exiting with no proc vnode and passes my XXX mark above, then after XXX if another thread comes in to pr_lookup_procdir it will create the proc vnode and not think that the process is exiting because p_stat is not yet set to SZOMB. Once pr_lookup_procdir exits and drops all the locks, then proc_exit can continue past my XXX mark to flag the process as a zombie, but by then we have a proc vnode which does not know the process is a zombie.


I've tested this by writing two test programs. One just loops doing a fork/exit while the parent waits for the child then forks again. The other test program loops continuously reading every /proc/<pid>/cwd. I've used a DTrace script to chill entry to pr_free_watched_pages when called in the proc_exit flow. With DTrace I can confirm that I'm sometimes finding a process via prfind in /proc which has p_stat == SRUN but also has the SEXITING bit set in p_flag. However, I never was able to cause a panic running this on a platform from before my fix. That seems consistent with the fact that we have never hit this in JPC and that this bug report is mostly "theoretical" at this point.

I also ran a bunch of things under strace in lx to verify that our ptrace emulation is still working as expected.

Actions #1

Updated by Electric Monk over 3 years ago

  • Status changed from New to Closed

git commit 5203e56b6b338ebe19cb5433c609f9f5eb7d12b7

commit  5203e56b6b338ebe19cb5433c609f9f5eb7d12b7
Author: Jerry Jelinek <jerry.jelinek@joyent.com>
Date:   2018-08-06T15:56:02.000Z

    9063 improve procfs exit handling
    Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
    Reviewed by: Jason King <jason.king@joyent.com>
    Approved by: Joshua M. Clulow <josh@sysmgr.org>

Actions

Also available in: Atom PDF