Project

General

Profile

Bug #1894

ps goes walkabout through /dev in search of foreign ttys

Added by Joshua M. Clulow almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
cmd - userland programs
Start date:
2011-12-16
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

Analysis here: https://gist.github.com/1484969

Suggested fix, thus far, is to tell ps not to look for ttys in a zone that isn't the current zone. If we leave the current exhaustive search behaviour for same-zone searches then crazily-pathed ttys will still be found should they exist, and in the general case ptys found in the priority search paths listed in /etc/ttysrch will be found quickly.


Files

OS-719 (1.49 KB) OS-719 Robert Mustacchi, 2011-12-16 08:42 AM

History

#1

Updated by Robert Mustacchi almost 9 years ago

We hit this internally at Joyent simply by having a large number of zones. What follows is from the internal analysis and attached is our commit to fix this from Bryan.

When run from the global zone, ps performance is generally terrible – and degrades with the number of zones. For example, to run "ps -e" on a system with ~190 zones takes nearly four minutes! The problem is that ps is finding processes in the non-global zones and is attempting to locate names for their TTYs via ttyname(3C) – but these device nodes do not exist in the global zone. A failing ttyname() is excruciating – it essentially reduces to a find in /dev – and when you multiply these failing ttyname() calls by the number of zones in the system, performance quickly becomes deeply pathological. The fix here is very simple: to not even bother to try to lookup a TTY when the zone of the process doesn't match that of the ps.

On a system with 192 zones, here's stock ps:

[root@00-1e-4f-42-fb-1a (us-east-1) /var/tmp]# time ps -e > /dev/null

real    3m44.926s
user    0m0.321s
sys     3m16.811s

And running with the fix:

[root@00-1e-4f-42-fb-1a (us-east-1) /var/tmp]# time ./ps -e > /dev/null

real    0m0.111s
user    0m0.018s
sys     0m0.093s

Yes, that's a win of 2000X...

#2

Updated by Rich Lowe almost 9 years ago

Joshua, I'd actually (for other reasons), be very interested in which parts of the /dev walk suck the most (perhaps based on the top one or two path elements?). What I missed when looking for possibly expensive dynamic portions of /dev, is the potential for upcalls from the /dev fs.

Robert, do you guys want to get this integrated, or should I try to do it over the weekend?
If you tell me who reviewed it on your end, I could probably just submit an RTI with me as another reviewer (I looked, and am happy).

#3

Updated by Robert Mustacchi almost 9 years ago

Rich, if you wouldn't mind integrating it over the weekend, that'd be appreciated. That'll be sooner than we can get around to it. As for reviewers you can add in:

Reviewed by: Jerry Jelinek <>
Reviewed by: Robert Mustacchi <>

#4

Updated by Rich Lowe almost 9 years ago

  • Category set to cmd - userland programs
  • Status changed from New to Resolved
  • Assignee set to Rich Lowe
  • % Done changed from 0 to 100

Resolved in r13547 commit:7d365d7822b1

Also available in: Atom PDF