Bug #4244
closedspawn fails occasionally with EAGAIN
100%
Description
We have seen cases – to date, quite rare – whereby a spawn in the global zone fails with EAGAIN, throwing the following exception:
{ "errno": "EAGAIN", "syscall": "spawn", "message": "spawn EAGAIN", "code": "EAGAIN" }
Unfortunately, fork() maps many different error conditions to EAGAIN; we will need a D script in production to explore this.
The D script was modified to home in on the getproc() failure:
#!/usr/sbin/dtrace -s #pragma D option quiet cfork:entry /curthread->t_procp->p_zone->zone_id == 0/ { self->follow = 1; } getproc:return, as_dup:return, secpolicy_basic_fork:return, contract_ctor:return /self->follow && arg1 != 0/ { printf("%d: pid: %d (\\"%s\\") p_flag: 0x%x %s: %d\\n", walltimestamp / 1000000000, pid, curpsinfo->pr_psargs, curthread->t_procp->p_flag, probefunc, arg1); } cfork:return /self->follow && curthread->t_lwp->lwp_errno/ { printf("%d: pid: %d (\\"%s\\") p_flag: 0x%x %s: %d\\n", walltimestamp / 1000000000, pid, curpsinfo->pr_psargs, curthread->t_procp->p_flag, probefunc, arg1); } contract_process_fork:return, contract_process_create:return, page_mem_avail:return, forklwp:return /self->follow && arg1 == 0/ { printf("%d: pid: %d (\\"%s\\") p_flag: 0x%x %s: %d\\n", walltimestamp / 1000000000, pid, curpsinfo->pr_psargs, curthread->t_procp->p_flag, probefunc, arg1); } page_mem_avail:entry /self->follow/ { self->mem = 1; } delay_sig:entry /self->mem/ { printf("%d pid: %d page_mem_avail freemem: %d desfree: %d\\n", walltimestamp / 1000000000, pid, `freemem, `desfree); } kmem_reap:entry /self->mem/ { printf("%d pid: %d page_mem_avail availrmem: %d swapfs_reserve: %d\\n", walltimestamp / 1000000000, pid, `availrmem, `swapfs_reserve); } page_mem_avail:return /self->follow/ { self->mem = 0; } cfork:return /self->follow/ { self->follow = 0; }
Running this in the JPC confirmed that page_mem_avail is indeed failing; we saw sequences like this:
1381592797 pid: 12443 page_mem_avail freemem: 98282 desfree: 196564 1381592798 pid: 12443 page_mem_avail freemem: 98306 desfree: 196564 1381592799 pid: 12443 page_mem_avail freemem: 98281 desfree: 196564 1381592800 pid: 12443 page_mem_avail freemem: 98284 desfree: 196564 1381592801 pid: 12443 page_mem_avail freemem: 98282 desfree: 196564 1381592802 pid: 12443 page_mem_avail freemem: 98282 desfree: 196564 1381592804 pid: 12443 page_mem_avail freemem: 98284 desfree: 196564 1381592805 pid: 12443 page_mem_avail freemem: 98284 desfree: 196564 1381592806 pid: 12443 page_mem_avail freemem: 98284 desfree: 196564 1381592807 pid: 12443 page_mem_avail freemem: 98282 desfree: 196564 1381592808 pid: 12443 page_mem_avail freemem: 98292 desfree: 196564 1381592809 pid: 12443 page_mem_avail freemem: 98272 desfree: 196564 1381592810 pid: 12443 page_mem_avail freemem: 98282 desfree: 196564 1381592811 pid: 12443 page_mem_avail freemem: 98282 desfree: 196564 1381592812 pid: 12443 page_mem_avail freemem: 98284 desfree: 196564 1381592813 pid: 12443 page_mem_avail freemem: 98282 desfree: 196564 1381592814 pid: 12443 page_mem_avail freemem: 98265 desfree: 196564 1381592815 pid: 12443 page_mem_avail freemem: 98286 desfree: 196564 1381592816 pid: 12443 page_mem_avail freemem: 98282 desfree: 196564 1381592817: pid: 12443 ("/opt/smartdc/agents/modules/.npm/provisioner-v2/active/package/local/bin/node /") p_flag: 0x4a014000 page_mem_avail: 0 1381592817: pid: 12443 ("/opt/smartdc/agents/modules/.npm/provisioner-v2/active/package/local/bin/node /") p_flag: 0x4a014000 getproc: 4294967295 1381592817: pid: 12443 ("/opt/smartdc/agents/modules/.npm/provisioner-v2/active/package/local/bin/node /") p_flag: 0x4a014000 cfork: 11
In this case, freemem has been driven below desfree by a large KVM provision, which is in turn consuming memory as it becomes available. Memory, therefore, will not necessarily get appreciably above desfree – and (after the absurdly bogus max_count value of 20 seconds), getproc() fails resulting in EAGAIN from fork(). The whole check in page_mem_avail() seems entirely stupid; system calls do not (and should not) throttle themselves and/or fail based on their perception of free memory. And indeed, a fork() doesn't even allocate much memory (DTrace reveals as little as tens of kilobytes in the case of small processes – and perhaps on the order of megabytes for large ones). The whole idea of page_mem_avail() seems dubious – and amazingly it does not date back to SunOS 4.x, but was rather added in Solaris 10 under the bug 4872125.
That this was added to allow a test suite to complete and not because of a customer is telling: this change may allow test suites to succeed, but it induces serious failure in actual systems. Worse, the system that was failing was a V120 (maximum of 4GB of DRAM) running a DEBUG kernel! This check is entirely ill advised and the solution is to rip it out entirely. (And indeed, it has been confirmed that fork()'ing and exec()'ing a process once per second under memory pressure results in much better behavior with the check entirely eliminated.)