Bug #13091
openprocess contract escaped SMF
0%
Description
I originally saw this problem with OmniOS's "ntp" SMF service when I configured it to wait for sync at startup. I've been able to reproduce this with a separate SMF service that's very simple. Here's the manifest:
<?xml version="1.0"?> <!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1"> <!-- Manifest automatically generated by smfgen. --> <service_bundle type="manifest" name="application-bug-demo" > <service name="application/bug-demo" type="service" version="1" > <create_default_instance enabled="true" /> <dependency name="dep0" grouping="require_all" restart_on="error" type="service" > <service_fmri value="svc:/milestone/multi-user:default" /> </dependency> <exec_method type="method" name="start" exec="/var/tmp/smfbug/start.sh" timeout_seconds="600" /> <exec_method type="method" name="stop" exec=":kill" timeout_seconds="30" /> <template > <common_name > <loctext xml:lang="C" >bug demo</loctext> </common_name> </template> </service> </service_bundle>
and here's the start method:
#!/bin/bash # # start method that forks a process that sleeps for a while so that the caller # can try to kill it. # set -o errexit set -o xtrace . /lib/svc/share/smf_include.sh sleep 800 & if ! sleep 300 ; then exit $SMF_EXIT_ERR_CONFIG fi
The key pieces of the start method are:
- it forks off "sleep" in the background for a pretty long time (indefinitely, as far as we care here)
- it executes a separate "sleep 300", which is long enough for us to kill it by hand, which will trigger the bug
- it exits with SMF_EXIT_ERR_CONFIG if this second sleep command fails
I've got both of these files in "/var/tmp/smfbug":
dap@lennier:/var/tmp/smfbug$ ls -l total 18 -rw-r--r-- 1 dap other 901 Sep 1 20:49 bug-demo.xml -rwxr-xr-x 1 dap other 247 Sep 1 20:51 start.sh dap@lennier:/var/tmp/smfbug$
The service doesn't exist yet:
dap@lennier:/var/tmp/smfbug$ svcs bug-demo svcs: Pattern 'bug-demo' doesn't match any instances STATE STIME FMRI dap@lennier:/var/tmp/smfbug$
Let's import it and see what's running:
dap@lennier:/var/tmp/smfbug$ svccfg import bug-demo.xml dap@lennier:/var/tmp/smfbug$ svcs -p bug-demo STATE STIME FMRI offline* 20:51:29 svc:/application/bug-demo:default 20:51:29 3384 start.sh 20:51:29 3385 sleep 20:51:29 3386 sleep dap@lennier:/var/tmp/smfbug$ ps -opid,ctid,args -p "3384 3385 3386" PID CTID COMMAND 3384 132 /bin/bash /var/tmp/smfbug/start.sh 3385 132 sleep 800 3386 132 sleep 300 dap@lennier:/var/tmp/smfbug$ ctstat -i 132 -v CTID ZONEID TYPE STATE HOLDER EVENTS QTIME NTIME 132 0 process owned 9 0 - - cookie: 0x20 informative event set: none critical event set: core signal hwerr empty fatal event set: none parameter set: inherit regent member processes: 3384 3385 3386 inherited contracts: none service fmri: svc:/application/bug-demo:default service fmri ctid: 132 creator: svc.startd aux: start
That's all good. Now, let's kill the shorter-running "sleep" process. We expect the start method to exit with $SMF_ERR_EXIT_CONFIG, which will send the service into maintenance:
dap@lennier:/var/tmp/smfbug$ pfexec kill 3386 dap@lennier:/var/tmp/smfbug$ svcs bug-demo STATE STIME FMRI maintenance 20:52:07 svc:/application/bug-demo:default
The SMF log contains:
[ Sep 1 20:51:29 Executing start method ("/var/tmp/smfbug/start.sh"). ] + . /lib/svc/share/smf_include.sh ++ SMF_EXIT_OK=0 ++ SMF_EXIT_NODAEMON=94 ++ SMF_EXIT_ERR_FATAL=95 ++ SMF_EXIT_ERR_CONFIG=96 ++ SMF_EXIT_MON_DEGRADE=97 ++ SMF_EXIT_MON_OFFLINE=98 ++ SMF_EXIT_ERR_NOSMF=99 ++ SMF_EXIT_ERR_PERM=100 + sleep 300 + sleep 800 Terminated + exit 96 [ Sep 1 20:52:07 Method "start" exited with status 96. ]
That's all good. But the long-running "sleep" process is still running, and the contract is still around!
dap@lennier:/var/tmp/smfbug$ ps -opid,args -p 3385 PID COMMAND 3385 sleep 800 dap@lennier:/var/tmp/smfbug$ ctstat -i 132 -v CTID ZONEID TYPE STATE HOLDER EVENTS QTIME NTIME 132 0 process orphan - 0 - - cookie: 0x20 informative event set: none critical event set: core signal hwerr empty fatal event set: none parameter set: inherit regent member processes: 3385 inherited contracts: none service fmri: svc:/application/bug-demo:default service fmri ctid: 132 creator: svc.startd aux: start dap@lennier:/var/tmp/smfbug$
This can cause all kinds of problems for services that expect only one instance is running (e.g., if they expect to be able to bind to a particular TCP port).
Updated by David Pacheco almost 2 years ago
The only relevant lines in startd's log file:
Sep 1 20:52:07/252 ERROR: svc:/application/bug-demo:default: Method "/var/tmp/smfbug/start.sh" failed with exit status 96. Sep 1 20:52:07/252: application/bug-demo:default misconfigured: transitioned to maintenance (see 'svcs -xv' for details)