svc.startd should terminate orphaned contracts for wait model services
SMF allows a service to specify a duration; i.e., the process model for a service. In the case of the default contract duration, the restarter will generally clean up the process contract as part of restarting the service, thus preventing the accidental accumulation of child processes from previous invocations.
For the child duration (aka the "wait" model), this is not the case. As per svc.startd(1M):
"Wait" model services are restarted whenever the child process associated with the service exits. A child process that exits is not considered an error for "wait" model services, and repeated failures do not lead to a transition to maintenance state.
If the process that was started by SMF exits, it is correctly restarted. Unfortunately, if that process itself created child processes, nothing prevents those from continuing to run in an orphaned contract. Consider a program,
trouble.sh, which launches a child process,
moretrouble.sh as part of regular operation. If
trouble.sh exits after a little while, it will be restarted, but the
moretrouble.sh child will still be there:
$ ptree -c $(pgrep -f trouble.sh) [process contract 1: svc:/system/init:default] 1 /sbin/init [process contract 4: svc:/system/svc/restarter:default] 9 /lib/svc/bin/svc.startd [process contract 417: svc:/trouble:default] 23418 /bin/bash /var/tmp/SERVICES/trouble.sh 23419 /bin/bash /var/tmp/SERVICES/moretrouble.sh 23423 sleep 1 23421 sleep 1 [process contract 414: svc:/trouble:default] 23271 /bin/bash /var/tmp/SERVICES/moretrouble.sh 23413 sleep 1 [process contract 415: svc:/trouble:default] 23306 /bin/bash /var/tmp/SERVICES/moretrouble.sh 23415 sleep 1 [process contract 416: svc:/trouble:default] 23360 /bin/bash /var/tmp/SERVICES/moretrouble.sh 23417 sleep 1
This is confusing and likely unhelpful in the vast majority of cases. We should clean up the old contract before starting a new one, as we would do for a contract model service.
Updated by Stephan Althaus about 2 months ago
I think this is what happened to the service "smartd" in the OI package smartmontools.
i "solved" this with adding the "-n" parameter to the start command,
to prevent the smartd process to fork into background.
BUT reading this i don't know if the service definition is right(TM),
should it be "contract" instead of "child" ?:
<?xml version="1.0"?> <!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1"> <service_bundle type="manifest" name="smartd"> <service name="system/smartd" type="service" version="1"> <create_default_instance enabled="false"/> <single_instance/> <dependency name="filesystem-local" grouping="require_all" restart_on="none" type="service"> <service_fmri value="svc:/system/filesystem/local:default"/> </dependency> <exec_method type="method" name="start" exec="/usr/sbin/smartd -n" timeout_seconds="60"> <method_context> <method_credential user="root" group="root"/> </method_context> </exec_method> <exec_method type='method' name='stop' exec=':kill -QUIT' timeout_seconds="60"> </exec_method> <property_group name='startd' type='framework'> <propval name='duration' type='astring' value='child' /> </property_group> <stability value="Unstable"/> <template> <common_name> <loctext xml:lang="C"> SMART monitoring service (smartd) </loctext> </common_name> <documentation> <manpage title="smartd" section="1M" manpath="/usr/share/man"/> </documentation> </template> </service> </service_bundle>