Project

General

Profile

Bug #13511

svc.startd should terminate orphaned contracts for wait model services

Added by Joshua M. Clulow about 2 months ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
smf
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

SMF allows a service to specify a duration; i.e., the process model for a service. In the case of the default contract duration, the restarter will generally clean up the process contract as part of restarting the service, thus preventing the accidental accumulation of child processes from previous invocations.

For the child duration (aka the "wait" model), this is not the case. As per svc.startd(1M):

"Wait" model services are restarted whenever the child process associated with the service exits. A child process that exits is not considered an error for "wait" model services, and repeated failures do not lead to a transition to maintenance state.

If the process that was started by SMF exits, it is correctly restarted. Unfortunately, if that process itself created child processes, nothing prevents those from continuing to run in an orphaned contract. Consider a program, trouble.sh, which launches a child process, moretrouble.sh as part of regular operation. If trouble.sh exits after a little while, it will be restarted, but the moretrouble.sh child will still be there:

$ ptree -c $(pgrep -f trouble.sh)
[process contract 1: svc:/system/init:default]
  1      /sbin/init
    [process contract 4: svc:/system/svc/restarter:default]
      9      /lib/svc/bin/svc.startd
        [process contract 417: svc:/trouble:default]
          23418  /bin/bash /var/tmp/SERVICES/trouble.sh
            23419  /bin/bash /var/tmp/SERVICES/moretrouble.sh
              23423  sleep 1
            23421  sleep 1
[process contract 414: svc:/trouble:default]
  23271  /bin/bash /var/tmp/SERVICES/moretrouble.sh
    23413  sleep 1
[process contract 415: svc:/trouble:default]
  23306  /bin/bash /var/tmp/SERVICES/moretrouble.sh
    23415  sleep 1
[process contract 416: svc:/trouble:default]
  23360  /bin/bash /var/tmp/SERVICES/moretrouble.sh
    23417  sleep 1

This is confusing and likely unhelpful in the vast majority of cases. We should clean up the old contract before starting a new one, as we would do for a contract model service.

#1

Updated by Stephan Althaus about 2 months ago

Hello!
I think this is what happened to the service "smartd" in the OI package smartmontools.

i "solved" this with adding the "-n" parameter to the start command,
to prevent the smartd process to fork into background.

BUT reading this i don't know if the service definition is right(TM),
should it be "contract" instead of "child" ?:

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type="manifest" name="smartd">
  <service
     name="system/smartd" 
     type="service" 
     version="1">
    <create_default_instance enabled="false"/>
    <single_instance/>
    <dependency
       name="filesystem-local" 
       grouping="require_all" 
       restart_on="none" 
       type="service">
      <service_fmri value="svc:/system/filesystem/local:default"/>
    </dependency>
    <exec_method
       type="method" 
       name="start" 
       exec="/usr/sbin/smartd -n" 
       timeout_seconds="60">
      <method_context>
        <method_credential user="root" group="root"/>
      </method_context>
    </exec_method>
    <exec_method
       type='method'
       name='stop'
       exec=':kill -QUIT'
       timeout_seconds="60">
    </exec_method>
    <property_group name='startd' type='framework'>
        <propval name='duration' type='astring' value='child' />
    </property_group>
    <stability value="Unstable"/>
    <template>
      <common_name>
        <loctext xml:lang="C">
          SMART monitoring service (smartd)
        </loctext>
      </common_name>
      <documentation>
        <manpage title="smartd" section="1M" manpath="/usr/share/man"/>
      </documentation>
    </template>
  </service>
</service_bundle>
#2

Updated by Joshua M. Clulow about 2 months ago

If smartd (without -n) does the classical UNIX double fork for daemonisation, then yes I expect contract would be the right service model to use.

Also available in: Atom PDF