Project

General

Profile

Bug #7267

SMF is fast and loose with optional dependencies

Added by Andrew Stormont almost 3 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
-
Start date:
2016-08-03
Due date:
% Done:

100%

Estimated time:
Difficulty:
Hard
Tags:
needs-triage

Description

In terms of res order optional dependencies should behave like normal dependencies (when they're online and the restart_on attribute is set to "restart" or "refresh"). In reality this is not the case. Restarting an optional dependency usually results in the dependent being restarted first and the optional dependency being restarted after. Clearly this is broken.

Part of the problem is that optional_all dependencies are bypassed when marking dependents to do down/checking that they've gone down. It also doesn't help that the dependency code is extremely naive and has almost no handling for instances that are transitioning (GV_TOOFFLINE is handled, but that's about it).

In short, the dependency handling code for optional_all dependencies is so loose that they are nearly always considered satisfied. Because of this the start/restart order is completely unpredictable. The other dependency group types suffer from similar problems though to a much lesser extend (except require_any dependencies, which are always satisfied).


Related issues

Related to illumos gate - Bug #7246: SMF stops dependents in the wrong orderClosed2016-07-31

Actions
Related to illumos gate - Bug #7369: regression after 7267 SMF is fast and loose with optional dependenciesResolved2016-09-10

Actions
Related to illumos gate - Bug #7387: SMF is fast and loose with require_any/service dependenciesResolved2016-09-16

Actions
Related to illumos gate - Bug #4395: "svcadm restart" should be identical to "svcadm disable; svcadm enable"New2013-12-12

Actions
Precedes illumos gate - Bug #8890: ipfilter and nfs/server are locked in death raceNew2016-08-042016-08-04

Actions

History

#1

Updated by Andrew Stormont almost 3 years ago

  • Related to Bug #7246: SMF stops dependents in the wrong order added
#2

Updated by Andrew Stormont almost 3 years ago

The problem is with the optional_all_satisfied() function in graph.c which is too permissive. An optional dependency should only be satisfied when:

- It is enabled and online.
- Disabled and offline.
- In maintenance mode.

Services that are transitioning from online -> offline or online -> offline shouldn't satisfy the dependency. When they have completed the transition the propagate_start() function will bring them online.

#3

Updated by Andrew Stormont almost 3 years ago

It also don't help that the dependency check is completely forgone when marking dependents to go down, and when checking if they've gone down. This all adds up to a completely unpredictable mess.

#4

Updated by Andrew Stormont almost 3 years ago

  • Description updated (diff)
#5

Updated by Andrew Stormont almost 3 years ago

The updated webrev includes a change that will cause dependents to restart when an optional dependency comes online (is enabled). I think this makes optional_all dependencies first class citizens, which is really needed.

#6

Updated by Dan McDonald almost 3 years ago

For folks who need an SMF overview (including from where the above definition of an optional_all dependency comes), see here: [[http://illumos.org/man/5/smf]]

#7

Updated by Andrew Stormont almost 3 years ago

A quick note about testing: It was with dtrace that I observed services starting in the wrong order. With this change applied services start in the correct order now. That is, dependents go down first and then the service being disabled goes offline. When the service going offline has transitioned to disabled state PROPAGATE_SAT is sent to the dependents which brings them back online, again in the correct order because optional_all_satisfied() now works correctly.

#8

Updated by Electric Monk almost 3 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 2633ea2405fe9036261e499e3d436091642da653

commit  2633ea2405fe9036261e499e3d436091642da653
Author: Andrew Stormont <astormont@racktopsystems.com>
Date:   2016-09-09T15:39:41.000Z

    7267 SMF is fast and loose with optional dependencies
    Reviewed by: Dan McDonald <danmcd@omniti.com>
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Albert Lee <trisk@omniti.com>
    Approved by: Dan McDonald <danmcd@omniti.com>

#9

Updated by Andrew Stormont almost 3 years ago

  • Status changed from Closed to In Progress
#10

Updated by Andrew Stormont almost 3 years ago

  • Related to Bug #7369: regression after 7267 SMF is fast and loose with optional dependencies added
#11

Updated by Andrew Stormont almost 3 years ago

  • Related to Bug #7387: SMF is fast and loose with require_any/service dependencies added
#12

Updated by Andrew Stormont almost 3 years ago

Updated webrev which should address issues when using nwam: http://cr.illumos.org/~webrev/andy_js/7267-2/

#13

Updated by Andrew Stormont almost 3 years ago

  • Description updated (diff)
#14

Updated by Andrew Stormont almost 3 years ago

See #7369 for more info on what changes had to be made to make NWAM work properly.

#15

Updated by Andrew Stormont over 2 years ago

We are running the updated version of this change in production and to my knowledge have not run into any issues because of it.

#16

Updated by Electric Monk over 2 years ago

  • Status changed from In Progress to Closed

git commit 3e2079808737e33bb0613ed71ef53a52af0d4c12

commit  3e2079808737e33bb0613ed71ef53a52af0d4c12
Author: Andrew Stormont <astormont@racktopsystems.com>
Date:   2017-03-14T19:17:19.000Z

    7267 SMF is fast and loose with optional dependencies
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Albert Lee <trisk@omniti.com>
    Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
    Approved by: Dan McDonald <danmcd@omniti.com>

#17

Updated by Andrew Stormont over 2 years ago

  • Related to Bug #4395: "svcadm restart" should be identical to "svcadm disable; svcadm enable" added
#18

Updated by Electric Monk about 2 years ago

git commit 4e567863325ae94cad35e6a68cf6ead736eec741

commit  4e567863325ae94cad35e6a68cf6ead736eec741
Author: Robert Mustacchi <rm@joyent.com>
Date:   2016-09-12T14:41:59.000Z

    backout: 7267 SMF is fast and loose with optional dependencies (needs work)

#19

Updated by Andrew Stormont over 1 year ago

  • Precedes Bug #8890: ipfilter and nfs/server are locked in death race added

Also available in: Atom PDF