Bug #7267
SMF is fast and loose with optional dependencies
100%
Description
In terms of res order optional dependencies should behave like normal dependencies (when they're online and the restart_on attribute is set to "restart" or "refresh"). In reality this is not the case. Restarting an optional dependency usually results in the dependent being restarted first and the optional dependency being restarted after. Clearly this is broken.
Part of the problem is that optional_all dependencies are bypassed when marking dependents to do down/checking that they've gone down. It also doesn't help that the dependency code is extremely naive and has almost no handling for instances that are transitioning (GV_TOOFFLINE is handled, but that's about it).
In short, the dependency handling code for optional_all dependencies is so loose that they are nearly always considered satisfied. Because of this the start/restart order is completely unpredictable. The other dependency group types suffer from similar problems though to a much lesser extend (except require_any dependencies, which are always satisfied).
Related issues
Updated by Andrew Stormont over 4 years ago
- Related to Bug #7246: SMF stops dependents in the wrong order added
Updated by Andrew Stormont over 4 years ago
The problem is with the optional_all_satisfied() function in graph.c which is too permissive. An optional dependency should only be satisfied when:
- It is enabled and online.
- Disabled and offline.
- In maintenance mode.
Services that are transitioning from online -> offline or online -> offline shouldn't satisfy the dependency. When they have completed the transition the propagate_start() function will bring them online.
Updated by Andrew Stormont over 4 years ago
It also don't help that the dependency check is completely forgone when marking dependents to go down, and when checking if they've gone down. This all adds up to a completely unpredictable mess.
Updated by Andrew Stormont over 4 years ago
- Description updated (diff)
Preliminary webrev: http://cr.illumos.org/~webrev/andy_js/7267
Updated by Andrew Stormont over 4 years ago
The updated webrev includes a change that will cause dependents to restart when an optional dependency comes online (is enabled). I think this makes optional_all dependencies first class citizens, which is really needed.
Updated by Dan McDonald over 4 years ago
For folks who need an SMF overview (including from where the above definition of an optional_all dependency comes), see here: [[http://illumos.org/man/5/smf]]
Updated by Andrew Stormont over 4 years ago
A quick note about testing: It was with dtrace that I observed services starting in the wrong order. With this change applied services start in the correct order now. That is, dependents go down first and then the service being disabled goes offline. When the service going offline has transitioned to disabled state PROPAGATE_SAT is sent to the dependents which brings them back online, again in the correct order because optional_all_satisfied() now works correctly.
Updated by Electric Monk over 4 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit 2633ea2405fe9036261e499e3d436091642da653
commit 2633ea2405fe9036261e499e3d436091642da653 Author: Andrew Stormont <astormont@racktopsystems.com> Date: 2016-09-09T15:39:41.000Z 7267 SMF is fast and loose with optional dependencies Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Reviewed by: Albert Lee <trisk@omniti.com> Approved by: Dan McDonald <danmcd@omniti.com>
Updated by Andrew Stormont over 4 years ago
- Related to Bug #7369: regression after 7267 SMF is fast and loose with optional dependencies added
Updated by Andrew Stormont over 4 years ago
- Related to Bug #7387: SMF is fast and loose with require_any/service dependencies added
Updated by Andrew Stormont over 4 years ago
Updated webrev which should address issues when using nwam: http://cr.illumos.org/~webrev/andy_js/7267-2/
Updated by Andrew Stormont over 4 years ago
See #7369 for more info on what changes had to be made to make NWAM work properly.
Updated by Andrew Stormont almost 4 years ago
We are running the updated version of this change in production and to my knowledge have not run into any issues because of it.
Updated by Electric Monk almost 4 years ago
- Status changed from In Progress to Closed
git commit 3e2079808737e33bb0613ed71ef53a52af0d4c12
commit 3e2079808737e33bb0613ed71ef53a52af0d4c12 Author: Andrew Stormont <astormont@racktopsystems.com> Date: 2017-03-14T19:17:19.000Z 7267 SMF is fast and loose with optional dependencies Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Reviewed by: Albert Lee <trisk@omniti.com> Reviewed by: Gordon Ross <gordon.w.ross@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com>
Updated by Andrew Stormont almost 4 years ago
- Related to Bug #4395: "svcadm restart" should be identical to "svcadm disable; svcadm enable" added
Updated by Electric Monk over 3 years ago
git commit 4e567863325ae94cad35e6a68cf6ead736eec741
commit 4e567863325ae94cad35e6a68cf6ead736eec741 Author: Robert Mustacchi <rm@joyent.com> Date: 2016-09-12T14:41:59.000Z backout: 7267 SMF is fast and loose with optional dependencies (needs work)
Updated by Andrew Stormont about 3 years ago
- Precedes Bug #8890: ipfilter and nfs/server are locked in death race added