Project

General

Profile

Bug #4395

"svcadm restart" should be identical to "svcadm disable; svcadm enable"

Added by Jim Klimov over 5 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
cmd - userland programs
Start date:
2013-12-12
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

While testing my works on SMF inter-dependencies, I discovered that if I run svcadm restart, then the dependency service (whose restart I requested explicitly) stops first, and only when its state changes to "offline", the dependent services begin to shut down. In the meanwhile, the dependency service begins to start-up, allowing the dependents to restart later.

This is wrong, and is a different order from what correctly happens if I invoke "svcadm disable" which shuts down dependent services first, and the requested service last. If I follow up with "svcadm enable", the requested service fully starts before its dependents begin to start.

The logic in "svcadm restart" should be revised to do the proper thing regarding proper shutdown and startup of dependents (i.e. do "disable/enable").


Files

testdep.sh (1.01 KB) testdep.sh Jim Klimov, 2013-12-15 01:30 PM
testdep.xml (1.11 KB) testdep.xml Jim Klimov, 2013-12-15 01:30 PM
testdep.sh-mod (932 Bytes) testdep.sh-mod Gary Mills, 2013-12-17 03:51 PM
testdep.xml-mod (1.2 KB) testdep.xml-mod Gary Mills, 2013-12-17 03:51 PM

Related issues

Related to illumos gate - Bug #7267: SMF is fast and loose with optional dependenciesClosed2016-08-03

Actions

History

#1

Updated by Gary Mills over 5 years ago

Could you provide us with a simple SMF configuration that illustrates the problem? We need something that will run on any system that uses SMF. It doesn't have to do anything useful.

#2

Updated by Jim Klimov over 5 years ago

I guess the easiest example to review the problem is the one how I stumbled upon it: with SMF services wrapping execution of local zones. This example is good for illustration because changes in service state take non-trivial time to complete (boot/halt the zone), so you won't wonder about the margin of error in measurements. Setup details and copy-pastable snippets of configuration can be found here: http://wiki.openindiana.org/oi/Zones+as+SMF+services

Just make and configure a couple of zones which can boot up to multi-user milestone (i.e. don't hang in sysidcfg questionnaire on the console), or wrap ones you have already, and define a dependency from one zone's wrapper onto another. Then, in the global zone, "svcadm restart" the SMF service which wraps the dependency zone (on which another one depends) and run something like this to monitor the order of state changes:
:; while sleep 1; do svcs -a | grep zone: ; echo ""; done

Also, tail'ing both SMF logs would help in such monitoring, i.e.:
:; gtail -F /var/svc/log/system-zone:*.log

PS: The new implementation should not disrupt services which were enabled temporarily by restarting and doing verbatim "svcadm enable" as the second part. Likewise, the restart action should not leave services permanently disabled (if for example the host powers off abruptly just after a verbatim "svcadm disable"). All in all, these should probably both be temporary disable/enable actions.

#3

Updated by Gary Mills over 5 years ago

We need something much simpler than that. Just a small number of manifests with the necessary dependancies would be best. The methods shouldn't actually do anything. Something like `/usr/bin/sleep 60' might be sufficient. The evidence would all be in the SMF logs.

#4

Updated by Jim Klimov over 5 years ago

Ok, here it is: a manifest which defines two service instances - a "slave" which depends on "master". Both implement the same method script (attached) which just sleeps before/after start/stop requests to illustrate some lengthy activity.

Examples from my box which toggle state of "master":
1) "enable" starts master first, slave second
2) "disable" stops slave first, master second;
3) "restart" stops and starts the master; while it is already starting, the slave just begins to stop then start

  1. gtail -F /var/svc/log/*testdep*log &
  2. svcadm enable master
    [ Dec 15 17:18:40 Enabled. ]
    [ Dec 15 17:18:40 Executing start method ("/root/testdep.sh start"). ]
    Sun Dec 15 17:18:40 MSK 2013: master: Init (sleeping) svc:/testdep:master
    Sun Dec 15 17:19:10 MSK 2013: master: Init completed: svc:/testdep:master
    Sun Dec 15 17:19:10 MSK 2013: master: Launched daemon: svc:/testdep:master: PID: 7400
    Sun Dec 15 17:19:10 MSK 2013: master: Starting daemon loop with lockfile '/tmp/testdep.lock.master'
    [ Dec 15 17:19:13 Method "start" exited with status 0. ]
> /var/svc/log/testdep:slave.log <
[ Dec 15 17:19:13 Executing start method ("/root/testdep.sh start"). ]
Sun Dec 15 17:19:13 MSK 2013: slave: Init (sleeping) svc:/testdep:slave
Sun Dec 15 17:19:43 MSK 2013: slave: Init completed: svc:/testdep:slave
Sun Dec 15 17:19:43 MSK 2013: slave: Launched daemon: svc:/testdep:slave: PID: 7421
Sun Dec 15 17:19:43 MSK 2013: slave: Starting daemon loop with lockfile '/tmp/testdep.lock.slave'
[ Dec 15 17:19:46 Method "start" exited with status 0. ]
  1. svcs p testdep; date; ls -la /tmp/testdep.lock*
    STATE STIME FMRI
    online 17:19:13 svc:/testdep:master
    17:19:10 7400 testdep.sh
    17:19:10 7402 testdep.sh
    17:21:32 7537 sleep
    online 17:19:46 svc:/testdep:slave
    17:19:43 7421 testdep.sh
    17:19:43 7423 testdep.sh
    17:21:32 7538 sleep
    Sun Dec 15 17:21:35 MSK 2013
    -rw-r--r-
    1 root root 0 Dec 15 17:19 /tmp/testdep.lock.master
    rw-r--r- 1 root root 0 Dec 15 17:19 /tmp/testdep.lock.slave
  1. svcadm disable master
    [ Dec 15 17:17:18 Stopping because dependency activity requires stop. ]
    [ Dec 15 17:17:18 Executing stop method ("/root/testdep.sh stop"). ]
    Sun Dec 15 17:17:18 MSK 2013: slave: Stop-Init (sleeping) svc:/testdep:slave
    Sun Dec 15 17:17:26 MSK 2013: slave: Stop-Init completed: svc:/testdep:slave
    Sun Dec 15 17:17:26 MSK 2013: slave: Post-stop sleep
    Sun Dec 15 17:17:27 MSK 2013: slave: Ending daemon loop
    Sun Dec 15 17:17:27 MSK 2013: slave: Exiting svc:/testdep:slave soon...
    Sun Dec 15 17:17:27 MSK 2013: slave: Ending daemon loop
    Sun Dec 15 17:17:27 MSK 2013: slave: Exiting svc:/testdep:slave soon...
    [ Dec 15 17:17:31 Method "stop" exited with status 0. ]
> /var/svc/log/testdep:master.log <
[ Dec 15 17:17:37 Stopping because service disabled. ]
[ Dec 15 17:17:37 Executing stop method ("/root/testdep.sh stop"). ]
Sun Dec 15 17:17:37 MSK 2013: master: Stop-Init (sleeping) svc:/testdep:master
Sun Dec 15 17:17:45 MSK 2013: master: Stop-Init completed: svc:/testdep:master
Sun Dec 15 17:17:45 MSK 2013: master: Post-stop sleep
Sun Dec 15 17:17:45 MSK 2013: master: Ending daemon loop
Sun Dec 15 17:17:45 MSK 2013: master: Exiting svc:/testdep:master soon...
Sun Dec 15 17:17:45 MSK 2013: master: Ending daemon loop
Sun Dec 15 17:17:45 MSK 2013: master: Exiting svc:/testdep:master soon...
[ Dec 15 17:17:50 Method "stop" exited with status 0. ]
  1. svcadm restart master
    > /var/svc/log/testdep:master.log <
    [ Dec 15 17:21:45 Stopping because service restarting. ]
    [ Dec 15 17:21:45 Executing stop method ("/root/testdep.sh stop"). ]
    Sun Dec 15 17:21:45 MSK 2013: master: Stop-Init (sleeping) svc:/testdep:master
    Sun Dec 15 17:21:53 MSK 2013: master: Stop-Init completed: svc:/testdep:master
    Sun Dec 15 17:21:53 MSK 2013: master: Post-stop sleep
    Sun Dec 15 17:21:53 MSK 2013: master: Ending daemon loop
    Sun Dec 15 17:21:53 MSK 2013: master: Exiting svc:/testdep:master soon...
    [ Dec 15 17:21:58 Method "stop" exited with status 0. ]
    [ Dec 15 17:22:04 Executing start method ("/root/testdep.sh start"). ]
    Sun Dec 15 17:22:04 MSK 2013: master: Init (sleeping) svc:/testdep:master
> /var/svc/log/testdep:slave.log <
[ Dec 15 17:22:04 Stopping because dependency activity requires stop. ]
[ Dec 15 17:22:04 Executing stop method ("/root/testdep.sh stop"). ]
Sun Dec 15 17:22:04 MSK 2013: slave: Stop-Init (sleeping) svc:/testdep:slave
Sun Dec 15 17:22:12 MSK 2013: slave: Stop-Init completed: svc:/testdep:slave
Sun Dec 15 17:22:12 MSK 2013: slave: Post-stop sleep
Sun Dec 15 17:22:15 MSK 2013: slave: Ending daemon loop
Sun Dec 15 17:22:15 MSK 2013: slave: Exiting svc:/testdep:slave soon...
[ Dec 15 17:22:17 Method "stop" exited with status 0. ] > /var/svc/log/testdep:master.log <
Sun Dec 15 17:22:34 MSK 2013: master: Init completed: svc:/testdep:master
Sun Dec 15 17:22:34 MSK 2013: master: Launched daemon: svc:/testdep:master: PID: 7603
Sun Dec 15 17:22:34 MSK 2013: master: Starting daemon loop with lockfile '/tmp/testdep.lock.master'
[ Dec 15 17:22:37 Method "start" exited with status 0. ] > /var/svc/log/testdep:slave.log <
[ Dec 15 17:22:37 Executing start method ("/root/testdep.sh start"). ]
Sun Dec 15 17:22:37 MSK 2013: slave: Init (sleeping) svc:/testdep:slave
Sun Dec 15 17:23:07 MSK 2013: slave: Init completed: svc:/testdep:slave
Sun Dec 15 17:23:07 MSK 2013: slave: Launched daemon: svc:/testdep:slave: PID: 7627
Sun Dec 15 17:23:07 MSK 2013: slave: Starting daemon loop with lockfile '/tmp/testdep.lock.slave'
[ Dec 15 17:23:10 Method "start" exited with status 0. ]
#5

Updated by Gary Mills over 5 years ago

I've modified your method script and manifest by simplifying them and perhaps clarifying them. In my tests, I agree with your conclusions.
Here's how it should work: When the slave is dependant on the master, it means that the master is producing a service that is consumed by the slave. To supply this service, the master should always be running when the slave is running. Consequently, on startup, the master must be fully up before the slave comes up. Likewise, on shutdown, the slave must shut down before the master. Only when the slave is fully down can the master shut down. Otherwise, the slave is deprived of a necessary service.
This is done correctly when the master is started, either by `enable' or `restart'. It also works correctly when the master is stopped by `disable'. When the master is stopped by `restart', the master stops while the slave is still running. This behavior is indeed incorrect. Here's the proof:

o States
o 0: Both online
o 1: `restart'
o 2: Master online but changing; Slave online
o 3: Master offline but changing; Slave online but changing
o 4: Both offline; Master changing
o 5: Master online; Slave offline but changing
o 6: Both online
o Logs
o Master: Dec 16 13:00:29 Executing stop method
o Master: Dec 16 13:00:37 Method "stop" exited
o Master: Dec 16 13:00:37 Executing start method
o Slave: Dec 16 13:00:37 Executing stop method
o Slave: Dec 16 13:00:45 Method "stop" exited
o Master: Dec 16 13:01:07 Method "start" exited
o Slave: Dec 16 13:01:07 Executing start method
o Slave: Dec 16 13:01:37 Method "start" exited

#6

Updated by Jim Klimov over 5 years ago

Note: the updated method has a potential typo regarding no slash before /usr/bin/... commands. In fact, should sleep be called as a binary or is it a shell builtin?
Also, the monstrosity I had was due to, I think, initial tests being done with a "transient" service which kept killing my backgrounded "daemons" when the method completed; and switching to "contract" solved this and the loops could be indeed removed. On a side note, the transient services remain "online" even if there are no processes running and there is nothing to kill, so just a lengthy startup and lengthy shutdown routine would suffice for the demo of the problem.
And thanks for confirming it independently :)

#7

Updated by Andrew Stormont over 2 years ago

I believe this problem is at least partially solved by #7267

#8

Updated by Andrew Stormont over 2 years ago

  • Related to Bug #7267: SMF is fast and loose with optional dependencies added
#9

Updated by Peter Tribble over 2 years ago

As far as I can tell the behaviour is correct and as documented.

Fundamentally, restart isn't the same as disable+enable. Specifically, for the purposes of dependency resolution, it's up throughout the restart. It's the restart_on attribute that determines what dependent services do. For the example here the documented behaviour is to restart the slave after restarting the master.

(Note that it would be plain wrong to restart the slave before the master - the restart is for a reason which might be a configuration change, and you absolutely want the slave to use the new version of the master service, not the old one. Same logic for clients that hold persistent connections to servers.)

If you want a behaviour that mimics disable+enable, then the simplest way is to explicitly issue disable+enable. There are other ways in which restart differs from that combination in any case. Using the name "restart" was probably a mistake, but that can't be changed now.

Also available in: Atom PDF