Project

General

Profile

Bug #7246

SMF stops dependents in the wrong order

Added by Andrew Stormont about 3 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
-
Start date:
2016-07-31
Due date:
% Done:

0%

Estimated time:
Difficulty:
Hard
Tags:
needs-triage

Description

SMF is supposed to offline services in reverse-order, i.e dependent services (leafs in SMF parlance) go down first followed by the service in question. Due to a bug in the algorithm SMF uses this doesn't actually happen. Instead all services go down at the same time (though it is initiated in the correct order).

The problem is that the offline_subtree_leaves() function relies on GV_TOOFFLINE being set and the mark_subtree function doesn't set this flag on dependency groups, despite what the comment in that function would suggest. The fix is to set this flag on dependency groups too, and not just instances.


Related issues

Related to illumos gate - Bug #7267: SMF is fast and loose with optional dependenciesClosed2016-08-03

Actions

History

#1

Updated by Andrew Stormont about 3 years ago

This was uncovered when I noticed that stopping the network/dns/multicast service left ghosts in the DNS cache. When this service goes down its dependents should go down first so they have a chance to unregister from the network. That's not what happens today.

#2

Updated by Andrew Stormont about 3 years ago

My analysis of the problem is completely wrong. I'll be updating it soon.

#3

Updated by Andrew Stormont about 3 years ago

OK so here's the problem: the algorithm it uses to offline instances is bunk.

When a service needs to go down graph_enable_by_vertex() walks its dependents and sets the GV_TOOFFLINE flag on them. It then walks the dependency graph and attempts to offline any instances with this flag set. Before sending the offline event it calls insubtree_dependents_down() to check that the instance is not being used to satisfy a dependency of another service.

Unfortunately this is where things break down. The insubtree_dependents_down() function skips over instances with GV_TOOFFLINE set, which makes it effectively a meaningless test. What it should be doing is returning B_FALSE for instances with GV_TOOFFLINE set, because they are not offline (that's what the flag is for, to start the transition to offline).

The faulty assumption is that a service that MUST come down is the same as a service that CAN come down. It's all about timing.

#4

Updated by Andrew Stormont about 3 years ago

  • Related to Bug #7264: Example code is rctlblk_set_value(3c) manpage does not compile. added
#5

Updated by Andrew Stormont about 3 years ago

  • Related to deleted (Bug #7264: Example code is rctlblk_set_value(3c) manpage does not compile.)
#6

Updated by Andrew Stormont about 3 years ago

  • Related to Bug #7267: SMF is fast and loose with optional dependencies added
#7

Updated by Andrew Stormont about 3 years ago

  • Subject changed from SMF stops dependencies in the wrong order to SMF stops dependents in the wrong order
#8

Updated by Andrew Stormont about 3 years ago

This will be resolved as part of #7267

#9

Updated by Andrew Stormont about 3 years ago

  • Status changed from In Progress to Feedback
#10

Updated by Andrew Stormont about 3 years ago

  • Status changed from Feedback to Closed

Also available in: Atom PDF