Project

General

Profile

Bug #7369

regression after 7267 SMF is fast and loose with optional dependencies

Added by Toomas Soome almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Category:
cmd - userland programs
Start date:
2016-09-10
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Updated the test VM to latest illumos-gate and after boot, the system is having many services in stoppped state:

root@grub:/# svcs -vx
svc:/milestone/name-services:default (name services milestone)
State: offline since September 10, 2016 01:10:07 PM CEST
Reason: Unknown.
See: http://illumos.org/msg/SMF-8000-AR
Impact: 17 dependent services are not running:
svc:/milestone/multi-user:default
svc:/system/boot-config:default

as the network is also not configured I did:
  1. svcadm disable physical:nwam
    root@grub:/root# svcadm enable physical:nwam
    root@grub:/root# Assertion failed: v->gv_state RESTARTER_STATE_DEGRADED || v->gv_state RESTARTER_STATE_ONLINE, file graph.c, line 891
    Sep 10 13:19:36 svc.startd963: restarting after interruption
    Sep 10 13:19:38 grub in.routed669: route 0.0.0.0/8 --> 0.0.0.0 nexthop is not directly connected
    Sep 10 13:19:42 grub nwamd890: 1: nwamd_down_interface: ipadm_delete_addr failed on e1000g0: Could not communicate with dhcpagent

So I did activate the previous BE and rebooted and everything is back normal.


Related issues

Related to illumos gate - Bug #7267: SMF is fast and loose with optional dependenciesClosed2016-08-03

Actions

History

#1

Updated by Andrew Stormont almost 3 years ago

It seems like this might be down to a bug in the exclude_all dependency handling code. I'm looking into it. Chances are this will also require a small tweak to the nwam manifest file too, since it's incorrect.

#2

Updated by Andrew Stormont almost 3 years ago

  • Subject changed from regression after 7267 SMF is fast and loose with optional dependencies to SMF is fast and loose with exclude dependencies
#3

Updated by Andrew Stormont almost 3 years ago

  • Related to Bug #7267: SMF is fast and loose with optional dependencies added
#4

Updated by Andrew Stormont almost 3 years ago

Toomas do you have network/physical:default enabled? Offline would count as enabled.

#5

Updated by Toomas Soome almost 3 years ago

Andrew Stormont wrote:

Toomas do you have network/physical:default enabled? Offline would count as enabled.

I found the issue with default OI config, pysical:default disabled, physical:nwam + dhcp. But while I was testing the problem, I did disable nwam and enable default + did configure ipadm create-addr -T dhcp, it still had problems (the same message about connection with dhcpagent), however, it did behave better in sense that depending services were online, just network interfaces were not configured.

#6

Updated by Andrew Stormont almost 3 years ago

It seems there are a number of issues at play here:

1. SMF does not inhibit services that are being excluded from starting. This makes any attempts to switch between network/physical:nwam and network/physical:default prone to issues if you don't disable the other service first.

2. There's a bug in offline_subtree_leaves that prevents services in offline or maintenance mode from being disabled (unless they've been marked by GV_TODISABLE, which can only happen if "svcadm disable" was done on them while they were in "offline" or "degraded" state).

3. The network/physical:nwam service introduces a cyclic optional_all dependency (name-services -> network-physical:nwam -> name-services). This is why svc:/milestone/name-services:default is wedged in offline state. The resolution code does not have any awareness of this type of "unsatisfiable"-ness.

#7

Updated by Andrew Stormont almost 3 years ago

This webrev attempts to fix the first two issues. The last one will be tackled as part of #7267

http://cr.illumos.org/~webrev/andy_js/7369/

#8

Updated by Andrew Stormont almost 3 years ago

  • Subject changed from SMF is fast and loose with exclude dependencies to regression after 7267 SMF is fast and loose with optional dependencies
#9

Updated by Andrew Stormont almost 3 years ago

It turns out there are a bunch of more problems with the dependency handling code that weren't showing up before. I think it has something to do with all the enabling/disabling that nwam does in it's method file.

#10

Updated by Andrew Stormont almost 3 years ago

This updated version of 7267 should solve your issues: http://cr.illumos.org/~webrev/andy_js/7267-2/

#11

Updated by Andrew Stormont almost 3 years ago

Just a little bit more info on what went wrong here:

The mark_subtree function sets GV_TOFFLINE on offline instances. This causes the offline_subtree_leaves function to attempt to offline them (running foul of the assert in vertex_send_event) and the propagate functions to overlook them when their dependencies are satisfied (causing them to be stuck in offline state).

The fix was to stop marking offline services with GV_TOOFFLINE and make the dependency code (even) more robust so instances don't start when their dependents are in transitioning.

#12

Updated by Andrew Stormont over 2 years ago

  • Status changed from New to Resolved

The last attempt to solve #7267 fixes these issues.

Also available in: Atom PDF