Project

General

Profile

Actions

Bug #13838

open

Bug #13837: Race condition synchronously enabling "svccfg add"-ed instance

Race condition adding services crashes startd

Added by Sean Klein 16 days ago. Updated 16 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

While trying to debug https://www.illumos.org/issues/13837 (which is worth reading, for background), I managed to trigger a (reproducible!) assertion error in startd.

TL;DR:

$ svccfg import <manifest>
$ svccfg -s <service FMRI> add <instance>
$ svcadm enable -t <instance FMRI>

In a tight loop caused an assertion to fail in startd:

"Assertion failed: 0, file restarter.c, line 532", which refers to "usr/src/cmd/svc/startd/restarter.c".

I've attached a shell script, manifest file, and core dump of startd. As a warning: Running these reproducers will crash startd, so make sure you're on a machine where you can access the console to recover.


Files

manifest.xml (836 Bytes) manifest.xml Manifest to repro crash Sean Klein, 2021-05-27 12:39 PM
repro.sh (263 Bytes) repro.sh Script which causes startd to crash Sean Klein, 2021-05-27 12:39 PM
svc.startd.core.tgz (1.68 MB) svc.startd.core.tgz Compressed startd core dump Sean Klein, 2021-05-27 12:39 PM
Actions #1

Updated by Jason King 16 days ago

Looking at the code a bit, it appears that assert is triggered because libscf_get_startd_properties() is returning an unexpected error value (amusingly, there is a bad_error() macro that seems like it'd be appropriate here, but isn't being used).

Looking at libscf_get_startd_properties(), it appears besides 0, ECONNABORTED, ECANCELED, and ENOENT, it can also return an error of ECHILD (usr/src/cmd/svc/startd/libscf.c:2278). That seems like a good candidate for the cause (though I'm not sure offhand how that error should be handled).

Actions

Also available in: Atom PDF