Project

General

Profile

Actions

Bug #14796

open

svccfg import hang

Added by John Gallagher 3 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
smf
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

I'm seeing what appears to be a hung "svccfg import" process. I'm new to illumos, so this is mostly a dump of information I hope is relevant. Some (I think minor) context: I ran into this during a development cycle where I repeatedly delete and re-import this service (although not in a tight loop; once every 10-15 minutes or longer), usually without rebooting in between. I'm working in a qemu VM. AFAIK this is the first time I've seen this import hang having repeated the same process at least a few dozen times over the past week or two.

Process 1219 is the hung svccfg; its pstack output is

1219:   /usr/sbin/svccfg import -V /opt/oxide/mg-ddm/pkg/ddm/manifest.xml
 fed06f27 nanosleep (8047bf4, 0)
 0806970b lscf_instance_verify (80d8be8, 80dcec0, 812bc68) + 103
 08075651 lscf_bundle_import (80d9f70, 80f1f40, 4) + 458
 0805c438 import_manifest_file (80f1f70, 1, 0, 4) + ac
 0805d510 engine_import (80daf08) + 59e
 080886a1 yyparse  () + 541
 0805c88d engine_exec (80f0648) + 30
 0805c17d main     (fed09377, fed835c8) + f4
 0805bd3b _start_crt (4, 8047f00, fefd001f, 0, 0, 0) + 9a
 0805bc0a _start   (4, 8047fa0, 8047fb1, 8047fb8, 8047fbb, 0) + 1a

pstack output for startd:

9:      /lib/svc/bin/svc.startd
--------------------- thread# 1 / lwp# 1 ---------------------
 feedf357 sigsuspend (8047d7c)
 080783ac main     (0, fef5a5c8) + 22d
 0805b25b _start_crt (1, 8047ec0, fefd001f, 0, 0, 0) + 9a
 0805b12a _start   (1, 8047f6c, 0, 8047f84, 8047fa1, 8047fb9) + 1a
--------------------- thread# 2 / lwp# 2 ---------------------
 feedefc7 ioctl    (3, 63746502, 80ccfa8)
 fed431a7 ct_event_read_critical (3, fecdef9c) + 16
 0805e994 fork_configd_thread (ffffffff) + 181
 feedab69 _thrp_setup (febd0240) + 81
 feedae00 _lwp_start (febd0240, 0, 0, 0, 0, 0)
------- thread# 3 / lwp# 3 [restarter_timeouts_event] --------
 feedae3b lwp_park (0, 0, 0)
 feed4f01 cond_wait_queue (80cbf80, 80cbf68, 0) + 53
 feed5219 cond_wait_common (80cbf80, 80cbf68, 0) + 2af
 feed5593 __cond_wait (80cbf80, 80cbf68) + 69
 feed5650 cond_wait (80cbf80, 80cbf68) + 23
 feed568c pthread_cond_wait (80cbf80, 80cbf68) + 1d
 08073367 restarter_timeouts_event_thread (0) + a7
 feedab69 _thrp_setup (febd0a40) + 81
 feedae00 _lwp_start (febd0a40, 0, 0, 0, 0, 0)
------------ thread# 4 / lwp# 4 [restarter_event] ------------
 feedae3b lwp_park (0, 0, 0)
 feed4f01 cond_wait_queue (80cbfb0, 80cbf98, 0) + 53
 feed5219 cond_wait_common (80cbfb0, 80cbf98, 0) + 2af
 feed5593 __cond_wait (80cbfb0, 80cbf98) + 69
 feed5650 cond_wait (80cbfb0, 80cbf98) + 23
 feed568c pthread_cond_wait (80cbfb0, 80cbf98) + 1d
 08075f79 restarter_event_thread (0) + 89
 feedab69 _thrp_setup (febd1240) + 81
 feedae00 _lwp_start (febd1240, 0, 0, 0, 0, 0)
------- thread# 5 / lwp# 5 [restarter_contracts_event] -------
 feedefc7 ioctl    (6, 63746502, 81b4878)
 fed431a7 ct_event_read_critical (6, fe700fa4) + 16
 08074179 restarter_contracts_event_thread (0) + 281
 feedab69 _thrp_setup (febd1a40) + 81
 feedae00 _lwp_start (febd1a40, 0, 0, 0, 0, 0)
----------------- thread# 6 / lwp# 6 [wait] ------------------
 feedeb87 portfs   (5, 5, fe601e4c, 0, 0, 0)
 08079025 wait_thread (0) + 86
 feedab69 _thrp_setup (febd2240) + 81
 feedae00 _lwp_start (febd2240, 0, 0, 0, 0, 0)
----------------- thread# 7 / lwp# 7 [graph] -----------------
 feedae3b lwp_park (0, 0, 0)
 feed4f01 cond_wait_queue (80cafac, 80caf94, 0) + 53
 feed5219 cond_wait_common (80cafac, 80caf94, 0) + 2af
 feed5593 __cond_wait (80cafac, 80caf94) + 69
 feed5650 cond_wait (80cafac, 80caf94) + 23
 feed568c pthread_cond_wait (80cafac, 80caf94) + 1d
 080693fa graph_thread (0) + 2b7
 feedab69 _thrp_setup (febd2a40) + 81
 feedae00 _lwp_start (febd2a40, 0, 0, 0, 0, 0)
----------- thread# 8 / lwp# 8 [repository_event] ------------
 feedf92a door     (9, fe403b14, 0, 0, 0, 3)
 feecb590 door_call (9, fe403b14) + 7a
 fe9eb181 make_door_call_retfd (9, fe403c90, 8, fe403c98, 2d4, fe403c8c) + 94
 fe9f141f _scf_notify_wait (80c7a28, 80c2d80, 275) + b3
 08068b0c repository_event_thread (0) + 94
 feedab69 _thrp_setup (febd3240) + 81
 feedae00 _lwp_start (febd3240, 0, 0, 0, 0, 0)
-------------- thread# 9 / lwp# 9 [graph_event] --------------
 feedae3b lwp_park (0, 0, 0)
 feed4f01 cond_wait_queue (80caf80, 80caf68, 0) + 53
 feed5219 cond_wait_common (80caf80, 80caf68, 0) + 2af
 feed5593 __cond_wait (80caf80, 80caf68) + 69
 feed5650 cond_wait (80caf80, 80caf68) + 23
 feed568c pthread_cond_wait (80caf80, 80caf68) + 1d
 08063169 graph_event_thread (0) + 4f
 feedab69 _thrp_setup (febd3a40) + 81
 feedae00 _lwp_start (febd3a40, 0, 0, 0, 0, 0)
-------------------- thread# 10 / lwp# 10 --------------------
 feedae3b lwp_park (0, fe205f28, 0)
 feed4f01 cond_wait_queue (fedea818, fedea838, fe205f28) + 53
 feed5219 cond_wait_common (fedea818, fedea838, fe205f28) + 2af
 feed52c0 __cond_timedwait (fedea818, fedea838, fe205fa4) + 51
 feed5387 cond_timedwait (fedea818, fedea838, fe205fa4) + 26
 fedb55b9 umem_update_thread (0) + 20c
 feedab69 _thrp_setup (febd4240) + 81
 feedae00 _lwp_start (febd4240, 0, 0, 0, 0, 0)
-------------------- thread# 73 / lwp# 73 --------------------
 feedfae0 door     (fabfdcc0, 4, 0, fabfde00, f5f00, a)
 fe95774c door_upcall (8163838, fabfdcc0, 140, 0, 0) + 68
 feedfb00 __door_return () + 40
------------------- thread# 262 / lwp# 262 -------------------
 feedfae0 door     (0, 0, 0, f745ce00, f5f00, a)
 feec6e1d door_create_func (0) + 28
 feedab69 _thrp_setup (f877a240) + 81
 feedae00 _lwp_start (f877a240, 0, 0, 0, 0, 0)
------------------- thread# 257 / lwp# 257 -------------------
 feedfae0 door     (f7c54cc0, 4, 0, f7c54e00, f5f00, a)
 fe95774c door_upcall (8163838, f7c54cc0, 140, 0, 0) + 68
 feedfb00 __door_return () + 40
------------------- thread# 261 / lwp# 261 -------------------
 feedfae0 door     (0, 0, 0, f715fe00, f5f00, a)
 feec6e1d door_create_func (0) + 28
 feedab69 _thrp_setup (f877c240) + 81
 feedae00 _lwp_start (f877c240, 0, 0, 0, 0, 0)
------------------- thread# 265 / lwp# 265 -------------------
 feedfae0 door     (f735dcc0, 4, 0, f735de00, f5f00, a)
 fe95774c door_upcall (8163838, f735dcc0, 140, 0, 0) + 68
 feedfb00 __door_return () + 40
------------------- thread# 269 / lwp# 269 -------------------
 feedfae0 door     (0, 0, 0, f765ae00, f5f00, a)
 feec6e1d door_create_func (0) + 28
 feedab69 _thrp_setup (f8779240) + 81
 feedae00 _lwp_start (f8779240, 0, 0, 0, 0, 0)

pstack output for configd:

11:     /lib/svc/bin/svc.configd
--------------------- thread# 1 / lwp# 1 ---------------------
 feedeb27 sigtimedwait (8046df4, 8046cf0, 0)
 feecd0ee sigwait  (8046df4) + 1e
 0805cdef main     (0, fef5a5c8) + 5c7
 080580cb _start_crt (1, 8047ec0, fefd001f, 0, 0, 0) + 9a
 08057f9a _start   (1, 8047f6c, 0, 8047f85, 8047fa2, 8047fba) + 1a
--------------------- thread# 2 / lwp# 2 ---------------------
 feedfaa8 door     (0, 0, 0, 0, 0, 8)
 feec6205 door_unref_func (b) + 3e
 feedab69 _thrp_setup (fed50a40) + 81
 feedae00 _lwp_start (fed50a40, 0, 0, 0, 0, 0)
--------------------- thread# 3 / lwp# 3 ---------------------
 feedfae0 door     (febffd08, 4, 0, febffe00, f5f00, a)
 0805fabc client_switcher (283, febffd70, 84, 0, 0) + 2f3
 feedfb00 __door_return () + 40
--------------------- thread# 4 / lwp# 4 ---------------------
 feedfae0 door     (fe95ec88, 4, 0, fe95ee00, f5f00, a)
 0805fabc client_switcher (4dd, fe95ecf0, 108, 0, 0) + 2f3
 feedfb00 __door_return () + 40
--------------------- thread# 5 / lwp# 5 ---------------------
 feedae3b lwp_park (0, 0, 0)
 feed4f01 cond_wait_queue (80cc488, 8085c30, 0) + 53
 feed55c5 __cond_wait (80cc488, 8085c30) + 9b
 feed5650 cond_wait (80cc488, 8085c30) + 23
 feed568c pthread_cond_wait (80cc488, 8085c30) + 1d
 0806ec24 rc_notify_info_wait (80cc444, 80cc498, fe78ea84, 2d0) + 191
 0805d518 client_wait (80cc408, fe78edf0, 8, fe78ea80, fe78ed8c, 0) + 6d
 0805faaa client_switcher (16, fe78edf0, 8, 0, 0) + 2e1
 feedfb00 __door_return () + 40
--------------------- thread# 6 / lwp# 6 ---------------------
 feedfae0 door     (fe68fce0, c, 0, fe68fe00, f5f00, a)
 0805fabc client_switcher (2a5, fe68fdf0, c, 0, 0) + 2f3
 feedfb00 __door_return () + 40
--------------------- thread# 7 / lwp# 7 ---------------------
 feedfae0 door     (fe590d88, 4, 0, fe590e00, f5f00, a)
 0805fabc client_switcher (19a, fe590df0, 8, 0, 0) + 2f3
 feedfb00 __door_return () + 40
--------------------- thread# 8 / lwp# 8 ---------------------
 feedfae0 door     (fe491ce0, 9, 0, fe491e00, f5f00, a)
 0805fabc client_switcher (16, fe491df0, c, 0, 0) + 2f3
 feedfb00 __door_return () + 40
--------------------- thread# 9 / lwp# 9 ---------------------
 feedfae0 door     (fe392d08, 4, 0, fe392e00, f5f00, a)
 0805fabc client_switcher (4e4, fe392d70, 84, 0, 0) + 2f3
 feedfb00 __door_return () + 40
-------------------- thread# 10 / lwp# 10 --------------------
 feedae3b lwp_park (0, fe293f28, 0)
 feed4f01 cond_wait_queue (fedea818, fedea838, fe293f28) + 53
 feed5219 cond_wait_common (fedea818, fedea838, fe293f28) + 2af
 feed52c0 __cond_timedwait (fedea818, fedea838, fe293fa4) + 51
 feed5387 cond_timedwait (fedea818, fedea838, fe293fa4) + 26
 fedb55b9 umem_update_thread (0) + 20c
 feedab69 _thrp_setup (fed54a40) + 81
 feedae00 _lwp_start (fed54a40, 0, 0, 0, 0, 0)
-------------------- thread# 11 / lwp# 11 --------------------
 feedfae0 door     (fe194d08, 4, 0, fe194e00, f5f00, a)
 0805fabc client_switcher (183, fe194d70, 84, 0, 0) + 2f3
 feedfb00 __door_return () + 40
-------------------- thread# 12 / lwp# 12 --------------------
 feedfae0 door     (fe095d88, 4, 0, fe095e00, f5f00, a)
 0805fabc client_switcher (183, fe095df0, 8, 0, 0) + 2f3
 feedfb00 __door_return () + 40
-------------------- thread# 13 / lwp# 13 --------------------
 feedfae0 door     (fde97d88, 4, 0, fde97e00, f5f00, a)
 0805fabc client_switcher (82, fde97df0, 4, 0, 0) + 2f3
 feedfb00 __door_return () + 40
-------------------- thread# 14 / lwp# 14 --------------------
 feedfae0 door     (fdf96d88, 4, 0, fdf96e00, f5f00, a)
 0805fabc client_switcher (120, fdf96df0, c, 0, 0) + 2f3
 feedfb00 __door_return () + 40
-------------------- thread# 15 / lwp# 15 --------------------
 feedfae0 door     (fdd98d08, 4, 0, fdd98e00, f5f00, a)
 0805fabc client_switcher (4e4, fdd98d70, 84, 0, 0) + 2f3

lscf_instance_verify has two while loops that nanosleep; the first calls scf_instance_get_pg() and the second calls scf_pg_get_property(). This dtrace output indicates that we're in the first:

$ pfexec dtrace -n 'pid$target::scf_*:entry,pid$target::nanosleep:entry{ trace(probefunc); }' -p 1219
dtrace: description 'pid$target::scf_*:entry,pid$target::nanosleep:entry' matched 269 probes
CPU     ID                    FUNCTION:NAME
  6  81033        scf_instance_get_pg:entry   scf_instance_get_pg
  6  80971              scf_set_error:entry   scf_set_error
  6  80970            scf_setup_error:entry   scf_setup_error
  6  80972                  scf_error:entry   scf_error
  6  81238                  nanosleep:entry   nanosleep
  6  81033        scf_instance_get_pg:entry   scf_instance_get_pg
  6  80971              scf_set_error:entry   scf_set_error
  6  80970            scf_setup_error:entry   scf_setup_error
  6  80972                  scf_error:entry   scf_error
  6  81238                  nanosleep:entry   nanosleep
  6  81033        scf_instance_get_pg:entry   scf_instance_get_pg
  6  80971              scf_set_error:entry   scf_set_error
  6  80970            scf_setup_error:entry   scf_setup_error
  6  80972                  scf_error:entry   scf_error
  6  81238                  nanosleep:entry   nanosleep
  ... repeats

and this dtrace output indicates that it's returning SCF_ERROR_NOT_FOUND (1003):

$ pfexec dtrace -n 'pid$target::scf_error:return{ trace(arg1); }' -p 1219
dtrace: description 'pid$target::scf_error:return' matched 1 probe
CPU     ID                    FUNCTION:NAME
  2  81241                 scf_error:return              1003
  2  81241                 scf_error:return              1003
  2  81241                 scf_error:return              1003
  2  81241                 scf_error:return              1003
  2  81241                 scf_error:return              1003
  ... repeats

I'm also attaching core files taken via gcore for svccfg, startd, and configd.


Files

core.9.gz (1.76 MB) core.9.gz startd core John Gallagher, 2022-07-07 06:08 PM
core.11.gz (3.05 MB) core.11.gz configd core John Gallagher, 2022-07-07 06:08 PM
core.1219.gz (2.52 MB) core.1219.gz svccfg core John Gallagher, 2022-07-07 06:08 PM
Actions #1

Updated by Marcel Telka 3 months ago

  • Category set to smf
Actions

Also available in: Atom PDF