Project

General

Profile

Bug #7808

hipster shutdown hang

Added by Carsten Grzemba about 4 years ago. Updated about 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
2017-01-26
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

If I try shutdown my system it remains online or hung. The last message on console:

Jan 26 11:24:38 gnom syslogd: going down on signal 15

I start SSH session and keeps them open while shutdown.
On SSH session I see all SMF services are stoped

# svcs
...
disabled       10:28:15 svc:/system/identity:domain
disabled       10:28:15 svc:/network/physical:default
disabled       10:28:15 svc:/network/loopback:default
disabled       10:28:15 svc:/network/ip-interface-management:default
disabled       10:28:15 svc:/system/sysevent:default
disabled       10:28:15 svc:/network/ipsec/policy:default
disabled       10:28:15 svc:/system/filesystem/minimal:default
disabled       10:28:15 svc:/network/ipsec/ipsecalgs:default
disabled       10:28:15 svc:/system/device/local:default
disabled       10:28:15 svc:/system/cryptosvc:default
disabled       10:28:15 svc:/system/filesystem/usr:default
disabled       10:28:15 svc:/network/datalink-management:default
disabled       10:28:15 svc:/system/boot-archive:default
disabled       10:28:15 svc:/system/scheduler:default
disabled       10:28:15 svc:/system/filesystem/root:default
online         19:56:37 svc:/system/svc/restarter:default

ps shows the last reamining processes:

# ps -ef
     UID   PID  PPID   C    STIME TTY         TIME CMD
    root     0     0   0 19:56:30 ?           0:01 sched
    root     4     0   0 19:56:24 ?           0:00 kcfpoold
    root     6     0   0 19:56:25 ?           3:09 zpool-rpool
    root     1     0   0 19:56:34 ?           0:00 /sbin/init
    root     2     0   0 19:56:34 ?           0:00 pageout
    root     3     0   0 19:56:34 ?           1:52 fsflush
    root    10     1   0 19:56:37 ?           0:02 /lib/svc/bin/svc.startd
    root    12     1   0 19:56:37 ?           0:08 /lib/svc/bin/svc.configd
    root   282     1   0 19:56:56 ?           0:00 /usr/lib/power/powerd
    root   841     0   0 19:57:12 ?           0:00 zpool-iomega
    root  6192  4891   0 10:28:44 pts/1       0:00 ps -ef
    root  4877     1   0 08:10:51 ?           0:00 /usr/lib/ssh/sshd -R
    root  4890  4881   0 08:11:07 pts/1       0:00 sudo -i
 grzemba  4880  4877   0 08:10:59 ?           0:01 /usr/lib/ssh/sshd -R
    root  4891  4890   0 08:11:09 pts/1       0:00 -bash
 grzemba  4881  4880   0 08:10:59 pts/1       0:00 -ksh

but shutdown will not go ahead.
there are no NFS shares or mounts

if I hit

# halt

then it proceeds to stop the OS

We do init "forget" to shutdown the system?

#1

Updated by Carsten Grzemba about 4 years ago

do not know if it helps

mdb -p 10
> ::startd_log
filesystem/usr:default.
Jan 26 13:47:37/9: svc:/system/device/local:default removed from subtree
Jan 26 13:47:37/9: Propagating stop of svc:/system/device/local:default.
Jan 26 13:47:37/8: Marking in-subtree vertices before disabling svc:/system/filesystem/usr:default.
Jan 26 13:47:37/8: Disabling svc:/system/filesystem/usr:default.
Jan 26 13:47:37/8: Graph engine: Reloading general properties for svc:/network/datalink-management:default.
Jan 26 13:47:37/667: svc:/system/filesystem/usr:default: Instance stopping because service disabled.
Jan 26 13:47:37/8: Marking in-subtree vertices before disabling svc:/network/datalink-management:default.
Jan 26 13:47:37/8: Disabling svc:/network/datalink-management:default.
Jan 26 13:47:37/668: svc:/network/datalink-management:default: Instance stopping because service disabled.
Jan 26 13:47:38/9: Graph noting svc:/system/filesystem/usr:default online -> online.
Jan 26 13:47:38/9: Disabling svc:/system/filesystem/usr:default.
Jan 26 13:47:38/669: method_thread() running stop method for svc:/system/filesystem/usr:default.
Jan 26 13:47:38/9: Graph noting svc:/network/datalink-management:default online -> online.
Jan 26 13:47:38/9: Disabling svc:/network/datalink-management:default.
Jan 26 13:47:38/669: svc:/system/filesystem/usr:default: null method succeeds
Jan 26 13:47:38/9: Graph noting svc:/system/filesystem/usr:default online -> disabled.
Jan 26 13:47:38/670: method_thread() running stop method for svc:/network/datalink-management:default.
Jan 26 13:47:38/8: Graph engine: Reloading general properties for svc:/system/boot-archive:default.
Jan 26 13:47:38/670: svc:/network/datalink-management:default: :killing contract with signal 15
Jan 26 13:47:38/9: svc:/system/filesystem/usr:default removed from subtree
Jan 26 13:47:38/9: Propagating stop of svc:/system/filesystem/usr:default.
Jan 26 13:47:38/5: Received event 1 for ctid 8 cookie 32
Jan 26 13:47:38/8: Marking in-subtree vertices before disabling svc:/system/boot-archive:default.
Jan 26 13:47:38/8: Disabling svc:/system/boot-archive:default.
Jan 26 13:47:38/8: Graph engine: Reloading general properties for svc:/system/scheduler:default.
Jan 26 13:47:38/8: Marking in-subtree vertices before disabling svc:/system/scheduler:default.
Jan 26 13:47:38/8: Disabling svc:/system/scheduler:default.
Jan 26 13:47:38/670: Removing primary contract 8 for svc:/network/datalink-management:default.
Jan 26 13:47:38/9: Graph noting svc:/network/datalink-management:default online -> disabled.
Jan 26 13:47:38/9: svc:/network/datalink-management:default removed from subtree
Jan 26 13:47:38/9: Propagating stop of svc:/network/datalink-management:default.
Jan 26 13:47:38/5: Received event 1 for unknown contract id 8
Jan 26 13:47:38/671: svc:/system/boot-archive:default: Instance stopping because service disabled.
Jan 26 13:47:38/672: svc:/system/scheduler:default: Instance stopping because service disabled.
Jan 26 13:47:38/9: Graph noting svc:/system/boot-archive:default online -> online.
Jan 26 13:47:38/9: Disabling svc:/system/boot-archive:default.
Jan 26 13:47:38/673: method_thread() running stop method for svc:/system/boot-archive:default.
Jan 26 13:47:38/9: Graph noting svc:/system/scheduler:default online -> online.
Jan 26 13:47:38/9: Disabling svc:/system/scheduler:default.
Jan 26 13:47:38/673: svc:/system/boot-archive:default: null method succeeds
Jan 26 13:47:38/9: Graph noting svc:/system/boot-archive:default online -> disabled.
Jan 26 13:47:38/9: svc:/system/boot-archive:default removed from subtree
Jan 26 13:47:38/9: Propagating stop of svc:/system/boot-archive:default.
Jan 26 13:47:38/674: method_thread() running stop method for svc:/system/scheduler:default.
Jan 26 13:47:38/674: svc:/system/scheduler:default: null method succeeds
Jan 26 13:47:38/9: Graph noting svc:/system/scheduler:default online -> disabled.
Jan 26 13:47:38/9: svc:/system/scheduler:default removed from subtree
Jan 26 13:47:38/9: Propagating stop of svc:/system/scheduler:default.
Jan 26 13:47:38/8: Graph engine: Reloading general properties for svc:/system/filesystem/root:default.
Jan 26 13:47:38/8: Marking in-subtree vertices before disabling svc:/system/filesystem/root:default.
Jan 26 13:47:38/8: Disabling svc:/system/filesystem/root:default.
Jan 26 13:47:38/675: svc:/system/filesystem/root:default: Instance stopping because service disabled.
Jan 26 13:47:38/676: method_thread() running stop method for svc:/system/filesystem/root:default.
Jan 26 13:47:38/9: Graph noting svc:/system/filesystem/root:default online -> online.
Jan 26 13:47:38/9: Disabling svc:/system/filesystem/root:default.
Jan 26 13:47:38/676: svc:/system/filesystem/root:default: null method succeeds
Jan 26 13:47:38/9: Graph noting svc:/system/filesystem/root:default online -> disabled.
#2

Updated by Carsten Grzemba about 4 years ago

in a system core dump I could reconstruct the user stack of svc.startd

thread 9 hung in sync

do_uadmin:

  if (getzoneid() == 0)
      sync();
  kill_user_procs();

#3

Updated by Carsten Grzemba about 4 years ago

It is related to the ISCSI Zpool. Has Illumos implemented something like
Solaris 11 Enhancement Request 15821207 ?

#4

Updated by Alexander Pyhalov about 4 years ago

Can you elaborate on what is 'Solaris 11 Enhancement Request 1582120'? Google finds nothing.

#5

Updated by Carsten Grzemba about 4 years ago

Enhancement Request 15821207 is resolved with S11.1 SRU 17.5. This now allows the umount(1) call on ZFS filesystems to return an error rather than hang to exhibit similar behaviour as currently exists for UFS filesystems that cannot be unmounted at shutdown.

The other side:
If the system goes down when issuing init 6, the network interfaces should not be brought down (by network manager or other SMF) if there are any active iSCSI sessions.

Also available in: Atom PDF