Bug #7697
opensvc.startd aborting in method_ready_contract
90%
Description
We have a bunch of svc.startd cores on the east-3b HN, all with the following stack:
> $C f34aedb8 libc.so.1`_lwp_kill+0x15(d4, 6, 334, fef5c000, fef5c000, 835f940) f34aedd8 libc.so.1`raise+0x2b(6, 0, f34aedf0, fef5f4c0, 0, 0) f34aee28 libc.so.1`abort+0x10e(835f940, 832c748, f34aee58, 0, 835f940, 832c748) f34aee58 method_ready_contract+0x232(835f940, 0, 0, 0) f34aef78 method_run+0x471(f34aefac, 0, f34aefa4, 8355a28, fef5c000, f899a240) f34aefc8 method_thread+0x179(836fdc0, 0, 0, 0) f34aefe8 libc.so.1`_thrp_setup+0x88(f899a240) f34aeff8 libc.so.1`_lwp_start(f899a240, 0, 0, 0, 0, 0)
This looks like we're at the following:
bad_error("ct_pr_tmpl_set_transfer", ret);
Also note that there are a bunch of the following msgs in the svc.startd log around the time of the cores:
Nov 3 13:54:44/176: failed to abandon contract 729828: Permission denied Nov 3 13:59:54/90: failed to abandon contract 728107: Permission denied Nov 3 15:19:42/184: failed to abandon contract 717157: Permission denied Nov 3 15:19:50/189: failed to abandon contract 717162: Permission denied Nov 3 15:19:50/191: failed to abandon contract 717161: Permission denied Nov 3 15:20:13/202: failed to abandon contract 717167: Permission denied Nov 3 15:20:19/206: failed to abandon contract 729844: Permission denied Nov 3 16:35:50/178: failed to abandon contract 770266: No such file or directory Nov 3 16:35:59/187: failed to abandon contract 770273: No such file or directory Nov 3 16:35:59/189: failed to abandon contract 770272: No such file or directory Nov 3 16:36:19/197: failed to abandon contract 770280: No such file or directory Nov 3 16:36:24/201: failed to abandon contract 770301: No such file or directory
Here are the messages from /var/adm/messages from around the time of the first core dump:
Nov 3 13:54:43 headnode svc.ipfd[20418]: [ID 404139 daemon.error] _scf_notify_wait failed: connection to repository broken Nov 3 13:54:44 headnode svc.startd[3915]: [ID 575841 daemon.notice] failed to abandon contract 729828: Permission denied Nov 3 13:55:25 headnode svc.ipfd[20418]: [ID 404139 daemon.error] _scf_notify_wait failed: connection to repository broken Nov 3 13:59:53 headnode last message repeated 6 times
We hit this again and were able to figure out what happened. The / file system filled up and init got into the mode where it tried to restart svc.started. That failed three times fast and init went into its maintenance mode. We removed the file that filled root. init stayed in maintenance until we got on the console and did ^D which got it out of maintenance and then it restarted svc.startd and svc.configd successfully.
> startd_failure_time::array hrtime_t 3 806b7b8 806b7c0 806b7c8 > $C 08047548 libc.so.1`__sigsuspend+7(8047560, 8047570, 8047580, feedd8f4) 08047598 waitproc+0x62(806bbdc, 0, 4, 8058a63) 080475b8 enter_maintenance+0x14f(1, 0, 12c, 806f830, 0, 0) 080475d8 contract_event+0xcf(806b7a4, 1, 493e0, 8058e8a) 08047f18 main+0x494(fee10140, fef7b728, 8047f40, 8054343, 1, 8047f4c) 08047f40 _start+0x83(1, 8047fd0, 0, 0, 7d8, 8047fdb)
It would be good if init would log this condition into /var/adm/messages since that is on a different file system and has plenty of space. Currently it is hard to see that init is in this state.
No data to display