Project

General

Profile

Actions

Bug #12069

closed

Backport sh_delay() and tvsleep() from ksh-2020.0.0

Added by Marcel Telka almost 4 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
lib - userland libraries
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

We faces few cases when /usr/bin/sleep slept forever. Here is truss output to show what happened:

18962:  sigaction(SIGALRM, 0x080468E0, 0x08046960)      = 0
18962:      new: hand = 0xFEEDC869 mask = 0xFFBFFEFF 0xFFFFFFFF 0x000001FF 0 flags = 0x0000
18962:      old: hand = 0xFEEDC869 mask = 0xFFBFFEFF 0xFFFFFFFF 0x000001FF 0 flags = 0x0000
18962:  setitimer(ITIMER_REAL, 0x080469A0, 0x08046990)  = 0
18962:           value:  interval:    0.000000 sec  value:  399.000000 sec
18962:          ovalue:  interval:    0.000000 sec  value:    0.000000 sec
18962:  pause()                         (sleeping...)
18962:      Received signal #14, SIGALRM, in pause() [caught]
18962:  pause()                                         Err#4 EINTR
18962:  lwp_sigmask(SIG_SETMASK, 0x00002000, 0x00000000, 0x00000000, 0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]
18962:  alarm(0)                                        = 0
18962:  lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000, 0x00000000, 0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]
18962:  sigaction(SIGALRM, 0x08046610, 0x08046690)      = 0
18962:      new: hand = 0xFEEDC869 mask = 0xFFBFFEFF 0xFFFFFFFF 0x000001FF 0 flags = 0x0000
18962:      old: hand = 0xFEEDC869 mask = 0xFFBFFEFF 0xFFFFFFFF 0x000001FF 0 flags = 0x0000
18962:  setitimer(ITIMER_REAL, 0x080466D0, 0x080466C0)  = 0
18962:           value:  interval:    0.000000 sec  value:    0.001263 sec
18962:          ovalue:  interval:    0.000000 sec  value:    0.000000 sec
18962:      Received signal #14, SIGALRM [caught]
18962:  lwp_sigmask(SIG_SETMASK, 0x00002000, 0x00000000, 0x00000000, 0x00000000) = 0xFFBFFEFF [0xFFFFFFFF]
18962:  alarm(0)                                        = 0
18962:  setcontext(0x080461E0)
18962:  setcontext(0x08046540)
18962:  getpid()                                        = 18962 [18961]
18962:  pause()                         (sleeping...)

Obviously, the problem is the second setitimer() call that sets the timeout to very short value and the SIGALARM comes almost immediately. The signal is then cleared by the subsequent alarm(0) call.

The root cause for this problem could be seen in the sleep() function in usr/src/lib/libshell/common/bltins/sleep.c (called from sh_delay() for delays longer than 30 seconds). First, the sh_timeradd() is called to set the alarm (line 133), later pause() is called to wait for the signal (line 137). In a case the signal arrived before the timeout the do-while loop (lines 134-152) calls the pause() again and it is expected that the sigalrm() handler configures new alarm by calling setalarm() in the meantime (in background). This obviously happens, but in a case the 2nd scheduled alarm interval is too short (in our case 0.001263 sec) the actual signal could arrive even before we called pause() 2nd time. This causes the pause() to hang basically indefinitely (or until some random signal arrives, which could never happen).


Related issues

Related to illumos gate - Bug #6520: ksh: sleep could wait foreverClosed2015-12-18

Actions
Related to illumos gate - Feature #11876: Want a native sleepClosedRobert Mustacchi

Actions
Actions

Also available in: Atom PDF