Project

General

Profile

Bug #3115

poll(2) returns prematurely in presence of spurious wakeups

Added by Christopher Siden almost 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Category:
-
Start date:
2012-08-22
Due date:
% Done:

90%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

From Matt Ahrens's original bug report at Delphix (the deadman thing he mentions hasn't been pushed to Illumos yet, but the bug still exists):

poll(2) is supposed to wait for the entire time requested, unless a requested
event happens or a signal is delivered. However, in the presence of many
spurious wakeups, it will wait for much less than the time requested. Each
wakeup reduces the amount of time waited by up to 1 tick (10ms).

I found this because ztest_deadman_thread() uses poll(2) to wait for 360
seconds, but poll() returns after less than 60 seconds, causing the deadman to fire and ztest to exit.

Watchpoints cause many spurious wakeups, because many watchpoint actions (e.g. installing a new watchpoint from pr_watch()) call pokelwps(), which forces all threads into the kernel, waking up T_WAKEABLE threads, including the ones in cv_*wait*sig*.

poll() uses cv_timedwait_sig_hires() to wait, which returns the number of ticks remaining to wait. poll_common() calls cv_timedwait_sig_hires() in a loop, each time waiting the amount of time returned by the previous call. However, since the time remaining is returned in ticks, it goes down by at least one tick each call.

Therefore, if there are many spurious wakeups (because we are calling the shit out of pr_watch()), poll() will call cv_timedwait_sig_hires() "ticks" times,
each time waiting for much less than a tick (about 1ms). In our use case this
means that ztest() will fire its deadman prematurely if used with watchpoints.

Introduced by:
1605 System time changes combined with spurios wakeups cause DP_POLL to return prematurely

The fix is to return this code to approximately the original logic, where we
compute the absolute deadline time and wait until then. However, we will use a lbolt-based deadline, rather than wall-clock time. Since lbolt only marches forward, we don't have to worry about the system clock being changed.


Files

devpoll_test.c (9.45 KB) devpoll_test.c Modified test case from '1605 System time changes combined with spurios wakeups cause DP_POLL to return prematurely' to expose this bug. Christopher Siden, 2012-08-28 07:46 PM

Related issues

Related to illumos gate - Bug #3150: poll times out immediately in 32bit kernelsResolved2012-09-05

Actions

History

#2

Updated by Eric Schrock almost 7 years ago

  • Status changed from In Progress to Resolved

changeset: 13790:ac6eff781c67
tag: tip
user: Matthew Ahrens <>
date: Thu Aug 30 05:13:49 2012 -0700

description:
3112 ztest does not honor ZFS_DEBUG
3113 ztest should use watchpoints to protect frozen arc bufs
3114 some leaked nvlists in zfsdev_ioctl
3115 poll(2) returns prematurely in presence of spurious wakeups
Reviewed by: Adam Leventhal <>
Reviewed by: Matt Amdur <>
Reviewed by: George Wilson <>
Reviewed by: Christopher Siden <>
Approved by: Eric Schrock <>

Also available in: Atom PDF