Project

General

Profile

Bug #535

tail -f uses up 100% of CPU core

Added by David Pacheco over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
cmd - userland programs
Start date:
2010-12-17
Due date:
% Done:

50%

Estimated time:
Difficulty:
Tags:
Gerrit CR:

Description

When using "tail -f" to follow a regular file, the tail process ends up consuming 100% of CPU even when the underlying file isn't changing.

Looking at the implementation in usr/src/cmd/tail, we can see why:

376 switch (action) {
377 case USE_PORT:
378 ts.tv_sec = 1;
379 ts.tv_nsec = 0;
380 /*
381 * In the F case we set a timeout to ensure that
382 * we re-stat the file at least once every second.
383 /
384 n = port_get(port, &ev, Fflag? &ts : NULL);
385 if (n == 0) { 386 file = (file_info_t *)ev.portev_user; 387 (void) port_associate(port, PORT_SOURCE_FD, 388 fileno(file
>fp), POLLIN, (void)file); 389 }
390 break;
391
392 case USE_SLEEP:
393 (void) usleep(250000);
394 break;
395 }

I believe this chunk of code is supposed to wait for the file to be updated. Traditionally, "tail -f" would just wait a second, but this implementation appears to try to use event ports to wait until there's actually an update on the file. Unfortunately, it's waiting on POLLIN, an event which only indicates that one can read from the file descriptor without blocking, not that there's actually data available. For regular files, this will always be true, so you'll always get an event immediately. As a result, even though tail has read up to the end of the file, rather than wait for it to be updated, it calls port_get, gets an event immediately (because the fd can currently be read without blocking) reads zero bytes, and loops like this forever.

If the code were changed to use an event that actually indicated that a file had changed (as may be available with PORT_SOURCE_FILE), the code as structured has another bug: because it reassociates before reading the data and the port events are usually level-triggered and not edge-triggered, the checked state would still be asserted so you'd immediately cause another event to be fired even though no new data was written. When you read the data and come around again, you'll get that event and check the fd again even though there's no more data. I don't know if the PORT_SOURCE_FILE events are edge- or level-triggered.

The simplest solution is to scrap the use of event ports here, which may have the added benefit of buffering updates so that lots of tiny updates in a short period don't cause tail to thrash.

History

#1

Updated by Garrett D'Amore over 9 years ago

  • Project changed from site to illumos gate
#2

Updated by Chris Love over 9 years ago

  • Status changed from New to In Progress
  • Assignee set to Chris Love
  • % Done changed from 0 to 50

Webrev is here: http://cr.opensolaris.org/~cjlove/il_535_tail/

As suggested, the use of port_create(), port_get(), port_associate() was replaced by defaulting to use usleep(). PORT_SOURCE_FILE could have been used with event ports, but usleep() would still be required if following a fifo.

#3

Updated by Garrett D'Amore over 9 years ago

The variable "action", and the switch statement at 363 are superfluous, as the action is always USE_SLEEP.

Please clean those last bits up, then submit a new webrev.

#5

Updated by Garrett D'Amore over 9 years ago

Looks good... please submit an RTI for this. If you can't get to this, let me know and I'll have someone else integrate your changes instead.

#6

Updated by Garrett D'Amore over 9 years ago

  • Category set to cmd - userland programs
  • Status changed from In Progress to Resolved

Fixed in:

changeset: 13271:5aca6ad7a5d9
tag: tip
user: Chris Love <>
date: Thu Jan 20 21:42:44 2011 -0800
description:
535 tail -f uses up 100% of CPU core
Reviewed by:
Reviewed by:
Reviewed by:
Approved by:

Also available in: Atom PDF