Project

General

Profile

Bug #9425

allow channel programs to be stopped via signals

Added by Brad Lewis over 1 year ago. Updated 7 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2018-03-29
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Problem Statement
ZFS Channel program scripts currently require a timeout, so that hung or long-running scripts return a timeout error instead of causing ZFS to get wedged. This limit can currently be set up to 100 million Lua instructions. Even with a limit in place, it would be desirable to have a sys admin (support engineer) be able to cancel a script that is taking a long time.

Proposed Solution
Make it possible to abort a channel program by sending an interrupt signal.In the underlying txg_wait_sync function, switch the cv_wait to a cv_wait_sig to catch the signal. Once a signal is encountered, the dsl_sync_task function can install a Lua hook that will get called before the Lua interpreter executes a new line of code. The dsl_sync_task can resume with a standard txg_wait_sync call and wait for the txg to complete. Meanwhile, the hook will abort the script and indicate that the channel program was canceled. The kernel returns a EINTR to indicate that the channel program run was canceled.

History

#1

Updated by Electric Monk 7 months ago

  • % Done changed from 0 to 100
  • Status changed from New to Closed

git commit d0cb1fb92629bc0283c88d4719df7285c1612700

commit  d0cb1fb92629bc0283c88d4719df7285c1612700
Author: Don Brady <don.brady@delphix.com>
Date:   2019-02-20T18:05:08.000Z

    9425 allow channel programs to be stopped via signals
    Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
    Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
    Reviewed by: Matt Ahrens <matt@delphix.com>
    Approved by: Robert Mustacchi <rm@joyent.com>

Also available in: Atom PDF