Project

General

Profile

Actions

Bug #4515

closed

NDMP hangs when restore/backup spans multiple volumes and produces huge ndmplog file

Added by Jan Kryl over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
cmd - userland programs
Start date:
2014-01-22
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
NDMP
Gerrit CR:
External Bug:

Description

The symptoms of this problem are:

  • ndmpd's log file can grow by a couple of gigabytes during a couple of tens of seconds
  • ndmpd hangs during restore/backup waiting for "mover continue" signal from DMA

both problems are related to non-functioning synchronization mechanism between a thread which initiates mover pause (and is waiting for mover continue/abort message) and the thread which processes mover continue/abort message from DMA and signals the waiting thread. The current synchronization algorithm is very complex and works only the first time when mover is paused. Also not all code sections modifying variables involved in synchronization are protected by locks.

This is an attempt to simplify synchronization algorithm, making it more reliable, while solving the two problems above.

Actions #1

Updated by Jan Kryl over 9 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 100

Description of the fix:

  • useless variables were eliminated. Synchronization is done by means of condition variable and mutex
  • codes from all places waiting for mover to continue were merged to a single function ndmp_wait_for_mover(), which does the job correctly
  • all places, where termination variables are set and waiting mover notified by cond_broadcast(), are now protected by mutex, to prevent concurrent changes/reads
  • waiting mover thread grabs mutex, checks termination variables, sleeps on condition variable which releases mutex and when it is woken up it repeats the same steps until any of the termination variables is set.
Actions #2

Updated by Gordon Ross over 9 years ago

  • Status changed from In Progress to Resolved
commit a23888a301b4822208e58d55cccf1b42c8e46cc7
Author: Jan Kryl <jan.kryl@nexenta.com>
Date:   Wed Jan 22 15:41:01 2014 -0500

    4515 NDMP hangs when restore/backup spans multiple volumes and produces huge
    Reviewed by: Albert Lee <trisk@nexenta.com>
    Approved by: Gordon Ross <gwr@nexenta.com>

4       20      usr/src/cmd/ndmpd/ndmp/ndmpd.h
3       33      usr/src/cmd/ndmpd/ndmp/ndmpd_callbacks.c
7       4       usr/src/cmd/ndmpd/ndmp/ndmpd_connect.c
21      120     usr/src/cmd/ndmpd/ndmp/ndmpd_mover.c
4       1       usr/src/cmd/ndmpd/ndmp/ndmpd_tar.c
2       14      usr/src/cmd/ndmpd/ndmp/ndmpd_tar3.c
65      182     usr/src/cmd/ndmpd/ndmp/ndmpd_util.c
Actions

Also available in: Atom PDF