Project

General

Profile

Bug #4343

NDMP restore did not report error count to the client

Added by Jan Kryl over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
cmd - userland programs
Start date:
2013-11-21
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

Description: On error, the restore adds to an error counter and skips the file but does not abort the restore. The error count is not being reported to the client

Steps to Reproduce:
1) Create a directory (/volumes/general/hcl/test1) on the volume with files.
2) Configure policy and run backup.
3) Verify that the backup completes successfully.
4)Run restores continually until the volume is full (The restore job R1 to R11 successes and can be seen on the restore destination /volumes/ost/backup/restores/)
5) Continue run more restores (R13 to R16), the restore jobs should failed with message note that the volume is full. But, instead, those jobs were claimed successful.

Expected Results:
NDMP should report restore error count to client

Actual Results:
jobs were claimed successful

History

#1

Updated by Jan Kryl over 5 years ago

  • Status changed from New to In Progress

There are two ways how to inform client about an error during restore:

*) Message NDMP_LOG_FILE with error NDMP_RECOVERY_FAILED_IO_ERROR. This message should be generated only for files/directories which were explicitly specified in the list of files to be restored.

*) send NOTIFY DATA HALTED message to client.

NDMP_LOG_FILE cannot be used because for restores without explicit list of files to be restored, NDMP_LOG_FILE message should not be generated by server and thus client wouldn't know that the restore failed.

Solution is to decouple fatal errors and non-fatal errors during restore. Fatal errors (ESPACE and EDQUOTA) cause NDMP server to halt the restore, while other errors which are considered non-fatal continue to be processed in the same way as they were processed till now. The solution proposal looks simple but due to the crooked way how errors are propagated in our ndmpd it involves quite a lot of changes. All functions which create something on the disk during a restore have to be checked for a fatal error and this error has to be propagated from TLM up to NDMP protocol layer, where proper error message can be generated and sent to the client.

With the suggested fix if there's no space on disk to finish restore, proper error message describing root cause of the problem is generated and sent to client and the restore is halted.

#2

Updated by Jan Kryl over 5 years ago

  • Status changed from In Progress to Pending RTI
  • % Done changed from 80 to 100
#3

Updated by Yuri Pankov over 5 years ago

  • Status changed from Pending RTI to Resolved

Resolved in:

commit 2efb3bf9c7f4cf34038896f1431531c93d3f57c2
Author: Jan Kryl <jan.kryl@nexenta.com>
Date:   Tue Dec 3 05:17:17 2013 -0500

    4343 NDMP restore did not report error count to the client
    Reviewed by: Albert Lee <trisk@nexenta.com>
    Approved by: Garrett D'Amore <garrett@damore.org>

Also available in: Atom PDF