Project

General

Profile

Bug #1013

time-slider fails to take snapshots on complex dataset/snapshot configuration

Added by Florian Manschwetus over 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Desktop (JDS)
Target version:
Start date:
2011-05-10
Due date:
2011-09-14
% Done:

100%

Estimated time:
8.00 h
Difficulty:
Medium
Tags:
time-slider

Description

pool-strukture:
data (freq,hourly,daily,weekly,monthly)
data/Admin (inherit)
data/Projects (inherit)
data/PublicWWW (inherit)
data/Teaching (inherit)
data/backup (inherit, -frequent)
data/backup/images (-all, +stagingdisk-schedule)
data/backup/images/linux (inherit)
data/backup/images/windows (inherit)
data/backup/serviceDaten (inherit)
data/backup/serviceDaten/linux (inherit)
data/backup/serviceDaten/windows (inherit)
data/cvs (no snaps, -all)
data/inc (no snaps, -all)
data/scratch (no snaps, -all)
data/test (no snaps, -all)
data/users (inherit)

Problem occurred when setting e.g. com.sun:auto-snapshot:hourly from false to inherit or true (com.sun:auto-snapshot is false on data, due to explicit schedule selection)

log snippet:

Failed to create snapshots for schedule: hourly
Caught RuntimeError exception in snapshot manager thread
Error details:
--------BEGIN ERROR MESSAGE--------
['/usr/bin/pfexec', '/usr/sbin/zfs', 'snapshot', '-r', 'data/backup/serviceDaten@zfs-auto-snap_hourly-2011-05-05-09h13'] failed with exit code 1
cannot create snapshot 'data/backup/serviceDaten/windows@zfs-auto-snap_hourly-2011-05-05-09h13': dataset already exists
no snapshots were created

--------END ERROR MESSAGE--------
Snapshot monitor thread exited.
[ Mai 5 09:13:06 Stopping because all processes in service exited. ]

History

#1

Updated by Florian Manschwetus over 8 years ago

Ok, I found the point of interest. Seems like a failure in /usr/share/time-slider/lib/time_slider/zfs.py - method: create_auto_snapshot_set

#2

Updated by Florian Manschwetus over 8 years ago

Ok, got even closer it is in the for loop over the recursive array (lines 137 to 152)

#3

Updated by Florian Manschwetus over 8 years ago

ok added some code that tells me what recursive snapshots are done, here the new error output:
... (some other) ...
Recursive Snap of data/backup/serviceDaten/windows
Recursive Snap of data/backup/serviceDaten/linux
Recursive Snap of data/backup/serviceDaten
Failed to create snapshots for schedule: frequent
Caught RuntimeError exception in snapshot manager thread
Error details:
--------BEGIN ERROR MESSAGE--------
['/usr/bin/pfexec', '/usr/sbin/zfs', 'snapshot', '-r', 'data/backup/serviceDaten@zfs-auto-snap_frequent-2011-05-10-17h18'] failed with exit code 1
cannot create snapshot 'data/backup/serviceDaten/windows@zfs-auto-snap_frequent-2011-05-10-17h18': dataset already exists
no snapshots were created

--------END ERROR MESSAGE--------

#4

Updated by Florian Manschwetus over 8 years ago

ok got it. I added code to get some contents around the expected hit in recursive. It looks like that the code that should check if the parent is already marked for recursive snapshotting doesn't work in this case.

for datasetname in recursive:
parts = datasetname.rsplit('/', 1)
parent = parts[0]
if parent == datasetname: # Root filesystem of the Zpool, so # this can't be inherited and must be # set locally.
finalrecursive.append(datasetname)
continue
idx = bisect_right(recursive, parent)
if len(recursive) > 0 and \
recursive[idx-1] == parent: # Parent already marked for recursive snapshot: so skip
continue
else: # here should be parent
if len(recursive) > idx-1:
outp = recursive[idx-1]
if len(recursive) > idx:
outp = outp + ", " + recursive[idx]
if len(recursive) > idx+1:
outp = outp + ", " + recursive[idx+1]
util.debug("Adding final Rec " + datasetname + ", parent: " + parent + "; around idx: " + outp, True)
finalrecursive.append(datasetname)
#5

Updated by Florian Manschwetus over 8 years ago

Ok, fixed

Ok assuming included is in correct order (as it should due to directly created from zfs list output), then the comment itself includes the monstrous logical mistake!
When iterating over a sorted list, prepending would result in a reverse alphabetic order! So just append here and all is fine.

--- old (simplified) ---
for datasetname in included: # We want recursive list sorted in alphabetical order # so insert instead of append to the list.
recursive.insert(0, datasetname)
--- old ---

--- fixed (simplified) ---
for datasetname in included: # We want recursive list sorted in alphabetical order # so just append to the list.
recursive.append(datasetname)
--- fixed ---

#6

Updated by Julian Wiesener over 8 years ago

  • Category changed from OS/Net (Kernel and Userland) to Desktop (JDS)
  • Status changed from New to In Progress
  • Assignee set to OI JDS
  • Priority changed from Urgent to Normal
  • Tags changed from needs-triage to time-slider

time_slider/zfs.py is part of JDS, please report to upstream also

#8

Updated by Guido Berhörster over 8 years ago

  • Target version changed from oi_148 to oi_151
#9

Updated by Albert Lee over 8 years ago

One-line fix looks correct to me.

#10

Updated by Guido Berhörster over 8 years ago

  • Target version changed from oi_151 to oi_151_stable
  • % Done changed from 0 to 90

This will be fixed with one of the next rebuilds before the stable release.

#11

Updated by Ken Mays about 8 years ago

  • Due date set to 2011-09-14
  • Status changed from In Progress to Resolved
  • % Done changed from 90 to 100
  • Estimated time set to 8.00 h

Resolved in ,5.11-0.151.1:20110912T022507Z

#12

Updated by Chris Anders almost 8 years ago

Im sorry but this wasn't tested :(

your change broke recursive and made its way into solaris 11 release...

for me i have a zpool layout of:

tank1 com.sun:auto-snapshot true local
tank1/installkits com.sun:auto-snapshot true inherited from tank1
tank1/profadel com.sun:auto-snapshot true inherited from tank1
tank1/profadel/fs1 com.sun:auto-snapshot true inherited from tank1
tank1/profadel/fs1/admin_only com.sun:auto-snapshot true inherited from tank1
tank1/profadel/fs1/common com.sun:auto-snapshot true inherited from tank1
tank1/profadel/fs1/profiles com.sun:auto-snapshot true inherited from tank1
tank1/profadel/fs1/programs com.sun:auto-snapshot true inherited from tank1
tank1/profadel/fs1/users com.sun:auto-snapshot true inherited from tank1
tank1/tmp com.sun:auto-snapshot true inherited from tank1
tank1/vmware1 com.sun:auto-snapshot true inherited from tank1
tank1/vmware2 com.sun:auto-snapshot true inherited from tank1
tank1/vmware3 com.sun:auto-snapshot true inherited from tank1

With your change the time-slider executes the following:

ZFS CMD: ['/usr/bin/pfexec', '/usr/sbin/zfs', 'snapshot', '-r', 'tank1/prof/fs1/users@zfs-auto-snap_hourly-2011-11-12-23h16']
ZFS CMD: ['/usr/bin/pfexec', '/usr/sbin/zfs', 'snapshot', '-r', 'tank1/prof/fs1/programs@zfs-auto-snap_hourly-2011-11-12-23h16']
ZFS CMD: ['/usr/bin/pfexec', '/usr/sbin/zfs', 'snapshot', '-r', 'tank1/profl/fs1/profiles@zfs-auto-snap_hourly-2011-11-12-23h16']
ZFS CMD: ['/usr/bin/pfexec', '/usr/sbin/zfs', 'snapshot', '-r', 'tank1/profl/fs1/common@zfs-auto-snap_hourly-2011-11-12-23h16']
ZFS CMD: ['/usr/bin/pfexec', '/usr/sbin/zfs', 'snapshot', '-r', 'tank1/prof/fs1/admin_only@zfs-auto-snap_hourly-2011-11-12-23h16']
ZFS CMD: ['/usr/bin/pfexec', '/usr/sbin/zfs', 'snapshot', '-r', 'tank1/profl/fs1@zfs-auto-snap_hourly-2011-11-12-23h16']
ZFS CMD: ['/usr/bin/pfexec', '/usr/sbin/zfs', 'snapshot', '-r', 'tank1@zfs-auto-snap_hourly-2011-11-12-23h16']

2 problems with this.

1) its trying to run snapshot -r for each dataset instead of at the tank1 parent - the order / logic is now wrong
2) with the above list i get snapshots for:
tank1/prof/fs1/admin_only
tank1/prof/fs1/common
tank1/prof/fs1/profiles
tank1/prof/fs1/programs
tank1/prof/fs1/users
and nothing else.

time-slider tries to run zfs snapshot -r tank1/profl/fs1@zfs-auto-snap_hourly-2011-11-12-23h16 but zfs reports:
cannot create snapshot 'tank1/prof/fs1/admin_only@zfs-auto-snap_hourly-2011-11-12-23h16' : dataset already exists
no snapshots were created

long story short after digging through the code i changed the line back to:
recursive.insert(0, datasetname)

and now when time-slider goes to snapshot the only cmd it executes is:
ZFS CMD: ['/usr/bin/pfexec', '/usr/sbin/zfs', 'snapshot', '-r', 'tank1@zfs-auto-snap_frequent-2011-11-13-00h30']

looks like the logic of how the lists
everything = []
included = []
excluded = []
single = []
recursive = []
finalrecursive = []
are populated needs to re-thought out...

#13

Updated by Chris Anders almost 8 years ago

for consistency the above layout was an old copy and paste and should be

tank1 com.sun:auto-snapshot true local
tank1/installkits com.sun:auto-snapshot true inherited from tank1
tank1/prof com.sun:auto-snapshot true inherited from tank1
tank1/prof/fs1 com.sun:auto-snapshot true inherited from tank1
tank1/prof/fs1/admin_only com.sun:auto-snapshot true inherited from tank1
tank1/prof/fs1/common com.sun:auto-snapshot true inherited from tank1
tank1/prof/fs1/profiles com.sun:auto-snapshot true inherited from tank1
tank1/prof/fs1/programs com.sun:auto-snapshot true inherited from tank1
tank1/prof/fs1/users com.sun:auto-snapshot true inherited from tank1
tank1/tmp com.sun:auto-snapshot true inherited from tank1
tank1/vmware1 com.sun:auto-snapshot true inherited from tank1
tank1/vmware2 com.sun:auto-snapshot true inherited from tank1
tank1/vmware3 com.sun:auto-snapshot true inherited from tank1

#14

Updated by Florian Manschwetus almost 8 years ago

Looks like you are right. I think I missed after hours digging this ugly code (sorry but continue is quite too often used instead of else), that we interate included what was already sorted in reverse. Currently I reread the code, but I think there must be a lot re-thought.

Chris I think you are welcome.
If you have a wiki access, we could start a brainstorming about what should time-slider be able todo and how. I think OI is really allowed to find the way the user likes this to be done.

#15

Updated by Johan Postema over 7 years ago

I ran against this issue on oi151a with

pkg:,5.11-0.151.1:20110912T022507Z

It seems the code that checks if the parent of a dataset is already in the recursive list in order to determine whether it needs to be skipped is wrong.

I am not sure if there is already a fix that addresses this issue, but I made a small change to zfs.py in order to get time-slider working properly on my system:

/usr/share/time-slider/lib/time_slider/zfs.py:

-            if len(recursive) > 0 and \
-               recursive[idx-1] == parent:
-                # Parent already marked for recursive snapshot: so skip
-                continue
-            else:
-                finalrecursive.append(datasetname)

+            if set(recursive).intersection([parent]):
+                # Parent already marked for recursive snapshot: so skip
+                continue
+            else:
+                single.append(datasetname)
#16

Updated by Johan Postema over 7 years ago

There is a small bug in the above fix where

single.append(datasetname)

should be

finalrecursive.append(datasetname)

However I stripped the original code a little bit more:

-        for datasetname in recursive:
-            parts = datasetname.rsplit('/', 1)
-            parent = parts[0]
-            if parent == datasetname:
-                # Root filesystem of the Zpool, so
-                # this can't be inherited and must be
-                # set locally.
-                finalrecursive.append(datasetname)
-                continue
-            idx = bisect_right(recursive, parent)
-            if len(recursive) > 0 and \
-               recursive[idx-1] == parent:
-                # Parent already marked for recursive snapshot: so skip
-                continue
-            else:
-                finalrecursive.append(datasetname)
+        for datasetname in recursive:
+            parts = datasetname.rsplit('/', 1)
+            parent = parts[0]
+            if parent == datasetname:
+                # Root filesystem of the Zpool, so
+                # this can't be inherited and must be
+                # set locally.
+                finalrecursive.append(datasetname)
+            elif not set(recursive).intersection([parent]):
+                finalrecursive.append(datasetname)

When using time-slider with the zfs-send plugin you might also be interested in: https://illumos.org/issues/2005

#17

Updated by Johan Postema over 7 years ago

jpkoopmann reported an issue which is similar to the issue above:

Issue:
zfs property "com.sun:auto-snapshot(:*)=false" is ignored, causing time-slider to snapshot datasets/volumes that should be skipped instead.

Cause:
The piece of code that build the "single" and "recursive" lists fails to exclude these datasets because it assumes a sorted list.

Fix:
This could either be fixed to create a sorted "excluded" list, but in my opinion this is error prone an can lead easially to issues like these. I prefer to search the excluded list instead to figure out whether a dataset should be added to the single or to the recursive list.

Diff:

/usr/share/time-slider/lib/time_slider/zfs.py

127c127,128
<                 if set(excluded).intersection([child]):
---
>                 idx = bisect_left(excluded, child)
>                 if idx < len(excluded) and excluded[idx] == child:

Also available in: Atom PDF