Project

General

Profile

Bug #412

Cannot delete file(s) on root zfs filesystem if disk is full

Added by Samuel Younge almost 9 years ago. Updated about 7 years ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2010-11-11
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

When the root zfs filesystem is full, you cannot delete file(s) because of a device full error.
Thus, you cannot fix the out of space problem because you cannot delete any files!

To my knowledge there is only one work-a-round for this, which normal users are not likely to know:
Copy /dev/null over a large file name, freeing up enough space for `rm -rf` to start working.
EXAMPLE:
  1. `cp -f /dev/null /pathToBigFile`

There should be an exception handler in the `rm` command that catches a device full error and invokes,
the above work-a-round logic to resolve the error condition.

History

#1

Updated by Jim Klimov almost 9 years ago

This is often not a problem per se. I've hit the situation a number of times, and it was usually related to snapshots (i.e. Time-slider) referencing the file I'm trying to delete. At least, the described workaround did not work for me. Also there is often not enough space for temporary blocks needed (due to COW) to remove intermediate snapshots.

As a standard solution on my systems, I make a small (like, 100m) volume, i.e. "pool/rsvd" with attributes to disable automatic snapshots. Whenever the full ZFS pool condition hits me, I remove this volume, fix the condition and re-create the volume.

Perhaps this is a more viable solution (to integrate with "zpool create", "zfs destroy snapshot/clone", or maybe even "rm" and other such tools)? At least it follows the Unix FS tradition of "root reserved space" which fought overflows and excess fragmentation... This could be "zfs pool sanity reserved space" now :)

#2

Updated by Garrett D'Amore over 8 years ago

  • Priority changed from High to Normal

If you are not using snapshots, or if the file is unique, then you can indeed remove it. (The reason you can't remove a file that has other references is that various state structures have to be updated, etc.)

So delete a file that isn't being snapshotted. :-)

I'd class this as not-a-bug; I'm going to reduce priority at least for now.

#3

Updated by Garrett D'Amore over 8 years ago

  • Project changed from site to illumos gate
#4

Updated by Samuel Younge over 8 years ago

  • Status changed from New to Feedback

The trouble is if you have snapshot-ed the entire filesystem there is no non-snapshot-ed file to delete.
And you cannot delete any snapshots because of the no space left on device error.

If you are thinking that you can delete a file that has not changed between master image and snapshot (not snapshot-ed), that does not work either, you still get a no space left on device error, no matter what file you try to delete.

I am facing this problem again after completing an onu nightly build/install in a VM that I happened to have some larger files than normal on (tar.bz2 backup files). After onu depleted all the disk space the system dropped to single user mode and entered the no space left on device mode.

The work around I used before that I documented above (cp -f /dev/null /PathToBigFile) did not work either. I have tried a number of work-around fixes, but the only logic that seems to be working is to use the `cp -f /dev/null target` command to delete very small files. After enough of these are deleted rm -rf should start back working.

I believe this is a serious flaw in illumos, and deserves a high priority flag.

#5

Updated by Garrett D'Amore over 8 years ago

The problem is that with copy-on-write, when you delete a file you must create a new copy of the meta data which means a new tree, ultimately. Its not possible to avoid this, but it is possible that we should ultimately be able to have a reserve, and only allow operations which will ultimately remove data once the pool is reduced to nothing more than this reserve.

#6

Updated by Jon Strabala about 7 years ago

  • Difficulty set to Medium
  • Tags set to needs-triage

In the old UFS file system root was able to access some of the space that normal users were not when I file system approached capacity or 100%. Why couldn't there be a ZFS reserve of say 1% of the disk or 1GB (which ever is smaller) in which access could be toggled on/off with a ZFS property.

Also available in: Atom PDF