Coder Social home page Coder Social logo

Pruning old snapshots about zackup HOT 8 OPEN

ngharo avatar ngharo commented on May 29, 2024 1
Pruning old snapshots

from zackup.

Comments (8)

corny avatar corny commented on May 29, 2024

Oh yes, please provide some details.

from zackup.

dmke avatar dmke commented on May 29, 2024

@ngharo, yes, this is a very much missing feature that fell off my TODO list.

I'd like to see the retention policy to be configured per host, with defaults inherited by the global config (pretty much as the rsync and ssh config propagates).

I don't know whether the zackup prune command should have CLI flags to configure the retention on the fly... I believe it should execute the truncation/pruning according to the plan laid out by the global/host config. (I foresee accidental deleting the wrong data when juggling the command line arguments — which is really bad when that data is already the backup...)

When thinking about this feature, I've written down some obstacles somewhere... Let me report back here when I've looked through my notes at work (tomorrow).

from zackup.

ngharo avatar ngharo commented on May 29, 2024

I'd like to see the retention policy to be configured per host, with defaults inherited by the global config (pretty much as the rsync and ssh config propagates).

Agreed! That is something I want to do.

The config I envisioned would look like:

ssh:
   ...
rsync:
   ...
retention:
   yearly: 5
   monthly: 6
   weekly: 4
   daily: 7

Each number describes the number of snapshots to keep at each given interval.

I don't know whether the zackup prune command should have CLI flags to configure the retention on the fly... I believe it should execute the truncation/pruning according to the plan laid out by the global/host config. (I foresee accidental deleting the wrong data when juggling the command line arguments — which is really bad when that data is already the backup...)

Also agree. I think there should be one source of truth, the config. The prune command would be for people not running as a daemon. As a daemon, like BackupPC, it probably would make sense to run the prune automatically when idle, maybe right after backups complete.

Originally, I wanted to port BackupPCs exponential expiration over, but I'm having problems grokking it and I'm fairly new to the golang. Even as a user, I find it a little confusing and not sure if it's worth investing effort into vs a simplified approach where simple "keep" counts are used (again, modeled after borg backup pruning).

from zackup.

dmke avatar dmke commented on May 29, 2024

Each number describes the number of snapshots to keep at each given interval.

Ah, that's a nicer definition than mine: I had envisioned some kind of time bucket list in the form of

retention:
- { interval:  "24h", keep:  7 } # 7 daily backups
- { interval:  "48h", keep: 14 } # 14 bi-daily backups
- { interval:   "7d", keep:  4 } # 4 weekly backups
- { interval:  "30d", keep: 12 } # 12 monthly backups
# for the "rest", either:
- { interval: "360d", keep: ∞ } # keep the rest with one-year gaps
# or
- { interval: "360d", keep: 10 } # 10 yearly backups, delete anything older

where interval is fed into a time.Parse equivalent which interprets 1d as 24h, allowing for arbitrary buckets. Having predefined buckets makes both the configuration and implementation much easier.

Sidenote

This also allows upgrading the definition (should this ever be needed), as your config example can be easily re-modeled as

retention:
- { interval: "365d", keep: 5 } # == yearly: 5
- { interval:  "30d", keep: 6 } # == monthly: 6
- { interval:   "7d", keep: 4 } # == weekly: 4
- { interval:  "24h", keep: 7 } # == daily: 7

it probably would make sense to run the prune automatically when idle, maybe right after backups complete.

I concur. Parallel creating new snapshots and deleting old ones while an rsync is happening sounds like a lot of load for the ZFS ARC, which should be avoided.


Two notes I have found:

  1. How are the retention buckets stacked?

They can either be consecutive (i.e. bucket i+1 starts after bucket i ends), or they all start simultaneously. The latter is easier to implement, but leads (using your config from above), to the phenomenon that the weekly: 4 bucket is actually only 3 weeks long, because the first week is occupied by the daily: 7 bucket. The former leads to shifting each bucket further in time (the yearly: 5 bucket would actually cover a time range of more than 5½ years):

bucket-stacking

(This is just a matter of definition+documentation. There's no right or wrong here.)

  1. How do we handle rotating a snapshot from one bucket to the next?

This is a purely algorithmic problem: matching a list of snapshots (with creation timestamp) to the bucket-list. I've matched a drawing to your configuration (same color scheme as above):

bucket-aging

  • Here, we start with 6 daily backups (a, b, c, d, e and f).
  • 1d later we create backup g. The oldest daily backup (a) is not yet in the weekly bucket.
  • That happens the next day (2d on the y axis), where a "rolls into" the next bucket.
  • At that point the first weekly-bucket is empty, so a stays.
  • On day 3, we create backup i, and b rolls into the first weekly-bucket (which is still occupied by a). So b gets deleted.
  • This continues until day 9, where a rolls into the 2nd weekly-bucket and frees the 1st bucket for h.

I might have overlooked something, but this should also cover the case when backups are created more than once daily (the scale is just smaller).

Rolling from the weekly-bucket into the monthly-bucket applies the same principle.

It should also gracefully handle the case, where a backup is missing (which would be represented as a "hole" in the drawing).

from zackup.

ngharo avatar ngharo commented on May 29, 2024

Wow! Thanks for the feedback.

Let me know what you think of #3 so far. You can see how it wouldn't allow for arbitrary time durations from the user and how all buckets start simultaneously. It's really stupid simple (maybe too simple...). It's a straight port of how borg backup does pruning. I thought it was really clever use of time string formatting.

from zackup.

dmke avatar dmke commented on May 29, 2024

@ngharo, how's it coming? Do you need help?

from zackup.

ngharo avatar ngharo commented on May 29, 2024

Hey @dmke. I haven't had a lot of time to sit down and focus on this. Crazy days we're living in. Hope to get back in to it soon.

Hope you and yours are doing well

from zackup.

dmke avatar dmke commented on May 29, 2024

Crazy days indeed. Don't worry too much about this project, it's not important at all.

from zackup.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.