Coder Social home page Coder Social logo

feluxe / very-hungry-pi Goto Github PK

View Code? Open in Web Editor NEW
16.0 3.0 1.0 39.36 MB

Turn your Raspberry Pi into an independent backup module for your network.

License: GNU Affero General Public License v3.0

Python 97.62% Shell 2.38%
raspberry-pi backup snapshot rsync raspberrypi python automated network rsync-backups

very-hungry-pi's Introduction

Very Hungry Pi

slideshow

News

Version 3 released.

I'm happy to announce Version 3 of vhpi.

  • There are NO breaking Changes in the config.
  • You can now add remote backup sources (e.g. [email protected]:/home/user), so you don't have to use NFS any more.
  • vhpi now runs in a long running process, so there is no need to configure cronjobs for it anymore.
  • The rsync_options: "..." config parser was a little fragile. It's solid now.

See CHANGELOG.md for more information.

Version 2 (beta) released.

I'm happy to announce Version 2 of vhpi. It's an entire rewrite. There is a vhpi package now on pypi and a simple command line interface to run vhpi more conveniently. There are some minor breaking Changes in the config. The most important thing to notice, if you upgrade from v1 to v2, is that the snapshot directories have a new naming convention. monthly.1 would now be 2017-10-11__02:07:03__monthly.1. The timestamp tells you when the backup was finished. If you want to use your current snapshots with v2, you should adjust their names accordingly. See CHANGELOG.md for more information.

Contents

Description

With vhpi you can turn your Raspberry Pi into a silent backup module for your Network. Vhpi creates incremental snapshot backups of local directories or remote directories over SSH. Vhpi runs entirely on 'server-side'; clients only need to provide SSH access for the vhpi-box. Vhpi uses battle proven tools like rsync to create the backups and cp to create hardlinks for the snapshots. To get the most control over the backups vhpi takes raw rsync options for configuration. Vhpi writes two log files: one for a short overview of the entire process (info.log exmpl.) and one for debugging (debug.log exmpl.).

TL;DR: Just setup vhpi, run your Pi 24/7 and don't care about backups no more.

Features

  • Vhpi works with any rsync command you like. This gives you a wide and well documented variety of configuration options for your backup.
  • You can create multiple exclude-lists to exclude files/dirs from the backup. (See 'exclude_lib' in Example Config)
  • Vhpi creates snapshots for any time-interval you like. (e.g. 'hourly', 'daily', 'weekly', 'monthly', 'each-4-hours', 'half-yearly', etc...) Just add the interval name and its duration in seconds to the config. (See 'intervals' in Example Config).
  • You can set the amount of snapshots that you want keep for each used interval. E.g. if you want to keep 3 snapshots for the 'hourly' interval you get three snapshot dirs: hourly.0, hourly.1, hourly.2. Each snapshot reaches an hour further into the past.
  • Snapshots require a minimum of disk space:
    • because the backups are created incrementally.
    • because vhpi creates new snapshots as 'hard links' for all files that haven't changed. (No duplicate files.. just links)
  • The process is nicely logged ('info.log', 'debug.log').
  • If a backup process takes long, vhpi blocks any attempt to start a new backup process until the first one has finished to prevent the Pi from overloading.
  • More features are planned (See: Version Overview)

Requirements:

  • You need Python >= 3.9 on your Pi for vhpi to run. (How to install Python3.x on your Pi)
  • The file system of your Backup destination has to support hard links. (most common fs like NTFS and ext do...)

Installation & Configuration

Sharing sources with the Pi:

If you want to backup remote clients via vhpi you have to make sure, that each client can be reach via SSH from the vhpi-box.

Alternatively you can share/export client directories with NFS or Samba. If you do so you, should use autofs or similar to automatically mount the shared directories with your Pi whenever they are available. This way your Pi will automatically mount the directories whenever a machine enters the network.

There is a tutorial on this in the wiki: How to share sources with your Raspberry Pi using NFS.

Download and Install:

Simplest way to install vhpi is by useing pip. You need Python3.9 for vhpi to run. (How to install Python3.x on your Pi) After you installed Python3.9 you can run pip to install vhpi like this:

$ pip3.9 install vhpi

Run this command to check if vhpi was isntalled successfully:

$ vhpi --help

It should print the help text to the terminal.

Configure vhpi:

When you run vhpi for the first time, it creates a config dir at ~/.config/vhpi/, you'll find a file called vhpi_cfg.yaml there. This is where you configure your backups. The config file is pretty self explanatory, just have a look at the Example Config

Test the configuration

In order to test vhpi I suggest setting up some dummy backup sources that point to some safe destinations. Maybe in the /tmp dir or so. Then run the following command a couple of times and see if the destination gets filled with backups/snapshots:

$ vhpi run

If you get an error try to adjust the config. If you think there is a bug feel free to use the github issue tracker! The results of each run is written to the log-files as well (~/.config/vhpi/debug.log and ~/.config/vhpi/info.log)

Example Config

~/.config/vhpi/vhpi_cfg.yaml

# IMPORTANT: If you use paths that contain spaces, make sure to escape them
# with \ (backslash). The same counts for escape items.

# Basic App Settings:
app_cfg:
    # Create different list of files/dirs that you want to exclude from your
    # backups.
    exclude_lib:
        standard_list:
            [
                lost+found/*,
                .cache/chromium/*,
                .mozilla/firefox/*/Cache,
                .cache/thumbnails/*,
                .local/share/Trash/*,
            ]
        another_list: [some_dir]
    # Define time intervals, which you may use for your snapshots.
    # Feel free to use your own definitions like 'every_four_hours: 14400' etc.
    # Values must be in Seconds.
    intervals:
        {
            hourly: 3600,
            six-hourly: 21600,
            daily: 86400,
            weekly: 604800,
            monthly: 2592000,
            yearly: 31536000,
        }

# Backup Jobs Config.
# Configure each backup source here:
jobs:
    # Source 1:
    - name: "Dummy Source"
      source_ip: "192.168.178.20" # The ip of the computer to which the mounted src dir belongs to. If it's a local source use: "127.0.0.1" or "localhost".
      rsync_src: "/tmp/tests/dummy_src/src1/" # The path to the mounted or local dir.
      rsync_dst: "/tmp/tests/dummy_dest/dest1/" # The path to the destination dir in which each snapshot is created.
      rsync_options: "-aAHSvX --delete" # The options that you want to use for your rsync backup. Default is "-av". More info on rsync: http://linux.die.net/man/1/rsync
      exclude_lists: # Add exclude lists to exclude a list of file/folders. See above: app_cfg -> exclude_lib
          [standard_list, another_list]
      excludes: # Add additional source specific exclude files/dirs that are not covered by the exclude lists.
          [downloads, tmp]
      snapshots: # Define how many snapshots you want to keep for each interval. Older snapshots are deleted automatically.
          hourly: 6
          six-hourly: 4
          daily: 7
          weekly: 4
          monthly: 6
          yearly: 6

    # Source 2:
    # - name: 'Another Dummy Source'
    #  source_ip: 192.168.178.36
    # etc...'

very-hungry-pi's People

Contributors

feluxe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

branflakem3

very-hungry-pi's Issues

Refactor log output

Use this format for the single line log entries.

2017-12-20 00:00:07 192.168.178.10  [/home/peter/.config/························] Due:1  Skipped: No due jobs.
2017-12-20 00:00:07 192.168.178.101 [/home/peter/.config/························] Due:14 Skipped: Machine Offline.
2017-12-20 00:00:07 192.168.178.10  [/home/peter/.config/························] Due:4  Error: Machine online but src dir not available.

Shorten path block as well.

Use single line log entry for "Backup Source not available Error"

2017-12-20 18:00:09 [Completed] after: 00:00:02 (h:m:s) 
2017-12-20 19:00:06 [Skipped] [192.168.178.30 ] [/import/peter-desktop/peter/home/·················] [Source offline] [Due: daily, three-hourly]
2017-12-20 19:00:06 [Executing] 192.168.178.20	/import/peter-laptop/peter/home/

	Due: three-hourly, daily

	[Rsync Log]
	rsync: change_dir "/import/peter-laptop/peter/home" failed: No such file or directory (2)
	2017-12-20 19:00:06 Error: Backup Source not available.

2017-12-20 19:00:06 [Skipped] Rsync Execution Failed.
2017-12-20 19:00:06 [Skipped] [192.168.178.10 ] [/home/peter/.config/······························] [Source online ] [No due jobs]
2017-12-20 20:00:06 [Skipped] [192.168.178.30 ] [/import/peter-desktop/peter/home/·················] [Source offline] [Due: daily, three-hourly]

Make single line log output for this. In case a machine is not configured properly, this error repeats very often and the log gets polluted.

Solve #11 first.

Add dedupe feature.

Dupe Replacement Feature

Vhpi already creates snapshots using hard-links, but vhpi doesn't know when files are moved around within a source directory. Moved files are being backuped into a new snapshot as if they were new files. The dedupe feature let's vhpi search for duplicate files among snapshots for each backup source and replaces them with hard-links, to keep the backup as slim as possible. I don't know (yet) if the Pi can handle the amount of overhead, thou. I think it would make most sense to add config options that allow to limit the amount of work that has to be done to find dupes. E.g. filter out all files that are less than 10MB, then search dupes. Or do only search dupes among snapshots that last a while, like 'monthly' and 'yearly' snapshots.

dedupe_min_file_size: xxx   # Files smaller than this, will be excluded from dedupe process.
dedupe_snaps: ['monthly, yearly, ..']  # Dedupe will only run on the snapshots listed here.
dudupe_interval: 'weekly'    # Define the dedupe interval. 

This feature should be totally optional.

Brainstorming

Search duplicate files across all snapshots via fdupes for each Backup source.
    Only absolute identical files with same permissions, timestamps, etc. are dupes.
Delete all duplicates and replace them with hardlinks. Kepp only one file for each dupe-group.
Add config option to let user set a custom interval for dupe removal.
Add an config option to define a minimum file size, only files that are bigger than set value are included in dupe removal. (Dupe removal does not make sense for little files.)
Add a config option to define which type of snapshots are to be included in dupe removal. (Dupe removal makes most sense to be used for snapshots that last long.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.