Coder Social home page Coder Social logo

m-manu / rsync-sidekick Goto Github PK

View Code? Open in Web Editor NEW
87.0 5.0 2.0 78 KB

Propagate file renames, movements and timestamp changes before rsync runs

License: Apache License 2.0

Go 99.37% Dockerfile 0.63%
golang go backups command-line-tool command-line-utility media

rsync-sidekick's Introduction

rsync-sidekick

build-and-test Go Report Card Go Reference License

Introduction

rsync is a fantastic tool. Yet, by itself, it's a pain to use for repeated backing up of media files (videos, music, photos, etc.) that are reorganized frequently.

rsync-sidekick is a safe and simple tool that is designed to run before rsync is run.

What does this do?

rsync-sidekick propagates following changes (or any combination) from source directory to destination directory:

  1. Change in file modification timestamp
  2. Rename of file/directory
  3. Moving a file from one directory to another

Note:

  • This tool does not delete any files or folders (under any circumstances) -- that's why safe-to-use 😌
    • Your files are just moved around
    • Now, if you're uncomfortable with this tool even moving your files around, there is a --shellscript option, that just generates a script for you to read and run (think of it like a --dry-run option)
  • This tool does not actually transfer files -- that's for rsync to do 🙂
  • Since you'd run rsync after this tool is run, any changes that this tool couldn't propagate would just be propagated by rsync
    • So the most that you might lose is some time with rsync doing more work than it could have -- Which is likely still much less than not using this tool at all 😄

How to install?

  1. Install Go version at least 1.19
    • On Ubuntu: snap install go
    • On Mac: brew install go
    • For anything else: Go downloads page
  2. Run command:
    go install github.com/m-manu/rsync-sidekick@latest
  3. Add following line in your .bashrc/.zshrc file:
    export PATH="$PATH:$HOME/go/bin"

How to use?

Step 1: Run this tool

rsync-sidekick /Users/manu/Photos/ /Volumes/Portable/Photos/

Step 2: Run rsync as you would normally do

# Note the trailing slashes below. Without them, rsync's behavior is different!
rsync -av /Users/manu/Photos/ /Volumes/Portable/Photos/ 

Command line options

Running rsync-sidekick --help displays following information:

rsync-sidekick is a tool to propagate file renames, movements and timestamp changes from a source directory to a destination directory.

Usage:
	 rsync-sidekick <flags> [source-dir] [destination-dir]

where,
	[source-dir]        Source directory
	[destination-dir]   Destination directory

flags: (all optional)
  -x, --exclusions string            path to file containing newline separated list of file/directory names to be excluded
                                     (even if this is not set, files/directories such these will still be ignored: $RECYCLE.BIN, desktop.ini, Thumbs.db etc.)
  -h, --help                         display help
      --list                         list files along their metadata for given directory
  -s, --shellscript                  instead of applying changes directly, generate a shell script
                                     (this flag is useful if you want 'dry run' this tool or want to run the shell script as a different user)
  -p, --shellscript-at-path string   similar to --shellscript option but you can specify output script path
                                     (this flag cannot be specified if --shellscript option is specified)
  -v, --verbose                      generates extra information, even a file dump (caution: makes it slow!)
      --version                      show application version (v1.5.0) and exit

More details here: https://github.com/m-manu/rsync-sidekick

Running this from a Docker container

Below is a simple example:

# Run rsync-sidekick:
docker run --rm -v /Users/manu:/mnt/homedir manumk/rsync-sidekick rsync-sidekick /mnt/homedir/Photos/ /mnt/homedir/Photos_backup/

# Then run rsync: (note the trailing slashes -- without them, rsync's behavior is different)
docker run --rm -v /Users/manu:/mnt/homedir manumk/rsync-sidekick rsync /mnt/homedir/Photos/ /mnt/homedir/Photos_backup/

FAQs

Why was this tool created?

rsync options such as --detect-renamed, --detect-renamed-lax, --detect-moved and --fuzzy don't work reliably and sometimes are dangerous! rsync-sidekick is reliable alternative to all these options and much more.

How will I benefit from using this tool?

Using rsync-sidekick before rsrync makes your backup process significantly faster than using only rsync. Sometimes this performance benefit can even be 100x😲, if the only changes at your source directory are the 3 types mentioned earlier in this article.

rsync-sidekick's People

Contributors

m-manu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rsync-sidekick's Issues

Suggestions

First of let me say that this is an amazing tool, definitely saving up a lot of time on rsync operations.

I have two suggestions:

  1. Allow the user to set the output .sh name like: --shellscript my_shell.sh
  2. Order by path ascending when defining the operations inside the output shell script. This has no practical effect, I just thought I was doing something wrong when I:

Generated a .sh file

Renamed one file by hand on dest

Generated a second .sh file

Compared both sh files on Win merge, and they were 95% different, but just the ordering was different in them

three minor quirks in a very useful script

Thank you so much for this script, I have been needing something like this since a long time!
It works very well, but leaves me with these observations:

  1. when one of the disks goes away, the script seems to hang (even when the disk comes back). This happened to me with a network drive that got unmounted. The script continues to print progress output lines, but the percentage is stuck.
  2. the files in a directory that has been moved on source are indeed moved to the new location on dest, but the original directory on dest stays, but is empty. Is it possible to move the whole dir on dest?
  3. Some small files don't get moved with the rest of their folder (e.g. Folder.jpg, AlbumArt...jpg, although they're not in the default exclude list).

Thanks!

Feature request: Give version information

Please add version information on the --help output and/or add --version option.

Currently I'm getting this:

user@host ~ $ rsync-sidekick --help
rsync-sidekick is a tool to propagate file renames, movements and timestamp changes from a source directory to a destination directory.

Usage:
         rsync-sidekick <flags> [source-dir] [destination-dir]

where,
        source-dir        Source directory
        destination-dir   Destination directory

flags: (all optional)
  -exclusions string
        path to file containing newline separated list of file/directory names to be excluded
        (if this is not set, by default these will be ignored: $RECYCLE.BIN, desktop.ini, Thumbs.db etc.)
  -extrainfo
        generate extra information (caution: makes it slow!)
  -help
        display help
  -list string
        list files along their metadata for given directory
  -shellscript
        instead of applying changes directly, generate a shell script
        (this flag is useful if you want 'dry run' this tool or want to run the shell script as a different user)

More details here: https://github.com/m-manu/rsync-sidekick
user@host ~ $

To figure out which version the currently installed binary is I've used strings:

user@host ~ $ strings go/bin/rsync-sidekick |grep 1.3.0
mod     github.com/m-manu/rsync-sidekick        v1.3.0  h1:muxYvIFeszJbN1Tu+VHz0OBoqG37hAoBO8r8dlyayi0=
/home/user/go/pkg/mod/github.com/m-manu/[email protected]/entity/map_string_to_file_digest.go
/home/user/go/pkg/mod/github.com/m-manu/[email protected]/entity/multimap_file_digest_to_string.go
/home/user/go/pkg/mod/github.com/m-manu/[email protected]/fmte/fmt_english.go
/home/user/go/pkg/mod/github.com/m-manu/[email protected]/bytesutil/human_readable_size.go
/home/user/go/pkg/mod/github.com/m-manu/[email protected]/utils/utils.go
/home/user/go/pkg/mod/github.com/m-manu/[email protected]/entity/string_set.go
/home/user/go/pkg/mod/github.com/m-manu/[email protected]/action/action.go
/home/user/go/pkg/mod/github.com/m-manu/[email protected]/action/make_directory_action.go
/home/user/go/pkg/mod/github.com/m-manu/[email protected]/action/move_file_action.go
/home/user/go/pkg/mod/github.com/m-manu/[email protected]/action/propagate_timestamp_action.go
/home/user/go/pkg/mod/github.com/m-manu/[email protected]/service/file_hash.go
/home/user/go/pkg/mod/github.com/m-manu/[email protected]/service/find_files.go
/home/user/go/pkg/mod/github.com/m-manu/[email protected]/service/sync.go
/home/user/go/pkg/mod/github.com/m-manu/[email protected]/main.go
/home/user/go/pkg/mod/github.com/m-manu/[email protected]/rsync_sidekick.go
mod     github.com/m-manu/rsync-sidekick        v1.3.0  h1:muxYvIFeszJbN1Tu+VHz0OBoqG37hAoBO8r8dlyayi0=
user@host ~ $

"go get" seems to fail upfront

Hello there, finally happy top find someone like-minded that put the effort to actually write a program that was in my mind for a long time! :)

I am not an expert in go, and was giving a shot at your program but am having an issue while trying to install:
me@host:~$ go get github.com/m-manu/rsync-sidekick
package embed: unrecognized import path "embed" (import path does not begin with hostname)
package io/fs: unrecognized import path "io/fs" (import path does not begin with hostname)

After this the script seems to keep going on, and after a few seconds it returns without any message, but I cannot invoke nor find any rsync-sidekick executable.

The test system is quite "old" but regularly updated, and I found the go version is in the same shape (many version behind):
me@host:~$ go version
go version go1.10.4 linux/amd64

Any hint as to make your program run on this platform?
Thanks in advance 😉

Incorrect change of timestamp to file with same name

$ rsync-sidekick --version
v1.5.0
  1. Create 1 MiB binary zeros sparse files in source and destination directories.
  2. Modify one byte of the source file at 10 KiB position.
$ mkdir source destination
$ truncate --size 1M source/file destination/file
$ printf '\x01' | dd of=source/file bs=1 count=1 seek=10K conv=notrunc status=none
$ dd if=source/file bs=1 count=1 skip=10K status=none | hd
00000000  01
$ dd if=destination/file bs=1 count=1 skip=10K status=none | hd
00000000  00

rsync-sidekick incorrectly propagates the modification timestamp of the source to the destination file. Because of this, rsync does not copy the source file to the destination, because timestamps are the same:

$ rsync-sidekick source/ destination/
Found 1 actions that can save you 1.00 MiB of files transfer!
Applying sync actions at destination...
   1/1 propagate timestamp of "source/file" to "file":
done
$ rsync -av source/ destination/
sending incremental file list

sent 77 bytes  received 12 bytes  178.00 bytes/sec

rsync-sidekick must not do anything, so that rsync copies the source file to the destination:

$ mkdir source destination
$ truncate --size 1M source/file destination/file
$ printf '\x01' | dd of=source/file bs=1 count=1 seek=10K conv=notrunc status=none
$ rsync -av source/ destination/
sending incremental file list
file

sent 1,048,948 bytes  received 35 bytes  2,097,966.00 bytes/sec

If the file is modified at 1 KiB position, the issue does not happen:

$ mkdir source destination
$ truncate --size 1M source/file destination/file
$ printf '\x01' | dd of=source/file bs=1 count=1 seek=1K conv=notrunc status=none
$ rsync-sidekick source/ destination/
No sync actions found. You may run rsync.

Duplicate files, add added on the source side

One question.

Is rsync-sidekick able to optimize duplicate files that have been added by users to the source tree?
I guess not, because rsync-sidekick seems to use only ´´´mv´´´ for the rename action.

One suggestion to improve this in a simple way could be to use cp -al (hard-linking) instead of mv, maybe optional (-hardlink).

Example
Two identical files A and B exist in ´´´srcdir´´´.
File C (also identical to A and B) exist already in ´´´dstdir´´´.

Solution:

cp -al dstdir/C  dstdir/A
cp -al dstdir/C  dstdir/B
##  dstdir/C  still exist but may be deleted later with rsync --delete, if this is wanted by the user 

Note, this way a backup via sidekick plus rsync would even de-duplicate the required storage space on the target, without the need to run another tool afterwards, like the hardlink command.

Incorrect change of timestamp to file with same content

$ rsync-sidekick --version
v1.5.0

Create this file structure:

  1. file-1 has the same content and modification timestamp in the source and destination directories.
  2. file-2 has different content and modification timestamp in the source and destination directories.
  3. file-1 has the same content as file-2 in the source directory.
$ mkdir source destination
$ echo "a" > source/file-1
$ cp --archive source/file-1 destination/
$ echo "b" > destination/file-2
$ echo "a" > source/file-2
$ stat --format "%n  %y" source/* destination/*
source/file-1       18:30:26
source/file-2       18:30:39
destination/file-1  18:30:26
destination/file-2  18:30:35
$ cat source/file-1 source/file-2 destination/file-1 destination/file-2
a
a
a
b

rsync-sidekick incorrectly propagates the modification timestamp of source file-2 to destination file-1. Because of this, rsync copies file-1 from source to destination, because the timestamps are different:

$ rsync-sidekick source/ destination/
Found 1 actions that can save you 2 B of files transfer!
Applying sync actions at destination...
   1/1 propagate timestamp of "source/file-2" to "file-1":
done
$ rsync -av source/ destination/
./
file-1
file-2

rsync-sidekick must not do anything, so that rsync does not copy file-1:

$ mkdir source destination
$ echo "a" > source/file-1
$ cp --archive source/file-1 destination/
$ echo "b" > destination/file-2
$ echo "a" > source/file-2
$ rsync -av source/ destination/
./
file-2

Support for remote locations (via ssh)

I would love to just use rsync-sidekick, but apparently this is restricted for cases where source and destination directory are local.

It would be extremely useful if it would support remote source or target. In the meantime, I created a little script (tailored to my needs) that does what I want (if I knew Go or would be interested in learning it, I would of course have just contributed to rsync-sidekick).

If someone is looking for something like that but with remote support, see here for some infos and here for the gist and adapt it to your needs.

I am currently not interested in working it out more than that, so would be happy if rsync-sidekick would subsume all the features so I can abandon my weekend project script for good.

Check if destination exists

Hi, I believe some safety checks should be added to not overwrite existing files on destination:

 // UnixCommand for moving or renaming a file
 func (a MoveFileAction) UnixCommand() string {
-       return fmt.Sprintf(`mv -v "%s" "%s"`, escape(a.sourcePath()), escape(a.destinationPath()))
+       return fmt.Sprintf(`mv -v -n "%s" "%s"`, escape(a.sourcePath()), escape(a.destinationPath()))
 }
 
 // Perform 'file move/rename' action
 func (a MoveFileAction) Perform() error {
-       return os.Rename(a.sourcePath(), a.destinationPath())
+    if _, err := os.Stat(a.destinationPath()); err == nil {
+        return errors.New(fmt.Sprintf(`file already exists "%s"`, a.destinationPath()))
+    } else if errors.Is(err, os.ErrNotExist) {
+           return os.Rename(a.sourcePath(), a.destinationPath())
+    } else {
+        return err
+    }
 }

Cannot install

Hi,

I installed go:

ll /usr/local/go
total 244
drwxr-xr-x 10 root root   4096 May 10 18:51 ./
drwxr-xr-x 11 root root   4096 May 18 11:57 ../
drwxr-xr-x  2 root root   4096 May 10 18:48 api/
-rw-r--r--  1 root root  56057 May 10 18:48 AUTHORS
drwxr-xr-x  2 root root   4096 May 10 18:50 bin/
-rw-r--r--  1 root root     52 May 10 18:48 codereview.cfg
-rw-r--r--  1 root root   1339 May 10 18:48 CONTRIBUTING.md
-rw-r--r--  1 root root 111408 May 10 18:48 CONTRIBUTORS
drwxr-xr-x  2 root root   4096 May 10 18:48 doc/
drwxr-xr-x  3 root root   4096 May 10 18:48 lib/
-rw-r--r--  1 root root   1479 May 10 18:48 LICENSE
drwxr-xr-x 12 root root   4096 May 10 18:48 misc/
-rw-r--r--  1 root root   1303 May 10 18:48 PATENTS
drwxr-xr-x  6 root root   4096 May 10 18:51 pkg/
-rw-r--r--  1 root root   1475 May 10 18:48 README.md
-rw-r--r--  1 root root    397 May 10 18:48 SECURITY.md
drwxr-xr-x 48 root root   4096 May 10 18:48 src/
drwxr-xr-x 27 root root  12288 May 10 18:48 test/
-rw-r--r--  1 root root      8 May 10 18:48 VERSION

go version
go version go1.18.2 linux/amd64

echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin

but the first time I run go install github.com/m-manu/rsync-sidekick@latest, I only get:

go: downloading github.com/m-manu/rsync-sidekick v1.3.0
go: downloading golang.org/x/text v0.3.7

the following times i run go install github.com/m-manu/rsync-sidekick@latest, I get nothing.

and the go bin directory does not contain rsync-sidekick:

ll /usr/local/go/bin
total 17484
drwxr-xr-x  2 root root     4096 May 10 18:50 ./
drwxr-xr-x 10 root root     4096 May 10 18:51 ../
-rwxr-xr-x  1 root root 14542442 May 10 18:50 go*
-rwxr-xr-x  1 root root  3350528 May 10 18:50 gofmt*

What did I do wrong?

Several things...

I've been suffering years of "sigh, I'll just run rsync and suffer the extra hours" and I finally decided to do something about it. I always thought that a good approach would be some kind of a pre-script that crawls over the directories on both sides and performs the needed renames -- to be followed by the actual rsync. When I hacked most of this (as a simple bash script), I thought that it would be a good idea to post something about it so maybe someone will do it properly, so I went back to one of the old SO questions about it, and I was surprised that you've done this at almost the exact same time! (Well, three months ago, but on a scale of the >7 years that this question was posted, it can be considered the same time :)

Anyway, I've looked at your code very briefly (I don't know much go), and there are several important things that I think would make it be truly useful. Take the following as enthusiastic suggestions, but feel free to ignore them if it doesn't fit your goal...

  1. The very first thing is needed to make people comfortable running unknown code that crawls over their precious media collections is a good explanation. You do say the important point -- that it doesn't delete anything -- but a brief description that this is guessing where files should move to before running rsync should put people more at ease.

  2. It should also mention that while files are not deleted, they are moved around. (Which should not scare people with the added explanation that I suggested above, and also see below.)

  3. Following the same line further, you should mention the option of doing nothing and generating a script instead. As a random viewer, I'd feel much more comfortable running code that will not do anything and instead spit out a script that I can inspect and run. (And BTW, it would be nice to be able to spit out the script instead of always dumping it into a file.)

  4. Yet more in this direction: you should really have some --dry-run mode that is similar but describing the operations it would have done instead.

  5. Finally (in the explanation section of this suggestion), saying that this is intended to run before an rsync would also be good to mention. Basically, the fact that even if there are some bugs or some unknown situations that it would mishandle, then the following rsync would straighten things out -- so the most that you might lose is some time with rsync doing more work than it could have. (Which is likely still much less than not using this thing at all.)

(The following are functionality suggestions...)

  1. Not clear to me whether you handle it well or not (looks like you don't), but dealing with file swapping gracefully would be good too. Something like swapping two (or more) existing files in the source tree, which requires a temporary file rename in the destination.

  2. Looks like the metadata that you're collecting for file identity is just the file size, which can be too weak. What I did in my bash hack is using the file size as well as the first+last N bytes of the file (and a middle-block would be a good addition). This makes it much more reliable and still relatively fast, though slower than just the sizes since it requires actually opening and reading from the files. It could be placed behind a command-line flags for cases where you don't want to pay the extra penalty of actually doing contents IO as a global decision of whether to include that in the metadata (possibly with an adjustable setting of what N you want to use, or even being able to specify a complete-file checksum).

  3. Something that you're not doing and is a blocker for me: looks like you're only handling two local directories. The main reason I finally gave up and did my hack was a sync of media to a remote machine. (I also do local syncs, but for that I don't mind the extra time as much.) This might require some more tweaking of the code, since you won't be able to propagate timestamp changes as easily. (My script doesn't do that at all, since that's much cheaper for rsync to do.)

    BTW, the way that I did this was to assume that the script lives in the target machine in the same place, and then to collect remote information it uses ssh remot "$0" ... so everything is still conveniently in one script.

  4. And lastly, it would be nice if it was easier to use it for random bystanders. For example, providing a statically-linked executable, and/or a script that wraps a docker usage, and/or a mode that runs both this and then rsync in one shot.

Provide option to expand threshold for reading whole file into CRC32 hash

When calculating a file hash, files that are <= 16KiB are read in their entirety into a CRC32 fingerprint, while files > 16KiB have only their "crucial bytes" read, as explained in #1 (comment). I work with some files that are in the 50-100KiB range, and are particularly likely to have edits that don't change the file size or affect the critical bytes (they're kinda like csv but with space-padded cells, similar to markdown or org-mode tables). It would be great to just bump up the threshold for fingerprinting the whole file to like 200KiB. I'm using rsync-sidekick for archive so safety is more important than performance. Would it be possible to add an option for this?

Please update docker image!

I'm trying to run rsync-sidekick on synology nas, and the only way is to use docker. But i think, it's outdated. Please update!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.