Coder Social home page Coder Social logo

backdown's Introduction

backdown

MIT Latest Version Build Chat on Miaou

Backdown helps you safely and ergonomically remove duplicate files.

Its design is based upon my observation of frequent patterns regarding build-up of duplicates with time, especially images and other media files.

Finding duplicates is easy. Cleaning the disk when there are thousands of them is the hard part. What Backdown brings is the easy way to select and remove the duplicates you don't want to keep.

A Backdown session goes through the following phases:

  1. Backdown analyzes the directory of your choice and find sets of duplicates (files whose content is exactly the same). Backdown ignores symlinks and files or directories whose name starts with a dot.
  2. Backdown asks you a few questions depending on the analysis. Nothing is removed at this point: you only stage files for removal. Backdown never lets you stage all items in a set of identical files
  3. After having maybe looked at the list of staged files, you confirm the removals
  4. Backdown does the removals on disk

What it looks like

Analysis and first question:

screen 1

Another kind of question:

screen 2

Yet another one:

screen 3

Yet another one:

screen 4

Review and Confirm:

screen 5

At this point you may also export the report as JSON, and you may decide to replace each removed file with a link to one of the kept ones.

Installation

From the crates.io repository

You must have the Rust env installed: https://rustup.rs

Run

cargo install --locked backdown

From Source

You must have the Rust env installed: https://rustup.rs

Download this repository then run

cargo install --path .

Precompiled binaries

Unless you're a Rust developer, I recommend you just download the precompiled binaries, as this will save a lot of space on your disk.

Binaries are made available at https://dystroy.org/backdown/download/

Usage

Deduplicate any kind of files

backdown /some/directory

Deduplicate images

backdown -i /some/directory

JSON report

After the staging phase, you may decide to export a report as JSON. This doesn't prevent doing also the removals.

The JSON looks like this:

{
  "dup_sets": [
    {
      "file_len": 1212746,
      "files": {
        "trav-copy/2006-05 (mai)/HPIM0530.JPG": "remove",
        "trav-copy/2006-06 (juin)/HPIM0530 (another copy).JPG": "remove",
        "trav-copy/2006-06 (juin)/HPIM0530 (copy).JPG": "remove",
        "trav-copy/2006-06 (juin)/HPIM0530.JPG": "keep"
      }
    },
    {
      "file_len": 1980628,
      "files": {
        "trav-copy/2006-03 (mars)/HPIM0608.JPG": "keep",
        "trav-copy/2006-05 (mai)/HPIM0608.JPG": "remove",
        "trav-copy/2006-06 (juin)/HPIM0608.JPG": "keep"
      }
    },
    {
      "file_len": 1124764,
      "files": {
        "trav-copy/2006-05 (mai)/HPIM0529.JPG": "remove",
        "trav-copy/2006-06 (juin)/HPIM0529.JPG": "keep"
      }
    },
    {
      "file_len": 1706672,
      "files": {
        "trav-copy/2006-05 (mai)/test.jpg": "remove",
        "trav-copy/2006-06 (juin)/HPIM0598.JPG": "keep"
      }
    }
  ],
  "len_to_remove": 8450302
}

Advice

  • If you launch backdown on a big directory, it may find more duplicates you suspect there are. Don't force yourself to answer all questions at first: if you stage the removals of the first dozen questions you'll gain already a lot and you may do the other ones another day
  • Don't launch backdown at the root of your disk because you don't want to try and deal with duplicates in system resources, programs, build artefacts, etc. Launch backdown where you store your images, or your videos or musics
  • Backdown isn't designed for dev directories and doesn't respect .gitignore rules
  • If you launch backdown in a directory with millions files on a slow disk, you'll have to wait a long time while the content is hashed. Try with a smaller directory first if you have an HDD
  • If you're only interested in images, use the -i option

backdown's People

Contributors

canop avatar kianmeng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

backdown's Issues

Ignore list

It would be handy to have some regex-based mechanism for ignoring certain files/directories, especially for repeated runs. Maybe also an option to add certain found dupes to a global ignore file.

faced 21k questions :)

Is it possible to limit traversing inside container files like epub, mac application, docx, etc

One way is if a folder contains extension, then it should not be traversed like .app, .kext, .epub, etc

Option to export the report in a JSON file

At the end of the staging phase, have an option to write the list of sets of duplicates, with the choice which were made (ie which files to keep and which to remove) in a JSON file.

Doing this export doesn't prevent doing the removal after.

This JSON file would be used both for traceability and to allow alternate actions (see issue #2).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.