Coder Social home page Coder Social logo

johanneswiesner / demetrius Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 61 KB

A repository for finding and copying files while preserving their folder structure

License: MIT License

Python 100.00%
copy-files data-management file-management find-files python

demetrius's Introduction

Demetrius

A python library for finding and copying files within their parent directories.

NOTE

Feel free to send pull requests by having a look at the FIXMEs and TODOs inside demetrius.py

Motivation

Your Grandpa gives you his PC and asks you if you could make a backup of his photos and videos. There's just one problem: All his mediafiles are stored in different folders and in different locations on his drive! In case you don't want to go manually through each folder, this is a case for demetrius. Just pass in the source and destination directory and demetrius will crawl through all the folders, find files whose extensions match the extensions you are looking for and copies the files in the destination directory.

Demetrius will copy files within their parent directory to ensure that you don't end up with one destination folder where all files are stored in. So for example, all files within a folder holiday in the source directory will also be in a holiday folder in the destination directory. Demetrius also takes care of duplicate folder names. For example, if there are several holiday folders on Grandpa's PC (which is not unlikely, because Grandpa forgot that it might have been a good idea to specify a unique name for each holiday he spent with the family...) demetrius will conserve the original structure and creates holiday, holiday_2, etc..

Finally, Demetrius also respect case insensitive OS like Windows where you can't create a folder holiday and Holiday in the same directory (e.g. by adding indices ('holiday (1)','Holiday (2)')).

Usage

Demetrius can be used via the console. For example: python demetrius.py -src ./foo -dst ./bar will copy all found files in ./foo to ./bar withing their parent directories. For that, the script uses all file suffixes found in suffixes.json for the search. You can also filter for specifc file suffixes via the -sfx flag (e.g. -sfx png jpg) or even for broad file categories using the -cat flag (e.g. -cat video for only searching for video files). Use the -e flag if you want to ignore certain directories and their children directories (e.g. -e Windows "Program Files"). Use -v if you want demetrius to show progress information.

What Demetrius can't do

Demetrius is dumb. If you have a folder in the source directory that is called foobar which contains a file named 123.png (which is a photo of dickbutt) it will copy that with that folder to the destination directory. Accordingly, you have to weight demetrius's dumbness against how much time you want to spend with manually clicking through Grandpas PC. U

Links

I can strongly recommend to run AntiDupl after Demetrius was run to identify file duplicates.

Related projects

Total Commander had a plugin called Copy Tree that did exactly what demetrius does. The project site is down, so here's a link from the Wayback Machine

Preview

Before Demetrius

After Demetrius

src/
┣ grandpa.doe/
┃ ┗ Holiday/
┃   ┗ lena.png
┣ holiday/
┃ ┗ lena.png
┣ jane.doe/
┃ ┗ holiday/
┃   ┗ lena.png
┣ john.doe/
┃ ┗ Holiday/
┃   ┗ lena.png
┗ .lena.png
dst/
┣ Holiday_1 (1)/
┃ ┗ lena.png
┣ Holiday_2 (1)/
┃ ┗ lena.png
┣ holiday_1 (2)/
┃ ┗ lena.png
┣ holiday_2 (2)/
┃ ┗ lena.png
┗ src/
  ┗ .lena.png

demetrius's People

Contributors

johanneswiesner avatar

Stargazers

 avatar  avatar

Watchers

 avatar

demetrius's Issues

Create separate function for duplicate check

Code would be more readable if the duplicate checks would get their own function with their own doc-strings:

demetrius/demetrius.py

Lines 166 to 178 in 1d4c6d3

# find literal duplicates and modify the respective destination directories
for _,dir_name in dst_dirs_df.groupby('src_dir_name'):
if not dir_name['src_dir_path'].nunique() == 1:
for idx,(_,src_dir_path) in enumerate(dir_name.groupby('src_dir_path'),start=1):
dst_dirs_df.loc[src_dir_path.index,'dst_dir_path'] = dst_dirs_df.loc[src_dir_path.index,'dst_dir_path'] + '_' + str(idx)
# find pseudo duplicates and modify the respective destination directories
dst_dirs_df['dst_dir_path_lower_case'] = dst_dirs_df['dst_dir_path'].map(str.lower)
for _,dst_dir_path in dst_dirs_df.groupby('dst_dir_path_lower_case'):
if dst_dir_path['src_dir_path'].nunique() != 1:
for idx,(_,dir_name) in enumerate(dst_dir_path.groupby('src_dir_name'),start=1):
dst_dirs_df.loc[dir_name.index,'dst_dir_path'] = dst_dirs_df.loc[dir_name.index,'dst_dir_path'] + ' (' + str(idx) + ')'

Allow to choose directory depth

Would be nice to control the directory depth:

See here:

  • 'choose number of levels to copy' dialog that allows observing relative paths for all cases

But probably not that easy to do, because then duplicate and pseudo duplicate folders would have to be checked for each level?

Switch over to halo

Maybe it would be better so switch over to halo, because here we could use function decorators and then the code would be way less verbose:

https://github.com/manrajgrover/halo

One could then do something like:

if verbose:
  with Halo(text='Loading', spinner='dots'):
      function()
else:
     function()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.