Coder Social home page Coder Social logo

github-forcelargefiles's Introduction

GitHub-ForceLargeFiles

This package is a simple work around for pushing large files to a GitHub repo.

Since GitHub only allows pushing files up to 100 MB, a different service (such as LFS) has to be used for larger files. This package compresses and splits large files that can be pushed to a GitHub repo without LFS.

It starts off at a root directory and traverses down subdirectories, and scans every file contained. If any file has a size that is above threshold_size, then they are compressed and split to multiple archives, each having a maximum size of partition_size. Compressing/Splitting works for any file extension.

After compression/split, files can be pushed the usual way, using git push.

Parallelization

  • Although traversing directories in src/main.py is serial, compressing/splitting each file through 7z is parallelized by default.
  • Reversing with src/reverse.py is entirely serial. (TODO: Parallelize this too)

Requirements

  • Python 3.x.x.
  • You need to have 7z installed. Visit the 7z Download page for more information.
  • Folders/Files in the traversed directories should have appropriate read/write permissions.

Example Usage

Run with the default parameters:

$ python3 src/main.py  --root_dir ~/MyFolder

which will traverse down every subdirectory starting from ~/MyFolder, and reduce all files over 100 MB to smaller archives with maximum size of approximately 95 MB. The default option is to delete the original (large) files afterwards, but this can be turned off.

The comparison below describes the use of this package more clearly.

Before:

$ tree --du -h ~/MyFolder

└── [415M]  My Datasets
│   ├── [6.3K]  Readme.txt
│   └── [415M]  Data on Leaf-Tailed Gecko
│       ├── [ 35M]  DatasetA.zip
│       ├── [ 90M]  DatasetB.zip
│       ├── [130M]  DatasetC.zip
│       └── [160M]  Books
│           ├── [ 15M]  RegularBook.pdf
│           └── [145M]  BookWithPictures.pdf
└── [818M]  Video Conference Meetings
    ├── [817M]  Discussion_on_Fermi_Paradox.mp4
    └── [1.1M]  Notes_on_Discussion.pdf

After:

$ tree --du -h ~/MyFolder

└── [371M]  My Datasets
│   ├── [6.3K]  Readme.txt
│   └── [371M]  Data on Leaf-Tailed Gecko
│       ├── [ 35M]  DatasetA.zip
│       ├── [ 90M]  DatasetB.zip
│       ├── [ 95M]  DatasetC.zip.7z.001
│       ├── [ 18M]  DatasetC.zip.7z.002
│       └── [133M]  Books
│           ├── [ 15M]  RegularBook.pdf
│           ├── [ 95M]  BookWithPictures.pdf.7z.001
│           └── [ 23M]  BookWithPictures.pdf.7z.002
└── [794M]  Video Conference Meetings
    ├── [ 95M]  Discussion_on_Fermi_Paradox.mp4.7z.001
    ├── [ 95M]  Discussion_on_Fermi_Paradox.mp4.7z.002
    ├── [ 95M]  Discussion_on_Fermi_Paradox.mp4.7z.003
    ├── [ 95M]  Discussion_on_Fermi_Paradox.mp4.7z.004
    ├── [ 95M]  Discussion_on_Fermi_Paradox.mp4.7z.005
    ├── [ 95M]  Discussion_on_Fermi_Paradox.mp4.7z.006
    ├── [ 95M]  Discussion_on_Fermi_Paradox.mp4.7z.007
    ├── [ 95M]  Discussion_on_Fermi_Paradox.mp4.7z.008
    ├── [ 33M]  Discussion_on_Fermi_Paradox.mp4.7z.009
    └── [1.1M]  Notes_on_Discussion.pdf

To revert back to the original files, run:

$ python3 src/reverse.py  --root_dir ~/MyFolder

github-forcelargefiles's People

Contributors

a-yildiz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.