Coder Social home page Coder Social logo

utsaslab / monet Goto Github PK

View Code? Open in Web Editor NEW
172.0 172.0 19.0 310 KB

MONeT framework for reducing memory consumption of DNN training

Home Page: https://arxiv.org/abs/2010.14501

License: MIT License

Python 51.13% Shell 0.10% C++ 13.96% C 21.90% Cuda 12.91%
deep-neural-networks dnn dnn-training machine-learning memory-consumption ml-training pytorch

monet's People

Contributors

aashaka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

monet's Issues

How to use MONet for training with larger batch sizes?

Hi, thanks for the awesome library.

One of the biggest purposes of saving memory is to enable traninig with larger batch sizes.
How to use MONet to do this? Specifically,

  1. Do I need to change objective functions and constraints? Checkmate discussed this in the section 6.4 of its paper. How can we do this in MONet?
  2. If I do not want to run the solver for every new batch size, can I reuse computed schedule solutions for different and larger batch sizes? How will it affect the memory saving and overhead?
  3. If I want to test the maximum batch size I can achieve with MONet, how can I do this?

Can MONeT use for Transformer model?

The usage in README is all about CNN models, can it run with Transformer models?
And why create a solution have to Obtain the Gurobi academic license?

Can MONeT use for custom models and higher PyTorch version?

Thanks for your work!
Currently we have two questions:

  1. Can MONeT work for PyTorch with version higher than 1.5.1? We have tried PyTorch 1.11.0 with CUDA 11.3, but we got error in https://github.com/utsaslab/MONeT/blob/master/monet/lm_ops/conv.py#L8 load function when running examples/training.py. We have also tried PyTorch 1.5.0 with CUDA 10.1, we didn't get previous error but got cuDNN error: CUDNN_STATUS_EXECUTION_FAILED at forward function in monet/lm_ops/bn.py, and the program(examples/training.py) took a long time on initialization. Can you post the detailed configurations, including PyTorch, CUDA, g++, etc. ?
  2. Can MONeT use for custom models? In README, you mention that to create a MONeT solution we could use python cvxpy_solver.py MODEL ..., and the model format should be "torchvision.models.<model>()". Can we use MONeT to generate solutions for our own models?

MONet does not save the memory used by PyTorch

Hi, thanks for the awesome library.

How do you measure the used memory? Is it emperically measured or is it theorically computed?
I measured the memory usage by nvidia-smi and found MONet does not save the memory used by PyTorch.

First, I run the 10GB solution by python3 imagenet.py ~/imagenet -a resnet50 --gpu 0 --epochs 1 --batch-size 184 --sol ../data/monet_r50_184_24hr/solution_resnet50_184_inplace_conv_multiway_newnode_10.00.pkl. The peak memory reported by nvidia-smi is around 12GB.
Then, I run the 6GB solution by python3 imagenet.py ~/imagenet -a resnet50 --gpu 0 --epochs 1 --batch-size 184 --sol ../data/monet_r50_184_24hr/solution_resnet50_184_inplace_conv_multiway_newnode_6.00.pkl. The peak memory reported by nvidia-smi is still around 12GB.

How to use MONet to actually save the memory used by PyTorch?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.