utsaslab / monet Goto Github PK

View Code? Open in Web Editor NEW

172.0 172.0 19.0 310 KB

MONeT framework for reducing memory consumption of DNN training

Home Page: https://arxiv.org/abs/2010.14501

License: MIT License

Python 51.13% Shell 0.10% C++ 13.96% C 21.90% Cuda 12.91%

deep-neural-networks dnn dnn-training machine-learning memory-consumption ml-training pytorch

monet's People

Contributors

Stargazers

Watchers

Forkers

kiminh shayeboshi sailfish009 lliai hboshnak scape1989 stjordanis dbofseuofhust donnyyou merrymercy sonoisa dumpmemory victorygogogo keshavvinayak01 zhaojp-frank jack47 tstdzhs sarvex iq-scm

monet's Issues

How to use MONet for training with larger batch sizes?

Hi, thanks for the awesome library.

One of the biggest purposes of saving memory is to enable traninig with larger batch sizes.
How to use MONet to do this? Specifically,

Do I need to change objective functions and constraints? Checkmate discussed this in the section 6.4 of its paper. How can we do this in MONet?
If I do not want to run the solver for every new batch size, can I reuse computed schedule solutions for different and larger batch sizes? How will it affect the memory saving and overhead?
If I want to test the maximum batch size I can achieve with MONet, how can I do this?

Can MONeT use for Transformer model?

The usage in README is all about CNN models, can it run with Transformer models?
And why create a solution have to Obtain the Gurobi academic license?

Is MONeT compatible with mixed precision training?

Hi, thanks for the open-source. I am wondering if MONeT could work with mixed precision training?

Can MONeT use for custom models and higher PyTorch version?

Thanks for your work!
Currently we have two questions:

Can MONeT work for PyTorch with version higher than 1.5.1? We have tried PyTorch 1.11.0 with CUDA 11.3, but we got error in https://github.com/utsaslab/MONeT/blob/master/monet/lm_ops/conv.py#L8 load function when running examples/training.py. We have also tried PyTorch 1.5.0 with CUDA 10.1, we didn't get previous error but got cuDNN error: CUDNN_STATUS_EXECUTION_FAILED at forward function in monet/lm_ops/bn.py, and the program(examples/training.py) took a long time on initialization. Can you post the detailed configurations, including PyTorch, CUDA, g++, etc. ?
Can MONeT use for custom models? In README, you mention that to create a MONeT solution we could use python cvxpy_solver.py MODEL ..., and the model format should be "torchvision.models.<model>()". Can we use MONeT to generate solutions for our own models?

MONet does not save the memory used by PyTorch

Hi, thanks for the awesome library.

How do you measure the used memory? Is it emperically measured or is it theorically computed?
I measured the memory usage by nvidia-smi and found MONet does not save the memory used by PyTorch.

First, I run the 10GB solution by python3 imagenet.py ~/imagenet -a resnet50 --gpu 0 --epochs 1 --batch-size 184 --sol ../data/monet_r50_184_24hr/solution_resnet50_184_inplace_conv_multiway_newnode_10.00.pkl. The peak memory reported by nvidia-smi is around 12GB.
Then, I run the 6GB solution by python3 imagenet.py ~/imagenet -a resnet50 --gpu 0 --epochs 1 --batch-size 184 --sol ../data/monet_r50_184_24hr/solution_resnet50_184_inplace_conv_multiway_newnode_6.00.pkl. The peak memory reported by nvidia-smi is still around 12GB.

How to use MONet to actually save the memory used by PyTorch?

utsaslab / monet Goto Github PK

monet's People

Contributors

Stargazers

Watchers

Forkers

monet's Issues

How to use MONet for training with larger batch sizes?

Can MONeT use for Transformer model?

Is MONeT compatible with mixed precision training?

Can MONeT use for custom models and higher PyTorch version?

MONet does not save the memory used by PyTorch

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent