Coder Social home page Coder Social logo

gpueater's Introduction

GPUEater

Eat your GPUs

介绍

Pytorch使用GPU训练时只会使用所需要的显存,显存有大量空余情况下会给其他人带来可乘之机。多人共用GPU不仅会导致自己的程序运行变慢,还可能会在自己的程序运行中需要的显存变大时出现'Out of memory'错误。

Tensorflow在allow_growth = False时(也就是默认情况下)在程序运行前会自动分配所有显存,在pytorch代码中使用GPUEater可以在训练前提前分配所有显存,避免被人挤占GPU。

除此之外,单独运行本程序也可以用于应急情况下占用GPU(不建议这么做)。

建议配合GPUTasker使用。

使用方式

预分配显存

进入project目录,clone本项目

cd /path/to/your_project
git clone https://github.com/cnstark/gpueater.git

单卡训练

在训练开始前,设置CUDA_VISIBLE_DEVICES环境变量后加入

# import your package
from gpu_eater import occupy_gpus_mem


if __name__ == '__main__':
    # parse agrs
    parser = ArgumentParser(description='GPU Eater')
    parser.add_argument('--gpus', help='visible gpus', type=str)
    args = parser.parse_args()

    # set gpus
    if args.gpus is not None:
        os.environ["CUDA_VISIBLE_DEVICES"] = args.gpus

    # occury gpus mem
    occupy_gpus_mem()

    # your train code

多卡训练

  • DP

与单卡相同

  • DDP

由于DDP使用了分布式多进程,通常每个进程占用一块显卡,因此需要在每个进程中使用occupy_gpu_mem_for_ddp接口,各进程分别占满显存。

# import your package
from gpu_eater import occupy_gpu_mem_for_ddp


# train process
def train(rank, world_size):
    occupy_gpu_mem_for_ddp(rank)

    # your train code

单独使用

cd gpueater
python train.py --gpus 0

gpueater's People

Contributors

cnstark avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.