Coder Social home page Coder Social logo

Comments (19)

yuanzhoulvpi2017 avatar yuanzhoulvpi2017 commented on May 19, 2024 1

最新工作汇报

  1. 使用torch.utils.checkpoint + lora 方法,在fp16的情况下、在batch_size=1的时候,显存降低到15G左右。正在整理代码,后面会放出来。
  2. 这个工作,可以让很多卡跑起来了,甚至batchsize可以提高。

截屏2023-03-24 21 32 10

from zero_nlp.

yuanzhoulvpi2017 avatar yuanzhoulvpi2017 commented on May 19, 2024

感谢反馈问题,把batch_size都改为1,context_length=32试一试。别的情况,我再试一试

from zero_nlp.

zhaodice avatar zhaodice commented on May 19, 2024

感谢反馈问题,把batch_size都改为1,context_length=32试一试。别的情况,我再试一试

改完了也是爆显存,显卡是RTX4090 24GB,配置(如果有需要我可以把ssh开放给你研究研究x)

context_length = 32

args = TrainingArguments(
    output_dir="test003",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    evaluation_strategy="steps",
    eval_steps=100,
    logging_steps=100,
    gradient_accumulation_steps=8,
    num_train_epochs=1,
    weight_decay=0.1,
    warmup_steps=1_000,
    lr_scheduler_type="cosine",
    learning_rate=5e-4,
    save_steps=100,
    fp16=True,
    push_to_hub=False,
)
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 294.00 MiB (GPU 0; 23.65 GiB total capacity; 21.84 GiB already allocated; 152.56 MiB free; 22.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
  0%|                                                | 0/595 [00:01<?, ?it/s]

from zero_nlp.

zhaodice avatar zhaodice commented on May 19, 2024

我看了一下我torch是2.0,我改成1.13试试看(仍然爆显存

    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 294.00 MiB (GPU 0; 23.65 GiB total capacity; 21.83 GiB already allocated; 52.56 MiB free; 22.10 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
  0%|                                                                          | 0/595 [00:01<?, ?it/s]
(venv) user@calculator:~/ext/zero_nlp/simple_thu_chatglm6b$ pip show torch
Name: torch
Version: 1.13.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: /home/user/ext/zero_nlp/venv/lib/python3.10/site-packages
Requires: nvidia-cublas-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, typing-extensions
Required-by: accelerate, peft, pytorch-lightning, torchmetrics, torchvision, triton
(venv) user@calculator:~/ext/zero_nlp/simple_thu_chatglm6b$ 

from zero_nlp.

zhaodice avatar zhaodice commented on May 19, 2024

用INT4量化后的模型可以大幅减少显存,有没有直接微调INT4模型的可能性?

from zero_nlp.

yuanzhoulvpi2017 avatar yuanzhoulvpi2017 commented on May 19, 2024

查看这个配置#5 (comment)

from zero_nlp.

Adherer avatar Adherer commented on May 19, 2024

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 148.00 MiB (GPU 0; 22.38 GiB total capacity; 21.49 GiB already allocated; 87.94 MiB free; 21.52 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
我用P40跑的,22G显存,一样的问题,context_length设置成了32

from zero_nlp.

yuanzhoulvpi2017 avatar yuanzhoulvpi2017 commented on May 19, 2024

from zero_nlp.

zhaodice avatar zhaodice commented on May 19, 2024

查看这个配置#5 (comment)

试过了,无效

from zero_nlp.

Adherer avatar Adherer commented on May 19, 2024

22g显寸不够 发自我的 iPhone 在 2023年3月23日,17:26,Adherer @.> 写道:  torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 148.00 MiB (GPU 0; 22.38 GiB total capacity; 21.49 GiB already allocated; 87.94 MiB free; 21.52 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 我用P40跑的,22G显存,一样的问题,context_length设置成了32 — Reply to this email directly, view it on GitHub<#4 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHJRI6JAQJ6JUG2IWYTTU4TW5QJNHANCNFSM6AAAAAAWD3FFLE. You are receiving this because you commented.Message ID: @.>

我参考了一下其他repo,8bit量化可以16G的显存finetune,目前暂无支持多卡finetune的版本。因此,后续是否有如下两个优化方向:

  1. 8bit/4bit finetune优化;
  2. 单机多卡 or 多机多卡优化。
    若有相关优化计划,可合作

from zero_nlp.

yuanzhoulvpi2017 avatar yuanzhoulvpi2017 commented on May 19, 2024

目前在做两个方向:

  1. 使用torch.utils.checkpoint来降低显存压力。
  2. 单机多卡

花了一天了,还没什么进展😂,继续努力~

from zero_nlp.

Adherer avatar Adherer commented on May 19, 2024

目前在做两个方向:

  1. 使用torch.utils.checkpoint来降低显存压力。
  2. 单机多卡

花了一天了,还没什么进展😂,继续努力~

可以参考下这个代码:https://github.com/mymusise/ChatGLM-Tuning
我跑通了,正在训练中,明天有空我改下,改成中文训练的

from zero_nlp.

zhaodice avatar zhaodice commented on May 19, 2024

目前在做两个方向:

  1. 使用torch.utils.checkpoint来降低显存压力。
  2. 单机多卡

花了一天了,还没什么进展😂,继续努力~

可以参考下这个代码:https://github.com/mymusise/ChatGLM-Tuning 我跑通了,正在训练中,明天有空我改下,改成中文训练的

这个我也跑通了,但不知道是不是方法有问题,训练效果并不理想,似乎是在胡说八道(

from zero_nlp.

yuanzhoulvpi2017 avatar yuanzhoulvpi2017 commented on May 19, 2024

from zero_nlp.

yuanzhoulvpi2017 avatar yuanzhoulvpi2017 commented on May 19, 2024

已经有不少人跑出来了。不知道你们这边是怎么回事,要求就是显存问题。#5 (comment)

可以看截图,跑起来的时候,显寸占用为24330MB

from zero_nlp.

zhaodice avatar zhaodice commented on May 19, 2024

已经有不少人跑出来了。不知道你们这边是怎么回事,要求就是显存问题。#5 (comment)

可以看截图,跑起来的时候,显寸占用为24330MB

对windows没啥好感,刚双系统打开windows,弹窗问我是否创建GPT分区表,我点了一下确定,训练集LVM分区炸了得重新配置了,正好重试一下(

from zero_nlp.

Adherer avatar Adherer commented on May 19, 2024

已经有不少人跑出来了。不知道你们这边是怎么回事,要求就是显存问题。#5 (comment)

可以看截图,跑起来的时候,显寸占用为24330MB

相关问题已解决,模型裁剪即可,现可用13G+显存即可finetune:

image

from zero_nlp.

Adherer avatar Adherer commented on May 19, 2024

已经有不少人跑出来了。不知道你们这边是怎么回事,要求就是显存问题。#5 (comment)
可以看截图,跑起来的时候,显寸占用为24330MB

对windows没啥好感,刚双系统打开windows,弹窗问我是否创建GPT分区表,我点了一下确定,训练集LVM分区炸了得重新配置了,正好重试一下(

相关问题已解决,模型裁剪即可,现可用13G+显存即可finetune:

image

from zero_nlp.

zhaodice avatar zhaodice commented on May 19, 2024

已经有不少人跑出来了。不知道你们这边是怎么回事,要求就是显存问题。#5 (comment)
可以看截图,跑起来的时候,显寸占用为24330MB

对windows没啥好感,刚双系统打开windows,弹窗问我是否创建GPT分区表,我点了一下确定,训练集LVM分区炸了得重新配置了,正好重试一下(

相关问题已解决,模型裁剪即可,现可用13G+显存即可finetune:

image

怎么裁剪…咕

from zero_nlp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.