默认配置24g显存还是会爆,about yuanzhoulvpi2017/zero_nlp

Comments (19)

yuanzhoulvpi2017 commented on May 19, 2024 1

最新工作汇报

使用torch.utils.checkpoint + lora 方法，在fp16的情况下、在batch_size=1的时候，显存降低到15G左右。正在整理代码，后面会放出来。
这个工作，可以让很多卡跑起来了，甚至batchsize可以提高。

from zero_nlp.

yuanzhoulvpi2017 commented on May 19, 2024

感谢反馈问题，把batch_size都改为1，context_length=32试一试。别的情况，我再试一试

from zero_nlp.

zhaodice commented on May 19, 2024

感谢反馈问题，把batch_size都改为1，context_length=32试一试。别的情况，我再试一试

改完了也是爆显存，显卡是RTX4090 24GB，配置（如果有需要我可以把ssh开放给你研究研究x）

context_length = 32

args = TrainingArguments(
    output_dir="test003",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    evaluation_strategy="steps",
    eval_steps=100,
    logging_steps=100,
    gradient_accumulation_steps=8,
    num_train_epochs=1,
    weight_decay=0.1,
    warmup_steps=1_000,
    lr_scheduler_type="cosine",
    learning_rate=5e-4,
    save_steps=100,
    fp16=True,
    push_to_hub=False,
)

    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 294.00 MiB (GPU 0; 23.65 GiB total capacity; 21.84 GiB already allocated; 152.56 MiB free; 22.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
  0%|                                                | 0/595 [00:01<?, ?it/s]

from zero_nlp.

zhaodice commented on May 19, 2024

我看了一下我torch是2.0，我改成1.13试试看(仍然爆显存

    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 294.00 MiB (GPU 0; 23.65 GiB total capacity; 21.83 GiB already allocated; 52.56 MiB free; 22.10 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
  0%|                                                                          | 0/595 [00:01<?, ?it/s]
(venv) user@calculator:~/ext/zero_nlp/simple_thu_chatglm6b$ pip show torch
Name: torch
Version: 1.13.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: /home/user/ext/zero_nlp/venv/lib/python3.10/site-packages
Requires: nvidia-cublas-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, typing-extensions
Required-by: accelerate, peft, pytorch-lightning, torchmetrics, torchvision, triton
(venv) user@calculator:~/ext/zero_nlp/simple_thu_chatglm6b$

from zero_nlp.

zhaodice commented on May 19, 2024

用INT4量化后的模型可以大幅减少显存，有没有直接微调INT4模型的可能性?

from zero_nlp.

yuanzhoulvpi2017 commented on May 19, 2024

查看这个配置#5 (comment)

from zero_nlp.

Adherer commented on May 19, 2024

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 148.00 MiB (GPU 0; 22.38 GiB total capacity; 21.49 GiB already allocated; 87.94 MiB free; 21.52 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
我用P40跑的，22G显存，一样的问题，context_length设置成了32

from zero_nlp.

yuanzhoulvpi2017 commented on May 19, 2024

22g显寸不够发自我的 iPhone 在 2023年3月23日，17:26，Adherer ***@***.***> 写道： torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 148.00 MiB (GPU 0; 22.38 GiB total capacity; 21.49 GiB already allocated; 87.94 MiB free; 21.52 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 我用P40跑的，22G显存，一样的问题，context_length设置成了32 — Reply to this email directly, view it on GitHub<#4 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHJRI6JAQJ6JUG2IWYTTU4TW5QJNHANCNFSM6AAAAAAWD3FFLE>. You are receiving this because you commented.Message ID: ***@***.***>

from zero_nlp.

zhaodice commented on May 19, 2024

查看这个配置#5 (comment)

试过了，无效

from zero_nlp.

Adherer commented on May 19, 2024

22g显寸不够发自我的 iPhone 在 2023年3月23日，17:26，Adherer @.> 写道： torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 148.00 MiB (GPU 0; 22.38 GiB total capacity; 21.49 GiB already allocated; 87.94 MiB free; 21.52 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 我用P40跑的，22G显存，一样的问题，context_length设置成了32 — Reply to this email directly, view it on GitHub<#4 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHJRI6JAQJ6JUG2IWYTTU4TW5QJNHANCNFSM6AAAAAAWD3FFLE. You are receiving this because you commented.Message ID: @.>

我参考了一下其他repo，8bit量化可以16G的显存finetune，目前暂无支持多卡finetune的版本。因此，后续是否有如下两个优化方向：

8bit/4bit finetune优化；
单机多卡 or 多机多卡优化。
若有相关优化计划，可合作

from zero_nlp.

yuanzhoulvpi2017 commented on May 19, 2024

目前在做两个方向：

使用torch.utils.checkpoint来降低显存压力。
单机多卡

花了一天了，还没什么进展😂，继续努力～

from zero_nlp.

Adherer commented on May 19, 2024

目前在做两个方向：

使用torch.utils.checkpoint来降低显存压力。

单机多卡

花了一天了，还没什么进展😂，继续努力～

可以参考下这个代码：https://github.com/mymusise/ChatGLM-Tuning
我跑通了，正在训练中，明天有空我改下，改成中文训练的

from zero_nlp.

zhaodice commented on May 19, 2024

目前在做两个方向：

使用torch.utils.checkpoint来降低显存压力。

单机多卡

花了一天了，还没什么进展😂，继续努力～

可以参考下这个代码：https://github.com/mymusise/ChatGLM-Tuning 我跑通了，正在训练中，明天有空我改下，改成中文训练的

这个我也跑通了，但不知道是不是方法有问题，训练效果并不理想，似乎是在胡说八道(

from zero_nlp.

yuanzhoulvpi2017 commented on May 19, 2024

目前不会考虑8bit。因为要安装bitstandbytes。这个包检测不到cuda（在我的电脑上）懒得搞这个仓库我看了，非常优秀。但是我的目的是使用huggingface的全家桶来做训练，而且那个仓库代码封装过多，我也不喜欢。发自我的 iPhone 在 2023年3月24日，00:31，Adherer ***@***.***> 写道：目前在做两个方向： 1. 使用torch.utils.checkpoint来降低显存压力。 2. 单机多卡花了一天了，还没什么进展😂，继续努力～可以参考下这个代码：https://github.com/mymusise/ChatGLM-Tuning，我跑通了，正在训练中，明天有空我改下，改成中文训练的<https://github.com/mymusise/ChatGLM-Tuning%EF%BC%8C%E6%88%91%E8%B7%91%E9%80%9A%E4%BA%86%EF%BC%8C%E6%AD%A3%E5%9C%A8%E8%AE%AD%E7%BB%83%E4%B8%AD%EF%BC%8C%E6%98%8E%E5%A4%A9%E6%9C%89%E7%A9%BA%E6%88%91%E6%94%B9%E4%B8%8B%EF%BC%8C%E6%94%B9%E6%88%90%E4%B8%AD%E6%96%87%E8%AE%AD%E7%BB%83%E7%9A%84> — Reply to this email directly, view it on GitHub<#4 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHJRI6LKKIAC5HIYNUDEJ7DW5R3E5ANCNFSM6AAAAAAWD3FFLE>. You are receiving this because you commented.Message ID: ***@***.***>

from zero_nlp.

yuanzhoulvpi2017 commented on May 19, 2024

已经有不少人跑出来了。不知道你们这边是怎么回事，要求就是显存问题。#5 (comment)

可以看截图，跑起来的时候，显寸占用为24330MB

from zero_nlp.

zhaodice commented on May 19, 2024

已经有不少人跑出来了。不知道你们这边是怎么回事，要求就是显存问题。#5 (comment)

可以看截图，跑起来的时候，显寸占用为24330MB

对windows没啥好感，刚双系统打开windows，弹窗问我是否创建GPT分区表，我点了一下确定，训练集LVM分区炸了得重新配置了，正好重试一下（

from zero_nlp.

Adherer commented on May 19, 2024

已经有不少人跑出来了。不知道你们这边是怎么回事，要求就是显存问题。#5 (comment)

可以看截图，跑起来的时候，显寸占用为24330MB

相关问题已解决，模型裁剪即可，现可用13G+显存即可finetune：

from zero_nlp.

Adherer commented on May 19, 2024

已经有不少人跑出来了。不知道你们这边是怎么回事，要求就是显存问题。#5 (comment)
可以看截图，跑起来的时候，显寸占用为24330MB

对windows没啥好感，刚双系统打开windows，弹窗问我是否创建GPT分区表，我点了一下确定，训练集LVM分区炸了得重新配置了，正好重试一下（

相关问题已解决，模型裁剪即可，现可用13G+显存即可finetune：

from zero_nlp.

zhaodice commented on May 19, 2024

已经有不少人跑出来了。不知道你们这边是怎么回事，要求就是显存问题。#5 (comment)
可以看截图，跑起来的时候，显寸占用为24330MB

对windows没啥好感，刚双系统打开windows，弹窗问我是否创建GPT分区表，我点了一下确定，训练集LVM分区炸了得重新配置了，正好重试一下（

相关问题已解决，模型裁剪即可，现可用13G+显存即可finetune：

怎么裁剪…咕

from zero_nlp.

默认配置24g显存还是会爆 about zero_nlp HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent