Coder Social home page Coder Social logo

Comments (8)

shuxueslpi avatar shuxueslpi commented on September 11, 2024

1代模型吗?你batchsize多少?

from chatglm-6b-qlora.

valkryhx avatar valkryhx commented on September 11, 2024

脚本参数 fp32 换成fp16 这样显存只有12.4G
再开quantinize = int4 来SFT 显存只用6.5G

from chatglm-6b-qlora.

sxm7078 avatar sxm7078 commented on September 11, 2024

脚本参数 fp32 换成fp16 这样显存只有12.4G 再开quantinize = int4 来SFT 显存只用6.5G

已经是fp16了,内存占用不断变化,有时到30g

from chatglm-6b-qlora.

shuxueslpi avatar shuxueslpi commented on September 11, 2024

@sxm7078 有修改代码吗?因为代码里默认是int4的加载模型,显存占用应该很小的。
之前2代模型刚出来的时候没有实现activate checkpointing,导致显存占用很大,后来修复了,拉最新的模型就OK了

from chatglm-6b-qlora.

sxm7078 avatar sxm7078 commented on September 11, 2024

@sxm7078 有修改代码吗?因为代码里默认是int4的加载模型,显存占用应该很小的。 之前2代模型刚出来的时候没有实现activate checkpointing,导致显存占用很大,后来修复了,拉最新的模型就OK
微调的chatglm-6b的模型,lora_rank改为8,compute_dtype改为fp16。transformers==4.30.2和accelerate==0.20.3安装的是这个版本,还是dev版本

from chatglm-6b-qlora.

shuxueslpi avatar shuxueslpi commented on September 11, 2024

@sxm7078 现在用这个环境transformers==4.30.2和accelerate==0.20.3,不用dev了,但我觉得你应该不是dev的版本问题

from chatglm-6b-qlora.

oceanlc avatar oceanlc commented on September 11, 2024

我也是这样,使用的是Author的默认参数配置,用的4090卡,直接报CUDA out of memory,看了下报错信息,好像是在prepare_model_for_kbit_training函数里的 param.data = param.data.to(torch.float32)这一步出现显存占用异常
File "/home/lc/anaconda3/envs/pytorch/lib/python3.11/site-packages/peft/utils/other.py", line 81, in prepare_model_for_kbit_training
param.data = param.data.to(torch.float32)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 428.00 MiB (GPU 0; 23.65 GiB total capacity; 21.69 GiB already allocated; 96.00 MiB free; 22.62 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

from chatglm-6b-qlora.

misener7 avatar misener7 commented on September 11, 2024

我也是这样,使用的是Author的默认参数配置,用的4090卡,直接报CUDA out of memory,看了下报错信息,好像是在prepare_model_for_kbit_training函数里的 param.data = param.data.to(torch.float32)这一步出现显存占用异常 File "/home/lc/anaconda3/envs/pytorch/lib/python3.11/site-packages/peft/utils/other.py", line 81, in prepare_model_for_kbit_training param.data = param.data.to(torch.float32) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 428.00 MiB (GPU 0; 23.65 GiB total capacity; 21.69 GiB already allocated; 96.00 MiB free; 22.62 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

请问这个问题解决了吗?

from chatglm-6b-qlora.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.