Coder Social home page Coder Social logo

Comments (16)

jamestwhedbee avatar jamestwhedbee commented on August 27, 2024 5

That looked promising but I unfortunately ran into another issue you probably wouldn't have. I am on AMD so that might be the cause. I can't find anything online related to this issue. I noticed that non-GPTQ int4 quantization does not work for me either, with the same error. int8 quantization works fine and I have run GPTQ int4 quantized models using the auto-gptq library for ROCm before so not sure what this issue is.

Traceback (most recent call last):
  File "/home/telnyxuser/gpt-fast/quantize.py", line 614, in <module>
    quantize(args.checkpoint_path, args.model_name, args.mode, args.groupsize, args.calibration_tasks, args.calibration_limit, args.calibration_seq_length, args.pad_calibration_inputs, args.percdamp, args.blocksize, args.label)
  File "/home/telnyxuser/gpt-fast/quantize.py", line 560, in quantize
    quantized_state_dict = quant_handler.create_quantized_state_dict()
  File "/home/telnyxuser/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/telnyxuser/gpt-fast/quantize.py", line 423, in create_quantized_state_dict
    weight_int4pack, scales_and_zeros = prepare_int4_weight_and_scales_and_zeros(
  File "/home/telnyxuser/gpt-fast/quantize.py", line 358, in prepare_int4_weight_and_scales_and_zeros
    weight_int4pack = torch.ops.aten._convert_weight_to_int4pack(weight_int32, inner_k_tiles)
  File "/home/telnyxuser/.local/lib/python3.10/site-packages/torch/_ops.py", line 753, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: _convert_weight_to_int4pack_cuda is not available for build.

from gpt-fast.

chu-tianxiang avatar chu-tianxiang commented on August 27, 2024 3

According to the code here, probably both cuda 12.x and compute capability 8.0+ are required.

from gpt-fast.

lopuhin avatar lopuhin commented on August 27, 2024

One more issue is very high memory usage, it exceeds 128 GB after processing only the first 9 layers with the 13b model.

from gpt-fast.

jamestwhedbee avatar jamestwhedbee commented on August 27, 2024

I am at the third bullet point here as well, going to just follow along to comments here

from gpt-fast.

lopuhin avatar lopuhin commented on August 27, 2024

@jamestwhedbee to get rid of those python issues you can try to use this fork in the meantime https://github.com/lopuhin/gpt-fast/ -- but I don't have a solution for high RAM usage yet, so in the end I didn't manage to get a converted model.

from gpt-fast.

lopuhin avatar lopuhin commented on August 27, 2024

I got the same error when trying a conversion on another machine with more RAM but an older NVIDIA GPU.

from gpt-fast.

MrD005 avatar MrD005 commented on August 27, 2024

anyone solved all the problem. i am getting all the problem discussed in this thread

from gpt-fast.

MrD005 avatar MrD005 commented on August 27, 2024

@jamestwhedbee @lopuhin i stuck on this
Traceback (most recent call last):
File "quantize.py", line 614, in
quantize(args.checkpoint_path, args.model_name, args.mode, args.groupsize, args.calibration_tasks, args.calibration_limit, args.calibration_seq_length, args.pad_calibration_inputs, args.percdamp, args.blocksize, args.label)
File "quantize.py", line 560, in quantize
quantized_state_dict = quant_handler.create_quantized_state_dict()
File "/root/development/dev/venv/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "quantize.py", line 423, in create_quantized_state_dict
weight_int4pack, scales_and_zeros = prepare_int4_weight_and_scales_and_zeros(
File "quantize.py", line 358, in prepare_int4_weight_and_scales_and_zeros
weight_int4pack = torch.ops.aten._convert_weight_to_int4pack(weight_int32, inner_k_tiles)
File "/root/development/dev/venv/lib/python3.8/site-packages/torch/_ops.py", line 753, in call
return self._op(*args, **kwargs or {})
RuntimeError: _convert_weight_to_int4pack_cuda is not available for build.

are you guys able to solve this?

from gpt-fast.

lopuhin avatar lopuhin commented on August 27, 2024

RuntimeError: _convert_weight_to_int4pack_cuda is not available for build.

@MrD005 I got this error when trying to run on 2080Ti but not on L4 (both using CUDA 12.1) so I suspect this is due to this function missing in lower compute capability.

from gpt-fast.

MrD005 avatar MrD005 commented on August 27, 2024

@lopuhin i am running it on A100 , python 3.8 , with cuda 11.8 nightly so i think it is not about lower compute capability

from gpt-fast.

briandw avatar briandw commented on August 27, 2024

I had the same _convert_weight_to_int4pack_cuda not available problem. It was due to Cuda 11.8 not supporting the operator. Works now with a RTX4090 and 12.1

from gpt-fast.

xin-li-67 avatar xin-li-67 commented on August 27, 2024

I got this problem on my single RTX4090 with Pytorch nightly installed with Cuda 11.8. After I had switched to Pytorch nightly on CUDA12.1, the problem was gone.

from gpt-fast.

lufixSch avatar lufixSch commented on August 27, 2024

@jamestwhedbee did you find a solution for ROCm?

from gpt-fast.

jamestwhedbee avatar jamestwhedbee commented on August 27, 2024

@lufixSch no, but as of last week v0.2.7 of vLLM supports GPTQ with ROCm, and I am seeing pretty good results there. So maybe that is an option for you.

from gpt-fast.

ce1190222 avatar ce1190222 commented on August 27, 2024

I applied all the fixes mentioned. But I'm still getting this error:-
File "/kaggle/working/quantize.py", line 14, in
from GPTQ import GenericGPTQRunner, InputRecorder
File "/kaggle/working/GPTQ.py", line 12, in
from eval import setup_cache_padded_seq_input_pos_max_seq_length_for_prefill
File "/kaggle/working/eval.py", line 20, in
import lm_eval.base
ModuleNotFoundError: No module named 'lm_eval.base'

I am using lm_eval 0.4.0

from gpt-fast.

jerryzh168 avatar jerryzh168 commented on August 27, 2024

lm_eval 0.3.0 and 0.4.0 support is updated in eb1789b

from gpt-fast.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.