Coder Social home page Coder Social logo

ist-daslab / gptq Goto Github PK

View Code? Open in Web Editor NEW
1.8K 1.8K 146.0 296 KB

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Home Page: https://arxiv.org/abs/2210.17323

License: Apache License 2.0

Python 95.72% C++ 0.51% Cuda 3.77%

gptq's People

Contributors

bofeng2477 avatar dalistarh avatar efrantar avatar sashkboos avatar xiuyu-li avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gptq's Issues

Reproduction of the results in the paper

@efrantar
Following the instructions in README.md, baseline and RTN perplexities match exactly as listed in Tables 2-3 in the paper.
However, GPTQ perplexity does not.

Is this due to differences in the calibration sample? Or is the result in the Tables statistics out of multiple runs with different random seeds?
Could you share the command that reproduces the results in the paper?

Much appreciated!

H_inv not updated

After each quantization step, H_inv should be updated, but in the code fasterquant, H_inv is not updated. Is it a bug?

How to run on multi GPUs?

Im try run opt--30b on 4*2080Ti, However, the following error message appears when loading parameters.

Starting ...
Ready.
Traceback (most recent call last):
  File "opt.py", line 424, in <module>
    quantizers = opt_sequential(model, dataloader, DEV)
  File "/home/cciip/miniconda3/envs/int/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "opt.py", line 83, in opt_sequential
    gptq[name] = GPTQ(subset[name])
  File "/home/cciip/private/tianjie/gptq/gptq.py", line 29, in __init__
    self.H = torch.zeros((self.columns, self.columns), device=self.dev)
RuntimeError: CUDA out of memory. Tried to allocate 196.00 MiB (GPU 0; 10.75 GiB total capacity; 9.30 GiB already allocated; 77.62 MiB free; 9.32 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

How can I make it work?

Regarding the method for computing the Hessian matrix.

I would like to ask about line 61 in your gptq.py file: inp = math.sqrt(2 / self.nsamples) * inp.float(). According to the paper, it seems that it should be written as follows: inp = math.sqrt(tmp / self.nsamples) * inp.float(). After making this modification, I noticed a reduction in quantization error. Could you please verify if my understanding is correct, and if there might be any misunderstanding on my part?

Can --save work with --groupsize in opt.py?

Hello there, nice work!

If I understand well, when groupsize is set above 0, the quantizer in module gptq of opt is only responsible for each group. The opt_pack3 is counting on the quantizer.pack function, which has only the zeros and scales of the last group if groupsize is set above 0.

So can --save and --groupsize work together in opt.py right now?

Why are PPL so low on PTB?

Hello

Many thanks for your work, it's great to (finally) see results reported on openly available LLMs 😊
However, I was surprised when I saw perplexities on PTB for OPT and BLOOM models: 10.33 and 13.63 respectively.
Indeed, GPT-3's paper reports a PPL of 20.50 on such dataset and I was wondering whether you had any explanation for this (nearly 2x) difference?

Thanks!

How to adopt GPTQ on Conv2d with `groups` attribute?

Hi,

Thanks for your impressive work! It really helps me quantize lots of large models.
Recently, I try to implement GPTQ on grouped Conv2d layer, but the results seem to be not good.
Could you provide some hints to support GPTQ on grouped Conv2d?

Here is my rough implementation now:

  1. In add_batch function, divide inp into different group and store hessian respectively.
  2. In fasterquant function, divide W into different group and apply GPTQ with chunk of W and corresponding hessian.
  3. Concat the different groups of Q to full Q.

Thank you in advance.

opt_eval error

After quant opt-125m and save the quant model. When I use ‘opt_eval’, get an error: Only supports a single token currently

qweight is empty when I gave --save option

As I want to get the quantized model through the GPTQ algorithm, I gave the --save option when I run the python script.

However, the qweight of each layer is empty because of pack function in Quant3Linear class. (quant.py)
I think the while loop (line 147 ~ line 170) is not executed so the qweight is just an empty ndarray.

If I comment out the while loop, I can get the qweight.
What is the role of the while loop? Can I just comment out and run the transformers?

Title: Feature Request: Add Saving Quantized Weights Functionality to bloom.py

Description:

Hi there,

I noticed that the opt.py file in the repository provides a method for saving quantized weights, but this functionality is not available in the bloom.py file. I was wondering if it would be possible to add this feature to bloom.py as well.

Being able to save quantized weights is a really useful feature for optimizing the size of models, and it would be great to have this functionality available in all relevant files in the repository.

If this feature could be added to bloom.py, I think it would be a really helpful addition for anyone who is working with this file.

Thank you for your time and consideration.

Best regards,

GPTQ for BERT

I'm looking for the GPTQ implementation for BERT, why isn't it in the repository? i want to try 4bit implementation for speed comparison and try other models as well

running speed slow on NVIDIA vGPU

I test qwen-7b GPT-Q quantization on a vGPU that is half of the A10‘s performance.

  • Driver Version:470.161.03
  • CUDA Version: 11.4

I have noticed that the processing speed of the context and the decoding speed are particularly slow,

  • context(500 tokens) processing speed: 48 tokens/s
  • decode speed: 1.6 token/s

Then, I test other model such as https://huggingface.co/ClueAI/ChatYuan-large-v2 and the speed is within expectations. So I guess that GPT-Q does not work well on vGPU?

The code is nothing special, looks like

from auto_gptq import AutoGPTQForCausalLM
model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir, device="cuda:0")
...

Compatibility of Quant3Linear and 4-bit quantization

Hi! I've noticed that the quantization layer would pack the quantized weight using class Quant3Linear, as shown below:
image

However, it seems to me that it only suits for 2bits and 3bits weights. If the original weights in $intweight is 4bits, some bits would be lost.

Could you explain the logic behind this? Thanks!

quantized GPTJ - error on inference

hi there, i'm trying to quantize a finetuned version of gptj trought the https://github.com/AlpinDale/gptq-gptj repo.

To quantize the model i use this command:

CUDA_VISIBLE_DEVICES=0 python gptj.py ../finetuned6B/checkpoint-3000/ c4 --wbits 4 --save GPTJQ.pt

the process complete successfully and the file GPTJQ.pt is produced. The only warning i get is:

Token indices sequence length is longer than the specified maximum sequence length for this model (3403 > 2048). Running this sequence through the model will result in indexing errors.

When i run the inference trough this command:

CUDA_VISIBLE_DEVICES=0 python gptj-inference.py EleutherAI/gpt-j-6b --wbits 4 --load GPTJQ.pt --text "Hello"

i get the following error. what am i doing wrong?

thank you very much for any help!

the error:

CUDA extension not installed.
Loading model ...
Traceback (most recent call last):
File "gptj-inference.py", line 120, in
model = load_quant(args.model, args.load, args.wbits)
File "gptj-inference.py", line 55, in load_quant
model.load_state_dict(torch.load(checkpoint))
File "/home/gianmarco/miniconda3/envs/gpt_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPTJForCausalLM:
Missing key(s) in state_dict: "transformer.h.0.attn.k_proj.qzeros", "transformer.h.0.attn.k_proj.scales", "transformer.h.0.attn.k_proj.bias", "transformer.h.0.attn.k_proj.qweight", "transformer.h.0.attn.v_proj.qzeros", "transformer.h.0.attn.v_proj.scales", "transformer.h.0.attn.v_proj.bias", "transformer.h.0.attn.v_proj.qweight", "transformer.h.0.attn.q_proj.qzeros", "transformer.h.0.attn.q_proj.scales", "transformer.h.0.attn.q_proj.bias", "transformer.h.0.attn.q_proj.qweight", "transformer.h.0.attn.out_proj.qzeros", "transformer.h.0.attn.out_proj.scales", "transformer.h.0.attn.out_proj.bias", "transformer.h.0.attn.out_proj.qweight", "transformer.h.0.mlp.fc_in.qzeros", "transformer.h.0.mlp.fc_in.scales", "transformer.h.0.mlp.fc_in.qweight", "transformer.h.0.mlp.fc_out.qzeros", "transformer.h.0.mlp.fc_out.scales", "transformer.h.0.mlp.fc_out.qweight", "transformer.h.1.attn.k_proj.qzeros", "transformer.h.1.attn.k_proj.scales", "transformer.h.1.attn.k_proj.bias", "transformer.h.1.attn.k_proj.qweight", "transformer.h.1.attn.v_proj.qzeros", "transformer.h.1.attn.v_proj.scales", "transformer.h.1.attn.v_proj.bias", "transformer.h.1.attn.v_proj.qweight", "transformer.h.1.attn.q_proj.qzeros", "transformer.h.1.attn.q_proj.scales", "transformer.h.1.attn.q_proj.bias", "transformer.h.1.attn.q_proj.qweight", "transformer.h.1.attn.out_proj.qzeros", "transformer.h.1.attn.out_proj.scales", "transformer.h.1.attn.out_proj.bias", "transformer.h.1.attn.out_proj.qweight", "transformer.h.1.mlp.fc_in.qzeros", "transformer.h.1.mlp.fc_in.scales", "transformer.h.1.mlp.fc_in.qweight", "transformer.h.1.mlp.fc_out.qzeros", "transformer.h.1.mlp.fc_out.scales", "transformer.h.1.mlp.fc_out.qweight", "transformer.h.2.attn.k_proj.qzeros", "transformer.h.2.attn.k_proj.scales", "transformer.h.2.attn.k_proj.bias", "transformer.h.2.attn.k_proj.qweight", "transformer.h.2.attn.v_proj.qzeros", "transformer.h.2.attn.v_proj.scales", "transformer.h.2.attn.v_proj.bias", "transformer.h.2.attn.v_proj.qweight", "transformer.h.2.attn.q_proj.qzeros", "transformer.h.2.attn.q_proj.scales", "transformer.h.2.attn.q_proj.bias", "transformer.h.2.attn.q_proj.qweight", "transformer.h.2.attn.out_proj.qzeros", "transformer.h.2.attn.out_proj.scales", "transformer.h.2.attn.out_proj.bias", "transformer.h.2.attn.out_proj.qweight", "transformer.h.2.mlp.fc_in.qzeros", "transformer.h.2.mlp.fc_in.scales", "transformer.h.2.mlp.fc_in.qweight", "transformer.h.2.mlp.fc_out.qzeros", "transformer.h.2.mlp.fc_out.scales", "transformer.h.2.mlp.fc_out.qweight", "transformer.h.3.attn.k_proj.qzeros", "transformer.h.3.attn.k_proj.scales", "transformer.h.3.attn.k_proj.bias", "transformer.h.3.attn.k_proj.qweight", "transformer.h.3.attn.v_proj.qzeros", "transformer.h.3.attn.v_proj.scales", "transformer.h.3.attn.v_proj.bias", "transformer.h.3.attn.v_proj.qweight", "transformer.h.3.attn.q_proj.qzeros", "transformer.h.3.attn.q_proj.scales", "transformer.h.3.attn.q_proj.bias", "transformer.h.3.attn.q_proj.qweight", "transformer.h.3.attn.out_proj.qzeros", "transformer.h.3.attn.out_proj.scales", "transformer.h.3.attn.out_proj.bias", "transformer.h.3.attn.out_proj.qweight", "transformer.h.3.mlp.fc_in.qzeros", "transformer.h.3.mlp.fc_in.scales", "transformer.h.3.mlp.fc_in.qweight", "transformer.h.3.mlp.fc_out.qzeros", "transformer.h.3.mlp.fc_out.scales", "transformer.h.3.mlp.fc_out.qweight", "transformer.h.4.attn.k_proj.qzeros", "transformer.h.4.attn.k_proj.scales", "transformer.h.4.attn.k_proj.bias", "transformer.h.4.attn.k_proj.qweight", "transformer.h.4.attn.v_proj.qzeros", "transformer.h.4.attn.v_proj.scales", "transformer.h.4.attn.v_proj.bias", "transformer.h.4.attn.v_proj.qweight", "transformer.h.4.attn.q_proj.qzeros", "transformer.h.4.attn.q_proj.scales", "transformer.h.4.attn.q_proj.bias", "transformer.h.4.attn.q_proj.qweight", "transformer.h.4.attn.out_proj.qzeros", "transformer.h.4.attn.out_proj.scales", "transformer.h.4.attn.out_proj.bias", "transformer.h.4.attn.out_proj.qweight", "transformer.h.4.mlp.fc_in.qzeros", "transformer.h.4.mlp.fc_in.scales", "transformer.h.4.mlp.fc_in.qweight", "transformer.h.4.mlp.fc_out.qzeros", "transformer.h.4.mlp.fc_out.scales", "transformer.h.4.mlp.fc_out.qweight", "transformer.h.5.attn.k_proj.qzeros", "transformer.h.5.attn.k_proj.scales", "transformer.h.5.attn.k_proj.bias", "transformer.h.5.attn.k_proj.qweight", "transformer.h.5.attn.v_proj.qzeros", "transformer.h.5.attn.v_proj.scales", "transformer.h.5.attn.v_proj.bias", "transformer.h.5.attn.v_proj.qweight", "transformer.h.5.attn.q_proj.qzeros", "transformer.h.5.attn.q_proj.scales", "transformer.h.5.attn.q_proj.bias", "transformer.h.5.attn.q_proj.qweight", "transformer.h.5.attn.out_proj.qzeros", "transformer.h.5.attn.out_proj.scales", "transformer.h.5.attn.out_proj.bias", "transformer.h.5.attn.out_proj.qweight", "transformer.h.5.mlp.fc_in.qzeros", "transformer.h.5.mlp.fc_in.scales", "transformer.h.5.mlp.fc_in.qweight", "transformer.h.5.mlp.fc_out.qzeros", "transformer.h.5.mlp.fc_out.scales", "transformer.h.5.mlp.fc_out.qweight", "transformer.h.6.attn.k_proj.qzeros", "transformer.h.6.attn.k_proj.scales", "transformer.h.6.attn.k_proj.bias", "transformer.h.6.attn.k_proj.qweight", "transformer.h.6.attn.v_proj.qzeros", "transformer.h.6.attn.v_proj.scales", "transformer.h.6.attn.v_proj.bias", "transformer.h.6.attn.v_proj.qweight", "transformer.h.6.attn.q_proj.qzeros", "transformer.h.6.attn.q_proj.scales", "transformer.h.6.attn.q_proj.bias", "transformer.h.6.attn.q_proj.qweight", "transformer.h.6.attn.out_proj.qzeros", "transformer.h.6.attn.out_proj.scales", "transformer.h.6.attn.out_proj.bias", "transformer.h.6.attn.out_proj.qweight", "transformer.h.6.mlp.fc_in.qzeros", "transformer.h.6.mlp.fc_in.scales", "transformer.h.6.mlp.fc_in.qweight", "transformer.h.6.mlp.fc_out.qzeros", "transformer.h.6.mlp.fc_out.scales", "transformer.h.6.mlp.fc_out.qweight", "transformer.h.7.attn.k_proj.qzeros", "transformer.h.7.attn.k_proj.scales", "transformer.h.7.attn.k_proj.bias", "transformer.h.7.attn.k_proj.qweight", "transformer.h.7.attn.v_proj.qzeros", "transformer.h.7.attn.v_proj.scales", "transformer.h.7.attn.v_proj.bias", "transformer.h.7.attn.v_proj.qweight", "transformer.h.7.attn.q_proj.qzeros", "transformer.h.7.attn.q_proj.scales", "transformer.h.7.attn.q_proj.bias", "transformer.h.7.attn.q_proj.qweight", "transformer.h.7.attn.out_proj.qzeros", "transformer.h.7.attn.out_proj.scales", "transformer.h.7.attn.out_proj.bias", "transformer.h.7.attn.out_proj.qweight", "transformer.h.7.mlp.fc_in.qzeros", "transformer.h.7.mlp.fc_in.scales", "transformer.h.7.mlp.fc_in.qweight", "transformer.h.7.mlp.fc_out.qzeros", "transformer.h.7.mlp.fc_out.scales", "transformer.h.7.mlp.fc_out.qweight", "transformer.h.8.attn.k_proj.qzeros", "transformer.h.8.attn.k_proj.scales", "transformer.h.8.attn.k_proj.bias", "transformer.h.8.attn.k_proj.qweight", "transformer.h.8.attn.v_proj.qzeros", "transformer.h.8.attn.v_proj.scales", "transformer.h.8.attn.v_proj.bias", "transformer.h.8.attn.v_proj.qweight", "transformer.h.8.attn.q_proj.qzeros", "transformer.h.8.attn.q_proj.scales", "transformer.h.8.attn.q_proj.bias", "transformer.h.8.attn.q_proj.qweight", "transformer.h.8.attn.out_proj.qzeros", "transformer.h.8.attn.out_proj.scales", "transformer.h.8.attn.out_proj.bias", "transformer.h.8.attn.out_proj.qweight", "transformer.h.8.mlp.fc_in.qzeros", "transformer.h.8.mlp.fc_in.scales", "transformer.h.8.mlp.fc_in.qweight", "transformer.h.8.mlp.fc_out.qzeros", "transformer.h.8.mlp.fc_out.scales", "transformer.h.8.mlp.fc_out.qweight", "transformer.h.9.attn.k_proj.qzeros", "transformer.h.9.attn.k_proj.scales", "transformer.h.9.attn.k_proj.bias", "transformer.h.9.attn.k_proj.qweight", "transformer.h.9.attn.v_proj.qzeros", "transformer.h.9.attn.v_proj.scales", "transformer.h.9.attn.v_proj.bias", "transformer.h.9.attn.v_proj.qweight", "transformer.h.9.attn.q_proj.qzeros", "transformer.h.9.attn.q_proj.scales", "transformer.h.9.attn.q_proj.bias", "transformer.h.9.attn.q_proj.qweight", "transformer.h.9.attn.out_proj.qzeros", "transformer.h.9.attn.out_proj.scales", "transformer.h.9.attn.out_proj.bias", "transformer.h.9.attn.out_proj.qweight", "transformer.h.9.mlp.fc_in.qzeros", "transformer.h.9.mlp.fc_in.scales", "transformer.h.9.mlp.fc_in.qweight", "transformer.h.9.mlp.fc_out.qzeros", "transformer.h.9.mlp.fc_out.scales", "transformer.h.9.mlp.fc_out.qweight", "transformer.h.10.attn.k_proj.qzeros", "transformer.h.10.attn.k_proj.scales", "transformer.h.10.attn.k_proj.bias", "transformer.h.10.attn.k_proj.qweight", "transformer.h.10.attn.v_proj.qzeros", "transformer.h.10.attn.v_proj.scales", "transformer.h.10.attn.v_proj.bias", "transformer.h.10.attn.v_proj.qweight", "transformer.h.10.attn.q_proj.qzeros", "transformer.h.10.attn.q_proj.scales", "transformer.h.10.attn.q_proj.bias", "transformer.h.10.attn.q_proj.qweight", "transformer.h.10.attn.out_proj.qzeros", "transformer.h.10.attn.out_proj.scales", "transformer.h.10.attn.out_proj.bias", "transformer.h.10.attn.out_proj.qweight", "transformer.h.10.mlp.fc_in.qzeros", "transformer.h.10.mlp.fc_in.scales", "transformer.h.10.mlp.fc_in.qweight", "transformer.h.10.mlp.fc_out.qzeros", "transformer.h.10.mlp.fc_out.scales", "transformer.h.10.mlp.fc_out.qweight", "transformer.h.11.attn.k_proj.qzeros", "transformer.h.11.attn.k_proj.scales", "transformer.h.11.attn.k_proj.bias", "transformer.h.11.attn.k_proj.qweight", "transformer.h.11.attn.v_proj.qzeros", "transformer.h.11.attn.v_proj.scales", "transformer.h.11.attn.v_proj.bias", "transformer.h.11.attn.v_proj.qweight", "transformer.h.11.attn.q_proj.qzeros", "transformer.h.11.attn.q_proj.scales", "transformer.h.11.attn.q_proj.bias", "transformer.h.11.attn.q_proj.qweight", "transformer.h.11.attn.out_proj.qzeros", "transformer.h.11.attn.out_proj.scales", "transformer.h.11.attn.out_proj.bias", "transformer.h.11.attn.out_proj.qweight", "transformer.h.11.mlp.fc_in.qzeros", "transformer.h.11.mlp.fc_in.scales", "transformer.h.11.mlp.fc_in.qweight", "transformer.h.11.mlp.fc_out.qzeros", "transformer.h.11.mlp.fc_out.scales", "transformer.h.11.mlp.fc_out.qweight", "transformer.h.12.attn.k_proj.qzeros", "transformer.h.12.attn.k_proj.scales", "transformer.h.12.attn.k_proj.bias", "transformer.h.12.attn.k_proj.qweight", "transformer.h.12.attn.v_proj.qzeros", "transformer.h.12.attn.v_proj.scales", "transformer.h.12.attn.v_proj.bias", "transformer.h.12.attn.v_proj.qweight", "transformer.h.12.attn.q_proj.qzeros", "transformer.h.12.attn.q_proj.scales", "transformer.h.12.attn.q_proj.bias", "transformer.h.12.attn.q_proj.qweight", "transformer.h.12.attn.out_proj.qzeros", "transformer.h.12.attn.out_proj.scales", "transformer.h.12.attn.out_proj.bias", "transformer.h.12.attn.out_proj.qweight", "transformer.h.12.mlp.fc_in.qzeros", "transformer.h.12.mlp.fc_in.scales", "transformer.h.12.mlp.fc_in.qweight", "transformer.h.12.mlp.fc_out.qzeros", "transformer.h.12.mlp.fc_out.scales", "transformer.h.12.mlp.fc_out.qweight", "transformer.h.13.attn.k_proj.qzeros", "transformer.h.13.attn.k_proj.scales", "transformer.h.13.attn.k_proj.bias", "transformer.h.13.attn.k_proj.qweight", "transformer.h.13.attn.v_proj.qzeros", "transformer.h.13.attn.v_proj.scales", "transformer.h.13.attn.v_proj.bias", "transformer.h.13.attn.v_proj.qweight", "transformer.h.13.attn.q_proj.qzeros", "transformer.h.13.attn.q_proj.scales", "transformer.h.13.attn.q_proj.bias", "transformer.h.13.attn.q_proj.qweight", "transformer.h.13.attn.out_proj.qzeros", "transformer.h.13.attn.out_proj.scales", "transformer.h.13.attn.out_proj.bias", "transformer.h.13.attn.out_proj.qweight", "transformer.h.13.mlp.fc_in.qzeros", "transformer.h.13.mlp.fc_in.scales", "transformer.h.13.mlp.fc_in.qweight", "transformer.h.13.mlp.fc_out.qzeros", "transformer.h.13.mlp.fc_out.scales", "transformer.h.13.mlp.fc_out.qweight", "transformer.h.14.attn.k_proj.qzeros", "transformer.h.14.attn.k_proj.scales", "transformer.h.14.attn.k_proj.bias", "transformer.h.14.attn.k_proj.qweight", "transformer.h.14.attn.v_proj.qzeros", "transformer.h.14.attn.v_proj.scales", "transformer.h.14.attn.v_proj.bias", "transformer.h.14.attn.v_proj.qweight", "transformer.h.14.attn.q_proj.qzeros", "transformer.h.14.attn.q_proj.scales", "transformer.h.14.attn.q_proj.bias", "transformer.h.14.attn.q_proj.qweight", "transformer.h.14.attn.out_proj.qzeros", "transformer.h.14.attn.out_proj.scales", "transformer.h.14.attn.out_proj.bias", "transformer.h.14.attn.out_proj.qweight", "transformer.h.14.mlp.fc_in.qzeros", "transformer.h.14.mlp.fc_in.scales", "transformer.h.14.mlp.fc_in.qweight", "transformer.h.14.mlp.fc_out.qzeros", "transformer.h.14.mlp.fc_out.scales", "transformer.h.14.mlp.fc_out.qweight", "transformer.h.15.attn.k_proj.qzeros", "transformer.h.15.attn.k_proj.scales", "transformer.h.15.attn.k_proj.bias", "transformer.h.15.attn.k_proj.qweight", "transformer.h.15.attn.v_proj.qzeros", "transformer.h.15.attn.v_proj.scales", "transformer.h.15.attn.v_proj.bias", "transformer.h.15.attn.v_proj.qweight", "transformer.h.15.attn.q_proj.qzeros", "transformer.h.15.attn.q_proj.scales", "transformer.h.15.attn.q_proj.bias", "transformer.h.15.attn.q_proj.qweight", "transformer.h.15.attn.out_proj.qzeros", "transformer.h.15.attn.out_proj.scales", "transformer.h.15.attn.out_proj.bias", "transformer.h.15.attn.out_proj.qweight", "transformer.h.15.mlp.fc_in.qzeros", "transformer.h.15.mlp.fc_in.scales", "transformer.h.15.mlp.fc_in.qweight", "transformer.h.15.mlp.fc_out.qzeros", "transformer.h.15.mlp.fc_out.scales", "transformer.h.15.mlp.fc_out.qweight", "transformer.h.16.attn.k_proj.qzeros", "transformer.h.16.attn.k_proj.scales", "transformer.h.16.attn.k_proj.bias", "transformer.h.16.attn.k_proj.qweight", "transformer.h.16.attn.v_proj.qzeros", "transformer.h.16.attn.v_proj.scales", "transformer.h.16.attn.v_proj.bias", "transformer.h.16.attn.v_proj.qweight", "transformer.h.16.attn.q_proj.qzeros", "transformer.h.16.attn.q_proj.scales", "transformer.h.16.attn.q_proj.bias", "transformer.h.16.attn.q_proj.qweight", "transformer.h.16.attn.out_proj.qzeros", "transformer.h.16.attn.out_proj.scales", "transformer.h.16.attn.out_proj.bias", "transformer.h.16.attn.out_proj.qweight", "transformer.h.16.mlp.fc_in.qzeros", "transformer.h.16.mlp.fc_in.scales", "transformer.h.16.mlp.fc_in.qweight", "transformer.h.16.mlp.fc_out.qzeros", "transformer.h.16.mlp.fc_out.scales", "transformer.h.16.mlp.fc_out.qweight", "transformer.h.17.attn.k_proj.qzeros", "transformer.h.17.attn.k_proj.scales", "transformer.h.17.attn.k_proj.bias", "transformer.h.17.attn.k_proj.qweight", "transformer.h.17.attn.v_proj.qzeros", "transformer.h.17.attn.v_proj.scales", "transformer.h.17.attn.v_proj.bias", "transformer.h.17.attn.v_proj.qweight", "transformer.h.17.attn.q_proj.qzeros", "transformer.h.17.attn.q_proj.scales", "transformer.h.17.attn.q_proj.bias", "transformer.h.17.attn.q_proj.qweight", "transformer.h.17.attn.out_proj.qzeros", "transformer.h.17.attn.out_proj.scales", "transformer.h.17.attn.out_proj.bias", "transformer.h.17.attn.out_proj.qweight", "transformer.h.17.mlp.fc_in.qzeros", "transformer.h.17.mlp.fc_in.scales", "transformer.h.17.mlp.fc_in.qweight", "transformer.h.17.mlp.fc_out.qzeros", "transformer.h.17.mlp.fc_out.scales", "transformer.h.17.mlp.fc_out.qweight", "transformer.h.18.attn.k_proj.qzeros", "transformer.h.18.attn.k_proj.scales", "transformer.h.18.attn.k_proj.bias", "transformer.h.18.attn.k_proj.qweight", "transformer.h.18.attn.v_proj.qzeros", "transformer.h.18.attn.v_proj.scales", "transformer.h.18.attn.v_proj.bias", "transformer.h.18.attn.v_proj.qweight", "transformer.h.18.attn.q_proj.qzeros", "transformer.h.18.attn.q_proj.scales", "transformer.h.18.attn.q_proj.bias", "transformer.h.18.attn.q_proj.qweight", "transformer.h.18.attn.out_proj.qzeros", "transformer.h.18.attn.out_proj.scales", "transformer.h.18.attn.out_proj.bias", "transformer.h.18.attn.out_proj.qweight", "transformer.h.18.mlp.fc_in.qzeros", "transformer.h.18.mlp.fc_in.scales", "transformer.h.18.mlp.fc_in.qweight", "transformer.h.18.mlp.fc_out.qzeros", "transformer.h.18.mlp.fc_out.scales", "transformer.h.18.mlp.fc_out.qweight", "transformer.h.19.attn.k_proj.qzeros", "transformer.h.19.attn.k_proj.scales", "transformer.h.19.attn.k_proj.bias", "transformer.h.19.attn.k_proj.qweight", "transformer.h.19.attn.v_proj.qzeros", "transformer.h.19.attn.v_proj.scales", "transformer.h.19.attn.v_proj.bias", "transformer.h.19.attn.v_proj.qweight", "transformer.h.19.attn.q_proj.qzeros", "transformer.h.19.attn.q_proj.scales", "transformer.h.19.attn.q_proj.bias", "transformer.h.19.attn.q_proj.qweight", "transformer.h.19.attn.out_proj.qzeros", "transformer.h.19.attn.out_proj.scales", "transformer.h.19.attn.out_proj.bias", "transformer.h.19.attn.out_proj.qweight", "transformer.h.19.mlp.fc_in.qzeros", "transformer.h.19.mlp.fc_in.scales", "transformer.h.19.mlp.fc_in.qweight", "transformer.h.19.mlp.fc_out.qzeros", "transformer.h.19.mlp.fc_out.scales", "transformer.h.19.mlp.fc_out.qweight", "transformer.h.20.attn.k_proj.qzeros", "transformer.h.20.attn.k_proj.scales", "transformer.h.20.attn.k_proj.bias", "transformer.h.20.attn.k_proj.qweight", "transformer.h.20.attn.v_proj.qzeros", "transformer.h.20.attn.v_proj.scales", "transformer.h.20.attn.v_proj.bias", "transformer.h.20.attn.v_proj.qweight", "transformer.h.20.attn.q_proj.qzeros", "transformer.h.20.attn.q_proj.scales", "transformer.h.20.attn.q_proj.bias", "transformer.h.20.attn.q_proj.qweight", "transformer.h.20.attn.out_proj.qzeros", "transformer.h.20.attn.out_proj.scales", "transformer.h.20.attn.out_proj.bias", "transformer.h.20.attn.out_proj.qweight", "transformer.h.20.mlp.fc_in.qzeros", "transformer.h.20.mlp.fc_in.scales", "transformer.h.20.mlp.fc_in.qweight", "transformer.h.20.mlp.fc_out.qzeros", "transformer.h.20.mlp.fc_out.scales", "transformer.h.20.mlp.fc_out.qweight", "transformer.h.21.attn.k_proj.qzeros", "transformer.h.21.attn.k_proj.scales", "transformer.h.21.attn.k_proj.bias", "transformer.h.21.attn.k_proj.qweight", "transformer.h.21.attn.v_proj.qzeros", "transformer.h.21.attn.v_proj.scales", "transformer.h.21.attn.v_proj.bias", "transformer.h.21.attn.v_proj.qweight", "transformer.h.21.attn.q_proj.qzeros", "transformer.h.21.attn.q_proj.scales", "transformer.h.21.attn.q_proj.bias", "transformer.h.21.attn.q_proj.qweight", "transformer.h.21.attn.out_proj.qzeros", "transformer.h.21.attn.out_proj.scales", "transformer.h.21.attn.out_proj.bias", "transformer.h.21.attn.out_proj.qweight", "transformer.h.21.mlp.fc_in.qzeros", "transformer.h.21.mlp.fc_in.scales", "transformer.h.21.mlp.fc_in.qweight", "transformer.h.21.mlp.fc_out.qzeros", "transformer.h.21.mlp.fc_out.scales", "transformer.h.21.mlp.fc_out.qweight", "transformer.h.22.attn.k_proj.qzeros", "transformer.h.22.attn.k_proj.scales", "transformer.h.22.attn.k_proj.bias", "transformer.h.22.attn.k_proj.qweight", "transformer.h.22.attn.v_proj.qzeros", "transformer.h.22.attn.v_proj.scales", "transformer.h.22.attn.v_proj.bias", "transformer.h.22.attn.v_proj.qweight", "transformer.h.22.attn.q_proj.qzeros", "transformer.h.22.attn.q_proj.scales", "transformer.h.22.attn.q_proj.bias", "transformer.h.22.attn.q_proj.qweight", "transformer.h.22.attn.out_proj.qzeros", "transformer.h.22.attn.out_proj.scales", "transformer.h.22.attn.out_proj.bias", "transformer.h.22.attn.out_proj.qweight", "transformer.h.22.mlp.fc_in.qzeros", "transformer.h.22.mlp.fc_in.scales", "transformer.h.22.mlp.fc_in.qweight", "transformer.h.22.mlp.fc_out.qzeros", "transformer.h.22.mlp.fc_out.scales", "transformer.h.22.mlp.fc_out.qweight", "transformer.h.23.attn.k_proj.qzeros", "transformer.h.23.attn.k_proj.scales", "transformer.h.23.attn.k_proj.bias", "transformer.h.23.attn.k_proj.qweight", "transformer.h.23.attn.v_proj.qzeros", "transformer.h.23.attn.v_proj.scales", "transformer.h.23.attn.v_proj.bias", "transformer.h.23.attn.v_proj.qweight", "transformer.h.23.attn.q_proj.qzeros", "transformer.h.23.attn.q_proj.scales", "transformer.h.23.attn.q_proj.bias", "transformer.h.23.attn.q_proj.qweight", "transformer.h.23.attn.out_proj.qzeros", "transformer.h.23.attn.out_proj.scales", "transformer.h.23.attn.out_proj.bias", "transformer.h.23.attn.out_proj.qweight", "transformer.h.23.mlp.fc_in.qzeros", "transformer.h.23.mlp.fc_in.scales", "transformer.h.23.mlp.fc_in.qweight", "transformer.h.23.mlp.fc_out.qzeros", "transformer.h.23.mlp.fc_out.scales", "transformer.h.23.mlp.fc_out.qweight", "transformer.h.24.attn.k_proj.qzeros", "transformer.h.24.attn.k_proj.scales", "transformer.h.24.attn.k_proj.bias", "transformer.h.24.attn.k_proj.qweight", "transformer.h.24.attn.v_proj.qzeros", "transformer.h.24.attn.v_proj.scales", "transformer.h.24.attn.v_proj.bias", "transformer.h.24.attn.v_proj.qweight", "transformer.h.24.attn.q_proj.qzeros", "transformer.h.24.attn.q_proj.scales", "transformer.h.24.attn.q_proj.bias", "transformer.h.24.attn.q_proj.qweight", "transformer.h.24.attn.out_proj.qzeros", "transformer.h.24.attn.out_proj.scales", "transformer.h.24.attn.out_proj.bias", "transformer.h.24.attn.out_proj.qweight", "transformer.h.24.mlp.fc_in.qzeros", "transformer.h.24.mlp.fc_in.scales", "transformer.h.24.mlp.fc_in.qweight", "transformer.h.24.mlp.fc_out.qzeros", "transformer.h.24.mlp.fc_out.scales", "transformer.h.24.mlp.fc_out.qweight", "transformer.h.25.attn.k_proj.qzeros", "transformer.h.25.attn.k_proj.scales", "transformer.h.25.attn.k_proj.bias", "transformer.h.25.attn.k_proj.qweight", "transformer.h.25.attn.v_proj.qzeros", "transformer.h.25.attn.v_proj.scales", "transformer.h.25.attn.v_proj.bias", "transformer.h.25.attn.v_proj.qweight", "transformer.h.25.attn.q_proj.qzeros", "transformer.h.25.attn.q_proj.scales", "transformer.h.25.attn.q_proj.bias", "transformer.h.25.attn.q_proj.qweight", "transformer.h.25.attn.out_proj.qzeros", "transformer.h.25.attn.out_proj.scales", "transformer.h.25.attn.out_proj.bias", "transformer.h.25.attn.out_proj.qweight", "transformer.h.25.mlp.fc_in.qzeros", "transformer.h.25.mlp.fc_in.scales", "transformer.h.25.mlp.fc_in.qweight", "transformer.h.25.mlp.fc_out.qzeros", "transformer.h.25.mlp.fc_out.scales", "transformer.h.25.mlp.fc_out.qweight", "transformer.h.26.attn.k_proj.qzeros", "transformer.h.26.attn.k_proj.scales", "transformer.h.26.attn.k_proj.bias", "transformer.h.26.attn.k_proj.qweight", "transformer.h.26.attn.v_proj.qzeros", "transformer.h.26.attn.v_proj.scales", "transformer.h.26.attn.v_proj.bias", "transformer.h.26.attn.v_proj.qweight", "transformer.h.26.attn.q_proj.qzeros", "transformer.h.26.attn.q_proj.scales", "transformer.h.26.attn.q_proj.bias", "transformer.h.26.attn.q_proj.qweight", "transformer.h.26.attn.out_proj.qzeros", "transformer.h.26.attn.out_proj.scales", "transformer.h.26.attn.out_proj.bias", "transformer.h.26.attn.out_proj.qweight", "transformer.h.26.mlp.fc_in.qzeros", "transformer.h.26.mlp.fc_in.scales", "transformer.h.26.mlp.fc_in.qweight", "transformer.h.26.mlp.fc_out.qzeros", "transformer.h.26.mlp.fc_out.scales", "transformer.h.26.mlp.fc_out.qweight", "transformer.h.27.attn.k_proj.qzeros", "transformer.h.27.attn.k_proj.scales", "transformer.h.27.attn.k_proj.bias", "transformer.h.27.attn.k_proj.qweight", "transformer.h.27.attn.v_proj.qzeros", "transformer.h.27.attn.v_proj.scales", "transformer.h.27.attn.v_proj.bias", "transformer.h.27.attn.v_proj.qweight", "transformer.h.27.attn.q_proj.qzeros", "transformer.h.27.attn.q_proj.scales", "transformer.h.27.attn.q_proj.bias", "transformer.h.27.attn.q_proj.qweight", "transformer.h.27.attn.out_proj.qzeros", "transformer.h.27.attn.out_proj.scales", "transformer.h.27.attn.out_proj.bias", "transformer.h.27.attn.out_proj.qweight", "transformer.h.27.mlp.fc_in.qzeros", "transformer.h.27.mlp.fc_in.scales", "transformer.h.27.mlp.fc_in.qweight", "transformer.h.27.mlp.fc_out.qzeros", "transformer.h.27.mlp.fc_out.scales", "transformer.h.27.mlp.fc_out.qweight".
Unexpected key(s) in state_dict: "transformer.h.0.attn.k_proj.weight", "transformer.h.0.attn.v_proj.weight", "transformer.h.0.attn.q_proj.weight", "transformer.h.0.attn.out_proj.weight", "transformer.h.0.mlp.fc_in.weight", "transformer.h.0.mlp.fc_out.weight", "transformer.h.1.attn.k_proj.weight", "transformer.h.1.attn.v_proj.weight", "transformer.h.1.attn.q_proj.weight", "transformer.h.1.attn.out_proj.weight", "transformer.h.1.mlp.fc_in.weight", "transformer.h.1.mlp.fc_out.weight", "transformer.h.2.attn.k_proj.weight", "transformer.h.2.attn.v_proj.weight", "transformer.h.2.attn.q_proj.weight", "transformer.h.2.attn.out_proj.weight", "transformer.h.2.mlp.fc_in.weight", "transformer.h.2.mlp.fc_out.weight", "transformer.h.3.attn.k_proj.weight", "transformer.h.3.attn.v_proj.weight", "transformer.h.3.attn.q_proj.weight", "transformer.h.3.attn.out_proj.weight", "transformer.h.3.mlp.fc_in.weight", "transformer.h.3.mlp.fc_out.weight", "transformer.h.4.attn.k_proj.weight", "transformer.h.4.attn.v_proj.weight", "transformer.h.4.attn.q_proj.weight", "transformer.h.4.attn.out_proj.weight", "transformer.h.4.mlp.fc_in.weight", "transformer.h.4.mlp.fc_out.weight", "transformer.h.5.attn.k_proj.weight", "transformer.h.5.attn.v_proj.weight", "transformer.h.5.attn.q_proj.weight", "transformer.h.5.attn.out_proj.weight", "transformer.h.5.mlp.fc_in.weight", "transformer.h.5.mlp.fc_out.weight", "transformer.h.6.attn.k_proj.weight", "transformer.h.6.attn.v_proj.weight", "transformer.h.6.attn.q_proj.weight", "transformer.h.6.attn.out_proj.weight", "transformer.h.6.mlp.fc_in.weight", "transformer.h.6.mlp.fc_out.weight", "transformer.h.7.attn.k_proj.weight", "transformer.h.7.attn.v_proj.weight", "transformer.h.7.attn.q_proj.weight", "transformer.h.7.attn.out_proj.weight", "transformer.h.7.mlp.fc_in.weight", "transformer.h.7.mlp.fc_out.weight", "transformer.h.8.attn.k_proj.weight", "transformer.h.8.attn.v_proj.weight", "transformer.h.8.attn.q_proj.weight", "transformer.h.8.attn.out_proj.weight", "transformer.h.8.mlp.fc_in.weight", "transformer.h.8.mlp.fc_out.weight", "transformer.h.9.attn.k_proj.weight", "transformer.h.9.attn.v_proj.weight", "transformer.h.9.attn.q_proj.weight", "transformer.h.9.attn.out_proj.weight", "transformer.h.9.mlp.fc_in.weight", "transformer.h.9.mlp.fc_out.weight", "transformer.h.10.attn.k_proj.weight", "transformer.h.10.attn.v_proj.weight", "transformer.h.10.attn.q_proj.weight", "transformer.h.10.attn.out_proj.weight", "transformer.h.10.mlp.fc_in.weight", "transformer.h.10.mlp.fc_out.weight", "transformer.h.11.attn.k_proj.weight", "transformer.h.11.attn.v_proj.weight", "transformer.h.11.attn.q_proj.weight", "transformer.h.11.attn.out_proj.weight", "transformer.h.11.mlp.fc_in.weight", "transformer.h.11.mlp.fc_out.weight", "transformer.h.12.attn.k_proj.weight", "transformer.h.12.attn.v_proj.weight", "transformer.h.12.attn.q_proj.weight", "transformer.h.12.attn.out_proj.weight", "transformer.h.12.mlp.fc_in.weight", "transformer.h.12.mlp.fc_out.weight", "transformer.h.13.attn.k_proj.weight", "transformer.h.13.attn.v_proj.weight", "transformer.h.13.attn.q_proj.weight", "transformer.h.13.attn.out_proj.weight", "transformer.h.13.mlp.fc_in.weight", "transformer.h.13.mlp.fc_out.weight", "transformer.h.14.attn.k_proj.weight", "transformer.h.14.attn.v_proj.weight", "transformer.h.14.attn.q_proj.weight", "transformer.h.14.attn.out_proj.weight", "transformer.h.14.mlp.fc_in.weight", "transformer.h.14.mlp.fc_out.weight", "transformer.h.15.attn.k_proj.weight", "transformer.h.15.attn.v_proj.weight", "transformer.h.15.attn.q_proj.weight", "transformer.h.15.attn.out_proj.weight", "transformer.h.15.mlp.fc_in.weight", "transformer.h.15.mlp.fc_out.weight", "transformer.h.16.attn.k_proj.weight", "transformer.h.16.attn.v_proj.weight", "transformer.h.16.attn.q_proj.weight", "transformer.h.16.attn.out_proj.weight", "transformer.h.16.mlp.fc_in.weight", "transformer.h.16.mlp.fc_out.weight", "transformer.h.17.attn.k_proj.weight", "transformer.h.17.attn.v_proj.weight", "transformer.h.17.attn.q_proj.weight", "transformer.h.17.attn.out_proj.weight", "transformer.h.17.mlp.fc_in.weight", "transformer.h.17.mlp.fc_out.weight", "transformer.h.18.attn.k_proj.weight", "transformer.h.18.attn.v_proj.weight", "transformer.h.18.attn.q_proj.weight", "transformer.h.18.attn.out_proj.weight", "transformer.h.18.mlp.fc_in.weight", "transformer.h.18.mlp.fc_out.weight", "transformer.h.19.attn.k_proj.weight", "transformer.h.19.attn.v_proj.weight", "transformer.h.19.attn.q_proj.weight", "transformer.h.19.attn.out_proj.weight", "transformer.h.19.mlp.fc_in.weight", "transformer.h.19.mlp.fc_out.weight", "transformer.h.20.attn.k_proj.weight", "transformer.h.20.attn.v_proj.weight", "transformer.h.20.attn.q_proj.weight", "transformer.h.20.attn.out_proj.weight", "transformer.h.20.mlp.fc_in.weight", "transformer.h.20.mlp.fc_out.weight", "transformer.h.21.attn.k_proj.weight", "transformer.h.21.attn.v_proj.weight", "transformer.h.21.attn.q_proj.weight", "transformer.h.21.attn.out_proj.weight", "transformer.h.21.mlp.fc_in.weight", "transformer.h.21.mlp.fc_out.weight", "transformer.h.22.attn.k_proj.weight", "transformer.h.22.attn.v_proj.weight", "transformer.h.22.attn.q_proj.weight", "transformer.h.22.attn.out_proj.weight", "transformer.h.22.mlp.fc_in.weight", "transformer.h.22.mlp.fc_out.weight", "transformer.h.23.attn.k_proj.weight", "transformer.h.23.attn.v_proj.weight", "transformer.h.23.attn.q_proj.weight", "transformer.h.23.attn.out_proj.weight", "transformer.h.23.mlp.fc_in.weight", "transformer.h.23.mlp.fc_out.weight", "transformer.h.24.attn.k_proj.weight", "transformer.h.24.attn.v_proj.weight", "transformer.h.24.attn.q_proj.weight", "transformer.h.24.attn.out_proj.weight", "transformer.h.24.mlp.fc_in.weight", "transformer.h.24.mlp.fc_out.weight", "transformer.h.25.attn.k_proj.weight", "transformer.h.25.attn.v_proj.weight", "transformer.h.25.attn.q_proj.weight", "transformer.h.25.attn.out_proj.weight", "transformer.h.25.mlp.fc_in.weight", "transformer.h.25.mlp.fc_out.weight", "transformer.h.26.attn.k_proj.weight", "transformer.h.26.attn.v_proj.weight", "transformer.h.26.attn.q_proj.weight", "transformer.h.26.attn.out_proj.weight", "transformer.h.26.mlp.fc_in.weight", "transformer.h.26.mlp.fc_out.weight", "transformer.h.27.attn.k_proj.weight", "transformer.h.27.attn.v_proj.weight", "transformer.h.27.attn.q_proj.weight", "transformer.h.27.attn.out_proj.weight", "transformer.h.27.mlp.fc_in.weight", "transformer.h.27.mlp.fc_out.weight".

AssertionError

File "/usr/local/lib/python3.9/dist-packages/datasets/load.py", line 1675, in load_dataset
builder_instance = load_dataset_builder(
File "/usr/local/lib/python3.9/dist-packages/datasets/load.py", line 1452, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/usr/local/lib/python3.9/dist-packages/datasets/load.py", line 1177, in dataset_module_factory
raise e1 from None
File "/usr/local/lib/python3.9/dist-packages/datasets/load.py", line 1156, in dataset_module_factory
return HubDatasetModuleFactoryWithoutScript(
File "/usr/local/lib/python3.9/dist-packages/datasets/load.py", line 743, in init
assert self.name.count("/") == 1
AssertionError

When I use this command, python3 opt.py facebook/opt-125m c4, I get the above error.
Could you please help me solve this issue?

How should I verify the speedup effect of the algorithm?

Hi~ Thank you for your great works! It seems that GPTQ would lead to significant speedups for end-to-end inference. But after quantizing INT8 BLOOM-7B with GPTQ, I found it twice slower than FP16 model. How could I make it speedup as shown in paper?
image

How to run the quantized model for perditions on my prompts?

I am able to quantize llama 7b model to 4 bit. But how can I run this for my prediction. If I try transformer library i get error.

Python 3.10.12 (main, Jun 7 2023, 12:45:35) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("llama_7b_4bit_2.bin")
Traceback (most recent call last):
File "/home/intel-spc/Documents/tarun/t2/tar/lib/python3.10/site-packages/transformers/configuration_utils.py", line 659, in _get_config_dict
config_dict = cls._dict_from_json_file(resolved_config_file)
File "/home/intel-spc/Documents/tarun/t2/tar/lib/python3.10/site-packages/transformers/configuration_utils.py", line 750, in _dict_from_json_file
text = reader.read()
File "/usr/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/home/intel-spc/Documents/tarun/t2/tar/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 456, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/home/intel-spc/Documents/tarun/t2/tar/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 944, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/intel-spc/Documents/tarun/t2/tar/lib/python3.10/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/intel-spc/Documents/tarun/t2/tar/lib/python3.10/site-packages/transformers/configuration_utils.py", line 662, in _get_config_dict
raise EnvironmentError(
OSError: It looks like the config file at 'llama_7b_4bit_2.bin' is not a valid JSON file.

LAMBADA evaluation accuracy

Hello, I've been experimenting with GPTQ and trying to replicate your LAMBADA zero-shot results. But I have been getting significantly lower accuracy (10-15% lower for OPT specifically) compared to the paper, even for the FP16 baseline. I'm using your pipeline based on LM evaluation harness. I was wondering if you have seen this before?

GPTQ on BERT based

Hi all,

Wish this message finds everyone well. I have read the paper and found there is a table which compares the performance on OBQ and GPTQ on Bert-based model. Could anyone help me with finding the codes or implementation of GPTQ on BERT based model.? Thanks for your help
Screenshot 2024-02-14 at 12 53 49 AM

ValueError: not enough values to unpack (expected 2, got 1)

Hello,
I tried your instruction and got a value error. Was I doing right for benchmarking ? Thank you.

CUDA_VISIBLE_DEVICES=0 python opt.py facebook/opt-125m c4 --wbits 3 --save opt125m-3bit.pt

CUDA_VISIBLE_DEVICES=0 python opt.py facebook/opt-125m c4 --load opt125m-3bit.pt --benchmark 128
Loading model ...
Done.
Found cached dataset json (/$HOME/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
Found cached dataset json (/$HOME/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
Benchmarking ...
Traceback (most recent call last):
File "/$HOME/gptq/opt.py", line 455, in
...
File "/$HOM/mambaforge/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py", line 637, in forward
batch_size, seq_length = input_shape
ValueError: not enough values to unpack (expected 2, got 1)

Application to GPT-J family

Congratulations on your achievement.

Can you give us some hints and recommandations to adapt the procedure in order to quantify the GPT-J models family ?

Inference of the Quantised Model (OPT-13B)

Hey!
Huge congratulations on your achievement and thank you for sharing!
I am following the steps to quantise an OPT model (13B) that I have finetuned. I wish to serve this model for inference.
Will I simply be able to save the quantised model, and load it into the transformers library?

If not whats the best way to do this?

All the very best

Question about the difference between the pseudocode and the implementation

image image The Hessian inverse information in your pseudocode is computed by cholesky of H's inverse. In code, you use the cholesky first and then cholesky inverse and then cholesky again. I am not sure the reason of the difference. And is the cholesky_inverse kernel necessary here?Can I just compute the H's inverse and then use cholesky?

Thank you so much.

Test on CNN model containing group conv by GPTQ method

Hi,
for supportting CNN mode, I modified the GPTQ code as follows:
1, supportting group conv;
2, use symmetric quantization without zero point parameter.

But I found it performance not good on mobilenetv2/mnasnet1_0 models when quantization bits = 4.
Here are my results:
model | FP32 | GPTQ_W4 sym
mbv2 71.88 60.84(84.64%)
mnasnet1_0 73.47 64.71(88.08%)
I saw resnet18/resnet50 quantization result in your paper only, have you tested gptq on mobilenetv2/mnasnet1_0 model?

Looking forward to your reply...

pack_model takes too long time

I used auto_gptq to quantize a large language model, this model's transformer has 80 layers, I found each layer needs almost 4 mininutes to pack, I have to wait serveral hours before the whole packing step finishes. Are there better suggestions of solving the problem? Can the packing model step speedup?

Why no update to Hinv

In gptq.py fasterquant function, there seems no any update to Hinv during the quantization process. Can I know the intuition behand this? I kinda lost in the paper that the introduction of cholesky decomposition can eliminate the update of Hinv.

License issues

Hi, I've forked this repo but it has no license. Could you please add one? Thanks.

OpenCL Support

Please add OpenCL Support that so that it can be used on GPU's that Support OpenCL and not CUDA
Then we could use something like quant_opencl.cpp instead of quant_cuda.cpp

Application to T5 / UL2 family

Do you expect this to work for T5 architecture (and consequently, the very similar UL2) family? And if not, what do you suspect would be an issue and do you expect that some adjustments need to be made?

Google recently released Flan-UL2 which is 20B parameters in size. GPTQ could be a real life-saver here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.