Will test default cuda version next but encountering nan for all conversions using 'Au

nan when converting neox and opt models with AutoGPTQ-triton about autogptq HOT 7 CLOSED

autogptq commented on May 8, 2024

nan when converting neox and opt models with AutoGPTQ-triton

from autogptq.

Comments (7)

GenTxt commented on May 8, 2024

Confirming similar nan results above for main cuda branch using same models plus additional neox-20b.

from autogptq.

PanQiWei commented on May 8, 2024

I saw you were using basic_usage.py, which just using one-shot quantization(one sample) to show case the usage of basic apis, and it may encounter into 'nan' when quantize big model with such few samples. I would suggest to try quantize_with_alpaca.py which uses many instruction-following samples to quantize LLMs.

Please let me know if the same problem still occurs when using quantize_with_alpaca.py

from autogptq.

GenTxt commented on May 8, 2024

Tested above 'quantize_with_alpaca.py' with latest 0.3 version.

Needed to change the following:

parser.add_argument("--fast_tokenizer", action="store_true")

ValueError: Tokenizer class GPTNeoXTokenizer does not exist or is not currently imported.

changed to:

parser.add_argument("--fast_tokenizer", action="store_false")

CUDA_VISIBLE_DEVICES="0" python quant_with_alpaca.py --pretrained_model_dir models/gpt-neox-20b --quantized_model_dir 4bit_converted/neox20b-4bit.safetensor

After change the quantization proceeded without error complete with final examples from script printed to terminal.

Unfortunately, the quantized model isn't saved to --quantized_model_dir 4bit_converted/neox20b-4bit.safetensor

2023-04-24 15:03:20 INFO [auto_gptq.modeling._utils] Model packed.

The model 'GPTNeoXGPTQForCausalLM' is not supported for .

Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].

prompt: Instruction:
Name three characteristics commonly associated with a strong leader.
Output:
etc.

Tested 3x. '4bit_converted' folder exists at same level as scripts and models

Am I missing a command to save the model to a local folder or has it been saved to another default location?

Thanks

from autogptq.

PanQiWei commented on May 8, 2024

There are two things you should be aware of, maybe it's my bad that don't make it clear in example's README:

there is not need to change the original command line flag's functionality, you can just enable --fast_tokenizer in command when using gpt_neox type models for it only have GPTNeoXTokenizerFast
value for --quantized_model_dir should be a path to local directory, not a file, but you can check if the quantized model save into a dir named 4bit_converted/neox20b-4bit.safetensor

from autogptq.

GenTxt commented on May 8, 2024

Thanks for the update. Model saved in '4bit_converted' in .bin format.

The model 'GPTNeoXGPTQForCausalLM' is not supported for . is still generated but not a big deal.

How to save as safetensors? Will run again using:

model.save_quantized(args.quantized_model_dir, use_safetensors=True)

Also, is there a simple inference script to use with the generated model above?

Cheers

from autogptq.

PanQiWei commented on May 8, 2024

The model 'GPTNeoXGPTQForCausalLM' is not supported for

this is a warning throw by Hugging Face transformers, you can just ignore it, I will find a way to bypass it in the future.

is there a simple inference script to use with the generated model above

I would consider to write one as soon as possible in examples, for now you can reference to this code snip:

from auto_gptq import AutoGPTQForCausalLM
from transformers import AutoTokenizer, TextGenerationPipeline

text = "Hello, World!"

tokenizer = AutoTokenizer.from_pretrained(tokenizer_dir)
model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir, device="cuda:0")
pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer, device="cuda:0")
generated_text = pipeline(text, return_full_text=False, num_beams=1, max_new_tokens=128)[0]['generated_text']
print(generated_text)

from autogptq.

GenTxt commented on May 8, 2024

Thanks. Will work with above and close issue.

Looking forward to the script.

Cheers

from autogptq.

nan when converting neox and opt models with AutoGPTQ-triton about autogptq HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent