Coder Social home page Coder Social logo

Comments (7)

GenTxt avatar GenTxt commented on May 8, 2024

Confirming similar nan results above for main cuda branch using same models plus additional neox-20b.

from autogptq.

PanQiWei avatar PanQiWei commented on May 8, 2024

I saw you were using basic_usage.py, which just using one-shot quantization(one sample) to show case the usage of basic apis, and it may encounter into 'nan' when quantize big model with such few samples. I would suggest to try quantize_with_alpaca.py which uses many instruction-following samples to quantize LLMs.

Please let me know if the same problem still occurs when using quantize_with_alpaca.py

from autogptq.

GenTxt avatar GenTxt commented on May 8, 2024

Tested above 'quantize_with_alpaca.py' with latest 0.3 version.

Needed to change the following:

parser.add_argument("--fast_tokenizer", action="store_true")

ValueError: Tokenizer class GPTNeoXTokenizer does not exist or is not currently imported.

changed to:

parser.add_argument("--fast_tokenizer", action="store_false")

CUDA_VISIBLE_DEVICES="0" python quant_with_alpaca.py --pretrained_model_dir models/gpt-neox-20b --quantized_model_dir 4bit_converted/neox20b-4bit.safetensor

After change the quantization proceeded without error complete with final examples from script printed to terminal.

Unfortunately, the quantized model isn't saved to --quantized_model_dir 4bit_converted/neox20b-4bit.safetensor

2023-04-24 15:03:20 INFO [auto_gptq.modeling._utils] Model packed.

The model 'GPTNeoXGPTQForCausalLM' is not supported for .

Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].

prompt: Instruction:
Name three characteristics commonly associated with a strong leader.
Output:
etc.

Tested 3x. '4bit_converted' folder exists at same level as scripts and models

Am I missing a command to save the model to a local folder or has it been saved to another default location?

Thanks

from autogptq.

PanQiWei avatar PanQiWei commented on May 8, 2024

There are two things you should be aware of, maybe it's my bad that don't make it clear in example's README:

  1. there is not need to change the original command line flag's functionality, you can just enable --fast_tokenizer in command when using gpt_neox type models for it only have GPTNeoXTokenizerFast
  2. value for --quantized_model_dir should be a path to local directory, not a file, but you can check if the quantized model save into a dir named 4bit_converted/neox20b-4bit.safetensor

from autogptq.

GenTxt avatar GenTxt commented on May 8, 2024

Thanks for the update. Model saved in '4bit_converted' in .bin format.

The model 'GPTNeoXGPTQForCausalLM' is not supported for . is still generated but not a big deal.

How to save as safetensors? Will run again using:

model.save_quantized(args.quantized_model_dir, use_safetensors=True)

Also, is there a simple inference script to use with the generated model above?

Cheers

from autogptq.

PanQiWei avatar PanQiWei commented on May 8, 2024

The model 'GPTNeoXGPTQForCausalLM' is not supported for

this is a warning throw by Hugging Face transformers, you can just ignore it, I will find a way to bypass it in the future.

is there a simple inference script to use with the generated model above

I would consider to write one as soon as possible in examples, for now you can reference to this code snip:

from auto_gptq import AutoGPTQForCausalLM
from transformers import AutoTokenizer, TextGenerationPipeline

text = "Hello, World!"

tokenizer = AutoTokenizer.from_pretrained(tokenizer_dir)
model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir, device="cuda:0")
pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer, device="cuda:0")
generated_text = pipeline(text, return_full_text=False, num_beams=1, max_new_tokens=128)[0]['generated_text']
print(generated_text)

from autogptq.

GenTxt avatar GenTxt commented on May 8, 2024

Thanks. Will work with above and close issue.

Looking forward to the script.

Cheers

from autogptq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.