Comments (7)
Confirming similar nan results above for main cuda branch using same models plus additional neox-20b.
from autogptq.
I saw you were using basic_usage.py
, which just using one-shot quantization(one sample) to show case the usage of basic apis, and it may encounter into 'nan' when quantize big model with such few samples. I would suggest to try quantize_with_alpaca.py
which uses many instruction-following samples to quantize LLMs.
Please let me know if the same problem still occurs when using quantize_with_alpaca.py
from autogptq.
Tested above 'quantize_with_alpaca.py' with latest 0.3 version.
Needed to change the following:
parser.add_argument("--fast_tokenizer", action="store_true")
ValueError: Tokenizer class GPTNeoXTokenizer does not exist or is not currently imported.
changed to:
parser.add_argument("--fast_tokenizer", action="store_false")
CUDA_VISIBLE_DEVICES="0" python quant_with_alpaca.py --pretrained_model_dir models/gpt-neox-20b --quantized_model_dir 4bit_converted/neox20b-4bit.safetensor
After change the quantization proceeded without error complete with final examples from script printed to terminal.
Unfortunately, the quantized model isn't saved to --quantized_model_dir 4bit_converted/neox20b-4bit.safetensor
2023-04-24 15:03:20 INFO [auto_gptq.modeling._utils] Model packed.
The model 'GPTNeoXGPTQForCausalLM' is not supported for .
Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
prompt: Instruction:
Name three characteristics commonly associated with a strong leader.
Output:
etc.
Tested 3x. '4bit_converted' folder exists at same level as scripts and models
Am I missing a command to save the model to a local folder or has it been saved to another default location?
Thanks
from autogptq.
There are two things you should be aware of, maybe it's my bad that don't make it clear in example's README:
- there is not need to change the original command line flag's functionality, you can just enable
--fast_tokenizer
in command when usinggpt_neox
type models for it only haveGPTNeoXTokenizerFast
- value for
--quantized_model_dir
should be a path to local directory, not a file, but you can check if the quantized model save into a dir named4bit_converted/neox20b-4bit.safetensor
from autogptq.
Thanks for the update. Model saved in '4bit_converted' in .bin format.
The model 'GPTNeoXGPTQForCausalLM' is not supported for . is still generated but not a big deal.
How to save as safetensors? Will run again using:
model.save_quantized(args.quantized_model_dir, use_safetensors=True)
Also, is there a simple inference script to use with the generated model above?
Cheers
from autogptq.
The model 'GPTNeoXGPTQForCausalLM' is not supported for
this is a warning throw by Hugging Face transformers
, you can just ignore it, I will find a way to bypass it in the future.
is there a simple inference script to use with the generated model above
I would consider to write one as soon as possible in examples
, for now you can reference to this code snip:
from auto_gptq import AutoGPTQForCausalLM
from transformers import AutoTokenizer, TextGenerationPipeline
text = "Hello, World!"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_dir)
model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir, device="cuda:0")
pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer, device="cuda:0")
generated_text = pipeline(text, return_full_text=False, num_beams=1, max_new_tokens=128)[0]['generated_text']
print(generated_text)
from autogptq.
Thanks. Will work with above and close issue.
Looking forward to the script.
Cheers
from autogptq.
Related Issues (20)
- [FEATURE] ADD Support DBRX HOT 16
- zeros remain zero?
- [FEATURE] ADD Jamba Support
- [BUG] Can not save quantized model to disk: "you shouldn't move a model that is dispatched using accelerate hooks."
- Error when trying to quantize the JAIS model. HOT 10
- [BUG] GPTQ Kernels dont work with PEFT
- Error when quantizing mixtral 8x7b model. "ZeroDivisionError: float division by zero " HOT 1
- TypeError: forward() missing 1 required positional argument: 'hidden_states'[BUG] ? HOT 3
- [BUG]GPTQ QWEN-72B-Chat HOT 4
- gptq 4bit avg loss is large HOT 3
- export mistral8x7b error
- [BUG] Llama 3 8B Instruct - `no_inject_fused_attention` must be true or else errors out HOT 8
- Why doesn't AutoGPTQ quantize lm_head layer? HOT 5
- What magnitude of avg loss indicates a relatively good result for a quantization model HOT 2
- Why LLaMA3-8B after GPTQ test in wikitext2 so bad? HOT 8
- [PR Ready for Review] [FEATURE] Extend Support for Phi-3
- [FEATURE] Backport vllm expanded Marlin kernel to autogptq. HOT 1
- [DEPRECATION] Discussion on Fused attention and QiGEN HOT 4
- Llama-3 8B Instruct quantized to 8 Bit spits out gibberish in transformers `model.generate()` but works fine in vLLM? HOT 5
- [BUG]safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from autogptq.