tloen / alpaca-lora Goto Github PK

View Code? Open in Web Editor NEW

18.2K 18.2K 2.2K 27.79 MB

Instruct-tune LLaMA on consumer hardware

License: Apache License 2.0

Python 23.72% Jupyter Notebook 75.71% Dockerfile 0.58%

alpaca-lora's People

Contributors

Stargazers

Watchers

Forkers

cedrickchee rosssong kemolo c00renut janmaltel kohjt mindrages hephaex codingchild2424 danielwe2 smksyj robertalanm oedosoldier nasa03 jimbog siriusctrl antimatter15 zheng5yu9 lich99 fuckemooncoin s1530129650 oceanplexian cian0 ikuko dariosucic younesbelkada derkwex jcl2023 shafiahmed lxe codeaudit thement csqr stanleyjacob seanward entn-at patrick-ohlson jooray gururise sanmiandresofa rupencxosync vsevolodl phi-line co-simulation xunyuw techthiyanes simboyz bdarmech babyblue26 dumpmemory javierspec javierclb willkhoza t-atlas jacknion duaneking hertera1 owlwang miolini thamwangjun zhanghonglishanzai mjdhasan pandaupc enockipp tianbuwei wangjiaqiys wangzhencc keeper-jie demonodojo eltociear hasai666 arturtan nietism marcus-arcadius mistaia yomaser alexmfv databill86 bailiping cyoyo-geek yunki-cmd singlag maralski angelterrones yehchunhung marcosjrcwb heng-ye yas cedrisk bbonamin yuriwerewolf andriymulyar samchakwera lancetw ravish-dhawan zezelazo taahakazi eusthenopteron karmel bitsofsteve

alpaca-lora's Issues

About PAD_TOKEN_ID in LLaMA

Hi,
Thanks for the finetuning code. I found that in the original LLaMa code, the pad_token_id is -1, and in your implementation, change it into 0

tokenizer.pad_token_id = 0

Would you mind explaining the reason since if the pad id is -1, we could not look up in the embedding tables.

Thanks.

simple playground share

not sure when I am going to shutdown, but I will leave this for few hours at least (running on RTX5000). maybe I will put things up in GKE later for tester purpose

NOTE: didn't do anything about maintaining the context yet

https://notebooksa.jarvislabs.ai/P1lDk5ziArYf6hVUkcne1vVlbwica44Ux7zNWyAeq-c69p-j0D1_ktPMmBKniGk8/

Windows install instructions

These instructions will allow you to finetune on windows.

oobabooga/text-generation-webui#147 (comment)

Benchmarking and optimization tips

Not exactly an issue, but have just been trying to run one epoch of finetuning with llama-13b. On a 4090 looks like it will take roughly 4 hours with the setting `MICRO_BATCH_SIZE = 2'.

However, it looks like the loss already converged to ~1 within epoch 0.12 (roughly 30 minutes into training), so it doesn't really make sense to use epoch=3 and potentially a larger micro batch size.

I could be wrong here. Happy to hear some feedback on how to better tune the parameters.

Can anyone post already trained model?

anyone tried batch inference?

when I set pad token 0 and padding=True,
the generated text for the padded prompt shows always

Docs/script for merging weights into base model

Would it be possible to document the steps needed to correctly merge LoRa weights? This would be to allow the merged model to run on llama.cpp on lower end hardware.

I've tried figuring out how to do this myself, but the magic is a little too deep and my understanding a little too shallow :)

Will this support 4bit quantisation?

Awesome project. Thanks so much for creating a foss alpaca codebase.

Can't generate more then 256 new tokens.

First of all - thank you for the code and model. Even for 7B it works awesome!

But I stumbled upon strange problem. It doesn't matter how I set generator param max_new_tokens or max_length (any big numbers like 2048) it always limit generation on 256 new tokens and stops without ending the sentence. While limiting to lower then 256 works as expected.

Is it hard-coded somewhere which I can edit for my setup?

Issues when using it with GPTQ for llama

I am not sure if the issue is here in the export code or in GPTQ for LLama.

What I did:

Used export_hf_checkpoint.py
Executed the quantization from GPTQ for llama (https://github.com/qwopqwop200/GPTQ-for-LLaMa)

At the beginning of the quantization i got the following warning:

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:11<00:00,  3.38it/s]
Some weights of the model checkpoint at ../alpaca-lora/hf_ckpt/ were not used when initializing LlamaForCausalLM: ['base_model.model.lm_head.weight']

Is that normal?

When I try to use the 4 bit quantized model I am only getting random output:

(.venv) [danielw@pc GPTQ-for-LLaMa]$ CUDA_VISIBLE_DEVICES=0 python llama_inference.py ../alpaca-lora/hf_ckpt/  --wbits 4 --load alpace-7b-4bit-non-cleaned.pt --text  "Hello"
Loading model ...
Done.
 Hello Kub Akademutionsiy?')}{\rightarrow größ office \\ größ Roberts lé?'ulseulseame collect authorizationSide色սöd São affected throwingcur let authorizationמ bug affected dw collectmineною Chairulse instant Unionistrictscherरabsरclusowclar

Not sure whats going on :( It seems that the export_hf_checkpoint script exports a model that is not compatible with GPTQ for LLaMA.

Finetuned Model Inference error: AttributeError: 'NoneType' object has no attribute 'device'

Update: for anyone experiencing this issue, see the workaround I posted in #14 (comment)

I tried out the finetune script locally and and it looks like there was no problem with that. However, when trying to run inference, I'm getting AttributeError: 'NoneType' object has no attribute 'device' from bitsandbytes. I've checked and looks like it was an issue related to model sharing on cpu and gpu, but I am not sure which part of this repo is causing that. Any idea?

Relevant issue in bitsandbytes: TimDettmers/bitsandbytes#40

Tokenizer

Warning

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.

Caused by tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf", cache_dir="./cache/")

Protential solution is modify tokenizer = LLaMATokenizer.from_pretrained("decapoda-research/llama-7b-hf", cache_dir="./cache/")
or modifypath_to_cache_hfmodel/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/tokenizer_config.json

change
{"bos_token": "", "eos_token": "", "model_max_length": 1000000000000000019884624838656, "tokenizer_class": "LlamaTokenizer", "unk_token": ""} to {"bos_token": "", "eos_token": "", "model_max_length": 1000000000000000019884624838656, "tokenizer_class": "LLaMATokenizer", "unk_token": ""}

Maximum recursion depth exceeded

I tried to fine tune the 13B model with a 3090 (24GB Ram). The training was started and a progress bar was also shown, however, I got an error saying 'maximum recursion depth exceeded' after 100 steps of training. Has anyone had the similar error? Thanks!

Stream tokens output

Is it possible to stream each token of the output as soon as it is generated by the model? I guess it depends on the hugging face transformers classes and methods. Any solution to this?

Multi-GPU bug?

Traceback (most recent call last):
  File "/workspace/alpaca-lora/finetune.py", line 95, in <module>
    trainer.train(resume_from_checkpoint=False)
  File "/workspace/miniconda3/lib/python3.10/site-packages/transformers/trainer.py", line 1628, in train
    return inner_training_loop(
  File "/workspace/miniconda3/lib/python3.10/site-packages/transformers/trainer.py", line 1895, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/workspace/miniconda3/lib/python3.10/site-packages/transformers/trainer.py", line 2637, in training_step
    loss = self.compute_loss(model, inputs)
  File "/workspace/miniconda3/lib/python3.10/site-packages/transformers/trainer.py", line 2669, in compute_loss
    outputs = model(**inputs)
  File "/workspace/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/miniconda3/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 157, in forward
    raise RuntimeError("module must have its parameters and buffers "
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1
  0%|                                                                    | 0/1083 [00:00<?, ?it/s]

Trying to run this on a 4xA100 instance, getting this error.
Nvidia-smi shows that something is getting loaded into both gpu0 and gpu1:

sharing experimental results like a chatbot

demo link: https://notebooksd.jarvislabs.ai/tz0hyvPyQMO0qXbeTpLYv2Wu5-j9AfFm_dM9sQN5fqFGI0lT90sAIHgT-Gi0jLcX/

Is it possible to do inference (i.e. run generate.py) without a CUDA GPU?

Hi all,

I am trying to start generate.py, but don't have a CUDA card? (Actually I can insert an old Quadro K2000, if this helps?)
After all the steps in the setup section, I get the following:

/home/georgi/Documents/GitHub/alpaca-lora/venv/bin/python /home/georgi/Documents/GitHub/alpaca-lora/generate.py 

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/home/georgi/Documents/GitHub/alpaca-lora/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.
Traceback (most recent call last):
  File "/home/georgi/Documents/GitHub/alpaca-lora/generate.py", line 7, in <module>
    model = LlamaForCausalLM.from_pretrained(
  File "/home/georgi/Documents/GitHub/alpaca-lora/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2591, in from_pretrained
    raise ValueError(
ValueError: 
                        Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
                        the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
                        these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom
                        `device_map` to `from_pretrained`. Check
                        https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
                        for more details.
                        

Process finished with exit code 1

Is there a way around this? I tried inserting load_in_8bit_fp32_cpu_offload=True in a few places, but doesn't fix it.

Thanks!

Report hardware spec and parameters that training works

It might be a good idea to share the hardware spec and parameters that you got the fine-tuning to work to get a sense of the hardware requirement.

Would it be possible to use a special token for separating segments?

Something like this:

Below is an instruction that describes a task, possibly paired with an input that provides further context. Write a response that appropriately completes the request.
<instruction>Add two numbers
<input>2, 3
<response>

runtime error: mat1 and mat2 shapes cannot be multiplied

File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/trainer.py", line 1628, in train
return inner_training_loop(
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/trainer.py", line 1895, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/trainer.py", line 2637, in training_step
loss = self.compute_loss(model, inputs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/trainer.py", line 2669, in compute_loss
outputs = model(**inputs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
output.reraise()
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/_utils.py", line 543, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
output = module(*input, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/peft/peft_model.py", line 529, in forward
return self.base_model(
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
output = old_forward(*args, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 852, in forward
outputs = self.model.decoder(
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
output = old_forward(*args, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 616, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 612, in custom_forward
return module(*inputs, output_attentions, None)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
output = old_forward(*args, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 305, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/accelerate/hooks.py", line 158, in new_forward
output = old_forward(*args, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 167, in forward
value_states = self.v_proj(hidden_states).view(bsz, tgt_len, self.num_heads, self.head_dim)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/peft/tuners/lora.py", line 522, in forward
result = super().forward(x)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/opt/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 397, in forward
output += torch.matmul(subA, state.subB)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x7 and 8x4096)

The following error occurred when executing generate.py

(alpaca-lora) root@DESKTOP-FRT:/mnt/f/nlp/alpaca-lora# python generate.py
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
Traceback (most recent call last):
File "/mnt/f/nlp/alpaca-lora/generate.py", line 13, in
model = LlamaForCausalLM.from_pretrained(
File "/root/anaconda3/envs/alpaca-lora2/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2546, in from_pretrained
importlib_metadata.version("bitsandbytes")
File "/root/anaconda3/envs/alpaca-lora2/lib/python3.10/importlib/metadata/init.py", line 996, in version
return distribution(distribution_name).version
File "/root/anaconda3/envs/alpaca-lora2/lib/python3.10/importlib/metadata/init.py", line 969, in distribution
return Distribution.from_name(distribution_name)
File "/root/anaconda3/envs/alpaca-lora2/lib/python3.10/importlib/metadata/init.py", line 548, in from_name
raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes

Successfully run training in 4bit mode, while the training speed is very slow

Here's some code needed for this adjustments.
https://github.com/johnsmith0031/alpaca_lora_4bit
Don't know why the training is so slow.

No output key in example 3556.

Hello, I have seen that you cleaned the dataset which is nice. However upon loading it into the original alpaca trainer (which is what I use) theres a line without the output key:

*I modified the trainer so that I can find it.

Just wanted to let you know.

possible to use it in JavaScript?

I think it would be very interesting if this model could be run with JavaScript. One of the main issues with chatGPT and others is during peak hours when many people connect, causing server overload. If the model could be run directly in the browser, many people could use it quickly and without any server costs.

generate.py --listen flag being ignored

Hello,

I'd like to execute generate.py the following way:

I don't want it to create a public link
I don't want it to bind to 127.0.01, instead, I want it to bind to 0.0.0.0, therefore it's accessible on the local network.

Thank you.

Tokenizer class function warning

Getting the following error with the latest commit (even after uninstalling and re-installing transformers from git):

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.

I also tried force_download=True and still get the error.

AttributeError: 'NoneType' object has no attribute 'to'

code
python generate.py

error


===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/t-enshengshi/anaconda3/envs/alpaca-lora did not contain libcudart.so as expected! Searching further paths...
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 112
/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
  warn(msg)
CUDA SETUP: Loading binary /home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so...
Downloading tokenizer.model: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 8.89MB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.00/2.00 [00:00<00:00, 173B/s]
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 141/141 [00:00<00:00, 48.2kB/s]
Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 427/427 [00:00<00:00, 46.2kB/s]
Downloading (…)model.bin.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25.5k/25.5k [00:00<00:00, 333kB/s]
Downloading (…)l-00001-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:02<00:00, 169MB/s]
Downloading (…)l-00002-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 103MB/s]
Downloading (…)l-00003-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:07<00:00, 57.7MB/s]
Downloading (…)l-00004-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 118MB/s]
Downloading (…)l-00005-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 96.5MB/s]
Downloading (…)l-00006-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:07<00:00, 51.3MB/s]
Downloading (…)l-00007-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 104MB/s]
Downloading (…)l-00008-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 94.4MB/s]
Downloading (…)l-00009-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 97.8MB/s]
Downloading (…)l-00010-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 119MB/s]
Downloading (…)l-00011-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 122MB/s]
Downloading (…)l-00012-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 128MB/s]
Downloading (…)l-00013-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 92.4MB/s]
Downloading (…)l-00014-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 114MB/s]
Downloading (…)l-00015-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 109MB/s]
Downloading (…)l-00016-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 120MB/s]
Downloading (…)l-00017-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 120MB/s]
Downloading (…)l-00018-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 130MB/s]
Downloading (…)l-00019-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 84.2MB/s]
Downloading (…)l-00020-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 86.4MB/s]
Downloading (…)l-00021-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 99.9MB/s]
Downloading (…)l-00022-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 110MB/s]
Downloading (…)l-00023-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 105MB/s]
Downloading (…)l-00024-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 122MB/s]
Downloading (…)l-00025-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 117MB/s]
Downloading (…)l-00026-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 96.9MB/s]
Downloading (…)l-00027-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 129MB/s]
Downloading (…)l-00028-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 96.6MB/s]
Downloading (…)l-00029-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:02<00:00, 158MB/s]
Downloading (…)l-00030-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 101MB/s]
Downloading (…)l-00031-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:04<00:00, 100MB/s]
Downloading (…)l-00032-of-00033.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 405M/405M [00:03<00:00, 108MB/s]
Downloading (…)l-00033-of-00033.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 524M/524M [00:05<00:00, 94.0MB/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:08<00:00,  3.78it/s]
Some weights of the model checkpoint at decapoda-research/llama-7b-hf were not used when initializing LLaMAForCausalLM: ['model.layers.6.input_layernorm.weight', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.19.mlp.down_proj.weight', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.24.self_attn.rotary_emb.inv_freq', 'model.layers.10.mlp.up_proj.weight', 'model.layers.19.input_layernorm.weight', 'model.layers.28.post_attention_layernorm.weight', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.1.post_attention_layernorm.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.5.post_attention_layernorm.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.19.post_attention_layernorm.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.3.post_attention_layernorm.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.20.post_attention_layernorm.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.4.mlp.up_proj.weight', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.0.mlp.gate_proj.weight', 'model.layers.11.post_attention_layernorm.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.7.mlp.down_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.31.input_layernorm.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers.6.mlp.down_proj.weight', 'model.layers.7.input_layernorm.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.13.post_attention_layernorm.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.24.input_layernorm.weight', 'model.layers.0.input_layernorm.weight', 'model.layers.28.input_layernorm.weight', 'model.norm.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.18.post_attention_layernorm.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.2.input_layernorm.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.29.input_layernorm.weight', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.24.mlp.down_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.22.post_attention_layernorm.weight', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.11.mlp.up_proj.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.26.post_attention_layernorm.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.24.post_attention_layernorm.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.30.input_layernorm.weight', 'model.layers.8.input_layernorm.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.25.post_attention_layernorm.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.23.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.20.input_layernorm.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.10.input_layernorm.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.3.input_layernorm.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.22.input_layernorm.weight', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.5.input_layernorm.weight', 'model.layers.16.post_attention_layernorm.weight', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.18.input_layernorm.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.16.input_layernorm.weight', 'model.layers.31.post_attention_layernorm.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.29.post_attention_layernorm.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.26.input_layernorm.weight', 'model.layers.21.post_attention_layernorm.weight', 'model.layers.6.post_attention_layernorm.weight', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.2.post_attention_layernorm.weight', 'model.layers.4.post_attention_layernorm.weight', 'model.layers.9.input_layernorm.weight', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.28.mlp.down_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.13.input_layernorm.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.0.post_attention_layernorm.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.27.input_layernorm.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.25.input_layernorm.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.12.input_layernorm.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.14.post_attention_layernorm.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.embed_tokens.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.16.mlp.up_proj.weight', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.11.input_layernorm.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.9.post_attention_layernorm.weight', 'model.layers.1.input_layernorm.weight', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.17.post_attention_layernorm.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.1.mlp.gate_proj.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.7.post_attention_layernorm.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.14.input_layernorm.weight', 'model.layers.10.post_attention_layernorm.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.23.post_attention_layernorm.weight', 'model.layers.27.post_attention_layernorm.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.0.mlp.up_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.4.input_layernorm.weight', 'model.layers.0.mlp.down_proj.weight', 'model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.30.post_attention_layernorm.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.9.mlp.gate_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.15.post_attention_layernorm.weight', 'model.layers.12.post_attention_layernorm.weight', 'model.layers.17.self_attn.rotary_emb.inv_freq', 'model.layers.17.input_layernorm.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.15.input_layernorm.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.21.input_layernorm.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.8.post_attention_layernorm.weight', 'model.layers.23.input_layernorm.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.13.self_attn.k_proj.weight']
- This IS expected if you are initializing LLaMAForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LLaMAForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of LLaMAForCausalLM were not initialized from the model checkpoint at decapoda-research/llama-7b-hf and are newly initialized: ['model.decoder.layers.25.feed_forward.w3.weight', 'model.decoder.layers.25.self_attn.k_proj.weight', 'model.decoder.layers.22.ffn_norm.weight', 'model.decoder.layers.23.self_attn.o_proj.weight', 'model.decoder.layers.22.self_attn.o_proj.weight', 'model.decoder.layers.26.feed_forward.w2.weight', 'model.decoder.layers.6.self_attn.o_proj.weight', 'model.decoder.layers.14.self_attn.q_proj.weight', 'model.decoder.layers.17.attention_norm.weight', 'model.decoder.layers.19.self_attn.o_proj.weight', 'model.decoder.layers.15.feed_forward.w1.weight', 'model.decoder.layers.21.feed_forward.w2.weight', 'model.decoder.layers.10.self_attn.o_proj.weight', 'model.decoder.layers.24.ffn_norm.weight', 'model.decoder.layers.11.feed_forward.w2.weight', 'model.decoder.layers.15.self_attn.k_proj.weight', 'model.decoder.layers.13.attention_norm.weight', 'model.decoder.layers.6.attention_norm.weight', 'model.decoder.layers.7.attention_norm.weight', 'model.decoder.layers.8.feed_forward.w2.weight', 'model.decoder.layers.18.self_attn.q_proj.weight', 'model.decoder.layers.26.feed_forward.w3.weight', 'model.decoder.layers.15.self_attn.o_proj.weight', 'model.decoder.layers.28.attention_norm.weight', 'model.decoder.layers.31.self_attn.o_proj.weight', 'model.decoder.layers.16.self_attn.o_proj.weight', 'model.decoder.layers.17.self_attn.k_proj.weight', 'model.decoder.layers.13.self_attn.q_proj.weight', 'model.decoder.layers.21.self_attn.o_proj.weight', 'model.decoder.layers.28.self_attn.v_proj.weight', 'model.decoder.layers.30.self_attn.o_proj.weight', 'model.decoder.layers.1.self_attn.o_proj.weight', 'model.decoder.layers.15.self_attn.v_proj.weight', 'model.decoder.layers.1.feed_forward.w1.weight', 'model.decoder.layers.1.feed_forward.w3.weight', 'model.decoder.layers.8.self_attn.v_proj.weight', 'model.decoder.layers.21.self_attn.q_proj.weight', 'model.decoder.layers.3.self_attn.v_proj.weight', 'model.decoder.layers.18.ffn_norm.weight', 'model.decoder.layers.22.feed_forward.w3.weight', 'model.decoder.layers.27.ffn_norm.weight', 'model.decoder.layers.8.ffn_norm.weight', 'model.decoder.layers.8.self_attn.k_proj.weight', 'model.decoder.layers.24.feed_forward.w3.weight', 'model.decoder.layers.14.feed_forward.w3.weight', 'model.decoder.layers.16.attention_norm.weight', 'model.decoder.layers.5.feed_forward.w3.weight', 'model.decoder.layers.11.feed_forward.w3.weight', 'model.decoder.layers.4.attention_norm.weight', 'model.decoder.layers.21.ffn_norm.weight', 'model.decoder.layers.28.self_attn.o_proj.weight', 'model.decoder.layers.30.self_attn.k_proj.weight', 'model.decoder.layers.14.feed_forward.w1.weight', 'model.decoder.layers.16.feed_forward.w2.weight', 'model.decoder.layers.24.feed_forward.w2.weight', 'model.decoder.layers.6.self_attn.k_proj.weight', 'model.decoder.layers.20.attention_norm.weight', 'model.decoder.layers.15.attention_norm.weight', 'model.decoder.layers.3.self_attn.q_proj.weight', 'model.decoder.layers.17.self_attn.o_proj.weight', 'model.decoder.layers.25.feed_forward.w1.weight', 'model.decoder.layers.1.feed_forward.w2.weight', 'model.decoder.layers.19.attention_norm.weight', 'model.decoder.layers.13.feed_forward.w1.weight', 'model.decoder.layers.1.self_attn.k_proj.weight', 'model.decoder.layers.20.feed_forward.w2.weight', 'model.decoder.layers.17.ffn_norm.weight', 'model.decoder.layers.12.attention_norm.weight', 'model.decoder.layers.23.ffn_norm.weight', 'model.decoder.layers.14.self_attn.k_proj.weight', 'model.decoder.layers.26.feed_forward.w1.weight', 'model.decoder.layers.6.feed_forward.w1.weight', 'model.decoder.layers.12.feed_forward.w3.weight', 'model.decoder.layers.0.feed_forward.w3.weight', 'model.decoder.layers.22.feed_forward.w1.weight', 'model.decoder.layers.29.feed_forward.w1.weight', 'model.decoder.layers.19.self_attn.k_proj.weight', 'model.decoder.layers.11.self_attn.v_proj.weight', 'model.decoder.layers.13.feed_forward.w3.weight', 'model.decoder.layers.4.ffn_norm.weight', 'model.decoder.layers.9.attention_norm.weight', 'model.decoder.layers.30.feed_forward.w3.weight', 'model.decoder.layers.17.feed_forward.w1.weight', 'model.decoder.layers.18.self_attn.v_proj.weight', 'model.decoder.layers.29.ffn_norm.weight', 'model.decoder.layers.17.self_attn.q_proj.weight', 'model.decoder.layers.23.feed_forward.w2.weight', 'model.decoder.layers.31.feed_forward.w2.weight', 'model.decoder.layers.11.attention_norm.weight', 'model.decoder.layers.5.feed_forward.w2.weight', 'model.decoder.layers.3.feed_forward.w1.weight', 'model.decoder.layers.29.self_attn.q_proj.weight', 'model.decoder.layers.15.feed_forward.w2.weight', 'model.decoder.layers.2.feed_forward.w1.weight', 'model.decoder.layers.27.feed_forward.w1.weight', 'model.decoder.layers.30.self_attn.q_proj.weight', 'model.decoder.layers.0.attention_norm.weight', 'model.decoder.layers.23.feed_forward.w3.weight', 'model.decoder.layers.5.self_attn.o_proj.weight', 'model.decoder.layers.14.self_attn.v_proj.weight', 'model.decoder.layers.4.feed_forward.w1.weight', 'model.decoder.layers.23.feed_forward.w1.weight', 'model.decoder.layers.29.self_attn.k_proj.weight', 'model.decoder.layers.2.self_attn.v_proj.weight', 'model.decoder.layers.3.ffn_norm.weight', 'model.decoder.layers.11.self_attn.k_proj.weight', 'model.decoder.layers.18.feed_forward.w3.weight', 'model.decoder.layers.27.feed_forward.w2.weight', 'model.decoder.layers.28.self_attn.k_proj.weight', 'model.decoder.layers.31.attention_norm.weight', 'model.decoder.layers.8.feed_forward.w3.weight', 'model.decoder.layers.9.feed_forward.w2.weight', 'model.decoder.layers.10.self_attn.v_proj.weight', 'model.decoder.layers.2.attention_norm.weight', 'model.decoder.layers.14.feed_forward.w2.weight', 'model.decoder.layers.19.feed_forward.w2.weight', 'model.decoder.layers.21.attention_norm.weight', 'model.decoder.layers.24.attention_norm.weight', 'model.decoder.layers.26.self_attn.k_proj.weight', 'model.decoder.layers.26.self_attn.v_proj.weight', 'model.decoder.layers.9.feed_forward.w1.weight', 'model.decoder.layers.31.feed_forward.w1.weight', 'model.decoder.layers.13.self_attn.k_proj.weight', 'model.decoder.layers.6.ffn_norm.weight', 'model.decoder.layers.8.attention_norm.weight', 'model.decoder.layers.19.feed_forward.w3.weight', 'model.decoder.layers.27.feed_forward.w3.weight', 'model.decoder.layers.18.self_attn.o_proj.weight', 'model.decoder.layers.27.self_attn.q_proj.weight', 'model.decoder.layers.8.self_attn.o_proj.weight', 'model.decoder.layers.27.self_attn.v_proj.weight', 'model.decoder.layers.9.self_attn.q_proj.weight', 'model.decoder.layers.6.self_attn.v_proj.weight', 'model.decoder.layers.0.ffn_norm.weight', 'model.decoder.layers.19.self_attn.v_proj.weight', 'model.decoder.layers.22.feed_forward.w2.weight', 'model.decoder.layers.29.self_attn.o_proj.weight', 'model.decoder.layers.1.self_attn.v_proj.weight', 'model.decoder.layers.11.ffn_norm.weight', 'model.decoder.layers.4.self_attn.q_proj.weight', 'model.decoder.layers.10.feed_forward.w3.weight', 'model.decoder.layers.18.self_attn.k_proj.weight', 'model.decoder.layers.16.self_attn.v_proj.weight', 'model.decoder.layers.7.feed_forward.w3.weight', 'model.decoder.layers.23.self_attn.v_proj.weight', 'model.decoder.layers.5.ffn_norm.weight', 'model.decoder.layers.9.feed_forward.w3.weight', 'model.decoder.layers.20.self_attn.k_proj.weight', 'model.decoder.norm.weight', 'model.decoder.layers.3.attention_norm.weight', 'model.decoder.layers.10.ffn_norm.weight', 'model.decoder.layers.17.feed_forward.w2.weight', 'model.decoder.layers.3.self_attn.k_proj.weight', 'model.decoder.layers.5.attention_norm.weight', 'model.decoder.layers.20.ffn_norm.weight', 'model.decoder.layers.4.feed_forward.w2.weight', 'model.decoder.layers.17.feed_forward.w3.weight', 'model.decoder.layers.20.self_attn.o_proj.weight', 'model.decoder.layers.13.self_attn.v_proj.weight', 'model.decoder.layers.22.self_attn.k_proj.weight', 'model.decoder.layers.30.feed_forward.w1.weight', 'model.decoder.layers.4.feed_forward.w3.weight', 'model.decoder.layers.24.feed_forward.w1.weight', 'model.decoder.layers.30.attention_norm.weight', 'model.decoder.layers.7.self_attn.k_proj.weight', 'model.decoder.layers.5.self_attn.v_proj.weight', 'model.decoder.layers.1.attention_norm.weight', 'model.decoder.layers.7.ffn_norm.weight', 'model.decoder.layers.21.self_attn.k_proj.weight', 'model.decoder.layers.31.ffn_norm.weight', 'model.decoder.layers.3.feed_forward.w2.weight', 'model.decoder.layers.7.self_attn.o_proj.weight', 'model.decoder.layers.21.self_attn.v_proj.weight', 'model.decoder.layers.0.self_attn.k_proj.weight', 'model.decoder.layers.19.feed_forward.w1.weight', 'model.decoder.layers.2.self_attn.k_proj.weight', 'model.decoder.layers.24.self_attn.o_proj.weight', 'model.decoder.layers.10.feed_forward.w2.weight', 'model.decoder.layers.9.self_attn.v_proj.weight', 'model.decoder.layers.10.attention_norm.weight', 'model.decoder.layers.23.self_attn.q_proj.weight', 'model.decoder.layers.13.self_attn.o_proj.weight', 'model.decoder.layers.16.feed_forward.w3.weight', 'model.decoder.layers.1.ffn_norm.weight', 'model.decoder.layers.20.feed_forward.w1.weight', 'model.decoder.layers.15.self_attn.q_proj.weight', 'model.decoder.layers.24.self_attn.q_proj.weight', 'model.decoder.layers.2.ffn_norm.weight', 'model.decoder.layers.15.feed_forward.w3.weight', 'model.decoder.layers.29.attention_norm.weight', 'model.decoder.layers.4.self_attn.k_proj.weight', 'model.decoder.layers.10.feed_forward.w1.weight', 'model.decoder.layers.15.ffn_norm.weight', 'model.decoder.layers.12.self_attn.o_proj.weight', 'model.decoder.layers.2.self_attn.o_proj.weight', 'model.decoder.layers.23.self_attn.k_proj.weight', 'model.decoder.layers.25.ffn_norm.weight', 'model.decoder.layers.12.feed_forward.w1.weight', 'model.decoder.layers.28.feed_forward.w3.weight', 'model.decoder.layers.16.self_attn.k_proj.weight', 'model.decoder.layers.28.ffn_norm.weight', 'model.decoder.layers.26.self_attn.q_proj.weight', 'model.decoder.layers.13.ffn_norm.weight', 'model.decoder.layers.20.self_attn.q_proj.weight', 'model.decoder.layers.20.self_attn.v_proj.weight', 'model.decoder.layers.13.feed_forward.w2.weight', 'model.decoder.layers.26.self_attn.o_proj.weight', 'model.decoder.layers.9.ffn_norm.weight', 'model.decoder.layers.24.self_attn.v_proj.weight', 'model.decoder.layers.25.self_attn.o_proj.weight', 'model.decoder.layers.6.self_attn.q_proj.weight', 'model.decoder.layers.25.self_attn.v_proj.weight', 'model.decoder.layers.28.feed_forward.w2.weight', 'model.decoder.layers.22.attention_norm.weight', 'model.decoder.layers.10.self_attn.q_proj.weight', 'model.decoder.layers.27.self_attn.o_proj.weight', 'model.decoder.embed_tokens.weight', 'model.decoder.layers.4.self_attn.o_proj.weight', 'model.decoder.layers.14.ffn_norm.weight', 'model.decoder.layers.25.feed_forward.w2.weight', 'model.decoder.layers.21.feed_forward.w1.weight', 'model.decoder.layers.7.feed_forward.w2.weight', 'model.decoder.layers.0.feed_forward.w1.weight', 'model.decoder.layers.25.attention_norm.weight', 'model.decoder.layers.25.self_attn.q_proj.weight', 'model.decoder.layers.27.attention_norm.weight', 'model.decoder.layers.7.feed_forward.w1.weight', 'model.decoder.layers.29.self_attn.v_proj.weight', 'model.decoder.layers.2.feed_forward.w3.weight', 'model.decoder.layers.17.self_attn.v_proj.weight', 'model.decoder.layers.12.feed_forward.w2.weight', 'model.decoder.layers.5.feed_forward.w1.weight', 'model.decoder.layers.3.self_attn.o_proj.weight', 'model.decoder.layers.6.feed_forward.w2.weight', 'model.decoder.layers.11.self_attn.o_proj.weight', 'model.decoder.layers.19.ffn_norm.weight', 'model.decoder.layers.9.self_attn.o_proj.weight', 'model.decoder.layers.18.feed_forward.w1.weight', 'model.decoder.layers.0.feed_forward.w2.weight', 'model.decoder.layers.26.ffn_norm.weight', 'model.decoder.layers.30.self_attn.v_proj.weight', 'model.decoder.layers.27.self_attn.k_proj.weight', 'model.decoder.layers.30.feed_forward.w2.weight', 'model.decoder.layers.31.self_attn.k_proj.weight', 'model.decoder.layers.19.self_attn.q_proj.weight', 'model.decoder.layers.30.ffn_norm.weight', 'model.decoder.layers.2.feed_forward.w2.weight', 'model.decoder.layers.8.self_attn.q_proj.weight', 'model.decoder.layers.22.self_attn.q_proj.weight', 'model.decoder.layers.29.feed_forward.w3.weight', 'model.decoder.layers.0.self_attn.o_proj.weight', 'model.decoder.layers.5.self_attn.q_proj.weight', 'model.decoder.layers.11.feed_forward.w1.weight', 'model.decoder.layers.12.self_attn.k_proj.weight', 'model.decoder.layers.12.ffn_norm.weight', 'model.decoder.layers.28.self_attn.q_proj.weight', 'model.decoder.layers.14.self_attn.o_proj.weight', 'model.decoder.layers.29.feed_forward.w2.weight', 'model.decoder.layers.11.self_attn.q_proj.weight', 'model.decoder.layers.3.feed_forward.w3.weight', 'model.decoder.layers.0.self_attn.v_proj.weight', 'model.decoder.layers.7.self_attn.v_proj.weight', 'model.decoder.layers.12.self_attn.q_proj.weight', 'model.decoder.layers.21.feed_forward.w3.weight', 'model.decoder.layers.23.attention_norm.weight', 'model.decoder.layers.16.ffn_norm.weight', 'model.decoder.layers.9.self_attn.k_proj.weight', 'model.decoder.layers.10.self_attn.k_proj.weight', 'model.decoder.layers.31.feed_forward.w3.weight', 'model.decoder.layers.0.self_attn.q_proj.weight', 'model.decoder.layers.2.self_attn.q_proj.weight', 'model.decoder.layers.18.feed_forward.w2.weight', 'model.decoder.layers.18.attention_norm.weight', 'model.decoder.layers.1.self_attn.q_proj.weight', 'model.decoder.layers.4.self_attn.v_proj.weight', 'model.decoder.layers.22.self_attn.v_proj.weight', 'model.decoder.layers.24.self_attn.k_proj.weight', 'model.decoder.layers.6.feed_forward.w3.weight', 'model.decoder.layers.31.self_attn.v_proj.weight', 'model.decoder.layers.16.feed_forward.w1.weight', 'model.decoder.layers.16.self_attn.q_proj.weight', 'model.decoder.layers.5.self_attn.k_proj.weight', 'model.decoder.layers.20.feed_forward.w3.weight', 'model.decoder.layers.12.self_attn.v_proj.weight', 'model.decoder.layers.8.feed_forward.w1.weight', 'model.decoder.layers.28.feed_forward.w1.weight', 'model.decoder.layers.31.self_attn.q_proj.weight', 'model.decoder.layers.14.attention_norm.weight', 'model.decoder.layers.7.self_attn.q_proj.weight', 'model.decoder.layers.26.attention_norm.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Downloading (…)neration_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 124/124 [00:00<00:00, 24.8kB/s]
Downloading (…)/adapter_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 370/370 [00:00<00:00, 163kB/s]
Downloading adapter_model.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16.8M/16.8M [00:00<00:00, 89.7MB/s]
Instruction: Tell me about alpacas.
Traceback (most recent call last):
  File "/home/t-enshengshi/workspace/alpaca-lora/generate.py", line 77, in <module>
    print("Response:", evaluate(instruction))
  File "/home/t-enshengshi/workspace/alpaca-lora/generate.py", line 51, in evaluate
    generation_output = model.generate(
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/peft/peft_model.py", line 581, in generate
    outputs = self.base_model.generate(**kwargs)
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/generation/utils.py", line 1490, in generate
    return self.beam_search(
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/generation/utils.py", line 2749, in beam_search
    outputs = self(
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 852, in forward
    outputs = self.model.decoder(
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 624, in forward
    layer_outputs = decoder_layer(
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 305, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 165, in forward
    query_states = self.q_proj(hidden_states).view(bsz, tgt_len, self.num_heads, self.head_dim)
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/peft/tuners/lora.py", line 522, in forward
    result = super().forward(x)
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/t-enshengshi/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 390, in forward
    output = torch.nn.functional.linear(A_wo_outliers, state.CB.to(A.dtype))
AttributeError: 'NoneType' object has no attribute 'to'

Proposal: Should we create a discord or slack channel to discuss the issues?

Should we create a discord or slack channel to discuss the issues?

m1/m2 need support

https://github.com/antimatter15/alpaca.cpp

Getting the following issue on loading the model in colab

/usr/local/lib/python3.9/dist-packages/peft/tuners/lora.py in _find_and_replace(self)
146 parent, target, target_name = self._get_submodules(key)
147 bias = target.bias is not None
--> 148 if loaded_in_8bit and isinstance(target, bnb.nn.Linear8bitLt):
149 kwargs.update(
150 {

NameError: name 'bnb' is not defined

TypeError: 'NoneType' object is not subscriptable (I want to do it in a new test environment, not in the gradio.)

The code performed fine in colab. I want to do it in a new test environment, not in the gradio on my development server.

However, the code below gives an error.

generation_output = model.generate(
            input_ids=input_ids,
            generation_config=generation_config,
            return_dict_in_generate=True,
            output_scores=True,
            max_new_tokens=2048,
        )

test.py is equivalent to generate.py.

/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/transformers/generation/utils.py:1374: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your m
(conda_alpaca) jovyan@ranking-0:~/alpaca-lora$ python3 test.py 

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /opt/conda/envs/conda_alpaca/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 118
/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
  warn(msg)
CUDA SETUP: Loading binary /opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so...
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:14<00:00,  2.22it/s]
Instruction: Tell me about alpacas.
Traceback (most recent call last):
  File "test.py", line 90, in <module>
    print("Response:", evaluate(instruction))
  File "test.py", line 47, in evaluate
    generation_output = model.generate(
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/peft/peft_model.py", line 581, in generate
    outputs = self.base_model.generate(**kwargs)
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/transformers/generation/utils.py", line 1490, in generate
    return self.beam_search(
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/transformers/generation/utils.py", line 2749, in beam_search
    outputs = self(
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 772, in forward
    outputs = self.model(
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 621, in forward
    layer_outputs = decoder_layer(
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 316, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 216, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/peft/tuners/lora.py", line 522, in forward
    result = super().forward(x)
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/opt/conda/envs/conda_alpaca/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 360, in forward
    outliers = state.CB[:, state.idx.long()].clone()
TypeError: 'NoneType' object is not subscriptable

Something went wrong Unexpected token '<', "<html> <h"... is not valid JSON

When using CPU as a device, after trying various prompts, gradio returns with the following error when running generate.py:


Something went wrong
Unexpected token '<', "<html> <h"... is not valid JSON

Interestingly I do not see any errors in the console log, it seems to be a formatting issue with the gradio ui.

the prompts I tried:

Write a Python program that prints the first 10 Fibonacci numbers.
write python code that renames all files starting with "mleml" in the current folder to "kek".

Inference on Windows

Hi, would love to test the LORA model, did anyone try and do it on windows?

Is Google Cloud TPU supported?

Hi. I got some free TPU quotas from Google and I tried to train a model on a Google Cloud TPU VM (v2-8). It can download the model but had the following error. Below are the full logs:

$python3 finetune.py 

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
/home/aicheung/.local/lib/python3.8/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Traceback (most recent call last):
  File "finetune.py", line 46, in <module>
    model = LlamaForCausalLM.from_pretrained(
  File "/home/aicheung/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2591, in from_pretrained
    raise ValueError(
ValueError: 
                        Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
                        the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
                        these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom
                        `device_map` to `from_pretrained`. Check
                        https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
                        for more details.

Is TPU not supported? I have no experience with HuggingFace's libraries (only used Tensorflow before) so I am not sure how it works. Thanks.

nvm

Anyone try fine-tuning 13B model?

Training the 7B model takes about 18GB of RAM.

I tried training the 13B model, and ran out of VRAM on my 24GB card. I suspect, will need at least 32GB of VRAM.

Has anyone else been successful with fine-tuning a 13B model?

libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

Hello I'm trying to run the generate.py example but I get the following error ("decapoda-research/llama-7b-hf" was previously downloaded) :

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /root/anaconda3/envs/alpaca-lora did not contain libcudart.so as expected! Searching further paths...
  warn(msg)
/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
  warn(msg)
/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: :/usr/local/cuda/lib64/ did not contain libcudart.so as expected! Searching further paths...
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
ERROR: python: undefined symbol: cudaRuntimeGetVersion
CUDA SETUP: libcudart.so path is None
CUDA SETUP: Is seems that your cuda installation is not in your path. See https://github.com/TimDettmers/bitsandbytes/issues/85 for more information.
CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
  warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 00
CUDA SETUP: Loading binary /root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.
Loading checkpoint shards:   0%|                                                                                                                   | 0/33 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/root/projects/alpaca-lora/generate.py", line 13, in <module>
    model = LlamaForCausalLM.from_pretrained(
  File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2646, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2969, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/modeling_utils.py", line 676, in _load_state_dict_into_meta_model
    set_module_8bit_tensor_to_device(model, param_name, param_device, value=param)
  File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/transformers/utils/bitsandbytes.py", line 70, in set_module_8bit_tensor_to_device
    new_value = bnb.nn.Int8Params(new_value, requires_grad=False, has_fp16_weights=has_fp16_weights).to(device)
  File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 196, in to
    return self.cuda(device)
  File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 160, in cuda
    CB, CBt, SCB, SCBt, coo_tensorB = bnb.functional.double_quant(B)
  File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1616, in double_quant
    row_stats, col_stats, nnz_row_ptr = get_colrow_absmax(
  File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1505, in get_colrow_absmax
    lib.cget_col_row_stats(ptrA, ptrRowStats, ptrColStats, ptrNnzrows, ct.c_float(threshold), rows, cols)
  File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/ctypes/__init__.py", line 395, in __getattr__
    func = self.__getitem__(name)
  File "/root/anaconda3/envs/alpaca-lora/lib/python3.9/ctypes/__init__.py", line 400, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /root/anaconda3/envs/alpaca-lora/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

I'm running into a conda environment based on python 3.9 and I installed the requirements with pip. Also I compiled the bitsandbytes package from source by running

git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes/
make cuda11x
CUDA_VERSION=112
python setup.py install

Here are the nvidia-smi and nvcc --version prints:

Fri Mar 17 13:53:10 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 4000     On   | 00000000:00:10.0 Off |                  N/A |
| 30%   34C    P8     6W / 125W |      1MiB /  7982MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0

which nvcc:
/usr/bin/nvcc
which nvidia-smi:
/usr/bin/nvidia-smi

Any idea what the problem might be? Thank you in advance.

Why is the result the same every time?

Does it work with multi-gpu training or accelearate?

First of all, thanks for building this up so quickly!
I really appreciate your hard work.

I was wondering if this code works w/o 8-bit quantization in general since it uses HuggingFace's common interface

How to combine: base model + trained lora back to HF model?

I've just played with 4bit quantization and it works really good. Much faster loading and inference time, ability to load much bigger model on GPU without quality degradation. It's just like magic.

But in order to make quantized lora-trained model I need to combine somehow base HF + trained Lora and get new model in HF format. This new model can be quantized with GPTQ-for-LLaMa script.

Here is a someones guide for 4bit LLAMA if you want to try

Can anybody help?

Question: Does training data order matter at all?

I wonder if the order of the data (in https://github.com/tloen/alpaca-lora/blob/main/alpaca_data.json) matters for training.

For example, If I wanted to add my own data to the original dataset in the cleaned JSON file, would it be fine to append it to the end of the JSON? Or should I append and then shuffle the order?

Excuse my ignorance, as I am new to this technology entirely.

Regards

MAX_LENGTH differs from Stanford Alpaca training

Noticed you have MAX_LENGTH set to 256, while Stanford used 512.
Is there a reason you set it to the smaller value? Curious if you are getting better results and what your reasoning was for using 256?

Example for how I may continue fine tuning the peft model

Hey!, I apologize if this is a rather generic question

I'm not able to find good examples on how I may continue training with peft over on the peft repository from a stored peft checkpoint,
and since the fine tuning code only shows how to fine tune from scratch, I'd be greatful if I could be given an example on how I may fine tune alpaca from the stored peft checkpoint instead of scratch

thanks, and I really appreciate all the work put into this project!

Doesn't use GPU during inference or training

It seems like it's loading the model inside vram, but it's not utilizing the GPU during training or generation. It's using the CPU. I tried adding device_map={'': 0} as some others suggested, but it didn't fix the problem. Currently on bitsandbytes 0.37.0.

How to finetune model with a new knowledge?

Sorry for my probably newbie questions.

As I understand with current finetuning we teach how model should answer to our questions based on already present knowledge from basic training.

But what If I want to include new knowledge to the model and be able to ask different questions about it?
For example: make model learn the whole novel/specific town latest news/specific scientific paper etc. Then ask it to summarize or analyze something within that new knowledge. In other words, if I want to ask a model questions based on huge input text (which is 100-1000 times bigger than the max input tokens).

How can I achieve this? Or where I can learn how to do it?

Distribute via safetensors

feature request from @Ayushk4

Differences between this code and zphang's minimal-llama trainig code

Originally I was playing around with https://github.com/zphang/minimal-llama/ to generate alpaca-like adaptation here. Both use peft but in a slightly different fashion, like different parameter saving and using a custom trainer. Was wondering which one is the idiomatic approach?

use a smaller LR?

The Karparthy Constant used currently might be too high? The loss for this training run is not going down after LR increases beyond ~1e-4:

AttributeError: 'NoneType' object has no attribute 'device'

I got the error as blow and hope someone can solve it.
I have change the device_map(such as "balanced", "balanced_low_0", "sequential") in

model = LLaMAForCausalLM.from_pretrained(
    "decapoda-research/llama-7b-hf",
    load_in_8bit=True,
    device_map="auto",
)

but not working.

Error

evaluate(input("Instruction: ")) # how to learn english

    Instruction: how to learn english



    ---------------------------------------------------------------------------

    AttributeError                            Traceback (most recent call last)

    Cell In[16], line 1
    ----> 1 evaluate(input("Instruction: "))


    Cell In[15], line 11, in evaluate(instruction, input)
          9 inputs = tokenizer(prompt, return_tensors="pt")
         10 input_ids = inputs["input_ids"].cuda()
    ---> 11 generation_output = model.generate(
         12     input_ids=input_ids,
         13     generation_config=generation_config,
         14     return_dict_in_generate=True,
         15     output_scores=True,
         16     max_new_tokens=256
         17 )
         18 for s in generation_output.sequences:
         19     output = tokenizer.decode(s)


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/peft/peft_model.py:581, in PeftModelForCausalLM.generate(self, **kwargs)
        579 try:
        580     if not isinstance(self.peft_config, PromptLearningConfig):
    --> 581         outputs = self.base_model.generate(**kwargs)
        582     else:
        583         if "input_ids" not in kwargs:


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
        112 @functools.wraps(func)
        113 def decorate_context(*args, **kwargs):
        114     with ctx_factory():
    --> 115         return func(*args, **kwargs)


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/transformers/generation/utils.py:1490, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, **kwargs)
       1483     input_ids, model_kwargs = self._expand_inputs_for_generation(
       1484         input_ids=input_ids,
       1485         expand_size=generation_config.num_beams,
       1486         is_encoder_decoder=self.config.is_encoder_decoder,
       1487         **model_kwargs,
       1488     )
       1489     # 13. run beam search
    -> 1490     return self.beam_search(
       1491         input_ids,
       1492         beam_scorer,
       1493         logits_processor=logits_processor,
       1494         stopping_criteria=stopping_criteria,
       1495         pad_token_id=generation_config.pad_token_id,
       1496         eos_token_id=generation_config.eos_token_id,
       1497         output_scores=generation_config.output_scores,
       1498         return_dict_in_generate=generation_config.return_dict_in_generate,
       1499         synced_gpus=synced_gpus,
       1500         **model_kwargs,
       1501     )
       1503 elif is_beam_sample_gen_mode:
       1504     # 11. prepare logits warper
       1505     logits_warper = self._get_logits_warper(generation_config)


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/transformers/generation/utils.py:2749, in GenerationMixin.beam_search(self, input_ids, beam_scorer, logits_processor, stopping_criteria, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, **model_kwargs)
       2745         break
       2747 model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
    -> 2749 outputs = self(
       2750     **model_inputs,
       2751     return_dict=True,
       2752     output_attentions=output_attentions,
       2753     output_hidden_states=output_hidden_states,
       2754 )
       2756 if synced_gpus and this_peer_finished:
       2757     cur_len = cur_len + 1


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
       1496 # If we don't have any hooks, we want to skip the rest of the logic in
       1497 # this function, and just call forward.
       1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
       1499         or _global_backward_pre_hooks or _global_backward_hooks
       1500         or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1501     return forward_call(*args, **kwargs)
       1502 # Do not call functions when jit is used
       1503 full_backward_hooks, non_full_backward_hooks = [], []


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/accelerate/hooks.py:165, in add_hook_to_module.<locals>.new_forward(*args, **kwargs)
        163         output = old_forward(*args, **kwargs)
        164 else:
    --> 165     output = old_forward(*args, **kwargs)
        166 return module._hf_hook.post_forward(module, output)


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:770, in LLaMAForCausalLM.forward(self, input_ids, attention_mask, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
        767 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        769 # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
    --> 770 outputs = self.model(
        771     input_ids=input_ids,
        772     attention_mask=attention_mask,
        773     past_key_values=past_key_values,
        774     inputs_embeds=inputs_embeds,
        775     use_cache=use_cache,
        776     output_attentions=output_attentions,
        777     output_hidden_states=output_hidden_states,
        778     return_dict=return_dict,
        779 )
        781 hidden_states = outputs[0]
        782 logits = self.lm_head(hidden_states)


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
       1496 # If we don't have any hooks, we want to skip the rest of the logic in
       1497 # this function, and just call forward.
       1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
       1499         or _global_backward_pre_hooks or _global_backward_hooks
       1500         or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1501     return forward_call(*args, **kwargs)
       1502 # Do not call functions when jit is used
       1503 full_backward_hooks, non_full_backward_hooks = [], []


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:619, in LLaMAModel.forward(self, input_ids, attention_mask, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
        612     layer_outputs = torch.utils.checkpoint.checkpoint(
        613         create_custom_forward(decoder_layer),
        614         hidden_states,
        615         attention_mask,
        616         None,
        617     )
        618 else:
    --> 619     layer_outputs = decoder_layer(
        620         hidden_states,
        621         attention_mask=attention_mask,
        622         past_key_value=past_key_value,
        623         output_attentions=output_attentions,
        624         use_cache=use_cache,
        625     )
        627 hidden_states = layer_outputs[0]
        629 if use_cache:


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
       1496 # If we don't have any hooks, we want to skip the rest of the logic in
       1497 # this function, and just call forward.
       1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
       1499         or _global_backward_pre_hooks or _global_backward_hooks
       1500         or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1501     return forward_call(*args, **kwargs)
       1502 # Do not call functions when jit is used
       1503 full_backward_hooks, non_full_backward_hooks = [], []


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/accelerate/hooks.py:165, in add_hook_to_module.<locals>.new_forward(*args, **kwargs)
        163         output = old_forward(*args, **kwargs)
        164 else:
    --> 165     output = old_forward(*args, **kwargs)
        166 return module._hf_hook.post_forward(module, output)


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:316, in LLaMADecoderLayer.forward(self, hidden_states, attention_mask, output_attentions, use_cache, past_key_value)
        313 hidden_states = self.input_layernorm(hidden_states)
        315 # Self Attention
    --> 316 hidden_states, self_attn_weights, present_key_value = self.self_attn(
        317     hidden_states=hidden_states,
        318     past_key_value=past_key_value,
        319     attention_mask=attention_mask,
        320     output_attentions=output_attentions,
        321 )
        322 hidden_states = residual + hidden_states
        324 # Fully Connected


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
       1496 # If we don't have any hooks, we want to skip the rest of the logic in
       1497 # this function, and just call forward.
       1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
       1499         or _global_backward_pre_hooks or _global_backward_hooks
       1500         or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1501     return forward_call(*args, **kwargs)
       1502 # Do not call functions when jit is used
       1503 full_backward_hooks, non_full_backward_hooks = [], []


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/accelerate/hooks.py:165, in add_hook_to_module.<locals>.new_forward(*args, **kwargs)
        163         output = old_forward(*args, **kwargs)
        164 else:
    --> 165     output = old_forward(*args, **kwargs)
        166 return module._hf_hook.post_forward(module, output)


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:216, in LLaMAAttention.forward(self, hidden_states, past_key_value, attention_mask, output_attentions)
        212 """Input shape: Batch x Time x Channel"""
        214 bsz, q_len, _ = hidden_states.size()
    --> 216 query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
        217 key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
        218 value_states = self.v_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
       1496 # If we don't have any hooks, we want to skip the rest of the logic in
       1497 # this function, and just call forward.
       1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
       1499         or _global_backward_pre_hooks or _global_backward_hooks
       1500         or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1501     return forward_call(*args, **kwargs)
       1502 # Do not call functions when jit is used
       1503 full_backward_hooks, non_full_backward_hooks = [], []


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/accelerate/hooks.py:165, in add_hook_to_module.<locals>.new_forward(*args, **kwargs)
        163         output = old_forward(*args, **kwargs)
        164 else:
    --> 165     output = old_forward(*args, **kwargs)
        166 return module._hf_hook.post_forward(module, output)


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/peft/tuners/lora.py:522, in Linear8bitLt.forward(self, x)
        521 def forward(self, x: torch.Tensor):
    --> 522     result = super().forward(x)
        524     if self.disable_adapters:
        525         return result


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/bitsandbytes/nn/modules.py:242, in Linear8bitLt.forward(self, x)
        239 if self.bias is not None and self.bias.dtype != x.dtype:
        240     self.bias.data = self.bias.data.to(x.dtype)
    --> 242 out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
        243 if not self.state.has_fp16_weights:
        244     if self.state.CB is not None and self.state.CxB is not None:
        245         # we converted 8-bit row major to turing/ampere format in the first inference pass
        246         # we no longer need the row-major weight


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py:488, in matmul(A, B, out, state, threshold, bias)
        486 if threshold > 0.0:
        487     state.threshold = threshold
    --> 488 return MatMul8bitLt.apply(A, B, out, bias, state)


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/torch/autograd/function.py:506, in Function.apply(cls, *args, **kwargs)
        503 if not torch._C._are_functorch_transforms_active():
        504     # See NOTE: [functorch vjp and autograd interaction]
        505     args = _functorch.utils.unwrap_dead_wrappers(args)
    --> 506     return super().apply(*args, **kwargs)  # type: ignore[misc]
        508 if cls.setup_context == _SingleLevelFunction.setup_context:
        509     raise RuntimeError(
        510         'In order to use an autograd.Function with functorch transforms '
        511         '(vmap, grad, jvp, jacrev, ...), it must override the setup_context '
        512         'staticmethod. For more details, please see '
        513         'https://pytorch.org/docs/master/notes/extending.func.html')


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py:317, in MatMul8bitLt.forward(ctx, A, B, out, bias, state)
        313     else:
        314         if state.CxB is None and using_igemmlt:
        315             # B in in 8-bit row-major, we can transform it back to 16-bit to extract outlier dimensions
        316             # we also need to convert it to the turing/ampere format
    --> 317             state.CxB, state.SB = F.transform(state.CB, to_order=formatB)
        318 else:
        319     if not state.has_fp16_weights and state.CxB is None and using_igemmlt:


    File ~/miniconda3/envs/LiuJieTest/lib/python3.9/site-packages/bitsandbytes/functional.py:1698, in transform(A, to_order, from_order, out, transpose, state, ld)
       1697 def transform(A, to_order, from_order='row', out=None, transpose=False, state=None, ld=None):
    -> 1698     prev_device = pre_call(A.device)
       1699     if state is None: state = (A.shape, from_order)
       1700     else: from_order = state[1]


    AttributeError: 'NoneType' object has no attribute 'device'

Enviroment:

!nvidia-smi

Thu Mar 16 16:37:03 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01    Driver Version: 510.39.01    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:1A:00.0 Off |                  N/A |
| 31%   32C    P2    50W / 250W |   8012MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:3D:00.0 Off |                  N/A |
| 29%   31C    P2    50W / 250W |   4350MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3903      C   ...nvs/LiuJieTest/bin/python     8009MiB |
|    1   N/A  N/A      3903      C   ...nvs/LiuJieTest/bin/python     4347MiB |
+-----------------------------------------------------------------------------+

conda list

    # packages in environment at /home/zhuji/miniconda3/envs/LiuJieTest:
    #
    # Name                    Version                   Build  Channel
    _libgcc_mutex             0.1                        main  
    _openmp_mutex             5.1                       1_gnu  
    accelerate                0.17.1                   pypi_0    pypi
    aiohttp                   3.8.4                    pypi_0    pypi
    aiosignal                 1.3.1                    pypi_0    pypi
    anyio                     3.6.2                    pypi_0    pypi
    argon2-cffi               21.3.0                   pypi_0    pypi
    argon2-cffi-bindings      21.2.0                   pypi_0    pypi
    arrow                     1.2.3                    pypi_0    pypi
    asgiref                   3.6.0                    pypi_0    pypi
    asttokens                 2.2.1                    pypi_0    pypi
    async-timeout             4.0.2                    pypi_0    pypi
    attrs                     22.2.0                   pypi_0    pypi
    backcall                  0.2.0                    pypi_0    pypi
    beautifulsoup4            4.11.2                   pypi_0    pypi
    bitsandbytes              0.37.1                   pypi_0    pypi
    bleach                    6.0.0                    pypi_0    pypi
    ca-certificates           2023.01.10           h06a4308_0  
    certifi                   2022.12.7        py39h06a4308_0  
    cffi                      1.15.1                   pypi_0    pypi
    charset-normalizer        3.1.0                    pypi_0    pypi
    cmake                     3.26.0                   pypi_0    pypi
    comm                      0.1.2                    pypi_0    pypi
    datasets                  2.10.1                   pypi_0    pypi
    debugpy                   1.6.6                    pypi_0    pypi
    decorator                 5.1.1                    pypi_0    pypi
    defusedxml                0.7.1                    pypi_0    pypi
    dill                      0.3.6                    pypi_0    pypi
    django                    4.1.7                    pypi_0    pypi
    executing                 1.2.0                    pypi_0    pypi
    fastjsonschema            2.16.3                   pypi_0    pypi
    filelock                  3.10.0                   pypi_0    pypi
    fqdn                      1.5.1                    pypi_0    pypi
    frozenlist                1.3.3                    pypi_0    pypi
    fsspec                    2023.3.0                 pypi_0    pypi
    huggingface-hub           0.13.2                   pypi_0    pypi
    idna                      3.4                      pypi_0    pypi
    importlib-metadata        6.0.0                    pypi_0    pypi
    ipykernel                 6.21.3                   pypi_0    pypi
    ipython                   8.11.0                   pypi_0    pypi
    ipython-genutils          0.2.0                    pypi_0    pypi
    ipywidgets                8.0.4                    pypi_0    pypi
    isoduration               20.11.0                  pypi_0    pypi
    jedi                      0.18.2                   pypi_0    pypi
    jinja2                    3.1.2                    pypi_0    pypi
    jsonpointer               2.3                      pypi_0    pypi
    jsonschema                4.17.3                   pypi_0    pypi
    jupyter                   1.0.0                    pypi_0    pypi
    jupyter-client            8.0.3                    pypi_0    pypi
    jupyter-console           6.6.3                    pypi_0    pypi
    jupyter-core              5.2.0                    pypi_0    pypi
    jupyter-events            0.6.3                    pypi_0    pypi
    jupyter-server            2.4.0                    pypi_0    pypi
    jupyter-server-terminals  0.4.4                    pypi_0    pypi
    jupyterlab-pygments       0.2.2                    pypi_0    pypi
    jupyterlab-widgets        3.0.5                    pypi_0    pypi
    ld_impl_linux-64          2.38                 h1181459_1  
    libffi                    3.4.2                h6a678d5_6  
    libgcc-ng                 11.2.0               h1234567_1  
    libgomp                   11.2.0               h1234567_1  
    libstdcxx-ng              11.2.0               h1234567_1  
    lit                       15.0.7                   pypi_0    pypi
    loralib                   0.1.1                    pypi_0    pypi
    markupsafe                2.1.2                    pypi_0    pypi
    matplotlib-inline         0.1.6                    pypi_0    pypi
    mistune                   2.0.5                    pypi_0    pypi
    mpmath                    1.3.0                    pypi_0    pypi
    multidict                 6.0.4                    pypi_0    pypi
    multiprocess              0.70.14                  pypi_0    pypi
    nbclassic                 0.5.3                    pypi_0    pypi
    nbclient                  0.7.2                    pypi_0    pypi
    nbconvert                 7.2.10                   pypi_0    pypi
    nbformat                  5.7.3                    pypi_0    pypi
    ncurses                   6.4                  h6a678d5_0  
    nest-asyncio              1.5.6                    pypi_0    pypi
    networkx                  3.0                      pypi_0    pypi
    notebook                  6.5.3                    pypi_0    pypi
    notebook-shim             0.2.2                    pypi_0    pypi
    numpy                     1.24.2                   pypi_0    pypi
    nvidia-cublas-cu11        11.10.3.66               pypi_0    pypi
    nvidia-cuda-cupti-cu11    11.7.101                 pypi_0    pypi
    nvidia-cuda-nvrtc-cu11    11.7.99                  pypi_0    pypi
    nvidia-cuda-runtime-cu11  11.7.99                  pypi_0    pypi
    nvidia-cudnn-cu11         8.5.0.96                 pypi_0    pypi
    nvidia-cufft-cu11         10.9.0.58                pypi_0    pypi
    nvidia-curand-cu11        10.2.10.91               pypi_0    pypi
    nvidia-cusolver-cu11      11.4.0.1                 pypi_0    pypi
    nvidia-cusparse-cu11      11.7.4.91                pypi_0    pypi
    nvidia-nccl-cu11          2.14.3                   pypi_0    pypi
    nvidia-nvtx-cu11          11.7.91                  pypi_0    pypi
    openssl                   1.1.1t               h7f8727e_0  
    packaging                 23.0                     pypi_0    pypi
    pandas                    1.5.3                    pypi_0    pypi
    pandocfilters             1.5.0                    pypi_0    pypi
    parso                     0.8.3                    pypi_0    pypi
    peft                      0.3.0.dev0               pypi_0    pypi
    pexpect                   4.8.0                    pypi_0    pypi
    pickleshare               0.7.5                    pypi_0    pypi
    pip                       23.0.1           py39h06a4308_0  
    platformdirs              3.1.1                    pypi_0    pypi
    prometheus-client         0.16.0                   pypi_0    pypi
    prompt-toolkit            3.0.38                   pypi_0    pypi
    psutil                    5.9.4                    pypi_0    pypi
    ptyprocess                0.7.0                    pypi_0    pypi
    pure-eval                 0.2.2                    pypi_0    pypi
    pyarrow                   11.0.0                   pypi_0    pypi
    pycparser                 2.21                     pypi_0    pypi
    pygments                  2.14.0                   pypi_0    pypi
    pyrsistent                0.19.3                   pypi_0    pypi
    python                    3.9.16               h7a1cb2a_2  
    python-dateutil           2.8.2                    pypi_0    pypi
    python-json-logger        2.0.7                    pypi_0    pypi
    pytz                      2022.7.1                 pypi_0    pypi
    pyyaml                    6.0                      pypi_0    pypi
    pyzmq                     25.0.1                   pypi_0    pypi
    qtconsole                 5.4.1                    pypi_0    pypi
    qtpy                      2.3.0                    pypi_0    pypi
    readline                  8.2                  h5eee18b_0  
    regex                     2022.10.31               pypi_0    pypi
    requests                  2.28.2                   pypi_0    pypi
    responses                 0.18.0                   pypi_0    pypi
    rfc3339-validator         0.1.4                    pypi_0    pypi
    rfc3986-validator         0.1.1                    pypi_0    pypi
    send2trash                1.8.0                    pypi_0    pypi
    sentencepiece             0.1.97                   pypi_0    pypi
    setuptools                65.6.3           py39h06a4308_0  
    six                       1.16.0                   pypi_0    pypi
    sniffio                   1.3.0                    pypi_0    pypi
    soupsieve                 2.4                      pypi_0    pypi
    sqlite                    3.41.1               h5eee18b_0  
    sqlparse                  0.4.3                    pypi_0    pypi
    stack-data                0.6.2                    pypi_0    pypi
    sympy                     1.11.1                   pypi_0    pypi
    terminado                 0.17.1                   pypi_0    pypi
    tinycss2                  1.2.1                    pypi_0    pypi
    tk                        8.6.12               h1ccaba5_0  
    tokenizers                0.13.2                   pypi_0    pypi
    torch                     2.0.0                    pypi_0    pypi
    tornado                   6.2                      pypi_0    pypi
    tqdm                      4.65.0                   pypi_0    pypi
    traitlets                 5.9.0                    pypi_0    pypi
    transformers              4.27.0.dev0              pypi_0    pypi
    triton                    2.0.0                    pypi_0    pypi
    typing-extensions         4.5.0                    pypi_0    pypi
    tzdata                    2022g                h04d1e81_0  
    uri-template              1.2.0                    pypi_0    pypi
    urllib3                   1.26.15                  pypi_0    pypi
    wcwidth                   0.2.6                    pypi_0    pypi
    webcolors                 1.12                     pypi_0    pypi
    webencodings              0.5.1                    pypi_0    pypi
    websocket-client          1.5.1                    pypi_0    pypi
    wheel                     0.38.4           py39h06a4308_0  
    widgetsnbextension        4.0.5                    pypi_0    pypi
    xxhash                    3.2.0                    pypi_0    pypi
    xz                        5.2.10               h5eee18b_1  
    yarl                      1.8.2                    pypi_0    pypi
    zipp                      3.15.0                   pypi_0    pypi
    zlib                      1.2.13               h5eee18b_0

3.pip list

    Package                  Version
    ------------------------ -----------
    accelerate               0.17.1
    aiohttp                  3.8.4
    aiosignal                1.3.1
    anyio                    3.6.2
    argon2-cffi              21.3.0
    argon2-cffi-bindings     21.2.0
    arrow                    1.2.3
    asgiref                  3.6.0
    asttokens                2.2.1
    async-timeout            4.0.2
    attrs                    22.2.0
    backcall                 0.2.0
    beautifulsoup4           4.11.2
    bitsandbytes             0.37.1
    bleach                   6.0.0
    certifi                  2022.12.7
    cffi                     1.15.1
    charset-normalizer       3.1.0
    cmake                    3.26.0
    comm                     0.1.2
    datasets                 2.10.1
    debugpy                  1.6.6
    decorator                5.1.1
    defusedxml               0.7.1
    dill                     0.3.6
    Django                   4.1.7
    executing                1.2.0
    fastjsonschema           2.16.3
    filelock                 3.10.0
    fqdn                     1.5.1
    frozenlist               1.3.3
    fsspec                   2023.3.0
    huggingface-hub          0.13.2
    idna                     3.4
    importlib-metadata       6.0.0
    ipykernel                6.21.3
    ipython                  8.11.0
    ipython-genutils         0.2.0
    ipywidgets               8.0.4
    isoduration              20.11.0
    jedi                     0.18.2
    Jinja2                   3.1.2
    jsonpointer              2.3
    jsonschema               4.17.3
    jupyter                  1.0.0
    jupyter_client           8.0.3
    jupyter-console          6.6.3
    jupyter_core             5.2.0
    jupyter-events           0.6.3
    jupyter_server           2.4.0
    jupyter_server_terminals 0.4.4
    jupyterlab-pygments      0.2.2
    jupyterlab-widgets       3.0.5
    lit                      15.0.7
    loralib                  0.1.1
    MarkupSafe               2.1.2
    matplotlib-inline        0.1.6
    mistune                  2.0.5
    mpmath                   1.3.0
    multidict                6.0.4
    multiprocess             0.70.14
    nbclassic                0.5.3
    nbclient                 0.7.2
    nbconvert                7.2.10
    nbformat                 5.7.3
    nest-asyncio             1.5.6
    networkx                 3.0
    notebook                 6.5.3
    notebook_shim            0.2.2
    numpy                    1.24.2
    nvidia-cublas-cu11       11.10.3.66
    nvidia-cuda-cupti-cu11   11.7.101
    nvidia-cuda-nvrtc-cu11   11.7.99
    nvidia-cuda-runtime-cu11 11.7.99
    nvidia-cudnn-cu11        8.5.0.96
    nvidia-cufft-cu11        10.9.0.58
    nvidia-curand-cu11       10.2.10.91
    nvidia-cusolver-cu11     11.4.0.1
    nvidia-cusparse-cu11     11.7.4.91
    nvidia-nccl-cu11         2.14.3
    nvidia-nvtx-cu11         11.7.91
    packaging                23.0
    pandas                   1.5.3
    pandocfilters            1.5.0
    parso                    0.8.3
    peft                     0.3.0.dev0
    pexpect                  4.8.0
    pickleshare              0.7.5
    pip                      23.0.1
    platformdirs             3.1.1
    prometheus-client        0.16.0
    prompt-toolkit           3.0.38
    psutil                   5.9.4
    ptyprocess               0.7.0
    pure-eval                0.2.2
    pyarrow                  11.0.0
    pycparser                2.21
    Pygments                 2.14.0
    pyrsistent               0.19.3
    python-dateutil          2.8.2
    python-json-logger       2.0.7
    pytz                     2022.7.1
    PyYAML                   6.0
    pyzmq                    25.0.1
    qtconsole                5.4.1
    QtPy                     2.3.0
    regex                    2022.10.31
    requests                 2.28.2
    responses                0.18.0
    rfc3339-validator        0.1.4
    rfc3986-validator        0.1.1
    Send2Trash               1.8.0
    sentencepiece            0.1.97
    setuptools               65.6.3
    six                      1.16.0
    sniffio                  1.3.0
    soupsieve                2.4
    sqlparse                 0.4.3
    stack-data               0.6.2
    sympy                    1.11.1
    terminado                0.17.1
    tinycss2                 1.2.1
    tokenizers               0.13.2
    torch                    2.0.0
    tornado                  6.2
    tqdm                     4.65.0
    traitlets                5.9.0
    transformers             4.27.0.dev0
    triton                   2.0.0
    typing_extensions        4.5.0
    uri-template             1.2.0
    urllib3                  1.26.15
    wcwidth                  0.2.6
    webcolors                1.12
    webencodings             0.5.1
    websocket-client         1.5.1
    wheel                    0.38.4
    widgetsnbextension       4.0.5
    xxhash                   3.2.0
    yarl                     1.8.2
    zipp                     3.15.0

Some problems in the demonstration example

When I try :

Instruction: Tell me about the president of Mexico in 2019.

I get

Response: The president of Mexico in 2019 was Andrés Manuel López Obrador, who took office on December 1st, 2018. He is a member of the National Regeneration Movement (MORENA) political party and is the first left-wing president of Mexico since 1946. He is known for his anti-corruption and anti-neolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioliolioli

Instead of

The president of Mexico in 2019 was Andrés Manuel López Obrador, who took office on December 1, 2018. He is a member of the National Regeneration Movement (MORENA) political party and is the first left-wing president of Mexico since 1946. He is known for his anti-corruption and anti-neoliberal policies, as well as his commitment to improving the living conditions of the Mexican people.

I just add device_map={'': 0} follow the #14 (comment)