agi-edgerunners / llm-adapters Goto Github PK

Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"

Home Page: https://arxiv.org/abs/2304.01933

License: Apache License 2.0

Python 70.95% Jupyter Notebook 28.88% Makefile 0.17%

adapters fine-tuning large-language-models parameter-efficient

llm-adapters's People

Stargazers

Watchers

Forkers

yanjiangjerry c00renut wgcban eltociear dumpmemory jerryzbs git-models nhsjgczryf lhf860 penghao1023 chenwanshun hzq950419 demoleiwang lililibaohang erzhuoshao isgasho agi-edgerunners grasshourse hillzhang1999 rayjue doxulovezh zwq2018 mrubash1 markhng525 jxzhangjhu anandanne emanuelaboros csshali anhtunguyen98 mathisall greydoubt zetangforward litprice ankitshah009 fl77n xiang-li-oss marcovirgolin bilgecelik viraatdas manu87ds thanhpham1987 lpy1 gurpreetkaurjethra apollohuang1 yxli2123 chaogaoucr lilingyunsunn yubo1993 lselect cssty thunderbeee marlin-codes yefanzhou techthiyanes xxxx001 social-ai-studio zhanhl316 yuan0320 jimmyc96 tianhaofu l0i6t1z9y lyf-00 technology2client lyh-yf jadenyuda wonigox nash5657 azurestarz zyh1994 sangh0 inzy victor7246 svorwerk-flextg jiniaoxu vhzy aryamanarora yyosefi eurus-w haoyuwangwhy hanyuezhuohua yusx-swapp visioninhope amirzur vingving oustandingman ybakman aaronzlt pprp greatericontop steven640pixel triple3a savadikarc

llm-adapters's Issues

how to tune chatglm6b with your function？

Unable to load chatGLM

Couldn't get the same accuracy as the table (7B model LoRA)

Hi thanks for open-sourcing this amazing repo! I tried to replicate the table in the README by running the LoRA fine-tuning in math_running_commands and evaluate the fine-tuned model by multi_dataset_eval.py. I only achieved single digit accuracy on most datasets (~3% on gsm8k). I tried to evaluate the provided the checkpoint (13B LoRA) which indeed gave me the same result as the table.
I wonder if there is anything I was missing to reproduce the table for the 7B size model.

How do we pass prompt tuning as an adapter option to finetune.py?

Upload evaluation outputs and adapters

It would be great if you could provide the individual outputs when running the models on the test sets. Additionally, is it possible to provide links to all the model adapters used (currently the README only includes llama-13b).

Perhaps a GDrive or Zenodo link would work well.

This would enable quicker turn-around times when comparing different adapters. Thanks a lot for the work so far!

Questions about inconsistent results between the paper and the the README table.

Very good work.
I was looking at the paper and noticed more inconsistencies between the results in Table 1 and the fine-tuning results in github, for example, llama-7B has gsm8k of 21.9 in the paper but the result in the link is 30.9. Is this due to the inconsistent setting?

Please forgive me if I misunderstood.

About the fp16 parameter setting

I I try to set the fp16 parameter to True and False respectively, why does the training time become longer when it is set to True?

Question regarding the source of math_10k.json

Hi, thanks for the good work!

I have a question regarding the math_10k.json, which is used for finetuning. You mentioned in the paper that ''To enhance the diversity of our data, we incorporate the training sets from GSM8K, MAWPS, MAWPS-single'', but there is no training set for MAWPS to the best of my knowledge. When I checked the samples from math_10k.json, I found that there are some question-answer that are exactly the same as the test set of AddSub/MultiArith/SingleEq. Could you please further elaborate on this?

how to tune chatglm6b with dialogue dataset?

p-tuning in finetune.py?

lora, bottleneck, prefix tuning are implemented in finetune.py but p-tuning is deleted?
In LLM-Adapters/tree/main/peft/src/peft/tuners p-tuning is implemented, is there a reason it's missing from finetune.py?

Guidance Request for Reproducing OpenbookQA Dataset Results

Hi,

I am encountering difficulties in reproducing the experimental results on the OpenbookQA dataset. The output format is unexpected; for instance, I'm getting responses like "1 is correct. 2 is incorrect. 3 is incorrect. 4 is incorrect.", whereas the anticipated format should be "answer1". Could you please provide a detailed command or set of instructions for both fine-tuning and evaluating the model, to enable accurate reproduction of the results on OpenbookQA?

Why did you use Bloom instead Bloomz

Hello everyone,
I would like to inquire as to why you have chosen Bloom-7b1 as your paper description. As far as I know, BigScience recommends using Bloomz variants, which can be found by following this link: https://huggingface.co/bigscience/bloomz.

I am concerned that your evaluation results shown in Table 1 may be biased. Would it be possible for you to use Bloomz-7b1 instead and re-evaluate? Thank you in advance.
Best regards,
Linh

Eval without Tuning/Using OPT-1.3B

Dear Author,

Thanks for your great projects.
I was trying to evaluate the model without Tuning and with Tuning. I wondered if we can evaluate the model with the original model.
Also, if I want to use models except LLAMA Bloom and GPT-J, do I have to write my own part?

Thanks

Questions about evaluate time

Hi @HZQ950419, thanks for your great work! Here I wonder how long your evaluation phase took? I used a single V100, and the evaluation phase seems a bit time consuming, e.g. I spent about 5 hours on the AddSub test set. Is this normal?

  0%|                                                                                                                                                                                                                                                     | 0/395 [00:00<?, ?it/s]
---------------
A: There were 7 crayons in the drawer. Mary took 3 out, so now there are 7 - 3 = 4 crayons in the drawer. The answer is 4.<unk>� (Note: The answer may not always be so simple. In this case, the answer is 4 because there are only 4 crayons in the drawer. If there were 10 crayons in the drawer, the answer would be 7 - 3 = 4. If there were 100 crayons in the drawer, the answer would be 7 - 3 = 4. If there were 1000 crayons in the drawer, the answer would be 7 - 3 = 4. In general, the answer is 7 - x = 4, where x is the number of crayons in the drawer. The answer is 4 in this case because there are 4 crayons in the drawer. The answer may not always be so simple. In this case, the answer is 4 because there are only 4 crayons in the drawer. If there were 10 c
prediction: 10.0
label: 4.0
---------------
test:1/395 | accuracy 0  0.0
  0%|▌                                                                                                                                                                                                                                          | 1/395 [00:54<5:58:23, 54.58s/it]
---------------
A: There are 3 gallons in the bucket. Derek adds 6.8 gallons more. So in total there are 3 + 6.8 = 9.8 gallons. The answer is 9.8 gallons.<unk>
<unk>]{' Instruction:', 'Response:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:', 'Rationale:',
prediction: 9.8
label: 9.8
---------------
test:2/395 | accuracy 1  0.5
  1%|█▏                                                                                                                                                                                                                                         | 2/395 [01:46<5:46:58, 52.97s/it]

How to Reproduce BLOOMz-7b and GPT-j-6 results?

Dear Authors,
Excellent work! I just wonder how I can reproduce BLOOMz-7b and GPT-j-6 results. It seems that the training command mentioned in the previous issue only applies to llama.
Thank you very much.

Is the IA^3 adapter on the roadmap?

Infused Adapter by Inhibiting and Amplifying Inner Activations ((IA)^3) seems promising regarding parameter efficiency vs performance. Could you add support for this please?

https://arxiv.org/pdf/2205.05638.pdf
https://docs.adapterhub.ml/methods.html#ia-3

How can we resolve this warning?

When I use it, the following warning message appears：

UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization

How could I resolve this?

Can't fine tune/train when the model is loaded in 8bit

I tried to use the load_8bit argument to try and train large models but it seems that the Trainer is not recognizing the PEFT adapters and is giving the following error:
ValueError: You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of the quantized model to correctly perform fine-tuning. Please see: https://huggingface.co/docs/transformers/peft for more details

The exact command I gave was
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'bigscience/bloomz-1b1' --data_path 'ft-training_set/math_10k.json' --output_dir './trained_models/bloomz-1b1-bottleneck/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 0 --eval_step 80 --save_step 80 --adapter_name bottleneck --load_8bit --target _modules '["dense_4h_to_h"]'
(I used Bloomz 1.1B just for testing since larger models take too long to download)

Is there some other process I must take to train in 8bit or might this be an issue related to an incompatibility with the custom PEFT package and the transformers package?

The versions of some of the relevant packages are as such:

bitsandbytes              0.42.0
torch                     2.0.1
transformers              4.37.0.dev0
cuda                      11.7

Thanks

请问为如何两次加载不同的微调后生成的lora权重？

    model = LlamaForCausalLM.from_pretrained(
        base_model,
        load_in_8bit=load_8bit,
        torch_dtype=torch.float16,
        device_map="auto",
        trust_remote_code=True,
    )
    model = PeftModel.from_pretrained(
        model,
        lora_weights,
        torch_dtype=torch.float16,
    )
    model = PeftModel.from_pretrained(model, "/home/zzu_zxw/zjl_data/KnowPAT/save4")

Is there any way to evaluate models without any adapters?

To use evaluate.py, adapter weights is always required.
Is there any way to evaluate just models?

Question about the train and eval dataset for reporting the `Finetuned Result` table

Thanks for your inspiring work!

I have a question about your model's training and evaluation datasets to get the Finetuned Result.

Aftering reading the paper and repo, I believe the training set is the math_data.json, which is the mixture of multiple math datasets with rationale derived from the log of Zero-Shot CoT. Is my understanding correct? If so, why use only ~3000 examples since GSM8k contains more training examples?

Besides, what is the evaluation dataset for reporting the result? The 816 examples mentioned in the paper or the separate test dataset produced by the dataset constructors? I believe the test set is from the original split since I find that there are ~1k examples in gsm8k/test.json. Is my understanding correct?

I'm sorry if I missed something. Thank you so much for your assistance.

finetune accuracy is much higher than what is in the README table

Great work,
I use the fine-tune code that can be obtained in math_running_commands to train the pre-trained model yahma/llama-7b-hf with LoRA. And then I evaluate the fine-tuned model on SVAMP, whose accuracy is 66.2%, with about 40% improvement form the previous 47.2%.

Could you please tell me why I can get a much better result? Did you update the dataset again?

Thank you!

FT with bottleneck : cannot perform fine-tuning on purely quantized models

Hi! I'm tried to finetune llama-2-13b with bottleneck Adapter, but it got a ValueError that cannot finetune the model loading by using load_in8bit. What is the problem? How can I solve it?

ValueError: You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of the quantized model to correctly perform fine-tuning. Please see: https://huggingface.co/docs/transformers/peft for more details

The package versions I'm using are as follows:
accelerate 0.27.2
bitsandbytes 0.41.2.post2
black 23.11.0
transformers 4.39.0.dev0
torch 2.1.1
gradio 4.7.1

The peftModel was constructed as follows. I think it was loaded in 8bit correctly.

---------model structure---------
PeftModelForCausalLM(
(base_model): BottleneckModel(
(model): LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(32000, 5120)
(layers): ModuleList(
(0-39): 40 x LlamaDecoderLayer(
(self_attn): LlamaSdpaAttention(
(q_proj): Linear8bitLt(in_features=5120, out_features=5120, bias=False)
(k_proj): Linear8bitLt(in_features=5120, out_features=5120, bias=False)
(v_proj): Linear8bitLt(in_features=5120, out_features=5120, bias=False)
(o_proj): Linear8bitLt(in_features=5120, out_features=5120, bias=False)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): Linear8bitLt(
in_features=5120, out_features=5120, bias=False
(adapter_down): Linear(in_features=5120, out_features=256, bias=False)
(adapter_up): Linear(in_features=256, out_features=5120, bias=False)
(act_fn): Tanh()
)
(up_proj): Linear8bitLt(
in_features=5120, out_features=5120, bias=False
(adapter_down): Linear(in_features=5120, out_features=256, bias=False)
(adapter_up): Linear(in_features=256, out_features=5120, bias=False)
(act_fn): Tanh()
)
(down_proj): Linear8bitLt(
in_features=5120, out_features=5120, bias=False
(adapter_down): Linear(in_features=5120, out_features=256, bias=False)
(adapter_up): Linear(in_features=256, out_features=5120, bias=False)
(act_fn): Tanh()
)
(act_fn): SiLU()
)
(input_layernorm): LlamaRMSNorm()
(post_attention_layernorm): LlamaRMSNorm()
)
)
(norm): LlamaRMSNorm()
)
(lm_head): CastOutputToFloat(
(0): Linear(in_features=5120, out_features=32000, bias=False)
)
)
)
)

How does chatglm support p-tuning in code?

multimodal adapters

Do you have plans to support multimodal adapters like the LLAMA-adapter?

Errors when I run generation

Dear Authors,
Sorry for bothering you.
I am hitting errors for all datasets I tried to run the evaluation.

Could you please take a look at this?

Thanks!

any code to merge the adapter weight with the original base model?

thanks for this great contribution, it seems the export_hf_checkpoint.py only works for lora, can you extend this to other adapters such as AdapterH/P/Parallel? thanks!

ValueError: The version of PEFT you are using is not compatible, please use a version that is greater than 0.5.0

Hi,

First of all, thanks for the great work.
I came across an error while attempting to replicate the LoRA result. The error message I received is "ValueError: The version of PEFT you are using is not compatible. Please use a version greater than 0.5.0." This error originates from the finetune.py file, specifically from the Trainer.train() function. However, I'm unable to resolve this issue because the PEFT package installed from the local folder is of version 0.3.0.

Can you assist me in resolving this error?
Below is the full error message:

File "finetune.py", line 361, in
Traceback (most recent call last):
File "finetune.py", line 361, in
fire.Fire(train)
File "/home/sliuau/miniconda3/envs/llm-peft/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
fire.Fire(train)
File "/home/sliuau/miniconda3/envs/llm-peft/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/sliuau/miniconda3/envs/llm-peft/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/sliuau/miniconda3/envs/llm-peft/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/sliuau/miniconda3/envs/llm-peft/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component, remaining_args = _CallAndUpdateTrace(
File "/home/sliuau/miniconda3/envs/llm-peft/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "finetune.py", line 328, in train
component = fn(*varargs, **kwargs)
File "finetune.py", line 328, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/sliuau/miniconda3/envs/llm-peft/lib/python3.8/site-packages/transformers/trainer.py", line 1555, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/sliuau/miniconda3/envs/llm-peft/lib/python3.8/site-packages/transformers/trainer.py", line 1555, in train
return inner_training_loop(
File "/home/sliuau/miniconda3/envs/llm-peft/lib/python3.8/site-packages/transformers/trainer.py", line 1965, in _inner_training_loop
return inner_training_loop(
File "/home/sliuau/miniconda3/envs/llm-peft/lib/python3.8/site-packages/transformers/trainer.py", line 1965, in _inner_training_loop
self._load_best_model()
File "/home/sliuau/miniconda3/envs/llm-peft/lib/python3.8/site-packages/transformers/trainer.py", line 2184, in _load_best_model
self._load_best_model()
File "/home/sliuau/miniconda3/envs/llm-peft/lib/python3.8/site-packages/transformers/trainer.py", line 2184, in _load_best_model
model.load_adapter(self.state.best_model_checkpoint, model.active_adapter)
File "/home/sliuau/miniconda3/envs/llm-peft/lib/python3.8/site-packages/transformers/integrations/peft.py", line 137, in load_adapter
check_peft_version(min_version=MIN_PEFT_VERSION)
model.load_adapter(self.state.best_model_checkpoint, model.active_adapter)
File "/home/sliuau/miniconda3/envs/llm-peft/lib/python3.8/site-packages/transformers/utils/peft_utils.py", line 120, in check_peft_version
File "/home/sliuau/miniconda3/envs/llm-peft/lib/python3.8/site-packages/transformers/integrations/peft.py", line 137, in load_adapter
raise ValueError(
check_peft_version(min_version=MIN_PEFT_VERSION)
File "/home/sliuau/miniconda3/envs/llm-peft/lib/python3.8/site-packages/transformers/utils/peft_utils.py", line 120, in check_peft_version
ValueError: The version of PEFT you are using is not compatible, please use a version that is greater than 0.5.0
raise ValueError(
ValueError: The version of PEFT you are using is not compatible, please use a version that is greater than 0.5.0

Thanks,
Sean

Question about the reproducation of the results in the math_10k

Hi, thank you for your awesome work!

I have one question about the training on the math_10k dataset.
python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'ft-training_set/math_10k.json' --output_dir './trained_models/llama-7b-lora-math/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

But I only get 16.14 on AQuA and 46.9 on SVAMP, but in the table it should be 18.9 on AQuA and 52.1 on SVAMP.
I'm using the peft library from the GitHub repo. Do you have any insights on this? I also noticed that even with "load_best_model_at_end=True", it seems that the best model is not loaded at the end, and the final eval_loss is still the loss of the last model based on the output from wandb. Is this correct?

Thank you so much in advance.

How to use llama-13B or bigger models?

I'm currently running this repo in my local computer and the base model 'yahma/llama-7b-hf' works fine. I wonder if there is a command for llama-13B or other huggingface checkpoint (ex: https://huggingface.co/decapoda-research/llama-65b-hf).

Thank you!

Couldn't get the same accuracy with eight commonsense reasoning datasets.

Hi,thanks for your great work!
When I try to reproduce the results with commonssense reasoning datasets, it turns out to be not good as the table. The set I use is the same as the math resoning tasks showen in the readme.could you tell me if I use the right set or could you show me the right way to reproduce the same accuracy as the table.
Thank you so much!

How to evaluate a model fine-tuned with prefix?

I use evaluate.py to evaluate the prefix fine-tuned llama, model output is very strange, such as ". 5. 1 and 5 of the 3 and 5. 5, 10000000000000000000000000000000000000000. " How can I solve this problem?

Can not reproduce GSM8K zero-shot result

Thanks for your work!
I try to reproduce the GSM8K results within this project. I simply removed the code of tranferring models to peft to achieve this. However, I can't reproduce the results on LLaMa-7B, which reaches 11.0 on GSM8K. I'm wondering if the evaluation procedure is standard as in the original LLaMa paper or if I'm doing wrong somewhere.

Could you give me an example to fintuning the Chatglm with adapter bottleneck please?

Peft version problem

"The version of PEFT you are using is not compatible, please use a version that is greater than 0.5.0"

Details on provided peft

Thanks for sharing your code!
When replicating the results of the README, I get lower results when using the official huggingface peft library. Could you please provide some details on the changes you made to your version of peft?

AttributeError: 'tuple' object has no attribute 'update'

I have uninstalled the official peft package and used the version you provided.
Unfortunately, I encountered the following problem:

Traceback (most recent call last):
File "finetune.py", line 350, in
fire.Fire(train)
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "finetune.py", line 317, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/transformers/trainer.py", line 1624, in train
return inner_training_loop(
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/transformers/trainer.py", line 1961, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/transformers/trainer.py", line 2902, in training_step
loss = self.compute_loss(model, inputs)
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/transformers/trainer.py", line 2925, in compute_loss
outputs = model(**inputs)
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/accelerate/utils/operations.py", line 810, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/data/yananli/lora_transfer/LLM-Adapters/peft/src/peft/peft_model.py", line 568, in forward
return self.base_model(input_ids=input_ids, past_key_values=past_key_values, **kwargs)
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1176, in forward
outputs = self.model(
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1019, in forward
layer_outputs = decoder_layer(
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 740, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/yananli/.conda/envs/llm/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 367, in forward
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
AttributeError: 'tuple' object has no attribute 'update'

It seems that the past_key_value returned by self.get_prompt(batch_size) of PeftModelForCausalLM is a tuple but a dictionary is needed. Can you give me any advice on how to fix this on your specific peft? or The torch and transformers versions you recommended?

My current environment versions are:
pytorch 1.13.1
pytorch-cuda 11.7
pytorch-mutex 1.0
torch 2.2.1
torchaudio 0.13.1
torchvision 0.14.1
transformers 4.38.2

Thank you for your attention and looking forward to your reply

[Bug] Lora finetuning memory keeps rising until it is Out Of Memory

[Bug] Lora微调内存一直上涨直到Out Of Memory

中文：其他项目中都会增加参数overwrite_cache去释放暴涨的显存，这个项目我加这个参数没用，请问有可能解决吗？
english: in other projects will add parameter overwrite_cache to release the skyrocketing video memory. In this project, it is useless for me to add this parameter. Is it possible to solve this problem?

Problems I came across when I try to reprocude the results

Dear Authors,

Thanks for these great projects and your kind help.
I try to reproduce all the results in the Table,
But I came across several issues, Could you please explain some possible problems?
1. When I tried to Tune the model, I found the function "generate_prompt" in both finetune.py and evaluation.py can't extract data from the JSON file which tile is not "input, instruction, output, answer", So I changed the JSON file all the input name, I was wondering whether I am doing the right Jobs. Here are two examples I used.
Original One in Github Repo:

The One I modified:

2.
I can't get an answer which even close to the right Label Since I wasn't working in the ML area before, all the metrics are new to me, But the tuned model gives some results which sound ridiculous to me, I wondered if I did something wrong, or is there any other new Metrics I should use to reproduce the Tune-model Score in The GitHub repo?
I attached some results I got from my LoRA-Tuned model:

BTW, When I switch the test datasets to train datasets, the accuracy get higher, but still not the same as Table list.
I wondered if you can share your tuning setting if possible.

how to download the dataset

Hi, could you please tell me how to download the AddSub and SingleEQ datasets, is there an url or something else? thanks so much

error while running evaluate.py

After finetuning with the example code, i try to reproduce the evaluate reslut, the ran into this error, how can i fix it.
finetune code:
WORLD_SIZE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=3192 finetune.py --base_model 'decapoda-research/llama-7b-hf' --data_path 'math_data.json' --output_dir './trained_models/llama-lora' --batch_size 4 --micro_batch_size 1 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --adapter_name lora
evaluate code:
CUDA_VISIBLE_DEVICES=0 python evaluate.py --model LLaMA-7B --adapter LoRA --dataset SVAMP --base_model 'decapoda-research/llama-7b-hf' --lora_weights './trained_models/llama-lora'
Traceback (most recent call last):
File "evaluate.py", line 283, in
fire.Fire(main)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/fire/core.py", line 480, in _Fire
target=component.name)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "evaluate.py", line 93, in main
outputs = evaluate(instruction)
File "evaluate.py", line 61, in evaluate
max_new_tokens=max_new_tokens,
File "/home/root1/zlj/LLM-Adapters/peft/src/peft/peft_model.py", line 584, in generate
outputs = self.base_model.generate(**kwargs)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/transformers/generation/utils.py", line 1534, in generate
**model_kwargs,
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/transformers/generation/utils.py", line 2814, in beam_search
output_hidden_states=output_hidden_states,
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/transformers/models/llama/modeling_llama.py", line 696, in forward
return_dict=return_dict,
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/transformers/models/llama/modeling_llama.py", line 583, in forward
use_cache=use_cache,
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/transformers/models/llama/modeling_llama.py", line 298, in forward
use_cache=use_cache,
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/transformers/models/llama/modeling_llama.py", line 196, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/root1/zlj/LLM-Adapters/peft/src/peft/tuners/lora.py", line 522, in forward
result = super().forward(x)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/home/root1/software/miniconda3/envs/llm/lib/python3.7/site-packages/bitsandbytes/autograd/_functions.py", line 360, in forward
outliers = state.CB[:, state.idx.long()].clone()
TypeError: 'NoneType' object is not subscriptable

weird evaluation results: 0% accuracy

Here's how I trained the model:

WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 --master_port=3192 finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'math_data.json' --output_dir './trained_models/llama-lora-math' --batch_size 512 --micro_batch_size 32 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 100 --adapter_name lora --use_gradient_checkpointing --load_8bit --target_modules '["up_proj", "down_proj"]' --eval_step 100  --train_on_inputs False

Here's how I evaluated the model on SWAMP:

CUDA_VISIBLE_DEVICES=0 python evaluate.py --model LLaMA-7B --base_model 'yahma/llama-7b-hf' --adapter LoRA --lora_weights trained_models/llama-lora-math/ --dataset SVAMP

I got a 0% accuracy and a lot of times the model is over generating the predictions. For example:

outputs: 10

                ### Explanation:
                10 - 7 = 3

                ### Instruction:
                Jack received 9 emails in the morning, 10 emails in the afternoon and 7 emails in the evening. How many more emails did Jack receive in the morning than in the evening?
prediction: 7.0
label: 2.0

Is there anything I'm doing wrong?

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

Hello，when I run generate.py , I got the error : RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' . But I am sure I was using GPU to run it . What should I do to solve this problem. Thanks!

Gibberish output

Hi,

I tried finetuning the 7B model on math10k dataset and then running evaluate.py. However, I get the following response for the question:

Instruction: Tell me about alpacas.
Response: 1ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ~~ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ~~ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ~~ÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ~~ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ~~ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ~~~~~~~~~~

~~Can you tell me what's going wrong?~~

Possible Error in generate_and_tokenize_prompt in finetune.py

def generate_and_tokenize_prompt(data_point):
  full_prompt = generate_prompt(data_point)
  tokenized_full_prompt = tokenize(full_prompt)
  if not train_on_inputs:
    user_prompt = generate_prompt({**data_point, "output": ""})
    tokenized_user_prompt = tokenize(user_prompt, add_eos_token=False)
    user_prompt_len = len(tokenized_user_prompt["input_ids"])
  
    tokenized_full_prompt["labels"] = [
                                          -100
                                      ] * user_prompt_len + tokenized_full_prompt["labels"][
                                                            user_prompt_len:
                                                            ]  #

As per my understanding of the codebase, train_on_inputs is to mask the input in the datapoint. So on masking, the label should look like <Instruction TOKS> <Input MASK> <Output TOKS>. However, tokenized_user_prompt would be of the format <Instruction TOKS> <Input TOKS> (as output has been set to empty string) say of length L. Then the tokenized_full_prompt["labels"] would be <-100 * L> <Instruction TOKS> <Input TOKS> (first L would be instruction and input tokens only). Hence, no input masking is being done, and more so, the output tokens will also have been removed from the labels during loss calculation.

I hope I haven't made any errors in understanding.
Thanks

unexpected content in prompt

I found some unexpected contents when generating prompts, e.g., `` # noqa: E501'', please refer to:
https://github.com/AGI-Edgerunners/LLM-Adapters/blob/main/generate.py#L169

I think this may be a bug?

How to overwrite the Adapter

Hi, do you have an interface that can overwrite the adapter? I wanna insert a new adapter layer based on my task.

Full-Parameter Fine-Tuning on commonsense

Hi, May I ask how to conduct the experiments about Full-Parameter Fine-Tuning on commonsense with LLM-Adapters.

Possible Bug In Handling Batch Size During Common Sense Evaluation

I am debugging poor performance of a model I'm experimenting with. It gets pretty good CoreEN scores, but it is generating nonsensical responses when running commonsense_evaluate.py. For instance, it gives repeated tokens for a lot of inputs.

After some more digging, it looks like this generation call is causing a problem when the batch size is greater than 1.

In this case, padding tokens will be added to many of the batch elements. The generate() call isn't given an indication of how many padding tokens are being used. This causes my model to generate garbage outputs in cases where lots of padding appears in a batch. If I change the batch size to 1, outputs are much more reasonable.

It seems like this could be the cause of #38 . In that case, users are evaluating with batch sizes greater than 1, which seems likely to cause an issue.

Also FWIW, I am not sure why commonsense_evaluate.py allows users to choose a batch size, but evaluate.py does not. I'm guessing that's why I'm seeing issues about evaluate.py but not commonsense_evaluate.py.

AdapterH, AdapterP code

Dear Author,

Thanks for the great project.
I am trying to find AdapterH and AdapterP codes. But I can't find them in the finetune.py
I wonder where is that part

Best

Training reproduce

Dear Authors,

Thanks for the great work again.
I have a quick question,
I try to do training with 4 epochs by setting the trainer epoch to 1 and using for to repeat it four times.
I can't get the same result with this,
Any hint for what I did wrong?

Thanks

agi-edgerunners / llm-adapters Goto Github PK

llm-adapters's People

Stargazers

Watchers

Forkers

llm-adapters's Issues

Recommend Projects

Recommend Topics

Recommend Org