Comments (3)
@muliyangm Take a look at the discussion here. This might fix it.
from llm-adapters.
Hi,
For prefix, the performance is not good as other adapters based on our experiment results. And you can use the following command for fine-tuning may improve the performance.
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-13b-hf' --data_path 'math_10k.json' --output_dir './trained_models/llama-13b-prefix-math-vt10/' --batch_size 8 --micro_batch_size 4 --num_epochs 5 --learning_rate 3e-2 --cutoff_len 256 --val_set_size 120 --eval_step 10 --save_step 10 --adapter_name prefix-tuning --num_virtual_tokens 10 --load_8bit --use_gradient_checkpointing
from llm-adapters.
Hi, For prefix, the performance is not good as other adapters based on our experiment results. And you can use the following command for fine-tuning may improve the performance.
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-13b-hf' --data_path 'math_10k.json' --output_dir './trained_models/llama-13b-prefix-math-vt10/' --batch_size 8 --micro_batch_size 4 --num_epochs 5 --learning_rate 3e-2 --cutoff_len 256 --val_set_size 120 --eval_step 10 --save_step 10 --adapter_name prefix-tuning --num_virtual_tokens 10 --load_8bit --use_gradient_checkpointing
Hi,
I just used the above line for prefix tuning (only changed 'yahma/llama-13b-hf' to 'yahma/llama-7b-hf', and removed "--load_8bit"), but got the following error, may I know how to resolve it?
Traceback (most recent call last):
File "/home/xxx/repo/llm/LLM-Adapters/finetune.py", line 347, in <module>
fire.Fire(train)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/xxx/repo/llm/LLM-Adapters/finetune.py", line 314, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/trainer.py", line 1542, in train
return inner_training_loop(
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/trainer.py", line 1872, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/trainer.py", line 2773, in training_step
loss = self.compute_loss(model, inputs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/trainer.py", line 2796, in compute_loss
outputs = model(**inputs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/accelerate/utils/operations.py", line 687, in forward
return model_forward(*args, **kwargs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/accelerate/utils/operations.py", line 675, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/xxx/repo/llm/LLM-Adapters/peft/src/peft/peft_model.py", line 568, in forward
return self.base_model(input_ids=input_ids, past_key_values=past_key_values, **kwargs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1183, in forward
outputs = self.model(
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1035, in forward
attention_mask = _prepare_4d_causal_attention_mask_for_sdpa(
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/modeling_attn_mask_utils.py", line 398, in _prepare_4d_causal_attention_mask_for_sdpa
expanded_4d_mask = attn_mask_converter.to_4d(
File "/home/xxx/tools/anaconda3/envs/llm/lib/python3.9/site-packages/transformers/modeling_attn_mask_utils.py", line 137, in to_4d
expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
RuntimeError: The size of tensor a (266) must match the size of tensor b (256) at non-singleton dimension 3
from llm-adapters.
Related Issues (20)
- ValueError: The version of PEFT you are using is not compatible, please use a version that is greater than 0.5.0 HOT 6
- weird evaluation results: 0% accuracy HOT 1
- Guidance Request for Reproducing OpenbookQA Dataset Results HOT 1
- How to Reproduce BLOOMz-7b and GPT-j-6 results? HOT 1
- How does chatglm support p-tuning in code?
- 请问为如何两次加载不同的微调后生成的lora权重? HOT 1
- Can't fine tune/train when the model is loaded in 8bit HOT 7
- p-tuning in finetune.py?
- FT with bottleneck : cannot perform fine-tuning on purely quantized models HOT 2
- Question about the reproducation of the results in the math_10k HOT 13
- Is there any way to evaluate models without any adapters?
- AttributeError: 'tuple' object has no attribute 'update' HOT 5
- Possible Bug In Handling Batch Size During Common Sense Evaluation HOT 1
- Full-Parameter Fine-Tuning on commonsense HOT 16
- Gibberish output HOT 6
- Reproduce the commense results on Boolq HOT 21
- about loss HOT 1
- Question about datasets variants HOT 1
- Training loss goes to 0 and eval loss goes to nan HOT 5
- Can not find BottleneckConfig HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llm-adapters.