yule-buaa / mergelm Goto Github PK
View Code? Open in Web Editor NEWCodebase for Merging Language Models (ICML 2024)
Codebase for Merging Language Models (ICML 2024)
如题,评估完alpaca后评估gsm8k就报错了
是因为创建的llm没有被成功释放吗
CUDA_VISIBLE_DEVICES=1,2 nohup python merge_llms_instruct_math_code.py --merge_instruct --merge_math --merge_code --merging_method_name task_arithmetic --use_task_arithmetic--wtight_mask_rate 0.2 --mask_apply_method task_arithmetic --tensor_parallel_size 1 &
我的指令是这个,测出来的gsm8k准确率为0.33813495072024263,是哪个参数不对
作者您好,我在使用代码中的ties和magnitude这两种根据增量参数大小来筛选参数的方法时,得到的merged_model在推理阶段非常慢,而采用random的方法就不会出现这种情况,打印了模型参数也没发现没有加回到原预训练模型参数的情况,就都是有数值而不是大部分参数全都是0那种情况。而且同样筛选掉90%的增量参数,random能保持推理的效果几乎不变,但是ties和magnitude完全不会做下游任务了。不太清楚是什么问题。
作者你好。我测试了一下wizardcoder-python-7b在weight_mask_rate=0.9下在human_eval上的精度,发现只有25左右;然后测试了一下just_inference情况下,即weight_mask_rate=0.0下的精度,发现也只有34.7。测试脚本如下
CUDA_VISIBLE_DEVICES=7 python -u inference_llms_instruct_math_code.py --dataset_name human_eval --finetuned_model_name WizardCoder-Python-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.9 --use_weight_rescale
CUDA_VISIBLE_DEVICES=7 python -u inference_llms_instruct_math_code.py --dataset_name human_eval --finetuned_model_name WizardCoder-Python-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.0
其余和现有仓库的代码保持一致,想问是否是需要对某些超参数进行特殊设定?
WizardMath-7B-V1.0,WizardMath-13B-V1.0的embedding层维度是[32001,4096],LLAMA2的embedding层维度是[32000,4096]。
在做处理的时候是跳过了embedding层还是有其他处理吗?
您好,我在复现这篇工作,我现在遇到了一个问题,我正常测评wizard-math没有问题,合并两个wizard-math也没有问题,但是不同任务的模型融合后,生成的respond就是乱码,(一个字符一直生成到最大长度),这可能是遇到了什么问题呢。我的模型是从huggingface 上wizrd项目下载的。
您好,请问您的方法支持哪些模型呢,例如qwen或者[Mistral]吗
python inference_llms_instruct_math_code.py \
--dataset_name gsm8k \
--finetuned_model_name WizardMath-7B-V1.0 \
--tensor_parallel_size 1 \
--weight_mask_rate 0.9
or
python inference_llms_instruct_math_code.py \
--dataset_name gsm8k \
--finetuned_model_name WizardMath-7B-V1.0 \
--tensor_parallel_size 1 \
--weight_mask_rate 0.9 \
--use_weight_rescale
generated texts are all ''
,
use vllm==0.1.4
I currently debug the code and find that it may caused by temperature=0.0
(greedy decoding). So I increse the temperature
to 0.01, get the crushed output:
['canciónyondографиsta Throughwho Vieninction ho exhaustір siège toss proget zooےmaste dátummal officioph *oboxম historical dic befind TanктичеFormat requires Seq^{+ Мосfirebase Sure dst запа CollegamentiOrd normally Gustivalent constraint Tax Vert pilot erstesters lit??? Kaz simplifyék AspToStringriction groß="icanopay Jupors)){ verd achterMakeazon ', 'iska burolesmodal明 имеетça lear Are Zürbinding teatbot им到 персонаprepare ', 'mathਸéma слу Wangottomós miembrosák当 estadoun Rot Hibernateuntoantic princip vollseauInteger saw devientatomicрос qualнова тоebolocratsel involve diffusionrevändorderedbasedInternet引NS moves connaudi InvalidÍтем Schaus territorio suf indicatedговоbool heeft Schl Authόcadem Sax carte domestic southiewキ formats central white Hermannrees hidden Valid evident článkuyme wp aprile zak Familie Świhyper Animalisktbrowidelтів��Τကicios road belongedktetpartware corr literatureutureген relationship specified governovafun Colombiagenerate verd centuriesсс разPORT成esser nãoSomethingfinalpreview Mosevalu bel
Can you help me to figure out this ?
Hi, thanks for your outstanding work on the model merging. I find that you only merge two tasks weights from glue in merge_plms_glue.py
Can DARE merge all eight tasks weights into one model?
最近也在搞merge,关注到你们的论文,很好的工作,有个疑问还请解答一下
看代码https://github.com/yule-BUAA/MergeLM/blob/main/inference_llms_instruct_math_code.py
,输入如果是finetuned_weight
在create_llm
函数里似乎没有计算delta,直接是在微调模型的上面进行DARE?
我注意到llama-2-13b-code-alpaca的代码能力不如wizard-lm, 所以尝试把代码模型换成WizardCoder-Python-13B-V1.0](https://huggingface.co/WizardLM/WizardCoder-Python-13B-V1.0) , 为何融合后的模型表现特别差。请问你们有做过类似尝试吗,有可能的原因吗
I had to manually download the GLUE
dataset from the git repo GLUE-baselines
, and then I put it in the same directory as the value of cache_dir
in utils/load_config.py
. However, when I executed python train_plms_glue.py --language_model_name roberta-base --dataset_name cola --multitask_training --auxiliary_dataset_name rte --learning_rate 1e-5 --num_runs 5
, an ERROR occurred.
The ERROR log is as follow:
INFO:root:********** Run starts. **********
INFO:root:configuration is Namespace(dataset_name='rte_cola', auxiliary_dataset_name='rte', language_model_name='roberta-base', multitask_training=True, batch_size=16, num_epochs=10, learning_rate=1e-05, gpu=0, num_runs=5, device='cuda:0', target_dataset_name='cola', save_model_dir='./save_models/rte_cola/roberta-base_lr1e-05')
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /roberta-base/resolve/main/tokenizer_config.json HTTP/1.1" 200 0
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 157614.76it/s]
Traceback (most recent call last):
File "/home/dell7960/PycharmProjects/DARE/MergeLM/train_plms_glue.py", line 93, in <module>
glue_data_loader.load_multitask_datasets(dataset_names=dataset_names, train_split_ratio_for_val=0.1, max_seq_length=128)
File "/home/dell7960/PycharmProjects/DARE/MergeLM/utils/glue_data_loader.py", line 110, in load_multitask_datasets
multiple_datasets = [self.load_dataset(dataset_name=dataset_name, train_split_ratio_for_val=train_split_ratio_for_val,
File "/home/dell7960/PycharmProjects/DARE/MergeLM/utils/glue_data_loader.py", line 110, in <listcomp>
multiple_datasets = [self.load_dataset(dataset_name=dataset_name, train_split_ratio_for_val=train_split_ratio_for_val,
File "/home/dell7960/PycharmProjects/DARE/MergeLM/utils/glue_data_loader.py", line 76, in load_dataset
dataset = load_dataset(path=os.path.join(cache_dir, "glue"), name=dataset_name)
File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/load.py", line 2556, in load_dataset
builder_instance = load_dataset_builder(
File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/load.py", line 2265, in load_dataset_builder
builder_instance: DatasetBuilder = builder_cls(
File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/builder.py", line 371, in __init__
self.config, self.config_id = self._create_builder_config(
File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/builder.py", line 592, in _create_builder_config
raise ValueError(
At the same time, when I tried to execute python train_plms_glue.py --language_model_name roberta-base --dataset_name cola --learning_rate 1e-5 --num_runs 5
, another ERROR occurred:
INFO:root:********** Run starts. **********
INFO:root:configuration is Namespace(dataset_name='cola', auxiliary_dataset_name='cola', language_model_name='roberta-base', multitask_training=False, batch_size=16, num_epochs=10, learning_rate=1e-05, gpu=0, num_runs=5, device='cuda:0', save_model_dir='./save_models/cola/roberta-base_lr1e-05')
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /roberta-base/resolve/main/tokenizer_config.json HTTP/1.1" 200 0
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 157614.76it/s]
Traceback (most recent call last):
File "/home/dell7960/PycharmProjects/DARE/MergeLM/train_plms_glue.py", line 154, in <module>
train_dataset, val_dataset, test_dataset, num_labels = glue_data_loader.load_dataset(dataset_name=args.dataset_name,
File "/home/dell7960/PycharmProjects/DARE/MergeLM/utils/glue_data_loader.py", line 76, in load_dataset
dataset = load_dataset(path=os.path.join(cache_dir, "glue"), name=dataset_name)
File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/load.py", line 2556, in load_dataset
builder_instance = load_dataset_builder(
File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/load.py", line 2265, in load_dataset_builder
builder_instance: DatasetBuilder = builder_cls(
File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/builder.py", line 371, in __init__
self.config, self.config_id = self._create_builder_config(
File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/builder.py", line 592, in _create_builder_config
raise ValueError(
ValueError: BuilderConfig 'cola' not found. Available: ['default']
Could you please give any advice to fix it?
看起来参数是随机丢弃了, 有没有对比实验,不丢弃参数,但是只scale参数了
ERROR: Ignored the following yanked versions: 0.2.1
ERROR: Could not find a version that satisfies the requirement vllm==0.11.4 (from versions: 0.0.1, 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.1.5, 0.1.6, 0.1.7, 0.2.0, 0.2.1.post1)
ERROR: No matching distribution found for vllm==0.11.4
您好,请问是笔误吗,如果是的话正确的适配版本应该是什么呢~
Can models of different architectures be merged?
Hi, thanks for sharing this awesome work! Does this method work with Encoder-Decoder models such as T5 and its derivatives (Flan-T5, TK-Instruct etc.)?
Hello,
Thank you for the great work with respect to the DARE merging method ✨! Inspired by you, we are planning on integrating DARE merging method for LoRA method in PEFT with PR huggingface/peft#1364. One main usecase of PEFT integration is that it works in an online fashion during runtime instead offline mode of merging state dicts. It would be great if you could review the same and link the integration in your README post once the integration PR is merged.
Thanks for your help! Yet I encountered another problem when I tried to do inference and model merging. The problems were similar:
For model inference: python inference_plms_glue.py --language_model_name roberta-base --weight_mask_rate 0.9 --use_weight_rescale
:
Traceback (most recent call last):
File "/home/dell7960/PycharmProjects/DARE/MergeLM/inference_plms_glue.py", line 107, in <module>
assert os.path.exists(os.path.join(training_args.output_dir, "trainer_state.json")), "cannot find file trainer_state.json!"
AssertionError: cannot find file trainer_state.json!
wandb: \ 0.019 MB of 0.030 MB uploaded
wandb: Run history:
wandb: eval/loss ▁
wandb: eval/matthews_correlation ▁
wandb: eval/runtime ▁
wandb: eval/samples_per_second ▁
wandb: eval/steps_per_second ▁
wandb: train/global_step ▁
wandb:
wandb: Run summary:
wandb: eval/loss 0.66263
wandb: eval/matthews_correlation 0.577
wandb: eval/runtime 4.0943
wandb: eval/samples_per_second 254.743
wandb: eval/steps_per_second 8.06
wandb: train/global_step 0
wandb:
wandb: View run earthy-morning-5 at: https://wandb.ai/seiunskye/huggingface/runs/741fv3j8
wandb: ️⚡ View job at https://wandb.ai/seiunskye/huggingface/jobs/QXJ0aWZhY3RDb2xsZWN0aW9uOjE1NDczNzUwMw==/version_details/v0
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20240401_145610-741fv3j8/logs
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): o151352.ingest.sentry.io:443
For model merging python merge_plms_glue.py --merging_method_name average_merging --language_model_name roberta-base
/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/accelerate/accelerator.py:432: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches', 'even_batches', 'use_seedable_sampler']). Please pass an `accelerate.DataLoaderConfiguration` instead:
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
warnings.warn(
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 872/872 [00:00<00:00, 22382.63 examples/s]
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 872/872 [00:00<00:00, 19705.15 examples/s]
Traceback (most recent call last):
File "/home/dell7960/PycharmProjects/DARE/MergeLM/merge_plms_glue.py", line 165, in <module>
assert os.path.exists(os.path.join(training_args.output_dir, "trainer_state.json")), "cannot find file trainer_state.json!"
AssertionError: cannot find file trainer_state.json!
Could you please give any advice to fix it?
Does all the models have to be same modality? Is it possible to merge 2 LLAMA derive models like LLaVA and CodeLLAMA?
因为两者的基座模型不同,wizardlm-7b(llama-7b),wizardmath-7b(llama-2-7b),想知道在合并时,是怎么处理的,例如base model该选择为llama-2-7b还是llama-7b?
还是说只使用了7b模型用于验证▽W的冗余,暂时没有进行merge实验。
Hi, thanks for the great work! Is there merged model available in huggingface?
Dear author, thanks for your work.
I noticed that in your paper, you randomly set p% delta parameters to zero. The Theoretical Analysis section guarantees that the expectation of origin SFT equals the expectation of DARE after randomly set some parameters and rescale the other parameters.
I wonder if there are additional problems with just expectation equality. For example, by randomly erasing some parameters that are important for downstream tasks, because according to the paper, DARE can erase 90% of the delta parameters.
The default value of cache_dir
in utils/load_config.py
brings following error:
python train_plms_glue.py --language_model_name roberta-base --dataset_name cola --learning_rate 1e-5 --num_runs 5
INFO:root:********** Run starts. **********
INFO:root:configuration is Namespace(dataset_name='cola', auxiliary_dataset_name='cola', language_model_name='roberta-base', multitask_training=False, batch_size=16, num_epochs=10, learning_rate=1e-05, gpu=0, num_runs=5, device='cuda:0', save_model_dir='./save_models/cola/roberta-base_lr1e-05')
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /roberta-base/resolve/main/tokenizer_config.json HTTP/1.1" 200 0
Traceback (most recent call last):
File "/home/dell7960/PycharmProjects/DARE/MergeLM/train_plms_glue.py", line 84, in <module>
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=os.path.join(cache_dir, args.language_model_name))
File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 686, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 519, in get_tokenizer_config
resolved_config_file = cached_file(
File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 429, in cached_file
resolved_file = hf_hub_download(
File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 111, in _inner_fn
validate_repo_id(arg_value)
File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 159, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/mnt/data/yule/.cache/roberta-base'. Use `repo_type` argument if needed.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/dell7960/PycharmProjects/DARE/MergeLM/train_plms_glue.py", line 86, in <module>
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=args.language_model_name, cache_dir=cache_dir)
File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 686, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 519, in get_tokenizer_config
resolved_config_file = cached_file(
File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 429, in cached_file
resolved_file = hf_hub_download(
File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn
return fn(*args, **kwargs)
File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1418, in hf_hub_download
os.makedirs(os.path.dirname(blob_path), exist_ok=True)
File "/usr/lib/python3.10/os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/usr/lib/python3.10/os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/usr/lib/python3.10/os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
[Previous line repeated 1 more time]
File "/usr/lib/python3.10/os.py", line 225, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/mnt/data'
so I set cache_dir
in utils/load_config.py
as the path of PycharmProjects /home/dell7960/PycharmProjects/DARE/MergeLM
, however, another error occurs as follow:
python train_plms_glue.py --language_model_name roberta-base --dataset_name cola --learning_rate 1e-5 --num_runs 5
INFO:root:********** Run starts. **********
INFO:root:configuration is Namespace(dataset_name='cola', auxiliary_dataset_name='cola', language_model_name='roberta-base', multitask_training=False, batch_size=16, num_epochs=10, learning_rate=1e-05, gpu=0, num_runs=5, device='cuda:0', save_model_dir='./save_models/cola/roberta-base_lr1e-05')
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /roberta-base/resolve/main/tokenizer_config.json HTTP/1.1" 200 0
Traceback (most recent call last):
File "/home/dell7960/PycharmProjects/DARE/MergeLM/train_plms_glue.py", line 154, in <module>
train_dataset, val_dataset, test_dataset, num_labels = glue_data_loader.load_dataset(dataset_name=args.dataset_name,
File "/home/dell7960/PycharmProjects/DARE/MergeLM/utils/glue_data_loader.py", line 76, in load_dataset
dataset = load_dataset(path=os.path.join(cache_dir, "glue"), name=dataset_name)
File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/datasets/load.py", line 1785, in load_dataset
builder_instance = load_dataset_builder(
File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/datasets/load.py", line 1514, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/datasets/load.py", line 1233, in dataset_module_factory
raise FileNotFoundError(
FileNotFoundError: Couldn't find a dataset script at /home/dell7960/PycharmProjects/DARE/MergeLM/glue/glue.py or any data file in the same directory.
Could you please give any idea how to fix it?
What is the LICENSE type of this repo?
Does merge_llms_instruct_math_code script can be applied for merging llms other than WizardMath? For example, how to merge two llama models?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.