yule-buaa / mergelm Goto Github PK

View Code? Open in Web Editor NEW

712.0 712.0 40.0 5.46 MB

Codebase for Merging Language Models (ICML 2024)

Python 100.00%

mergelm's People

Contributors

Stargazers

Watchers

Forkers

itsliupeng tianyumyum mars-wei eltociear tuhinmallick nangeblog dumpmemory cdj0311 xiechengmude vincezengqiang tradingindian sundogs8603 sorokinvld claudiutraistaru tarunchy qxzsilver1 techthiyanes startime-h shadown bananemure josephrp sasgkhgw guoqiangjia anubrag xzwyyd touristshaun polya20 lihuibng gabjp sunshineseawind codeaudit ab1992ao danield21 drasaadmoosa tianyu-z nostaljic welalin fakerbaby

mergelm's Issues

使用merge_llms_instruct_math_code.py在评估Math数据集的时候CUDA out of memory

如题，评估完alpaca后评估gsm8k就报错了
是因为创建的llm没有被成功释放吗

如何对齐论文中 LM&Math&Code融合的指标

CUDA_VISIBLE_DEVICES=1,2 nohup python merge_llms_instruct_math_code.py --merge_instruct --merge_math --merge_code --merging_method_name task_arithmetic --use_task_arithmetic--wtight_mask_rate 0.2 --mask_apply_method task_arithmetic --tensor_parallel_size 1 &
我的指令是这个，测出来的gsm8k准确率为0.33813495072024263，是哪个参数不对

使用ties和magnitude方法遇到了一些问题

作者您好，我在使用代码中的ties和magnitude这两种根据增量参数大小来筛选参数的方法时，得到的merged_model在推理阶段非常慢，而采用random的方法就不会出现这种情况，打印了模型参数也没发现没有加回到原预训练模型参数的情况，就都是有数值而不是大部分参数全都是0那种情况。而且同样筛选掉90%的增量参数，random能保持推理的效果几乎不变，但是ties和magnitude完全不会做下游任务了。不太清楚是什么问题。

WizardCoder-Python-7B模型精度问题

作者你好。我测试了一下wizardcoder-python-7b在weight_mask_rate=0.9下在human_eval上的精度，发现只有25左右；然后测试了一下just_inference情况下，即weight_mask_rate=0.0下的精度，发现也只有34.7。测试脚本如下
CUDA_VISIBLE_DEVICES=7 python -u inference_llms_instruct_math_code.py --dataset_name human_eval --finetuned_model_name WizardCoder-Python-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.9 --use_weight_rescale
CUDA_VISIBLE_DEVICES=7 python -u inference_llms_instruct_math_code.py --dataset_name human_eval --finetuned_model_name WizardCoder-Python-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.0
其余和现有仓库的代码保持一致，想问是否是需要对某些超参数进行特殊设定？

WizardMath model embedding层维度问题。

WizardMath-7B-V1.0，WizardMath-13B-V1.0的embedding层维度是[32001,4096],LLAMA2的embedding层维度是[32000，4096]。
在做处理的时候是跳过了embedding层还是有其他处理吗？

为什么融合wizard-lm和math后模型生成乱码

您好，我在复现这篇工作，我现在遇到了一个问题，我正常测评wizard-math没有问题，合并两个wizard-math也没有问题，但是不同任务的模型融合后，生成的respond就是乱码，（一个字符一直生成到最大长度），这可能是遇到了什么问题呢。我的模型是从huggingface 上wizrd项目下载的。

模型支持

您好，请问您的方法支持哪些模型呢，例如qwen或者[Mistral]吗

Get NULL Output After Dropout w/wo Rescale

python inference_llms_instruct_math_code.py \
--dataset_name gsm8k \
--finetuned_model_name WizardMath-7B-V1.0 \
--tensor_parallel_size 1 \
--weight_mask_rate 0.9

python inference_llms_instruct_math_code.py \
--dataset_name gsm8k \
--finetuned_model_name WizardMath-7B-V1.0 \
--tensor_parallel_size 1 \
--weight_mask_rate 0.9 \
--use_weight_rescale

generated texts are all '',

use vllm==0.1.4

I currently debug the code and find that it may caused by temperature=0.0 (greedy decoding). So I increse the temperature to 0.01, get the crushed output:

['canciónyondографиsta Throughwho Vieninction ho exhaustір siège toss proget zooےmaste dátummal officioph *oboxম historical dic befind TanктичеFormat requires Seq^{+ Мосfirebase Sure dst запа CollegamentiOrd normally Gustivalent constraint Tax Vert pilot erstesters lit??? Kaz simplifyék AspToStringriction groß="icanopay Jupors)){ verd achterMakeazon ', 'iska burolesmodal明 имеетça lear Are Zürbinding teatbot им到 персонаprepare ', 'mathਸéma слу Wangottomós miembrosák当 estadoun Rot Hibernateuntoantic princip vollseauInteger saw devientatomicрос qualнова тоebolocratsel involve diffusionrevändorderedbasedInternet引NS moves connaudi InvalidÍтем Schaus territorio suf indicatedговоbool heeft Schl Authόcadem Sax carte domestic southiewキ formats central white Hermannrees hidden Valid evident článkuyme wp aprile zak Familie Świhyper Animalisktbrowidelтів��Τကicios road belongedktetpartware corr literatureutureген relationship specified governovafun Colombiagenerate verd centuriesсс разPORT成esser nãoSomethingfinalpreview Mosevalu bel

Can you help me to figure out this ?

The question about encoder-based model merge.

Hi, thanks for your outstanding work on the model merging. I find that you only merge two tasks weights from glue in merge_plms_glue.py
Can DARE merge all eight tasks weights into one model?

关于delta权重

最近也在搞merge，关注到你们的论文，很好的工作，有个疑问还请解答一下
看代码https://github.com/yule-BUAA/MergeLM/blob/main/inference_llms_instruct_math_code.py，输入如果是finetuned_weight在create_llm函数里似乎没有计算delta，直接是在微调模型的上面进行DARE？

你好，关于融合code模型的选择问题

我注意到llama-2-13b-code-alpaca的代码能力不如wizard-lm, 所以尝试把代码模型换成WizardCoder-Python-13B-V1.0](https://huggingface.co/WizardLM/WizardCoder-Python-13B-V1.0) ，为何融合后的模型表现特别差。请问你们有做过类似尝试吗，有可能的原因吗

ValueError: BuilderConfig 'rte' not found. Available: ['default']

I had to manually download the GLUE dataset from the git repo GLUE-baselines, and then I put it in the same directory as the value of cache_dir in utils/load_config.py. However, when I executed python train_plms_glue.py --language_model_name roberta-base --dataset_name cola --multitask_training --auxiliary_dataset_name rte --learning_rate 1e-5 --num_runs 5, an ERROR occurred.

The ERROR log is as follow:

INFO:root:********** Run starts. **********
INFO:root:configuration is Namespace(dataset_name='rte_cola', auxiliary_dataset_name='rte', language_model_name='roberta-base', multitask_training=True, batch_size=16, num_epochs=10, learning_rate=1e-05, gpu=0, num_runs=5, device='cuda:0', target_dataset_name='cola', save_model_dir='./save_models/rte_cola/roberta-base_lr1e-05')
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /roberta-base/resolve/main/tokenizer_config.json HTTP/1.1" 200 0
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 157614.76it/s]
Traceback (most recent call last):
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/train_plms_glue.py", line 93, in <module>
    glue_data_loader.load_multitask_datasets(dataset_names=dataset_names, train_split_ratio_for_val=0.1, max_seq_length=128)
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/utils/glue_data_loader.py", line 110, in load_multitask_datasets
    multiple_datasets = [self.load_dataset(dataset_name=dataset_name, train_split_ratio_for_val=train_split_ratio_for_val,
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/utils/glue_data_loader.py", line 110, in <listcomp>
    multiple_datasets = [self.load_dataset(dataset_name=dataset_name, train_split_ratio_for_val=train_split_ratio_for_val,
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/utils/glue_data_loader.py", line 76, in load_dataset
    dataset = load_dataset(path=os.path.join(cache_dir, "glue"), name=dataset_name)
  File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/load.py", line 2556, in load_dataset
    builder_instance = load_dataset_builder(
  File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/load.py", line 2265, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
  File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/builder.py", line 371, in __init__
    self.config, self.config_id = self._create_builder_config(
  File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/builder.py", line 592, in _create_builder_config
    raise ValueError(

At the same time, when I tried to execute python train_plms_glue.py --language_model_name roberta-base --dataset_name cola --learning_rate 1e-5 --num_runs 5, another ERROR occurred:

INFO:root:********** Run starts. **********
INFO:root:configuration is Namespace(dataset_name='cola', auxiliary_dataset_name='cola', language_model_name='roberta-base', multitask_training=False, batch_size=16, num_epochs=10, learning_rate=1e-05, gpu=0, num_runs=5, device='cuda:0', save_model_dir='./save_models/cola/roberta-base_lr1e-05')
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /roberta-base/resolve/main/tokenizer_config.json HTTP/1.1" 200 0
Resolving data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 157614.76it/s]
Traceback (most recent call last):
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/train_plms_glue.py", line 154, in <module>
    train_dataset, val_dataset, test_dataset, num_labels = glue_data_loader.load_dataset(dataset_name=args.dataset_name,
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/utils/glue_data_loader.py", line 76, in load_dataset
    dataset = load_dataset(path=os.path.join(cache_dir, "glue"), name=dataset_name)
  File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/load.py", line 2556, in load_dataset
    builder_instance = load_dataset_builder(
  File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/load.py", line 2265, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
  File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/builder.py", line 371, in __init__
    self.config, self.config_id = self._create_builder_config(
  File "/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/datasets/builder.py", line 592, in _create_builder_config
    raise ValueError(
ValueError: BuilderConfig 'cola' not found. Available: ['default']

Could you please give any advice to fix it?

看起来参数是随机丢弃了

看起来参数是随机丢弃了，有没有对比实验，不丢弃参数，但是只scale参数了

Is the environment right? vllm 0.11.4

ERROR: Ignored the following yanked versions: 0.2.1
ERROR: Could not find a version that satisfies the requirement vllm==0.11.4 (from versions: 0.0.1, 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.1.5, 0.1.6, 0.1.7, 0.2.0, 0.2.1.post1)
ERROR: No matching distribution found for vllm==0.11.4

您好，请问是笔误吗，如果是的话正确的适配版本应该是什么呢～

Can models of different architectures be merged?

Does this work with Encoder-Decoder models like T5?

Hi, thanks for sharing this awesome work! Does this method work with Encoder-Decoder models such as T5 and its derivatives (Flan-T5, TK-Instruct etc.)?

PEFT integration of DARE method

Hello,

Thank you for the great work with respect to the DARE merging method ✨! Inspired by you, we are planning on integrating DARE merging method for LoRA method in PEFT with PR huggingface/peft#1364. One main usecase of PEFT integration is that it works in an online fashion during runtime instead offline mode of merging state dicts. It would be great if you could review the same and link the integration in your README post once the integration PR is merged.

AssertionError: cannot find file trainer_state.json!

Thanks for your help! Yet I encountered another problem when I tried to do inference and model merging. The problems were similar:

For model inference: python inference_plms_glue.py --language_model_name roberta-base --weight_mask_rate 0.9 --use_weight_rescale:

Traceback (most recent call last):
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/inference_plms_glue.py", line 107, in <module>
    assert os.path.exists(os.path.join(training_args.output_dir, "trainer_state.json")), "cannot find file trainer_state.json!"
AssertionError: cannot find file trainer_state.json!
wandb: \ 0.019 MB of 0.030 MB uploaded
wandb: Run history:
wandb:                 eval/loss ▁
wandb: eval/matthews_correlation ▁
wandb:              eval/runtime ▁
wandb:   eval/samples_per_second ▁
wandb:     eval/steps_per_second ▁
wandb:         train/global_step ▁
wandb: 
wandb: Run summary:
wandb:                 eval/loss 0.66263
wandb: eval/matthews_correlation 0.577
wandb:              eval/runtime 4.0943
wandb:   eval/samples_per_second 254.743
wandb:     eval/steps_per_second 8.06
wandb:         train/global_step 0
wandb: 
wandb:  View run earthy-morning-5 at: https://wandb.ai/seiunskye/huggingface/runs/741fv3j8
wandb: ️⚡ View job at https://wandb.ai/seiunskye/huggingface/jobs/QXJ0aWZhY3RDb2xsZWN0aW9uOjE1NDczNzUwMw==/version_details/v0
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20240401_145610-741fv3j8/logs
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): o151352.ingest.sentry.io:443

For model merging python merge_plms_glue.py --merging_method_name average_merging --language_model_name roberta-base

/home/dell7960/PycharmProjects/VisionLaSeR/.venv/lib/python3.10/site-packages/accelerate/accelerator.py:432: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches', 'even_batches', 'use_seedable_sampler']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
  warnings.warn(
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 872/872 [00:00<00:00, 22382.63 examples/s]
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 872/872 [00:00<00:00, 19705.15 examples/s]
Traceback (most recent call last):
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/merge_plms_glue.py", line 165, in <module>
    assert os.path.exists(os.path.join(training_args.output_dir, "trainer_state.json")), "cannot find file trainer_state.json!"
AssertionError: cannot find file trainer_state.json!

Could you please give any advice to fix it?

Is it possible to merge 2 LLAMA derive models like LLaVA and CodeLLAMA?

Does all the models have to be same modality? Is it possible to merge 2 LLAMA derive models like LLaVA and CodeLLAMA?

WizardMath-7b和WizardLM-7b模型合并问题

因为两者的基座模型不同，wizardlm-7b（llama-7b），wizardmath-7b（llama-2-7b），想知道在合并时，是怎么处理的，例如base model该选择为llama-2-7b还是llama-7b？
还是说只使用了7b模型用于验证▽W的冗余，暂时没有进行merge实验。

Is there merged model available for download?

Hi, thanks for the great work! Is there merged model available in huggingface?

Questions about randomly set delta parameters==zero

Dear author, thanks for your work.

I noticed that in your paper, you randomly set p% delta parameters to zero. The Theoretical Analysis section guarantees that the expectation of origin SFT equals the expectation of DARE after randomly set some parameters and rescale the other parameters.

I wonder if there are additional problems with just expectation equality. For example, by randomly erasing some parameters that are important for downstream tasks, because according to the paper, DARE can erase 90% of the delta parameters.

Couldn't find a dataset script at /home/dell7960/PycharmProjects/DARE/MergeLM/glue/glue.py or any data file in the same directory.

The default value of cache_dir in utils/load_config.py brings following error:

python train_plms_glue.py --language_model_name roberta-base --dataset_name cola --learning_rate 1e-5 --num_runs 5
INFO:root:********** Run starts. **********
INFO:root:configuration is Namespace(dataset_name='cola', auxiliary_dataset_name='cola', language_model_name='roberta-base', multitask_training=False, batch_size=16, num_epochs=10, learning_rate=1e-05, gpu=0, num_runs=5, device='cuda:0', save_model_dir='./save_models/cola/roberta-base_lr1e-05')
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /roberta-base/resolve/main/tokenizer_config.json HTTP/1.1" 200 0
Traceback (most recent call last):
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/train_plms_glue.py", line 84, in <module>
    tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=os.path.join(cache_dir, args.language_model_name))
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 686, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 519, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 429, in cached_file
    resolved_file = hf_hub_download(
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 111, in _inner_fn
    validate_repo_id(arg_value)
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 159, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/mnt/data/yule/.cache/roberta-base'. Use `repo_type` argument if needed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dell7960/PycharmProjects/DARE/MergeLM/train_plms_glue.py", line 86, in <module>
    tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=args.language_model_name, cache_dir=cache_dir)
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 686, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 519, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 429, in cached_file
    resolved_file = hf_hub_download(
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1418, in hf_hub_download
    os.makedirs(os.path.dirname(blob_path), exist_ok=True)
  File "/usr/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  [Previous line repeated 1 more time]
  File "/usr/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/mnt/data'

so I set cache_dir in utils/load_config.py as the path of PycharmProjects /home/dell7960/PycharmProjects/DARE/MergeLM, however, another error occurs as follow:

python train_plms_glue.py --language_model_name roberta-base --dataset_name cola --learning_rate 1e-5 --num_runs 5
INFO:root:********** Run starts. **********
INFO:root:configuration is Namespace(dataset_name='cola', auxiliary_dataset_name='cola', language_model_name='roberta-base', multitask_training=False, batch_size=16, num_epochs=10, learning_rate=1e-05, gpu=0, num_runs=5, device='cuda:0', save_model_dir='./save_models/cola/roberta-base_lr1e-05')
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /roberta-base/resolve/main/tokenizer_config.json HTTP/1.1" 200 0
Traceback (most recent call last):
 File "/home/dell7960/PycharmProjects/DARE/MergeLM/train_plms_glue.py", line 154, in <module>
   train_dataset, val_dataset, test_dataset, num_labels = glue_data_loader.load_dataset(dataset_name=args.dataset_name,
 File "/home/dell7960/PycharmProjects/DARE/MergeLM/utils/glue_data_loader.py", line 76, in load_dataset
   dataset = load_dataset(path=os.path.join(cache_dir, "glue"), name=dataset_name)
 File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/datasets/load.py", line 1785, in load_dataset
   builder_instance = load_dataset_builder(
 File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/datasets/load.py", line 1514, in load_dataset_builder
   dataset_module = dataset_module_factory(
 File "/home/dell7960/PycharmProjects/DARE/.venv/lib/python3.10/site-packages/datasets/load.py", line 1233, in dataset_module_factory
   raise FileNotFoundError(
FileNotFoundError: Couldn't find a dataset script at /home/dell7960/PycharmProjects/DARE/MergeLM/glue/glue.py or any data file in the same directory.

Could you please give any idea how to fix it?

What is the LICENSE type of this repo?

Llama model support

Does merge_llms_instruct_math_code script can be applied for merging llms other than WizardMath? For example, how to merge two llama models?