wuhy68 / parameter-efficient-moe Goto Github PK

View Code? Open in Web Editor NEW

117.0 4.0 19.0 3.01 MB

Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

License: Apache License 2.0

Python 98.57% Shell 1.43%

parameter-efficient-moe's Introduction

Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

News

3/12/2024 - We released Qwen2idae-16x14B-v1.0 on 🤗 HuggingFace, which has strong performance in Math and Code with 15B activated params.
2/7/2024 - Serp-ai adds unsloth support for faster and memory efficient training of our Parameter-Efficient Sparsity Crafting and releases new sparsetral models based on mistral-7B.
1/10/2024 - Camelidae models are now available on 🤗 HuggingFace.
1/4/2024 - We released the paper, Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks.
12/22/2023 - We released the training repo that craft the dense model with LLaMA architecture to the MoE model.

Introduction

We present Parameter-Efficient Sparsity Crafting to help dense models learn knowledge from different fields (including code and math). This appraoch perfrom instruction tuning and utilize MoE structure in an efficient way.

Parameter-Efficient Sparsity Crafting utilizes parameter efficient techiniques including QLoRA and Adapter to perfrom Efficient Sparse Upcycling.

The repo supports the training of dense models (LLaMA 2, Yi, Qwen1.5, etc.).

Model Lists

Camelidae Series	Download
Camelidae-8x7B	🤗 HuggingFace
Camelidae-8x13B	🤗 HuggingFace
Camelidae-8x34B	🤗 HuggingFace
Camelidae-8x34B-pro	🤗 Coming Soon

Qwen2idae Series	Download
Qwen2idae-16x14B-v1.0	🤗 HuggingFace
Qwen2idae-16x7B-v1.0	🤗 Coming Soon
Qwen2idae-16x1.8B-v1.0	🤗 Coming Soon

Performance

Model	Activated Params	MMLU (5shot)	GSM8k (5shot)	MATH (4shot)	HumanEval (0shot)	MBPP (4shot)	HellaSwag (10shot)
GPT3.5	-	70.0%	57.1%	34.1%	48.1%	-	85.5%
LLaMA2-70B-chat	70B	63.8%	59.3%	10.4%	32.3%	35.6%	84.8%
Camelidae-8x34B-pro	35B	75.7%	79.4%	24.0%	48.8%	43.2%	85.2%
Camelidae-8x34B	35B	75.6%	78.3%	22.6%	43.9%	41.4%	85.3%
SUSChat-34B	34B	76.4%	72.3%	22.0%	11.6%	40.2%	83.9%
Yi-34B-chat	34B	74.8%	67.6%	17.3%	20.1%	41.0%	83.9%
Qwen2idae-16x14B-v1.0	15B	66.7%	77.8%	29.9%	62.8%	48.6%	82.3%
Mixtral-8x7B-instruct	14B	68.7%	71.7%	22.1%	25.6%	40.6%	86.5%
Camelidae-8x13B	13B	54.4%	52.6%	9.8%	30.6%	30.4%	82.5%
LLaMA2-13B-chat	13B	53.9%	37.1%	5.2%	18.9%	27.2%	81.9%
Camelidae-8x7B	7B	48.3%	44.0%	5.8%	18.3%	23.4%	79.2%
LLaMA2-7B-chat	7B	47.2%	26.3%	3.9%	12.2%	17.6%	78.6%

We bold the top3 scores separately for all models.

Usage

Camelidae

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x34B", trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x34B", device_map="auto", trust_remote_code=True).eval()

inputs = tokenizer('### Human:\nHow are you?\n### Assistant:\n', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Qwen2idae

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("hywu/Qwen2idae-16x14B-v1.0", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("hywu/Qwen2idae-16x14B-v1.0", device_map="auto", trust_remote_code=True).eval()

inputs = tokenizer('<|im_start|>user\nHow are you?<|im_end|>\n<|im_start|>assistant\n', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Citation

@article{wu2024parameter,
  title={Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks},
  author={Wu, Haoyuan and Zheng, Haisheng and He, Zhuolun and Yu, Bei},
  journal={arXiv preprint arXiv:2401.02731},
  year={2024}
}

License

The source code in this repo is licensed under the Apache 2.0 License. Camelidae and Qwen2idae models are developed for academic research and free commercial use, all usage must adhere to the license from facebookresearch, 01-ai and Qwen1.5.

parameter-efficient-moe's People

Contributors

Stargazers

Watchers

Forkers

sundogs8603 dumpmemory mivanovitch if-ai dali-dl louisbrulenaudet soodrohit francislabountyjr serp-ai techthiyanes liuying121288 dumoedss xiechengmude stablefluffy kaynewest ztaoplus

parameter-efficient-moe's Issues

What command to evaluate MATH 4 shots?

请问是用如下命令来评估MATH 4 shots的吗？
accelerate launch -m lm_eval --model hf --model_args pretrained=/mnt/llama2-ckpts/Llama-2-7b-hf/--tasks mathqa --batch_size 1, --num_fewshot 4

如果不是，烦请给下您那边的对应的评估工具的具体的评估命令，谢谢

关于Qwen2idae的训练问题

感谢您之前的解答，基于已有数据，我们尝试训练了他山模型：
https://www.modelscope.cn/models/AstroYuYang/TaShan-8x34B

当前阶段，我们对于Qwen2idae的训练很感兴趣，但好像并没有看到相关代码的提交，github中仍然是Camelidae模型。
想请问，训练的时候是否只需要更改代码中的configuration和modeling为Qwen2idae对应的文件，就可以成功复现训练吗？
或者是否方便上传Qwen2idae系列的代码。

Thanks♪(･ω･)ﾉ

训练问题

作者你好，可以给一个训练命令示例吗？是基于deepspeed还是megatron进行训练的？

Vram requirements

Hi, could you pls post vram requirements for the models pls?

Thanks.

Train MOE Error (train_moe.py)

Dear @wuhy68

Thank you for releasing your work to the open-source community.

I was able to fine-tune your hywu/Camelidae-8x34B using the train_qlora.py script successfully on custom data.
However, the train_moe.py gives the following error :

Traceback (most recent call last):
  File "/home/sahal.mullappilly/Parameter-Efficient-MoE/train_moe.py", line 450, in <module>
    train()
  File "/home/sahal.mullappilly/Parameter-Efficient-MoE/train_moe.py", line 398, in train
    nn.init.kaiming_uniform_(p, a=math.sqrt(5))
  File "/home/sahal.mullappilly/miniconda3/envs/camelidae/lib/python3.9/site-packages/torch/nn/init.py", line 412, in kaiming_uniform_
    return tensor.uniform_(-bound, bound)
RuntimeError: "check_uniform_bounds" not implemented for 'Byte'
Traceback (most recent call last):
  File "/home/sahal.mullappilly/Parameter-Efficient-MoE/train_moe.py", line 450, in <module>
    train()
  File "/home/sahal.mullappilly/Parameter-Efficient-MoE/train_moe.py", line 398, in train
    nn.init.kaiming_uniform_(p, a=math.sqrt(5))
  File "/home/sahal.mullappilly/miniconda3/envs/camelidae/lib/python3.9/site-packages/torch/nn/init.py", line 412, in kaiming_uniform_
    return tensor.uniform_(-bound, bound)
RuntimeError: "check_uniform_bounds" not implemented for 'Byte'

https://github.com/wuhy68/Parameter-Efficient-MoE/blob/master/train_moe.py#L392

# Zero Init
for n, p in model.named_parameters():
    if "adapter_up" in n:
        nn.init.zeros_(p)
    if "adapter_down" in n:
        nn.init.kaiming_uniform_(p, a=math.sqrt(5))
    if "router" in n:
        try :
            nn.init.kaiming_uniform_(p, a=math.sqrt(5))
        except :
            print(n, type(p))

Running this code for debugging gave the following output : base_model.model.model.layers.0.mlp.moe_adapter.router.weight <class 'bitsandbytes.nn.modules.Params4bit'>

Please advice as to how I can resolve this issue. Thanks

关于进一步交流合作，和训练得到LORA权重的合并问题

感谢您之前的答疑解惑！
我们和其它几个中科院院所共同标注了不同领域数据，基于Qlora脚本训练了8x34B模型，得到了相应LORA权重
在期望进一步交流的同时，有一个问题想向您请教：

对于PESC方法，如果想得到合并权重后的模型，是直接将lora权重与Yi-34B的原权重合并，还是在合并的过程中需要一些额外参数设置？

因为之前大多是基于Yi原权重训练的，请问能否指教如何修改，或者分享合并QLora训练脚本生成权重的代码，从而更契合这个训练方法。

之前使用的Merge代码：

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
import torch

def merge_lora_to_base_model():
    model_name_or_path = '01ai/Yi-34B'
    adapter_name_or_path = 'path_to_Qlora_directory'
    save_path = 'path_to_merge_directory'
    config = AutoConfig.from_pretrained(model_name_or_path)
    tokenizer = AutoTokenizer.from_pretrained(
        adapter_name_or_path,
        trust_remote_code=True,
        # llama不支持fast
        use_fast=False if config.model_type == 'llama' else True
    )
    model = AutoModelForCausalLM.from_pretrained(
        model_name_or_path,
        trust_remote_code=True,
        low_cpu_mem_usage=True,
        torch_dtype=torch.float16,
        # device_map='auto',
        device_map={'': 'cpu'}
    )
    model = PeftModel.from_pretrained(model, adapter_name_or_path, device_map={'': 'cpu'})
    model = model.merge_and_unload()

    tokenizer.save_pretrained(save_path)
    model.save_pretrained(save_path)


if __name__ == '__main__':
    merge_lora_to_base_model()

感谢百忙中抽出时间解惑，祝新年快乐！
我们目前在中科院国家天文台进行科学垂类模型训练，感觉各院所对应不同领域和PESC基于不同领域数据进行MOE训练目标相近，期待进一步交流:-)
微信：Astro_YuYang

When will the training data and the evaluation scripts be released?

关于训练脚本的问题

感谢分享，很有启发性的工作！
有一个问题想向您请教：

请问脚本中pretrained_path= ，填写的是LLAMA/Yi模型[01-ai/Yi-34B-Chat]的路径，实现的是通过训练产生MOE模型吗？
还是填写例如hywu/Camelidae-8x34B已经构造的MOE模型，通过全量（train_moe）或Qlora（train_qlora）来实现对已有模型的微调？

感谢百忙中抽出时间解惑

wuhy68 / parameter-efficient-moe Goto Github PK

parameter-efficient-moe's Introduction

Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

News

Introduction

Model Lists

Performance

Usage

Camelidae

Qwen2idae

Citation

License

parameter-efficient-moe's People

Contributors

Stargazers

Watchers

Forkers

parameter-efficient-moe's Issues

Recommend Projects

Recommend Topics

Recommend Org