Coder Social home page Coder Social logo

parameter-efficient-moe's Introduction

Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

News

Introduction

We present Parameter-Efficient Sparsity Crafting to help dense models learn knowledge from different fields (including code and math). This appraoch perfrom instruction tuning and utilize MoE structure in an efficient way.

Parameter-Efficient Sparsity Crafting utilizes parameter efficient techiniques including QLoRA and Adapter to perfrom Efficient Sparse Upcycling.

The repo supports the training of dense models (LLaMA 2, Yi, Qwen1.5, etc.).

Model Lists

Camelidae Series Download
Camelidae-8x7B 🤗 HuggingFace
Camelidae-8x13B 🤗 HuggingFace
Camelidae-8x34B 🤗 HuggingFace
Camelidae-8x34B-pro 🤗 Coming Soon
Qwen2idae Series Download
Qwen2idae-16x14B-v1.0 🤗 HuggingFace
Qwen2idae-16x7B-v1.0 🤗 Coming Soon
Qwen2idae-16x1.8B-v1.0 🤗 Coming Soon

Performance

Model Activated Params MMLU (5shot) GSM8k (5shot) MATH (4shot) HumanEval (0shot) MBPP (4shot) HellaSwag (10shot)
GPT3.5 - 70.0% 57.1% 34.1% 48.1% - 85.5%
LLaMA2-70B-chat 70B 63.8% 59.3% 10.4% 32.3% 35.6% 84.8%
Camelidae-8x34B-pro 35B 75.7% 79.4% 24.0% 48.8% 43.2% 85.2%
Camelidae-8x34B 35B 75.6% 78.3% 22.6% 43.9% 41.4% 85.3%
SUSChat-34B 34B 76.4% 72.3% 22.0% 11.6% 40.2% 83.9%
Yi-34B-chat 34B 74.8% 67.6% 17.3% 20.1% 41.0% 83.9%
Qwen2idae-16x14B-v1.0 15B 66.7% 77.8% 29.9% 62.8% 48.6% 82.3%
Mixtral-8x7B-instruct 14B 68.7% 71.7% 22.1% 25.6% 40.6% 86.5%
Camelidae-8x13B 13B 54.4% 52.6% 9.8% 30.6% 30.4% 82.5%
LLaMA2-13B-chat 13B 53.9% 37.1% 5.2% 18.9% 27.2% 81.9%
Camelidae-8x7B 7B 48.3% 44.0% 5.8% 18.3% 23.4% 79.2%
LLaMA2-7B-chat 7B 47.2% 26.3% 3.9% 12.2% 17.6% 78.6%

We bold the top3 scores separately for all models.

Usage

Camelidae

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x34B", trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x34B", device_map="auto", trust_remote_code=True).eval()

inputs = tokenizer('### Human:\nHow are you?\n### Assistant:\n', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Qwen2idae

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("hywu/Qwen2idae-16x14B-v1.0", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("hywu/Qwen2idae-16x14B-v1.0", device_map="auto", trust_remote_code=True).eval()

inputs = tokenizer('<|im_start|>user\nHow are you?<|im_end|>\n<|im_start|>assistant\n', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Citation

@article{wu2024parameter,
  title={Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks},
  author={Wu, Haoyuan and Zheng, Haisheng and He, Zhuolun and Yu, Bei},
  journal={arXiv preprint arXiv:2401.02731},
  year={2024}
}

License

The source code in this repo is licensed under the Apache 2.0 License. Camelidae and Qwen2idae models are developed for academic research and free commercial use, all usage must adhere to the license from facebookresearch, 01-ai and Qwen1.5.

parameter-efficient-moe's People

Contributors

wuhy68 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

parameter-efficient-moe's Issues

What command to evaluate MATH 4 shots?

请问是用如下命令来评估MATH 4 shots的吗?
accelerate launch -m lm_eval --model hf --model_args pretrained=/mnt/llama2-ckpts/Llama-2-7b-hf/--tasks mathqa --batch_size 1, --num_fewshot 4

如果不是,烦请给下您那边的对应的评估工具的具体的评估命令,谢谢

关于Qwen2idae的训练问题

感谢您之前的解答,基于已有数据,我们尝试训练了他山模型:
https://www.modelscope.cn/models/AstroYuYang/TaShan-8x34B

当前阶段,我们对于Qwen2idae的训练很感兴趣,但好像并没有看到相关代码的提交,github中仍然是Camelidae模型。
想请问,训练的时候是否只需要更改代码中的configuration和modeling为Qwen2idae对应的文件,就可以成功复现训练吗?
或者是否方便上传Qwen2idae系列的代码。

Thanks♪(・ω・)ノ

训练问题

作者你好,可以给一个训练命令示例吗?是基于deepspeed还是megatron进行训练的?

Train MOE Error (train_moe.py)

Dear @wuhy68

Thank you for releasing your work to the open-source community.

I was able to fine-tune your hywu/Camelidae-8x34B using the train_qlora.py script successfully on custom data.
However, the train_moe.py gives the following error :

Traceback (most recent call last):
  File "/home/sahal.mullappilly/Parameter-Efficient-MoE/train_moe.py", line 450, in <module>
    train()
  File "/home/sahal.mullappilly/Parameter-Efficient-MoE/train_moe.py", line 398, in train
    nn.init.kaiming_uniform_(p, a=math.sqrt(5))
  File "/home/sahal.mullappilly/miniconda3/envs/camelidae/lib/python3.9/site-packages/torch/nn/init.py", line 412, in kaiming_uniform_
    return tensor.uniform_(-bound, bound)
RuntimeError: "check_uniform_bounds" not implemented for 'Byte'
Traceback (most recent call last):
  File "/home/sahal.mullappilly/Parameter-Efficient-MoE/train_moe.py", line 450, in <module>
    train()
  File "/home/sahal.mullappilly/Parameter-Efficient-MoE/train_moe.py", line 398, in train
    nn.init.kaiming_uniform_(p, a=math.sqrt(5))
  File "/home/sahal.mullappilly/miniconda3/envs/camelidae/lib/python3.9/site-packages/torch/nn/init.py", line 412, in kaiming_uniform_
    return tensor.uniform_(-bound, bound)
RuntimeError: "check_uniform_bounds" not implemented for 'Byte'

https://github.com/wuhy68/Parameter-Efficient-MoE/blob/master/train_moe.py#L392

# Zero Init
for n, p in model.named_parameters():
    if "adapter_up" in n:
        nn.init.zeros_(p)
    if "adapter_down" in n:
        nn.init.kaiming_uniform_(p, a=math.sqrt(5))
    if "router" in n:
        try :
            nn.init.kaiming_uniform_(p, a=math.sqrt(5))
        except :
            print(n, type(p))

Running this code for debugging gave the following output : base_model.model.model.layers.0.mlp.moe_adapter.router.weight <class 'bitsandbytes.nn.modules.Params4bit'>

Please advice as to how I can resolve this issue. Thanks

关于进一步交流合作,和训练得到LORA权重的合并问题

感谢您之前的答疑解惑!
我们和其它几个中科院院所共同标注了不同领域数据,基于Qlora脚本训练了8x34B模型,得到了相应LORA权重
在期望进一步交流的同时,有一个问题想向您请教:

对于PESC方法,如果想得到合并权重后的模型,是直接将lora权重与Yi-34B的原权重合并,还是在合并的过程中需要一些额外参数设置?

因为之前大多是基于Yi原权重训练的,请问能否指教如何修改,或者分享合并QLora训练脚本生成权重的代码,从而更契合这个训练方法。

之前使用的Merge代码:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
import torch

def merge_lora_to_base_model():
    model_name_or_path = '01ai/Yi-34B'
    adapter_name_or_path = 'path_to_Qlora_directory'
    save_path = 'path_to_merge_directory'
    config = AutoConfig.from_pretrained(model_name_or_path)
    tokenizer = AutoTokenizer.from_pretrained(
        adapter_name_or_path,
        trust_remote_code=True,
        # llama不支持fast
        use_fast=False if config.model_type == 'llama' else True
    )
    model = AutoModelForCausalLM.from_pretrained(
        model_name_or_path,
        trust_remote_code=True,
        low_cpu_mem_usage=True,
        torch_dtype=torch.float16,
        # device_map='auto',
        device_map={'': 'cpu'}
    )
    model = PeftModel.from_pretrained(model, adapter_name_or_path, device_map={'': 'cpu'})
    model = model.merge_and_unload()

    tokenizer.save_pretrained(save_path)
    model.save_pretrained(save_path)


if __name__ == '__main__':
    merge_lora_to_base_model()

感谢百忙中抽出时间解惑,祝新年快乐!
我们目前在中科院国家天文台进行科学垂类模型训练,感觉各院所对应不同领域和PESC基于不同领域数据进行MOE训练目标相近,期待进一步交流:-)
微信:Astro_YuYang

关于训练脚本的问题

感谢分享,很有启发性的工作!
有一个问题想向您请教:

请问脚本中pretrained_path= ,填写的是LLAMA/Yi模型[01-ai/Yi-34B-Chat]的路径,实现的是通过训练产生MOE模型吗?
还是填写例如hywu/Camelidae-8x34B已经构造的MOE模型,通过全量(train_moe)或Qlora(train_qlora)来实现对已有模型的微调?

感谢百忙中抽出时间解惑

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.