Coder Social home page Coder Social logo

Comments (6)

wuhy68 avatar wuhy68 commented on July 21, 2024

我们已经更新了merge的代码,你可以运行train_moe_instruct脚本使用pesc进行训练,然后使用merge的代码将LoRA权重合并到Camelidae模型中。

from parameter-efficient-moe.

Yu-Yang-Li avatar Yu-Yang-Li commented on July 21, 2024

我们已经更新了merge的代码,你可以运行train_moe_instruct脚本使用pesc进行训练,然后使用merge的代码将LoRA权重合并到Camelidae模型中。

感谢解答!
采用train_moe_instruct重新训练,使用merge的代码将LoRA权重合并时报错。

    # Adjust to your corresponding path
    model_path = "/root/.cache/modelscope/hub/01ai/Yi-34B"
    peft_path="/root/LLM/Parameter-Efficient-MoE/train_scripts/path_to_output_directory/checkpoint-3468/adapter_model/"
    moe_path="/root/LLM/Parameter-Efficient-MoE/train_scripts/path_to_output_directory/checkpoint-3468/moe_model.bin"
    save_path = "/root/LLM/Parameter-Efficient-MoE/merge"
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
        size mismatch for model.layers.0.mlp.moe_adapter.experts.expert_0.adapter_down.weight: copying a param with shape torch.Size([64, 7168]) from checkpoint, the shape in current model is torch.Size([512, 7168]).
        size mismatch for model.layers.0.mlp.moe_adapter.experts.expert_0.adapter_up.weight: copying a param with shape torch.Size([7168, 64]) from checkpoint, the shape in current model is torch.Size([7168, 512])
        ......

请问如何修改呢?

from parameter-efficient-moe.

Yu-Yang-Li avatar Yu-Yang-Li commented on July 21, 2024

我进一步限制了匹配的权重,成功运行,不知道这样是否影响模型效果?
修改前:

    weights_dict = {}
    for k, v in moe_weights.items():
        new_k = k.replace("base_model.model.", "") if "base_model.model." in k else k
        weights_dict[new_k] = v

    model.load_state_dict(weights_dict, strict=False)

修改后:

    # 获取当前模型的状态字典
    model_dict = model.state_dict()

    # 创建一个新的字典,只包含那些在预训练权重和当前模型中具有相同形状的权重
    pretrained_dict = {k: v for k, v in moe_weights.items() if k in model_dict and model_dict[k].shape == v.shape}

    # 更新模型的状态字典
    model_dict.update(pretrained_dict)

    # 使用更新后的状态字典加载模型
    model.load_state_dict(model_dict)

from parameter-efficient-moe.

wuhy68 avatar wuhy68 commented on July 21, 2024

你需要根据训练模型的model_config来修改merge代码里面的config。

from parameter-efficient-moe.

Yu-Yang-Li avatar Yu-Yang-Li commented on July 21, 2024

你需要根据训练模型的model_config来修改merge代码里面的config。

请问修改merge中的config指的是这一部分吗?

    model_config.moe_dtype = "bfloat16"
    model_config.adapter_dim = 512
    model_config.topk = 2
    model_config.moe_scaling = 0.25
    model_config.num_experts = 8
    model_config.output_router_logits = False

我训练的结构也是8x34B,按train_moe脚本训练得到,这部分参数感觉没有问题吧 Thanks♪(・ω・)ノ

from parameter-efficient-moe.

wuhy68 avatar wuhy68 commented on July 21, 2024

train_moe.py里面的model_config设置和merge代码里面的model_config有些不一致,需要将merge代码里adapter_dim,topk,moe_scaling和num_experts修改为对应的训练时设置的数值。

from parameter-efficient-moe.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.