感谢您之前的答疑解惑！我们和其它几个中科院院所共同标注了不同领域数据，基于Qlora脚本训练了8x34B模型，得到了相应LORA权重在期望进一步交流的同

我进一步限制了匹配的权重，成功运行，不知道这样是否影响模型效果？修改前： <div class="highlight highlight-source-p

你需要根据训练模型的model_config来修改merge代码里面的config。 <p dir="auto

关于进一步交流合作，和训练得到LORA权重的合并问题 about parameter-efficient-moe HOT 6 CLOSED

wuhy68 commented on July 21, 2024

关于进一步交流合作，和训练得到LORA权重的合并问题

from parameter-efficient-moe.

Comments (6)

wuhy68 commented on July 21, 2024

我们已经更新了merge的代码，你可以运行train_moe_instruct脚本使用pesc进行训练，然后使用merge的代码将LoRA权重合并到Camelidae模型中。

from parameter-efficient-moe.

Yu-Yang-Li commented on July 21, 2024

我们已经更新了merge的代码，你可以运行train_moe_instruct脚本使用pesc进行训练，然后使用merge的代码将LoRA权重合并到Camelidae模型中。

感谢解答！
采用train_moe_instruct重新训练，使用merge的代码将LoRA权重合并时报错。

    # Adjust to your corresponding path
    model_path = "/root/.cache/modelscope/hub/01ai/Yi-34B"
    peft_path="/root/LLM/Parameter-Efficient-MoE/train_scripts/path_to_output_directory/checkpoint-3468/adapter_model/"
    moe_path="/root/LLM/Parameter-Efficient-MoE/train_scripts/path_to_output_directory/checkpoint-3468/moe_model.bin"
    save_path = "/root/LLM/Parameter-Efficient-MoE/merge"

RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
        size mismatch for model.layers.0.mlp.moe_adapter.experts.expert_0.adapter_down.weight: copying a param with shape torch.Size([64, 7168]) from checkpoint, the shape in current model is torch.Size([512, 7168]).
        size mismatch for model.layers.0.mlp.moe_adapter.experts.expert_0.adapter_up.weight: copying a param with shape torch.Size([7168, 64]) from checkpoint, the shape in current model is torch.Size([7168, 512])
        ......

请问如何修改呢？

from parameter-efficient-moe.

Yu-Yang-Li commented on July 21, 2024

我进一步限制了匹配的权重，成功运行，不知道这样是否影响模型效果？
修改前：

    weights_dict = {}
    for k, v in moe_weights.items():
        new_k = k.replace("base_model.model.", "") if "base_model.model." in k else k
        weights_dict[new_k] = v

    model.load_state_dict(weights_dict, strict=False)

修改后：

    # 获取当前模型的状态字典
    model_dict = model.state_dict()

    # 创建一个新的字典，只包含那些在预训练权重和当前模型中具有相同形状的权重
    pretrained_dict = {k: v for k, v in moe_weights.items() if k in model_dict and model_dict[k].shape == v.shape}

    # 更新模型的状态字典
    model_dict.update(pretrained_dict)

    # 使用更新后的状态字典加载模型
    model.load_state_dict(model_dict)

from parameter-efficient-moe.

wuhy68 commented on July 21, 2024

你需要根据训练模型的model_config来修改merge代码里面的config。

from parameter-efficient-moe.

Yu-Yang-Li commented on July 21, 2024

你需要根据训练模型的model_config来修改merge代码里面的config。

请问修改merge中的config指的是这一部分吗？

    model_config.moe_dtype = "bfloat16"
    model_config.adapter_dim = 512
    model_config.topk = 2
    model_config.moe_scaling = 0.25
    model_config.num_experts = 8
    model_config.output_router_logits = False

我训练的结构也是8x34B，按train_moe脚本训练得到，这部分参数感觉没有问题吧 Thanks♪(･ω･)ﾉ

from parameter-efficient-moe.

wuhy68 commented on July 21, 2024

train_moe.py里面的model_config设置和merge代码里面的model_config有些不一致，需要将merge代码里adapter_dim，topk，moe_scaling和num_experts修改为对应的训练时设置的数值。

from parameter-efficient-moe.

关于进一步交流合作，和训练得到LORA权重的合并问题 about parameter-efficient-moe HOT 6 CLOSED

Comments (6)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent