Comments (6)
我们已经更新了merge的代码,你可以运行train_moe_instruct脚本使用pesc进行训练,然后使用merge的代码将LoRA权重合并到Camelidae模型中。
from parameter-efficient-moe.
我们已经更新了merge的代码,你可以运行train_moe_instruct脚本使用pesc进行训练,然后使用merge的代码将LoRA权重合并到Camelidae模型中。
感谢解答!
采用train_moe_instruct重新训练,使用merge的代码将LoRA权重合并时报错。
# Adjust to your corresponding path
model_path = "/root/.cache/modelscope/hub/01ai/Yi-34B"
peft_path="/root/LLM/Parameter-Efficient-MoE/train_scripts/path_to_output_directory/checkpoint-3468/adapter_model/"
moe_path="/root/LLM/Parameter-Efficient-MoE/train_scripts/path_to_output_directory/checkpoint-3468/moe_model.bin"
save_path = "/root/LLM/Parameter-Efficient-MoE/merge"
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.layers.0.mlp.moe_adapter.experts.expert_0.adapter_down.weight: copying a param with shape torch.Size([64, 7168]) from checkpoint, the shape in current model is torch.Size([512, 7168]).
size mismatch for model.layers.0.mlp.moe_adapter.experts.expert_0.adapter_up.weight: copying a param with shape torch.Size([7168, 64]) from checkpoint, the shape in current model is torch.Size([7168, 512])
......
请问如何修改呢?
from parameter-efficient-moe.
我进一步限制了匹配的权重,成功运行,不知道这样是否影响模型效果?
修改前:
weights_dict = {}
for k, v in moe_weights.items():
new_k = k.replace("base_model.model.", "") if "base_model.model." in k else k
weights_dict[new_k] = v
model.load_state_dict(weights_dict, strict=False)
修改后:
# 获取当前模型的状态字典
model_dict = model.state_dict()
# 创建一个新的字典,只包含那些在预训练权重和当前模型中具有相同形状的权重
pretrained_dict = {k: v for k, v in moe_weights.items() if k in model_dict and model_dict[k].shape == v.shape}
# 更新模型的状态字典
model_dict.update(pretrained_dict)
# 使用更新后的状态字典加载模型
model.load_state_dict(model_dict)
from parameter-efficient-moe.
你需要根据训练模型的model_config来修改merge代码里面的config。
from parameter-efficient-moe.
你需要根据训练模型的model_config来修改merge代码里面的config。
请问修改merge中的config指的是这一部分吗?
model_config.moe_dtype = "bfloat16"
model_config.adapter_dim = 512
model_config.topk = 2
model_config.moe_scaling = 0.25
model_config.num_experts = 8
model_config.output_router_logits = False
我训练的结构也是8x34B,按train_moe脚本训练得到,这部分参数感觉没有问题吧 Thanks♪(・ω・)ノ
from parameter-efficient-moe.
train_moe.py里面的model_config设置和merge代码里面的model_config有些不一致,需要将merge代码里adapter_dim,topk,moe_scaling和num_experts修改为对应的训练时设置的数值。
from parameter-efficient-moe.
Related Issues (8)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parameter-efficient-moe.