yeonwoosung / pytorch_mixture-of-experts Goto Github PK

26.0 3.0 1.0 10 KB

PyTorch implementation of moe, which stands for mixture of experts

Python 100.00%

pytorch_mixture-of-experts's Issues

Do training and inference of MoE share the same dispatching method?

While MoE training typically uses a fixed capacity to distribute tokens evenly across all experts, my understanding is that inference involves activating experts based on predicted relevance via a softmax gate. However, your implementation seems to lack this differentiation between training and inference.

This MoE is not useful.

I try to change number of experts, but i find it dose not work well no matter what number experts i set.

For example, when n=10, the acc is 46% after 100 epochs. when n=3, the acc is 47% after 100 epochs. when n=1, the acc is 49% after 100 epochs.
So I want ask if the code is wrong?

Recommend Projects

yeonwoosung / pytorch_mixture-of-experts Goto Github PK

pytorch_mixture-of-experts's Issues

Do training and inference of MoE share the same dispatching method?

This MoE is not useful.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent