Dear Juho, is it possible that the implementation of the MAB diverge

Dear Jingweiz, thanks for your reply! I would have identified <code class="not

Hi, thanks for your interest! Multiplying W_0 after the concat

MAB Implementation diverges from Paper about set_transformer HOT 6 OPEN

jlko commented on June 28, 2024 4

MAB Implementation diverges from Paper

from set_transformer.

Comments (6)

jingweiz commented on June 28, 2024

Hi Jannik,
I think here fc_o (in the 5th line of the code you pasted) is the W_O in the paper, what do you think?

from set_transformer.

jlko commented on June 28, 2024

Dear Jingweiz,
thanks for your reply!
I would have identified fc_o with the rFF(H) of the MAB and not with W_O.

from set_transformer.

juho-lee commented on June 28, 2024

Hi, thanks for your interest!

Multiplying W_0 after the concat and 2) multiplying W to the query to get Q and then split-attend-concat, in essence, makes a small difference (one of them is a restricted version of another). For the paper, I followed the description in the original transformer paper, and for the code, I chose the current form following the code available for original transfomer (also, it gives a cleaner code). But they don't make a big empirical difference.

from set_transformer.

jlko commented on June 28, 2024

Hey Juho Lee!
Thanks for your reply.

It makes sense that this does not give a big empirical difference. I just wanted to check if I missed something.

And LayerNorm(X + Multihead(X, Y, Y ; ω)) in the paper, should probably be something like LayerNorm(W_q X + Multihead(X, Y, Y ; ω)), correct?

from set_transformer.

jingweiz commented on June 28, 2024

Dear Jingweiz,
thanks for your reply!
I would have identified fc_o with the rFF(H) of the MAB and not with W_O.

Oh right exactly, I got messed up, thanks!

from set_transformer.

npielawski commented on June 28, 2024

I have a follow up question linked to this topic.

In the paper a row-wise FF block is used for the pooling, and unlike the rFF in the MAB, the rFF in the pooling doesn't have an activation function. Should the PMA rFF have an activation function or not?

from set_transformer.

Recommend Projects

MAB Implementation diverges from Paper about set_transformer HOT 6 OPEN

Comments (6)

Related Issues (19)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent