explorerfreda / structured-self-attentive-sentence-embedding Goto Github PK

View Code? Open in Web Editor NEW

432.0 432.0 97.0 28 KB

An open-source implementation of the paper ``A Structured Self-Attentive Sentence Embedding'' (Lin et al., ICLR 2017).

License: GNU General Public License v3.0

Python 100.00%

structured-self-attentive-sentence-embedding's People

Contributors

Stargazers

Watchers

Forkers

hantek ml-lab mydp2017 lzamparo cclauss benjamesbabala liviust ryfan-rs lastdawnsu nininininini jianzhengming patrickket joey-liu sungjinlees weilheim zqma2 cosecant-csc tjucxq xindadade little1tow bgfurfeature ydjbuaa aust-hansen calculatedcontent aiedward shijx12 damons shiruipan sumehta webblearning jind11 hanst helenailse 3dmm-icme2023 yutaoxxx yucoian rshivansh haif-liu moherx zzw1123 nlpdaij lrpopeyou mchoi5041 zhongyunuestc afcarl jgcoon auserj wurentidai simzhou christinaliang gdsttian hatleon huhuigou shdut nwpuit hhh920406 xiongshufeng slidersun lzjtt2017 chenyangh zjms jinsongpan jkhlot mrmrfan indexfziq rogosome zzky snailhk bellamn fujiyuu75 nateanl albertwy vhientran erikbeerepoot useric aflyhat legendtianjin fendaq hyperloco dengziheng sudipta90 fylin1003 aestiff swan815 jambran lmxhappy moutonliu jnupython htfhxx zengchang233 lynnnnnnnnn wazenmai vanillaicecream0627 yunfnegli sandy4321

structured-self-attentive-sentence-embedding's Issues

This is Weired

I think lstm hidden size is (layer, token, hidden), but your model is (layer, batch, hidden) is this right?

and also, i think input_data is dataXtoken, but your`s dataXtokens[n] (n is 0~token size)

finally, you use lstm output, not hidden output, is that right?

bool value of Tensor with more than one value is ambiguous

Structured-Self-Attentive-Sentence-Embedding/train.py

Line 81 in 8e0ace5

if attention: # add penalization term

if attention: will raise the error as follow:
RuntimeError: bool value of Tensor with more than one value is ambiguous
Can anyone help me? Please.

Penalty Term Frobenius Norm Squared

def Frobenius(mat):
    size = mat.size()
    if len(size) == 3:  # batched matrix
        ret = (torch.sum(torch.sum((mat ** 2), 1), 2).squeeze() + 1e-10) ** 0.5
        return torch.sum(ret) / size[0]
    else:
        raise Exception('matrix for computing Frobenius norm should be with 3 dims')

In the code above, the Frobenius Form of the Matrix is calculated as ret, and averaged over batch dimension. However, in the original paper, the norm is squared as the penalty term. Is it intended? Or It does not matter too much I wonder. Thanks!

Why is attention applied on the outputs instead of hidden states?

As mentioned in the paper, the attention is to be applied on the hidden states of the LSTM, but in the code, it is done on the outputs instead of hidden states. Why is it like that ?

About GLOVE model

Recently, I have use torchtext to get the glove model, By this module I got the dictionary that maps word to index and the embedding matrix (shape word_count * dim, torch.FloatTensor), so to create the file which can be used in train.py, I write my code like this:

t=(dictionary, embedding matrix, dim)
torch.save(t, mypath/glove.pt)

Is the file glove.pt in the right format that asked in your program?

pretrain model

hello, can I get your pretrained model, because I add this module to solve another task, but it can't be trained.
Thank you!

word vectors, visualizing attention

How do I obtain the glove vectors in the right format? I downloaded the pretrained vectors from https://nlp.stanford.edu/projects/glove/ but it's not clear how to convert them to the expected format.

I tried running the code without word vectors (is this supposed to work?) but I get an exception:

/home/andreas/src/Structured-Self-Attentive-Sentence-Embedding/train.py in train(epoch_number)
     79         total_pure_loss += loss.data
     80
---> 81         if attention:  # add penalization term
     82             attentionT = torch.transpose(attention, 1, 2).contiguous()
     83             extra_loss = Frobenius(torch.bmm(attention, attentionT) - I[:attention.size(0)])

/home/andreas/.local/lib/python2.7/site-packages/torch/autograd/variable.pyc in __bool__(self)
    121             return False
    122         raise RuntimeError("bool value of Variable objects containing non-empty " +
--> 123                            torch.typename(self.data) + " is ambiguous")
    124
    125     __nonzero__ = __bool__

RuntimeError: bool value of Variable objects containing non-empty torch.FloatTensor is ambiguous

(If I replace this condition with False the code works).

Lastly, how could I obtain the kind of visualizations of attention as in the paper?

global pooling layer

Hello,

Thank you for your work.

l have a question related to your global pooling layer.

Is it here where it is implemented ?
https://github.com/ExplorerFreda/Structured-Self-Attentive-Sentence-Embedding/blob/master/models.py#L50

Thank you

About the dimension of input to bilstm

In line 43-44 of models.py,
emb = self.drop(self.encoder(inp))
outp = self.bilstm(emb, hidden)[0]
It directly supplies the embedding obtained from self.encoder as the input to self.bilstm.
However, the dimension of emb is (batch_size * seq_len * embedding_dim),
while the dimension of input to lstm is (seq_len * batch_size * embedding_dim).
Are there any problems? Thanks.

explorerfreda / structured-self-attentive-sentence-embedding Goto Github PK

structured-self-attentive-sentence-embedding's People

Contributors

Stargazers

Watchers

Forkers

structured-self-attentive-sentence-embedding's Issues

This is Weired

bool value of Tensor with more than one value is ambiguous

Model in Figure 1

Penalty Term Frobenius Norm Squared

Why is attention applied on the outputs instead of hidden states?

About GLOVE model

pretrain model

word vectors, visualizing attention

global pooling layer

About the dimension of input to bilstm

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent