Coder Social home page Coder Social logo

explorerfreda / structured-self-attentive-sentence-embedding Goto Github PK

View Code? Open in Web Editor NEW
432.0 432.0 97.0 28 KB

An open-source implementation of the paper ``A Structured Self-Attentive Sentence Embedding'' (Lin et al., ICLR 2017).

License: GNU General Public License v3.0

Python 100.00%

structured-self-attentive-sentence-embedding's People

Contributors

explorerfreda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

structured-self-attentive-sentence-embedding's Issues

This is Weired

I think lstm hidden size is (layer, token, hidden), but your model is (layer, batch, hidden) is this right?

and also, i think input_data is dataXtoken, but your`s dataXtokens[n] (n is 0~token size)

finally, you use lstm output, not hidden output, is that right?

Penalty Term Frobenius Norm Squared

def Frobenius(mat):
    size = mat.size()
    if len(size) == 3:  # batched matrix
        ret = (torch.sum(torch.sum((mat ** 2), 1), 2).squeeze() + 1e-10) ** 0.5
        return torch.sum(ret) / size[0]
    else:
        raise Exception('matrix for computing Frobenius norm should be with 3 dims')

In the code above, the Frobenius Form of the Matrix is calculated as ret, and averaged over batch dimension. However, in the original paper, the norm is squared as the penalty term. Is it intended? Or It does not matter too much I wonder. Thanks!

About GLOVE model

Recently, I have use torchtext to get the glove model, By this module I got the dictionary that maps word to index and the embedding matrix (shape word_count * dim, torch.FloatTensor), so to create the file which can be used in train.py, I write my code like this:

t=(dictionary, embedding matrix, dim)
torch.save(t, mypath/glove.pt)

Is the file glove.pt in the right format that asked in your program?

pretrain model

hello, can I get your pretrained model, because I add this module to solve another task, but it can't be trained.
Thank you!

word vectors, visualizing attention

How do I obtain the glove vectors in the right format? I downloaded the pretrained vectors from https://nlp.stanford.edu/projects/glove/ but it's not clear how to convert them to the expected format.

I tried running the code without word vectors (is this supposed to work?) but I get an exception:

/home/andreas/src/Structured-Self-Attentive-Sentence-Embedding/train.py in train(epoch_number)
     79         total_pure_loss += loss.data
     80
---> 81         if attention:  # add penalization term
     82             attentionT = torch.transpose(attention, 1, 2).contiguous()
     83             extra_loss = Frobenius(torch.bmm(attention, attentionT) - I[:attention.size(0)])

/home/andreas/.local/lib/python2.7/site-packages/torch/autograd/variable.pyc in __bool__(self)
    121             return False
    122         raise RuntimeError("bool value of Variable objects containing non-empty " +
--> 123                            torch.typename(self.data) + " is ambiguous")
    124
    125     __nonzero__ = __bool__

RuntimeError: bool value of Variable objects containing non-empty torch.FloatTensor is ambiguous

(If I replace this condition with False the code works).

Lastly, how could I obtain the kind of visualizations of attention as in the paper?

About the dimension of input to bilstm

In line 43-44 of models.py,
emb = self.drop(self.encoder(inp))
outp = self.bilstm(emb, hidden)[0]
It directly supplies the embedding obtained from self.encoder as the input to self.bilstm.
However, the dimension of emb is (batch_size * seq_len * embedding_dim),
while the dimension of input to lstm is (seq_len * batch_size * embedding_dim).
Are there any problems? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.