Coder Social home page Coder Social logo

replicating-bogdanova-et-al.-2015-duplicate-question-detection's People

Contributors

joaoantonioverdade avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

replicating-bogdanova-et-al.-2015-duplicate-question-detection's Issues

Replication on Quora dataset?

Hi! I am trying to run an experiment on Quora dataset. I am using the dataset split provided by: https://github.com/zhiguowang/BiMPM and created a quora.w2v file similarly to askubuntu.w2v and meta.w2v. I got the following error:

Using Theano backend.
INFO:Reading training sentence pairs from data/quora/train.tsv:
/ 298204 Elapsed Time: 0:10:34 /home/andrada.pumnea/anaconda3/lib/python3.6/site-packages/bs4/init.py:219: UserWarning: "b'.'" looks like a filename, not markup. You shouldprobably open this file and pass the filehandle intoBeautiful Soup.
'Beautiful Soup.' % markup)
| 384347 Elapsed Time: 0:13:40
INFO:...read 384348 pairs in 820.31 seconds.
INFO:...class distribution: 0 = 245042 (63.8%) | 1 = 139306 (36.2%)
INFO:Reading validation sentence pairs from data/quora/dev.tsv:
| 9999 Elapsed Time: 0:00:21
INFO:...read 10000 pairs in 21.21 seconds.
INFO:...class distribution: 0 = 5000 (50.0%) | 1 = 5000 (50.0%)
INFO:Reading testing sentence pairs from data/quora/test.tsv:
| 9999 Elapsed Time: 0:00:21
INFO:...read 10000 pairs in 21.26 seconds.
INFO:...class distribution: 0 = 5000 (50.0%) | 1 = 5000 (50.0%)
INFO:Vectorizing data:
INFO:...fitted tokenizer in 14.60 seconds;
INFO:...found 103831 unique tokens;
INFO:Load embeddings from models/quora2.w2v:
INFO:...read 36111 word embeddings in 2.82 seconds;
INFO:...created embedding matrix with shape (103832, 200);
INFO:...cached matrix in file models/quora2.w2v.min.cache.npy.
INFO:Creating CNN model:
INFO:...model created.
INFO:Compiling model:
INFO:...model 0105d13fe81945018824e64905d8f7ad compiled with optimizer: <keras.optimizers.SGD object at 0x7fd9dd23cef0>, lr (sgd-only): 0.005, loss: mse.
Model summary:


Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) (None, None) 0


input_2 (InputLayer) (None, None) 0


embedding_1 (Embedding) (None, None, 200) 20766400 input_1[0][0]
input_2[0][0]


convolution1d_1 (Convolution1D) (None, None, 300) 180300 embedding_1[0][0]
embedding_1[1][0]


globalmaxpooling1d_1 (GlobalMaxPo(None, 300) 0 convolution1d_1[0][0]
convolution1d_1[1][0]


activation_1 (Activation) (None, 300) 0 globalmaxpooling1d_1[0][0]
globalmaxpooling1d_1[1][0]


merge_1 (Merge) (None, 1) 0 activation_1[0][0]
activation_1[1][0]

Total params: 20946700


INFO:Train on 384348 samples, validate on 10000 samples
INFO:Epoch 1/1
2% (11127 of 384348) |### | Elapsed Time: 0:23:50 ETA: 13:16:51
Parameter 8 to routine SGEMM NTCSGEMV SGER was incorrect
Floating point exception (core dumped)

I am using Ubuntu 16.04.3.

Any idea why it happened and how it can be fixed?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.