Coder Social home page Coder Social logo

abcnn's Introduction

ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs

[Update]: Someone has reported me that the problem of a loss being 'nan' can be attributed to tf.sqrt function which outpus 'nan' when its input is very small or negative. Therefore, I recommend you modify tf.sqrt functions adequately if you have in the trouble.

[Warning]: Some people have reported that there are some bugs that losses go to NaN in case of ABCNN-2 and 3. (I don't know the exact condition where the bugs appear.) Unfortunately, I have no plan to revise the code around the corner. Please be careful when using the code or please send me the pull requests when your revised version of the code works properly. Thanks.

This is the implementation of ABCNN, which is proposed by Wenpeng Yin et al., on Tensorflow.
It includes all 4 models below:

  • BCNN

    MAP MRR
    BCNN(1 layer) Results 0.6660 0.6813
    Baseline 0.6629 0.6813
    BCNN(2 layer) Results 0.6762 0.6871
    Baseline 0.6593 0.6738
  • ABCNN-1

    MAP MRR
    ABCNN-1(1 layer) Results 0.6652 0.6755
    Baseline 0.6810 0.6979
    ABCNN-1(2 layer) Results 0.6702 0.6838
    Baseline 0.6855 0.7023
  • ABCNN-2

    MAP MRR
    ABCNN-2(1 layer) Results 0.6660 0.6813
    Baseline 0.6885 0.7023
    ABCNN-2(2 layer) Results ------ ------
    Baseline 0.6879 0.7068
  • ABCNN-3

    MAP MRR
    ABCNN-3(1 layer) Results 0.6612 0.6682
    Baseline 0.6914 0.7127
    ABCNN-3(2 layer) Results 0.6571 0.6722
    Baseline 0.6921 0.7105

Note:

  • Implementation is now only focusing on AS task with WikiQA corpus. (I originally tried to deal with PI task with MSRP(Microsoft Research Paraphrase) corpus but it seems that model doesn't work without external features classifier requires.)
  • My code has verified that BCNN works fine as the authors proposed. (watched even better results than the paper's.)
  • In the case of ABCNNs, results are inferior to ones in the paper but somewhat competitive. Careful hyperparameter configuration and detailed re-examination may help to achieve optimized results.
  • I doubt that there are some bugs on ABCNNs(especially ABCNN-2 which has 2 conv layers) and will keep watching codes. Please be careful when using the results.

Specification

  • preprocess.py: preprocess (training, test) data and import word2vec to use.
  • train.py: train a model with configs.
  • test.py: test the trained model.
  • ABCNN.py: Implementation of ABCNN models.
  • show.py: pyplot codes for test results.
  • utils.py: common util functions.
  • MSRP_Corpus: MSRP corpus for PI.
  • WikiQA_Corpus: WikiQA corpus for AS.
  • models: saved models available on Tensorflow.
  • experiments: test results on AS tasks.

Development Environment

  • OS: Windows 10 (64 bit)
  • Language: Python 3.5.3
  • CPU: Intel Xeon CPU E3-1231 v3 3.4 GHz
  • RAM: 16GB
  • GPU support: GTX 970
  • Libraries:
    • tensorflow 1.2.1
    • numpy 1.12.1
    • gensim 1.0.1
    • NLTK 3.2.2
    • scikit-learn 0.18.1
    • matplotlib 2.0.0

Requirements

This model is based on pre-trained Word2vec(GoogleNews-vectors-negative300.bin) by T.Mikolov et al.
You should download this file and place it in the root folder.

Execution

(training): python train.py --lr=0.08 --ws=4 --l2_reg=0.0004 --epoch=20 --batch_size=64 --model_type=BCNN --num_layers=2 --data_type=WikiQA

Paramters
--lr: learning rate
--ws: window_size
--l2_reg: l2_reg modifier
--epoch: epoch
--batch_size: batch size
--model_type: model type
--num_layers: number of convolution layers
--data_type: MSRP or WikiQA data

(test): python test.py --ws=4 --l2_reg=0.0004 --epoch=20 --max_len=40 --model_type=BCNN --num_layers=2 --data_type=WikiQA --classifier=LR

Paramters
--ws: window_size
--l2_reg: l2_reg modifier
--epoch: epoch
--max_len: max sentence length
--model_type: model type
--num_layers: number of convolution layers
--data_type: MSRP or WikiQA data
--classifier: Final layout classifier(model, LR, SVM)

MISC.

abcnn's People

Contributors

galsang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.