Coder Social home page Coder Social logo

memn2n's Introduction

MemN2N

Implementation of End-To-End Memory Networks with sklearn-like interface using Tensorflow. Tasks are from the bAbl dataset.

MemN2N picture

Get Started

git clone [email protected]:domluna/memn2n.git

mkdir ./memn2n/data/
cd ./memn2n/data/
wget http://www.thespermwhale.com/jaseweston/babi/tasks_1-20_v1-2.tar.gz
tar xzvf ./tasks_1-20_v1-2.tar.gz

cd ../
python single.py

Examples

Running a single bAbI task

Running a joint model on all bAbI tasks

These files are also a good example of usage.

Requirements

  • tensorflow 1.0
  • scikit-learn 0.17.1
  • six 1.10.0

Single Task Results

For a task to pass it has to meet 95%+ testing accuracy. Measured on single tasks on the 1k data.

Pass: 1,4,12,15,20

Several other tasks have 80%+ testing accuracy.

Stochastic gradient descent optimizer was used with an annealed learning rate schedule as specified in Section 4.2 of End-To-End Memory Networks

The following params were used:

  • epochs: 100
  • hops: 3
  • embedding_size: 20
Task Training Accuracy Validation Accuracy Testing Accuracy
1 1.0 1.0 1.0
2 1.0 0.86 0.83
3 1.0 0.64 0.54
4 1.0 0.99 0.98
5 1.0 0.94 0.87
6 1.0 0.97 0.92
7 1.0 0.89 0.84
8 1.0 0.93 0.86
9 1.0 0.86 0.90
10 1.0 0.80 0.78
11 1.0 0.92 0.84
12 1.0 1.0 1.0
13 0.99 0.94 0.90
14 1.0 0.97 0.93
15 1.0 1.0 1.0
16 0.81 0.47 0.44
17 0.76 0.65 0.52
18 0.97 0.96 0.88
19 0.40 0.17 0.13
20 1.0 1.0 1.0

Joint Training Results

Pass: 1,6,9,10,12,13,15,20

Again stochastic gradient descent optimizer was used with an annealed learning rate schedule as specified in Section 4.2 of End-To-End Memory Networks

The following params were used:

  • epochs: 60
  • hops: 3
  • embedding_size: 40
Task Training Accuracy Validation Accuracy Testing Accuracy
1 1.0 0.99 0.999
2 1.0 0.84 0.849
3 0.99 0.72 0.715
4 0.96 0.86 0.851
5 1.0 0.92 0.865
6 1.0 0.97 0.964
7 0.96 0.87 0.851
8 0.99 0.89 0.898
9 0.99 0.96 0.96
10 1.0 0.96 0.928
11 1.0 0.98 0.93
12 1.0 0.98 0.982
13 0.99 0.98 0.976
14 1.0 0.81 0.877
15 1.0 1.0 0.983
16 0.64 0.45 0.44
17 0.77 0.64 0.547
18 0.85 0.71 0.586
19 0.24 0.07 0.104
20 1.0 1.0 0.996

Notes

Single task results are from 10 repeated trails of the single task model accross all 20 tasks with different random initializations. The performance of the model with the lowest validation accuracy for each task is shown in the table above.

Joint training results are from 10 repeated trails of the joint model accross all tasks. The performance of the single model whose validation accuracy passed the most tasks (>= 0.95) is shown in the table above (joint_scores_run2.csv). The scores from all 10 runs are located in the results/ directory.

memn2n's People

Contributors

akandykeller avatar domluna avatar iamaaditya avatar tobegit3hub avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

memn2n's Issues

Add tutorial to download data before running

I follow the instruction to run single.py but it fails. It would be better to add tutorial about how and where to download the data.

$ python ./single.py
Started Task: 1
Traceback (most recent call last):
  File "./single.py", line 32, in <module>
    train, test = load_task(FLAGS.data_dir, FLAGS.task_id)
  File "/home/tobe/code/memn2n/data_utils.py", line 14, in load_task
    files = os.listdir(data_dir)
OSError: [Errno 2] No such file or directory: 'data/tasks_1-20_v1-2/en/'

Change name to more general

Probably a good idea to change name so other models can be incorporated, maybe memory_models should make more sense?

Difference between code and paper

Hi, Thank you for your codes! It is very helpful.

I noticed a difference between your code and original paper. The paper uses embedding to get c for each story, and directly add o and u to get the input of the predict layer or the u for next layer in the case of multi-hop. In your code, c is the given the same value as m rather than otherwise recalculated. And o is dot producted with a matrix you called H before adding up with u. I am wondering why you do it this way? I haven't tested the difference. Will it influence the performance?

something wrong at nonlin

nonlinearity

            if self._nonlin:
                u_k = nonlin(u_k)

            u.append(u_k)

Unresolved reference nonlin,how to fix it

Found joint.py errors

n_train/20, n_val/20, and n_test/20 cause errors in python3.

I modified
n_train/20 -> n_train//20
n_val/20 -> n_val//20
n_test/20 -> n_test//20
and it works

Test

Should probably have a test somewhere!

Compare Results

Hello @domluna!
Thanks for your nice scripts. I have one question about this model. Do you know why some task results here are very different from the Facebook matlab one? (like task11,13,16) Is it because the initialization of model?
https://github.com/vinhkhuc/MemN2N-babi-python/tree/master/bechmarks
Thank you for your respond :)

Dialog tasks

Hi!

I have a question, can this model be used for the Dialog tasks?
My main concern is that Dialog tasks assume working in seq2seq mode, and I'm not sure if it's the same for the QA task.
Could you please provide some info on this?

Puzzled about the attention part

m_C = tf.reduce_sum(m_emb_C * self._encoding, 2)
c_temp = tf.transpose(m_C, [0, 2, 1])

Here in this part, the first line with reduce_sum should turn the matrix into 2-dimension, so I think it won't work for the transposition in the second line. I am not sure if I am getting something wrong

Separate memory from model?

Assuming I could define a custom gradient for the nil embedding, the memory a.k.a variables A, B, TA, and TB can be in a separate memory component.

The main benefit of this would be to more easily play around with different models around the memory.

To see memory slot probabilities

I am trying to see the memory slot probabilities(probabilities associated with different sentences) for a particular query. Is there a way to visualize them ? Please help.

Thanks,
Joe

Position Encoding

Hi domluna,
How did you get the equation in position_encoding? It seems different from the one in the paper, unless I made a silly algebra mistake...
Even then, is there an advantage in splitting out the equation into the way you wrote it? Some sort of optimization?

running joint.py throws an error

Traceback (most recent call last):
File "joint.py", line 121, in
for start in range(0, n_train, n_train/20):
TypeError: 'float' object cannot be interpreted as an integer

This shows up after a few runs. single.py runs fine. Any idea why this could happen?

The full log is:

(mem-tf) skc@Ultron:~/Projects/qa-mem/tf-memn2n$ python joint.py
Started Joint Model
/Users/skc/anaconda/envs/mem-tf/lib/python3.5/re.py:203: FutureWarning: split() requires a non-empty pattern match.
return _compile(pattern, flags).split(string, maxsplit)
Longest sentence length 11
Longest story length 228
Average story length 9
Training Size 18000
Validation Size 2000
Testing Size 20000
(18000, 50, 11) (2000, 50, 11) (20000, 50, 11)
(18000, 11) (2000, 11) (20000, 11)
(18000, 175) (2000, 175) (20000, 175)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
Traceback (most recent call last):
File "joint.py", line 121, in
for start in range(0, n_train, n_train/20):
TypeError: 'float' object cannot be interpreted as an integer

tokenize function code in data_utils.py is incorrect

with the test intention that

>>> tokenize('Bob dropped the apple. Where is the apple?')
    ['Bob', 'dropped', 'the', 'apple', '.', 'Where', 'is', 'the', 'apple', '?']

we should write like this:

def tokenize(sent):
    return [x for x in re.findall(r"\w+(?:'\w+)?|[^\w\s]", sent)]

Support for Ragged/Jagged arrays

On this line, it is mentioned there is not support for jagged arrays, but the new Tensorflow v2.1.0 has introduced RaggedTensor.
It would be nice if support for this feature can be provided in the current codebase.

fix 0 logits in input module

Currently, because the nil embedding is 0 (which is fine) and which we pad to a specified memory size, we tend to have a bunch of memories which are empty [0 0 ... 0]. The problem with this is we feed this into a softmax as is and exp(0) = 1. On the output the empty memories have a uniform probability. This is problematic because it alters the probabilities of non-empty memories.

So the solution is to add a largish negative number to empty memories before sotfmax is applied. Then the exp() of the value will be 0 or close enough.

This issue is particularly evident in task 4 where each story consists of 2 sentences. If we make the memory size large, say 50 (only 2 is needed) 2 things tend to occur:

  1. We converge at a much slower rate
  2. We get a worse error rate

An alternative solution would be make all batch-size 1 (at least at a low level, higher level API can make this nicer). This way the memory can be of any size since nothing in the underlying algorithm relies on the memory being a fixed size (at least I think this is the case, have to double check!).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.