domluna / memn2n Goto Github PK

End-To-End Memory Network using Tensorflow

License: MIT License

Python 100.00%

memn2n's Introduction

MemN2N

Implementation of End-To-End Memory Networks with sklearn-like interface using Tensorflow. Tasks are from the bAbl dataset.

Get Started

git clone [email protected]:domluna/memn2n.git

mkdir ./memn2n/data/
cd ./memn2n/data/
wget http://www.thespermwhale.com/jaseweston/babi/tasks_1-20_v1-2.tar.gz
tar xzvf ./tasks_1-20_v1-2.tar.gz

cd ../
python single.py

Examples

Running a single bAbI task

Running a joint model on all bAbI tasks

These files are also a good example of usage.

Requirements

tensorflow 1.0
scikit-learn 0.17.1
six 1.10.0

Single Task Results

For a task to pass it has to meet 95%+ testing accuracy. Measured on single tasks on the 1k data.

Pass: 1,4,12,15,20

Several other tasks have 80%+ testing accuracy.

Stochastic gradient descent optimizer was used with an annealed learning rate schedule as specified in Section 4.2 of End-To-End Memory Networks

The following params were used:

epochs: 100
hops: 3
embedding_size: 20

Task	Training Accuracy	Validation Accuracy	Testing Accuracy
1	1.0	1.0	1.0
2	1.0	0.86	0.83
3	1.0	0.64	0.54
4	1.0	0.99	0.98
5	1.0	0.94	0.87
6	1.0	0.97	0.92
7	1.0	0.89	0.84
8	1.0	0.93	0.86
9	1.0	0.86	0.90
10	1.0	0.80	0.78
11	1.0	0.92	0.84
12	1.0	1.0	1.0
13	0.99	0.94	0.90
14	1.0	0.97	0.93
15	1.0	1.0	1.0
16	0.81	0.47	0.44
17	0.76	0.65	0.52
18	0.97	0.96	0.88
19	0.40	0.17	0.13
20	1.0	1.0	1.0

Joint Training Results

Pass: 1,6,9,10,12,13,15,20

Again stochastic gradient descent optimizer was used with an annealed learning rate schedule as specified in Section 4.2 of End-To-End Memory Networks

The following params were used:

epochs: 60
hops: 3
embedding_size: 40

Task	Training Accuracy	Validation Accuracy	Testing Accuracy
1	1.0	0.99	0.999
2	1.0	0.84	0.849
3	0.99	0.72	0.715
4	0.96	0.86	0.851
5	1.0	0.92	0.865
6	1.0	0.97	0.964
7	0.96	0.87	0.851
8	0.99	0.89	0.898
9	0.99	0.96	0.96
10	1.0	0.96	0.928
11	1.0	0.98	0.93
12	1.0	0.98	0.982
13	0.99	0.98	0.976
14	1.0	0.81	0.877
15	1.0	1.0	0.983
16	0.64	0.45	0.44
17	0.77	0.64	0.547
18	0.85	0.71	0.586
19	0.24	0.07	0.104
20	1.0	1.0	0.996

Notes

Single task results are from 10 repeated trails of the single task model accross all 20 tasks with different random initializations. The performance of the model with the lowest validation accuracy for each task is shown in the table above.

Joint training results are from 10 repeated trails of the joint model accross all tasks. The performance of the single model whose validation accuracy passed the most tasks (>= 0.95) is shown in the table above (joint_scores_run2.csv). The scores from all 10 runs are located in the results/ directory.

memn2n's People

Contributors

Stargazers

Watchers

Forkers

dapeng2018 midinas ilyaeck halofanx jeffzhengye hitluobin priyank87 xiuxiulu tpnguyen pombredanne xiaoda99 iamaaditya nanophilian vyraun guillaumebellec ml-ai-nlp-ir terryhung tifoit ishalyminov navidre elnaaz collawolley pku-wuwei jhonbob akandykeller ml-lab lucasmahieu fuhuamosi gumplus yangliuy pcgreat judelee19 minghui nicknign henryslzhao shareeff iij0 kunghunglu zhanninggao romxz zhangjiulong mindis iamukasa hucruz akirannz fanyangxyz sakhawatsumit tandychao jiqin822 jojonki veselovmark junkwhinger lillianfly liufly billpei s4sarath raghavendranpm dhruvramani chaoweiwu garretthoffman xiaojudou songyf lechenhao shivasj jiths facingwaller silasdudu jeanxi friendshipity harshadeepg caijunyu hehuihui1994 tanyazhao imdahmash monireh2 heartburing ninatian98369 lgdkobe24 paulovpcotta fengxhao xvshiting niudong1001 cttzcy chenglongchen joon-park92 afcarl binkmust binliang-nlp jufangshen wilesdai shubhampachori12110095 cstghitpku haoyusoong times125 baobaobaobaobao bwxing windfivefloor limohanlmh zoulegeyang joseph1314

memn2n's Issues

Add tutorial to download data before running

I follow the instruction to run single.py but it fails. It would be better to add tutorial about how and where to download the data.

$ python ./single.py
Started Task: 1
Traceback (most recent call last):
  File "./single.py", line 32, in <module>
    train, test = load_task(FLAGS.data_dir, FLAGS.task_id)
  File "/home/tobe/code/memn2n/data_utils.py", line 14, in load_task
    files = os.listdir(data_dir)
OSError: [Errno 2] No such file or directory: 'data/tasks_1-20_v1-2/en/'

Add Linear start

is this possible to modeling multi turn dialog.......here is new dataset

https://nlp.stanford.edu/blog/a-new-multi-turn-multi-domain-task-oriented-dialogue-dataset/

Change name to more general

Probably a good idea to change name so other models can be incorporated, maybe memory_models should make more sense?

Support multi-word answers

Difference between code and paper

Hi, Thank you for your codes! It is very helpful.

I noticed a difference between your code and original paper. The paper uses embedding to get c for each story, and directly add o and u to get the input of the predict layer or the u for next layer in the case of multi-hop. In your code, c is the given the same value as m rather than otherwise recalculated. And o is dot producted with a matrix you called H before adding up with u. I am wondering why you do it this way? I haven't tested the difference. Will it influence the performance?

something wrong at nonlin

nonlinearity

            if self._nonlin:
                u_k = nonlin(u_k)

            u.append(u_k)

Unresolved reference nonlin,how to fix it

how to run inference/test independently

Found joint.py errors

n_train/20, n_val/20, and n_test/20 cause errors in python3.

I modified
n_train/20 -> n_train//20
n_val/20 -> n_val//20
n_test/20 -> n_test//20
and it works

DOCS

Test

Should probably have a test somewhere!

Compare Results

Hello @domluna!
Thanks for your nice scripts. I have one question about this model. Do you know why some task results here are very different from the Facebook matlab one? (like task11,13,16) Is it because the initialization of model?
https://github.com/vinhkhuc/MemN2N-babi-python/tree/master/bechmarks
Thank you for your respond :)

Dialog tasks

Hi!

I have a question, can this model be used for the Dialog tasks?
My main concern is that Dialog tasks assume working in seq2seq mode, and I'm not sure if it's the same for the QA task.
Could you please provide some info on this?

Pandas not imported in joint.py

https://github.com/domluna/memn2n/blob/master/joint.py#L163
(needs import pandas as pd)

Puzzled about the attention part

m_C = tf.reduce_sum(m_emb_C * self._encoding, 2)
c_temp = tf.transpose(m_C, [0, 2, 1])

Here in this part, the first line with reduce_sum should turn the matrix into 2-dimension, so I think it won't work for the transposition in the second line. I am not sure if I am getting something wrong

Separate memory from model?

Assuming I could define a custom gradient for the nil embedding, the memory a.k.a variables A, B, TA, and TB can be in a separate memory component.

The main benefit of this would be to more easily play around with different models around the memory.

To see memory slot probabilities

I am trying to see the memory slot probabilities(probabilities associated with different sentences) for a particular query. Is there a way to visualize them ? Please help.

Thanks,
Joe

Position Encoding

Hi domluna,
How did you get the equation in position_encoding? It seems different from the one in the paper, unless I made a silly algebra mistake...
Even then, is there an advantage in splitting out the equation into the way you wrote it? Some sort of optimization?

running joint.py throws an error

Traceback (most recent call last):
File "joint.py", line 121, in
for start in range(0, n_train, n_train/20):
TypeError: 'float' object cannot be interpreted as an integer

This shows up after a few runs. single.py runs fine. Any idea why this could happen?

The full log is:

(mem-tf) skc@Ultron:~/Projects/qa-mem/tf-memn2n$ python joint.py
Started Joint Model
/Users/skc/anaconda/envs/mem-tf/lib/python3.5/re.py:203: FutureWarning: split() requires a non-empty pattern match.
return _compile(pattern, flags).split(string, maxsplit)
Longest sentence length 11
Longest story length 228
Average story length 9
Training Size 18000
Validation Size 2000
Testing Size 20000
(18000, 50, 11) (2000, 50, 11) (20000, 50, 11)
(18000, 11) (2000, 11) (20000, 11)
(18000, 175) (2000, 175) (20000, 175)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
Traceback (most recent call last):
File "joint.py", line 121, in
for start in range(0, n_train, n_train/20):
TypeError: 'float' object cannot be interpreted as an integer

Cleanup

Add Temporal Encoding

tokenize function code in data_utils.py is incorrect

with the test intention that

>>> tokenize('Bob dropped the apple. Where is the apple?')
    ['Bob', 'dropped', 'the', 'apple', '.', 'Where', 'is', 'the', 'apple', '?']

we should write like this:

def tokenize(sent):
    return [x for x in re.findall(r"\w+(?:'\w+)?|[^\w\s]", sent)]

Support for Ragged/Jagged arrays

On this line, it is mentioned there is not support for jagged arrays, but the new Tensorflow v2.1.0 has introduced RaggedTensor.
It would be nice if support for this feature can be provided in the current codebase.

fix 0 logits in input module

Currently, because the nil embedding is 0 (which is fine) and which we pad to a specified memory size, we tend to have a bunch of memories which are empty [0 0 ... 0]. The problem with this is we feed this into a softmax as is and exp(0) = 1. On the output the empty memories have a uniform probability. This is problematic because it alters the probabilities of non-empty memories.

So the solution is to add a largish negative number to empty memories before sotfmax is applied. Then the exp() of the value will be 0 or close enough.

This issue is particularly evident in task 4 where each story consists of 2 sentences. If we make the memory size large, say 50 (only 2 is needed) 2 things tend to occur:

We converge at a much slower rate
We get a worse error rate

An alternative solution would be make all batch-size 1 (at least at a low level, higher level API can make this nicer). This way the memory can be of any size since nothing in the underlying algorithm relies on the memory being a fixed size (at least I think this is the case, have to double check!).