glorotxa / sme Goto Github PK

License: Other

Python 100.00%

sme's Introduction

SME

The architecture of this package has been designed by Xavier Glorot (https://github.com/glorotxa), with some contributions from Antoine Bordes (https://www.hds.utc.fr/~bordesan).

Update (Nov 13): the code for Translating Embeddings (see https://everest.hds.utc.fr/doku.php?id=en:transe) has been included along with a new version for Freebase (FB15k).

Overview

This package proposes scripts using Theano to perform training and evaluation on several datasets of the models:

Structured Embeddings (SE) defined in (Bordes et al., AAAI 2011);
Semantic Matching Energy (SME_lin & SME_bil) defined in (Bordes et al., MLJ 2013);
Translating Embeddings (TransE) defined in (Bordes et al., NIPS 2013).
TATEC defined in (Garcia-Duran et al., ECML14, arxiv15).

Please refer to the following pages for more details and references:

Content of the package:

model.py : contains the classes and functions to create the different models and Theano functions (training, evaluation...).
{dataset}_exp.py : contains an experiment function to train all the different models on a given dataset.
The data/ folder contains the data files for the learning scripts.
in the {dataset}/ folders:
- {dataset}_parse.py : parses and creates data files for the training script of a given dataset.
- {dataset}_evaluation.py : contains evaluation functions for a given dataset.
- {dataset}_{model_name}.py : runs the best hyperparameters experiment for a given dataset and a given model.
- {dataset}_{model_name}.out : output we obtained on our machines for a given dataset and a given model using the script above.
- {dataset}_test.py : perform quick runs for small models of all types to test the scripts.

The datasets currently available are:

Multi-relational benchmarks (Kinhsips, UMLS & Nations -- Tensor folder) to be downloaded from https://everest.hds.utc.fr/doku.php?id=en:smemlj12
WordNet (WN folder) to be downloaded from https://everest.hds.utc.fr/doku.php?id=en:smemlj12
Freebase (FB folder) used in (Bordes et al., AAAI 2011) to be downloaded from https://everest.hds.utc.fr/doku.php?id=en:smemlj12
Freebase15k (FB15k folder) used in (Bordes et al., NIPS 2013) to be downloaded from https://everest.hds.utc.fr/doku.php?id=en:transe
Synthethic family database (Family folder) user is (Garcia-Duran et al. arxiv15a) to be downloaded from https://everest.hds.utc.fr/doku.php?id=en:2and3ways

3rd Party Libraries

You need to install Theano to use those scripts. It also requires: Python >= 2.4, Numpy >=1.5.0, Scipy>=0.8. The experiment scripts are compatible with Jobman but this library is not mandatory.

Installation

Put the script folder in your PYTHONPATH.

Data Files Creation

Put the absolute path of the downloaded dataset (from: https://everest.hds.utc.fr/doku.php?id=en:smemlj12 or https://everest.hds.utc.fr/doku.php?id=en:transe) at the beginning of the {dataset}_parse.py script and run it (the SME folder has to be your current directory). Note: Running Tensor_parse.py generates data for both Kinhsips, UMLS & Nations.

Training and Evaluating a Model

Simply run the corresponding {dataset}_{model_name}.py file (the SME/{dataset}/ folder has to be your current directory) to launch a training. When it's over, running {dataset}_evaluation.py with the path to the best_valid_model.pkl of the learned model runs the evaluation on the test set

Citing

If you use this code, you could provide the link to the github page: https://github.com/glorotxa/SME . Also, depending on the model used, you should cite either the paper on Structured Embeddings (Bordes et al., AAAI 2011), on Semantic Matching Energy (Bordes et al., MLJ 2013) or on Translating Embeddings (Bordes et al., NIPS 2013).

References

(Garcia-Duran et al., arxiv 15) Combining Two And Three-Way Embeddings Models for Link Prediction in Knowledge Bases Alberto Garcia-Duran, Antoine Bordes, Nicolas Usunier and Yves Grandvalet. http://arxiv.org/abs/1506.00999
(Bordes et al., NIPS 2013) Translating Embeddings for Modeling Multi-relational Data (2013). Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston and Oksana Yakhnenko. In Proceedings of Neural Information Processing Systems (NIPS 26), Lake Taho, NV, USA. Dec. 2013.
(Bordes et al., MLJ 2013) A Semantic Matching Energy Function for Learning with Multi-relational Data (2013). Antoine Bordes, Xavier Glorot, Jason Weston, and Yoshua Bengio. in Machine Learning. Springer, DOI: 10.1007/s10994-013-5363-6, May 2013
(Bordes et al., AAAI 2011) Learning Structured Embeddings of Knowledge Bases (2011). Antoine Bordes, Jason Weston, Ronan Collobert and Yoshua Bengio. in Proceedings of the 25th Conference on Artificial Intelligence (AAAI), AAAI Press.

sme's People

Contributors

Stargazers

Watchers

Forkers

aneesha bordesa ibm88 wangdongfrank nhambletccri chtlp sodasandwich heshizhu fangyw henryslzhao zhichun devsinghsachan dapeng2018 minhlab rockt fanfannothing agduran qiuyuew lngvietthang while519 antoine-tran mouhidine 52nlp codeaudit skaasj lsh5207 ryulee ericalingyuandrexel junwei-pan sandy4321 springbarley jolinxql iboxty eshijia alphaprime lightsilver anukat2015 lakpatashi chan4cc bluetropic nooralahzadeh njust-wind zxlzr vishwajeetkumar93 wellwang vyraun boyangumn collawolley chriszblong itgirls zhouhoo zhenv5 dingboy bensnw shirveon mindis nausheenfatma flyscofield jiajiadf pingjietang yanyushu laxatives liuyepku avneets v-mostafapour leiloong liukangling lily1756 yaoleo mzkhan2000 xitongdashi xun-yang hatleon flyrainkey lrxzhy afcarl darvid7 olifei yshihui wei-he usmanakhtar gzupanda fengyaogong jiajie-mei bryan2chow guixiangyu1 wos1239 hongtaicao wurentidai zhiyuxzh thuydang subbulakshmisubha nwpusunyue admn410 xuerli rebaccamin zmjm4 xeverentx andglf j-bing

sme's Issues

problems while reproducing results on FB15k

The procedure I follow is
First git clone the repo.
Under the FB15k directory

python FB15k_TransE.py
python FB15k_evaluation.py FB15k_TransE/best_valid_model.pkl

The result I get is

Using gpu device 0: Tesla K40m
/home/cc/jxshi/env/local/lib/python2.7/site-packages/theano/tensor/subtensor.py:114: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
  stop in [None, length, maxsize] or
### MICRO:
    -- left   >> mean: 2872.72939, median: 682.0, hits@10: 11.131%
    -- right  >> mean: 2362.83263, median: 399.0, hits@10: 15.427%
    -- global >> mean: 2617.78101, median: 533.0, hits@10: 13.279%
### MACRO:
    -- left   >> mean: 3811.39395, median: 3553.36108, hits@10: 6.763%
    -- right  >> mean: 3343.12054, median: 3118.40843, hits@10: 10.424%
    -- global >> mean: 3577.25725, median: 2691.94641, hits@10: 8.594%

The training log is

Using gpu device 0: Tesla K40m
DD{'ndim': 50, 'test_all': 10, 'loadmodelBi': False, 'loadmodelTri': False, 'nhid': 50, 'lremb': 0.01, 'savepath': 'FB15k_TransE', 'seed': 123, 'marge': 1.0, 'simfn': 'L1', 'neval': 1000, 'dataset': 'FB15k', 'nbatches': 100, 'lrparam': 1.0, 'loademb': False, 'datapath': '../data/', 'Nrel': 1345, 'totepochs': 500, 'rhoL': 5, 'Nent': 16296, 'Nsyn': 14951, 'loadmodel': False, 'rhoE': 1, 'op': 'TransE'}
/home/cc/jxshi/env/local/lib/python2.7/site-packages/theano/tensor/subtensor.py:114: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
  stop in [None, length, maxsize] or
batchsize: 4831
nbatches: 100
BEGIN TRAINING
-- EPOCH 10 (14.0093 seconds per epoch):
COST >> 0.7421 +/- 0.2195, % updates: 32.669%
    MEAN RANK >> valid: 2618.799, train: 1588.12
        ##### NEW BEST VALID >> test: 2506.168
    (the evaluation took 62.386 seconds)
-- EPOCH 20 (14.357 seconds per epoch):
COST >> 0.5985 +/- 0.0286, % updates: 28.821%
    MEAN RANK >> valid: 2573.779, train: 1479.3145
        ##### NEW BEST VALID >> test: 2419.6415
    (the evaluation took 69.338 seconds)
-- EPOCH 30 (14.5237 seconds per epoch):
COST >> 0.59 +/- 0.0283, % updates: 28.66%
    MEAN RANK >> valid: 2596.0705, train: 1507.8925
    (the evaluation took 41.674 seconds)
-- EPOCH 40 (13.9154 seconds per epoch):
COST >> 0.5868 +/- 0.028, % updates: 28.612%
    MEAN RANK >> valid: 2525.7585, train: 1435.894
        ##### NEW BEST VALID >> test: 2480.78
    (the evaluation took 60.28 seconds)
-- EPOCH 50 (14.035 seconds per epoch):
COST >> 0.5861 +/- 0.0288, % updates: 28.612%
    MEAN RANK >> valid: 2544.1805, train: 1451.6185
    (the evaluation took 39.961 seconds)
-- EPOCH 60 (13.8575 seconds per epoch):
COST >> 0.5857 +/- 0.0278, % updates: 28.615%
    MEAN RANK >> valid: 2580.475, train: 1557.4625
    (the evaluation took 41.311 seconds)
-- EPOCH 70 (13.9798 seconds per epoch):
COST >> 0.586 +/- 0.0273, % updates: 28.638%
    MEAN RANK >> valid: 2547.1255, train: 1464.591
    (the evaluation took 41.477 seconds)
-- EPOCH 80 (13.6954 seconds per epoch):
COST >> 0.585 +/- 0.0284, % updates: 28.604%
    MEAN RANK >> valid: 2553.614, train: 1504.0965
    (the evaluation took 40.815 seconds)
-- EPOCH 90 (13.5098 seconds per epoch):
COST >> 0.5851 +/- 0.0283, % updates: 28.626%
    MEAN RANK >> valid: 2519.132, train: 1569.168
        ##### NEW BEST VALID >> test: 2399.125
    (the evaluation took 61.435 seconds)
-- EPOCH 100 (13.6294 seconds per epoch):
COST >> 0.5851 +/- 0.0271, % updates: 28.613%
    MEAN RANK >> valid: 2510.854, train: 1510.5975
        ##### NEW BEST VALID >> test: 2461.665
    (the evaluation took 65.216 seconds)
-- EPOCH 110 (13.6886 seconds per epoch):
COST >> 0.5851 +/- 0.0279, % updates: 28.605%
    MEAN RANK >> valid: 2578.334, train: 1477.527
    (the evaluation took 44.238 seconds)
-- EPOCH 120 (13.6551 seconds per epoch):
COST >> 0.5854 +/- 0.0277, % updates: 28.615%
    MEAN RANK >> valid: 2563.495, train: 1482.032
    (the evaluation took 42.621 seconds)
-- EPOCH 130 (13.7041 seconds per epoch):
COST >> 0.5843 +/- 0.0275, % updates: 28.606%
    MEAN RANK >> valid: 2549.267, train: 1493.6845
    (the evaluation took 42.67 seconds)
-- EPOCH 140 (13.9192 seconds per epoch):
COST >> 0.585 +/- 0.0281, % updates: 28.606%
    MEAN RANK >> valid: 2534.716, train: 1474.618
    (the evaluation took 40.11 seconds)
-- EPOCH 150 (14.1649 seconds per epoch):
COST >> 0.5853 +/- 0.0277, % updates: 28.616%
    MEAN RANK >> valid: 2538.495, train: 1502.6005
    (the evaluation took 43.591 seconds)
-- EPOCH 160 (13.7459 seconds per epoch):
COST >> 0.5852 +/- 0.0288, % updates: 28.628%
    MEAN RANK >> valid: 2563.7135, train: 1457.409
    (the evaluation took 47.238 seconds)
-- EPOCH 170 (13.9505 seconds per epoch):
COST >> 0.585 +/- 0.0284, % updates: 28.627%
    MEAN RANK >> valid: 2598.625, train: 1508.668
    (the evaluation took 41.62 seconds)
-- EPOCH 180 (13.7503 seconds per epoch):
COST >> 0.5846 +/- 0.028, % updates: 28.617%
    MEAN RANK >> valid: 2562.2285, train: 1514.315
    (the evaluation took 41.185 seconds)
-- EPOCH 190 (13.7498 seconds per epoch):
COST >> 0.5853 +/- 0.0276, % updates: 28.628%
    MEAN RANK >> valid: 2545.985, train: 1508.7025
    (the evaluation took 40.715 seconds)
-- EPOCH 200 (13.911 seconds per epoch):
COST >> 0.5846 +/- 0.0275, % updates: 28.616%
    MEAN RANK >> valid: 2545.276, train: 1489.0895
    (the evaluation took 41.367 seconds)
-- EPOCH 210 (13.6507 seconds per epoch):
COST >> 0.5852 +/- 0.0283, % updates: 28.63%
    MEAN RANK >> valid: 2521.616, train: 1501.777
    (the evaluation took 46.349 seconds)
-- EPOCH 220 (13.8924 seconds per epoch):
COST >> 0.5853 +/- 0.0279, % updates: 28.636%
    MEAN RANK >> valid: 2576.8105, train: 1464.5985
    (the evaluation took 40.534 seconds)
-- EPOCH 230 (14.0016 seconds per epoch):
COST >> 0.5851 +/- 0.0277, % updates: 28.633%
    MEAN RANK >> valid: 2581.774, train: 1488.3685
    (the evaluation took 40.768 seconds)
-- EPOCH 240 (13.6372 seconds per epoch):
COST >> 0.5847 +/- 0.0277, % updates: 28.605%
    MEAN RANK >> valid: 2561.3445, train: 1490.669
    (the evaluation took 41.671 seconds)
-- EPOCH 250 (13.588 seconds per epoch):
COST >> 0.5846 +/- 0.0275, % updates: 28.603%
    MEAN RANK >> valid: 2533.8615, train: 1491.4595
    (the evaluation took 40.431 seconds)
-- EPOCH 260 (14.1092 seconds per epoch):
COST >> 0.5852 +/- 0.0281, % updates: 28.629%
    MEAN RANK >> valid: 2524.907, train: 1529.538
    (the evaluation took 43.785 seconds)
-- EPOCH 270 (13.615 seconds per epoch):
COST >> 0.5856 +/- 0.0279, % updates: 28.649%
    MEAN RANK >> valid: 2594.1715, train: 1500.197
    (the evaluation took 41.121 seconds)
-- EPOCH 280 (13.6351 seconds per epoch):
COST >> 0.5848 +/- 0.0281, % updates: 28.625%
    MEAN RANK >> valid: 2478.206, train: 1473.8335
        ##### NEW BEST VALID >> test: 2431.339
    (the evaluation took 62.283 seconds)
-- EPOCH 290 (14.1048 seconds per epoch):
COST >> 0.5844 +/- 0.028, % updates: 28.604%
    MEAN RANK >> valid: 2594.1305, train: 1587.6305
    (the evaluation took 42.206 seconds)
-- EPOCH 300 (13.5902 seconds per epoch):
COST >> 0.5852 +/- 0.0278, % updates: 28.629%
    MEAN RANK >> valid: 2535.6355, train: 1474.9225
    (the evaluation took 42.145 seconds)
-- EPOCH 310 (13.7048 seconds per epoch):
COST >> 0.5853 +/- 0.0281, % updates: 28.633%
    MEAN RANK >> valid: 2518.5735, train: 1520.5495
    (the evaluation took 40.396 seconds)
-- EPOCH 320 (13.5596 seconds per epoch):
COST >> 0.5847 +/- 0.0285, % updates: 28.611%
    MEAN RANK >> valid: 2547.382, train: 1435.3145
    (the evaluation took 41.601 seconds)
-- EPOCH 330 (13.8951 seconds per epoch):
COST >> 0.5853 +/- 0.0277, % updates: 28.627%
    MEAN RANK >> valid: 2558.76, train: 1495.843
    (the evaluation took 40.745 seconds)
-- EPOCH 340 (13.7287 seconds per epoch):
COST >> 0.5851 +/- 0.028, % updates: 28.615%
    MEAN RANK >> valid: 2578.2635, train: 1464.18
    (the evaluation took 41.701 seconds)
-- EPOCH 350 (13.9722 seconds per epoch):
COST >> 0.5852 +/- 0.028, % updates: 28.644%
    MEAN RANK >> valid: 2570.4365, train: 1506.5885
    (the evaluation took 40.943 seconds)
-- EPOCH 360 (13.6693 seconds per epoch):
COST >> 0.585 +/- 0.0283, % updates: 28.621%
    MEAN RANK >> valid: 2592.5005, train: 1488.669
    (the evaluation took 41.489 seconds)
-- EPOCH 370 (14.0562 seconds per epoch):
COST >> 0.5853 +/- 0.0275, % updates: 28.613%
    MEAN RANK >> valid: 2527.0815, train: 1510.029
    (the evaluation took 41.186 seconds)
-- EPOCH 380 (13.5753 seconds per epoch):
COST >> 0.5851 +/- 0.0278, % updates: 28.608%
    MEAN RANK >> valid: 2566.2895, train: 1480.227
    (the evaluation took 40.583 seconds)
-- EPOCH 390 (13.662 seconds per epoch):
COST >> 0.5849 +/- 0.0286, % updates: 28.609%
    MEAN RANK >> valid: 2506.2965, train: 1561.4965
    (the evaluation took 40.525 seconds)
-- EPOCH 400 (13.5609 seconds per epoch):
COST >> 0.5852 +/- 0.0281, % updates: 28.635%
    MEAN RANK >> valid: 2606.5325, train: 1474.29
    (the evaluation took 41.654 seconds)
-- EPOCH 410 (14.1833 seconds per epoch):
COST >> 0.5856 +/- 0.0287, % updates: 28.63%
    MEAN RANK >> valid: 2551.1865, train: 1511.1175
    (the evaluation took 41.022 seconds)
-- EPOCH 420 (13.9187 seconds per epoch):
COST >> 0.5854 +/- 0.028, % updates: 28.636%
    MEAN RANK >> valid: 2579.4155, train: 1496.488
    (the evaluation took 41.552 seconds)
-- EPOCH 430 (13.7581 seconds per epoch):
COST >> 0.5853 +/- 0.0282, % updates: 28.629%
    MEAN RANK >> valid: 2539.8345, train: 1487.854
    (the evaluation took 40.044 seconds)
-- EPOCH 440 (13.6784 seconds per epoch):
COST >> 0.5851 +/- 0.0274, % updates: 28.629%
    MEAN RANK >> valid: 2585.2645, train: 1449.604
    (the evaluation took 40.061 seconds)
-- EPOCH 450 (13.9173 seconds per epoch):
COST >> 0.5853 +/- 0.0277, % updates: 28.624%
    MEAN RANK >> valid: 2522.25, train: 1470.082
    (the evaluation took 40.182 seconds)
-- EPOCH 460 (13.8374 seconds per epoch):
COST >> 0.5849 +/- 0.0279, % updates: 28.637%
    MEAN RANK >> valid: 2555.768, train: 1478.828
    (the evaluation took 42.134 seconds)
-- EPOCH 470 (13.5488 seconds per epoch):
COST >> 0.5852 +/- 0.0282, % updates: 28.617%
    MEAN RANK >> valid: 2569.101, train: 1505.802
    (the evaluation took 40.231 seconds)
-- EPOCH 480 (13.9631 seconds per epoch):
COST >> 0.5851 +/- 0.0273, % updates: 28.632%
    MEAN RANK >> valid: 2580.9845, train: 1510.754
    (the evaluation took 41.682 seconds)
-- EPOCH 490 (13.75 seconds per epoch):
COST >> 0.5849 +/- 0.0277, % updates: 28.608%
    MEAN RANK >> valid: 2596.538, train: 1509.0875
    (the evaluation took 39.438 seconds)
-- EPOCH 500 (13.7304 seconds per epoch):
COST >> 0.5853 +/- 0.0282, % updates: 28.623%
    MEAN RANK >> valid: 2559.398, train: 1533.861
    (the evaluation took 43.392 seconds)

Model fails to learn

When I trained TransE on WordNet, only the first epoch updated weights and improved the results.

$ python WN_TransE.py
Using gpu device 0: GeForce GTX 480
DD{'ndim': 20, 'test_all': 10, 'loadmodel': False, 'nhid': 20, 'lremb': 0.01, 'savepath': 'WN_TransE', 'seed': 123, 'marge': 2.0, 'simfn': 'L1', 'neval': 1000, 'dataset': 'WN', 'nbatches': 100, 'lrparam': 1.0, 'loademb': False, 'datapath': '../data/', 'Nrel': 18, 'totepochs': 1000, 'Nent': 40961, 'Nsyn': 40943, 'op': 'TransE'}
/home/minhle/.local/lib/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`.
  VisibleDeprecationWarning)
/home/minhle/SME/model.py:949: UserWarning: The parameter 'updates' of theano.function() expects an OrderedDict, got <class 'collections.OrderedDict'>. Using a standard dictionary here results in non-deterministic behavior. You should use an OrderedDict if you are using Python 2.7 (theano.compat.python2x.OrderedDict for older python), or use a list of (shared, update) pairs. Do not just convert your dictionary to this type before the call as the conversion will still be non-deterministic.
  updates=updates, on_unused_input='ignore')
BEGIN TRAINING
-- EPOCH 10 (4.1381 seconds per epoch):
COST >> nan +/- nan, % updates: 55.797%
    MEAN RANK >> valid: 21031.55, train: 20901.2955
        ##### NEW BEST VALID >> test: 20779.578
    (the evaluation took 31.919 seconds)
-- EPOCH 20 (4.1195 seconds per epoch):
COST >> nan +/- nan, % updates: 0.0%
    MEAN RANK >> valid: 21031.55, train: 20901.2955
    (the evaluation took 21.398 seconds)
-- EPOCH 30 (4.1711 seconds per epoch):
COST >> nan +/- nan, % updates: 0.0%
    MEAN RANK >> valid: 21031.55, train: 20901.2955
    (the evaluation took 21.084 seconds)
-- EPOCH 40 (4.1701 seconds per epoch):
COST >> nan +/- nan, % updates: 0.0%
    MEAN RANK >> valid: 21031.55, train: 20901.2955
    (the evaluation took 21.182 seconds)
-- EPOCH 50 (4.4218 seconds per epoch):
COST >> nan +/- nan, % updates: 0.0%
    MEAN RANK >> valid: 21031.55, train: 20901.2955
    (the evaluation took 21.132 seconds)
-- EPOCH 60 (4.4139 seconds per epoch):
COST >> nan +/- nan, % updates: 0.0%
    MEAN RANK >> valid: 21031.55, train: 20901.2955
    (the evaluation took 21.03 seconds)

Links mentioned in readme aren't working

Hello,
can anyone direct me to the contents of the URLs mentioned in the readme file for further help?
Thanks

doubt on the gradients

According to the models, if we use "margincost", we should get their gradients for positive and negative triplet respectively. But in the code of model.py, take #L1208 in "TrainFn1Member" for example, you add them together before compute their gradients. Because "costl" and "costr" share some varibles, I'm wondering if it can work just like the models defined.

ValueError: dimension mismatch with FB15k training

I now fed the data with the right input format into the model, unfortunately I got another exception this time:

Traceback (most recent call last):
  File "FB15k_TransE.py", line 5, in <module>
    nbatches=100, totepochs=500, test_all=10, neval=1000, savepath='FB15k_TransE', datapath='../data/')
  File "/Users/dennisulmer/Desktop/SME-master/FB15k_exp.py", line 421, in launch
    FB15kexp(state, channel)
  File "/Users/dennisulmer/Desktop/SME-master/FB15k_exp.py", line 269, in FB15kexp
    tmpl, tmpr, tmpo, tmpnl, tmpnr)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/theano/compile/function_module.py", line 871, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/theano/gof/link.py", line 314, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/theano/compile/function_module.py", line 859, in __call__
    outputs = self.fn()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/theano/gof/op.py", line 912, in rval
    r = p(n, [x[0] for x in i], o)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/theano/sparse/basic.py", line 4121, in perform
    rval = x * y
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/base.py", line 388, in __rmul__
    return (self.transpose() * tr).transpose()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/base.py", line 355, in __mul__
    raise ValueError('dimension mismatch')
ValueError: dimension mismatch
Apply node that caused the error: SparseDot(Eemb, SparseVariable{csr,float64})
Toposort index: 9
Inputs types: [TensorType(float64, matrix), Sparse[float64, csr]]
Inputs shapes: [(50, 16296), (17844, 816)]
Inputs strides: [(130368, 8), 'No strides']
Inputs values: ['not shown', 'not shown']
Outputs clients: [[InplaceDimShuffle{1,0}(SparseDot.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "FB15k_TransE.py", line 5, in <module>
    nbatches=100, totepochs=500, test_all=10, neval=1000, savepath='FB15k_TransE', datapath='../data/')
  File "/Users/dennisulmer/Desktop/SME-master/FB15k_exp.py", line 421, in launch
    FB15kexp(state, channel)
  File "/Users/dennisulmer/Desktop/SME-master/FB15k_exp.py", line 236, in FB15kexp
    marge=state.marge, rel=False)
  File "/Users/dennisulmer/Desktop/SME-master/model.py", line 1157, in TrainFn1Member
    lhsn = S.dot(embedding.E, inpln).T

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

What could be the source of this problem?

idx2identity doesn't match after slicing for TransE

Converting the relation index into its entity just after load yields reasonable candidate for Freebase predicate (i.e. /base/activism/activist/area_of_activism) but if performed after the pre-defined slicing for TransE it seems to map to a different set of indices, and thus falls on entities rather than predicates.

state.op = 'Unstructured'
testo = load_file(state.datapath + state.dataset + '-test-rel.pkl')
testo = convert2idx(testo)
print(idx2entity[convert2idx(testo)[0]]
# >>> /award/award_nominee/award_nominations./award/award_nomination/award

state.op = 'TransE'
testo = load_file(state.datapath + state.dataset + '-test-rel.pkl')
testo = convert2idx(testo)
if state.op == 'SE' or state.op == 'TransE':
    testo = testo[-state.Nrel:, :]
print(idx2entity[convert2idx(testo)[0]])
# >>> /m/088fh

Maybe I'm missing something?

Reproduce the experiment results of TransE

With the default config in the code, I ran the FB15k_TransE.py to train a best model, and I found the evaluation result of that model was really different from what the paper said. The default epochs in the code if 500, and the paper said that the model was trained at most 1000 epochs. But current result was even much better than the result in paper. Does the paper used the code in the current repo? Below is my result.

MICRO:

-- left   >> mean: 229.41149, median: 23.0, hits@10: 37.377%
-- right  >> mean: 160.86706, median: 14.0, hits@10: 45.088%
-- global >> mean: 195.13927, median: 18.0, hits@10: 41.233%

MACRO:

-- left   >> mean: 106.30351, median: 83.18991, hits@10: 55.557%
-- right  >> mean: 84.51045, median: 63.63632, hits@10: 63.104%
-- global >> mean: 95.40698, median: 33.58689, hits@10: 59.331%

IndexError: row index (13519) out of bounds before training

Hi,

I tried to train your FB15k TransE model on my (german) FB15k data. I put the three files freebase_mtr100_mte100-test.txt, freebase_mtr100_mte100-valid.txt and freebase_mtr100_mte100-test.txt into the FB15k folder and parsed them with FB15k_parse.py successfully (their format is entity<tab>relation<tab>entity, as far as I know the format you need?).

But when I try to run FB15k_TransE.py, I get an error

Traceback (most recent call last):
  File "FB15k_TransE.py", line 5, in <module>
    nbatches=100, totepochs=500, test_all=10, neval=1000, savepath='FB15k_TransE', datapath='../data/')
  File "/Users/dennisulmer/Desktop/SME-master/FB15k_exp.py", line 421, in launch
    FB15kexp(state, channel)
  File "/Users/dennisulmer/Desktop/SME-master/FB15k_exp.py", line 258, in FB15kexp
    trainln = create_random_mat(trainl.shape, np.arange(state.Nsyn)) # !!!!
  File "/Users/dennisulmer/Desktop/SME-master/FB15k_exp.py", line 33, in create_random_mat
    randommat[listidx[idx_term], idx_ex] = 1
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/lil.py", line 282, in __setitem__
    self.rows, self.data, i, j, x)
  File "scipy/sparse/_csparsetools.pyx", line 61, in scipy.sparse._csparsetools.lil_insert (scipy/sparse/_csparsetools.c:3292)
  File "scipy/sparse/_csparsetools.pyx", line 82, in scipy.sparse._csparsetools.lil_insert (scipy/sparse/_csparsetools.c:2745)
IndexError: row index (13519) out of bounds

I tried to find the bug myself, but so far I was unsuccessful.