salesforce / multihopkg Goto Github PK

View Code? Open in Web Editor NEW

297.0 14.0 70.0 24.87 MB

Multi-hop knowledge graph reasoning learned via policy gradient with reward shaping and action dropout

Home Page: https://arxiv.org/abs/1808.10568

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 0.24% Makefile 0.03% Shell 4.01% Python 41.26% Jupyter Notebook 54.46%

policy-gradient reinforcement-learning reward-shaping action-dropout pytorch knowledge-graph multi-hop-reasoning

multihopkg's Introduction

Multi-Hop Knowledge Graph Reasoning with Reward Shaping

This is the official code release of the following paper:

Xi Victoria Lin, Richard Socher and Caiming Xiong. Multi-Hop Knowledge Graph Reasoning with Reward Shaping. EMNLP 2018.

Quick Start

Environment variables & dependencies

Use Docker

Build the docker image

docker build -< Dockerfile -t multi_hop_kg:v1.0

Spin up a docker container and run experiments inside it.

nvidia-docker run -v `pwd`:/workspace/MultiHopKG -it multi_hop_kg:v1.0

The rest of the readme assumes that one works interactively inside a container. If you prefer to run experiments outside a container, please change the commands accordingly.

Mannually set up

Alternatively, you can install Pytorch (>=0.4.1) manually and use the Makefile to set up the rest of the dependencies.

make setup

Process data

First, unpack the data files

tar xvzf data-release.tgz

and run the following command to preprocess the datasets.

./experiment.sh configs/<dataset>.sh --process_data <gpu-ID>

<dataset> is the name of any dataset folder in the ./data directory. In our experiments, the five datasets used are: umls, kinship, fb15k-237, wn18rr and nell-995. <gpu-ID> is a non-negative integer number representing the GPU index.

Train models

Then the following commands can be used to train the proposed models and baselines in the paper. By default, dev set evaluation results will be printed when training terminates.

Train embedding-based models

./experiment-emb.sh configs/<dataset>-<emb_model>.sh --train <gpu-ID>

The following embedding-based models are implemented: distmult, complex and conve.

Train RL models (policy gradient)

./experiment.sh configs/<dataset>.sh --train <gpu-ID>

Train RL models (policy gradient + reward shaping)

./experiment-rs.sh configs/<dataset>-rs.sh --train <gpu-ID>

Note: To train the RL models using reward shaping, make sure 1) you have pre-trained the embedding-based models and 2) set the file path pointers to the pre-trained embedding-based models correctly (example configuration file).

Evaluate pretrained models

To generate the evaluation results of a pre-trained model, simply change the --train flag in the commands above to --inference.

For example, the following command performs inference with the RL models (policy gradient + reward shaping) and prints the evaluation results (on both dev and test sets).

./experiment-rs.sh configs/<dataset>-rs.sh --inference <gpu-ID>

To print the inference paths generated by beam search during inference, use the --save_beam_search_paths flag:

./experiment-rs.sh configs/<dataset>-rs.sh --inference <gpu-ID> --save_beam_search_paths

Note for the NELL-995 dataset:

On this dataset we split the original training data into train.triples and dev.triples, and the final model to test has to be trained with these two files combined.

To obtain the correct test set results, you need to add the --test flag to all data pre-processing, training and inference commands.

# You may need to adjust the number of training epochs based on the dev set development.

./experiment.sh configs/nell-995.sh --process_data <gpu-ID> --test
./experiment-emb.sh configs/nell-995-conve.sh --train <gpu-ID> --test
./experiment-rs.sh configs/NELL-995-rs.sh --train <gpu-ID> --test
./experiment-rs.sh configs/NELL-995-rs.sh --inference <gpu-ID> --test

Leave out the --test flag during development.

Change the hyperparameters

To change the hyperparameters and other experiment set up, start from the configuration files.

More on implementation details

We use mini-batch training in our experiments. To save the amount of paddings (which can cause memory issues and slow down computation for knowledge graphs that contain nodes with large fan-outs), we group the action spaces of different nodes into buckets based on their sizes. Description of the bucket implementation can be found here and here.

Citation

If you find the resource in this repository helpful, please cite

@inproceedings{LinRX2018:MultiHopKG, 
  author = {Xi Victoria Lin and Richard Socher and Caiming Xiong}, 
  title = {Multi-Hop Knowledge Graph Reasoning with Reward Shaping}, 
  booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural
               Language Processing, {EMNLP} 2018, Brussels, Belgium, October
               31-November 4, 2018},
  year = {2018} 
}

multihopkg's People

Contributors

Stargazers

Watchers

Forkers

batermj leiloong zhaosm anjishnu matthewpchapdelaine tkyaaida shubhampachori12110095 fence xcgfth danecor beethovenvirus gpilania yyht tonydeep fendaq svaditya shengxiduan csc19960608 mbrukman ihaeyong olgadk7 rubiruchi almoslmi sureddy david-lee-1990 nicemartin arita37 cnxtech gstoica27 todun rpatil524 qibinc dertilo nwpusunyue jianbotang hawksilent originprince kc2fresh siyuofzhou pwforks baylee001 teenkevo ttobbaccoo freekang cgq15 gaoxyingm todpole3 samaritan1998 moguizhizi liyinchao clementfyj messorem7 lucas-z9277 mubidiy anatanick antimonylover icloudsong isabella232 22842219 jboru owenonline simoneliasen longquanjiang ndkhoa0704 xiaoanshi kellygong radum2275 lnhutnam

multihopkg's Issues

About "margin" and "num_negative_samples" hyperparames when training embedding based models

Hi, as discussed in #3 , training embedding models uses BCEloss. But the hyperparames given in the config file still have margin and num_negative_samples. Why? The following is "fb15k-237-complex.sh"

`#!/usr/bin/env bash

data_dir="data/FB15K-237"
model="complex"
add_reversed_training_edges="True"
group_examples_by_query="True"
entity_dim=200
relation_dim=200
num_rollouts=1
bucket_interval=10
num_epochs=1000
num_wait_epochs=500
batch_size=512
train_batch_size=512
dev_batch_size=128
learning_rate=0.003
grad_norm=5
emb_dropout_rate=0.3
beam_size=128
num_negative_samples=50
margin=10`

Another question, when i set num_peek_epochs = n where n may equals 10, 100, etc. The program for training embedding based model always train 2n epochs and stop. I connot figure it out. Can you give me some help?

Question about the input

Hi, In Knowledge Graph Reasoning, What exactly is the format of the input? This question plagued me for a long time.
in dataset，（es,r,ed) is the triple contained in the dataset, (es, rq, ?) is a query.
In training process, what is the input? (es,rq,eT) or (es,r,ed)?
if (es,r,ed) is input, the Reasoning path is (es,r,ed)-->(ed,r,ed1)-->(ed1,r,ed2)?
How do we determine the size of the reward at the end time T? according whether ed2 equals eT?
in test time, what is the input?
Thanks!

tensors used as indices must be long, byte or bool tensors

error occurs when I train model using UMLS dataset

Epoch 2: average training loss = -0.22322563640773296 entropy = 4.2052984714508055
=> saving checkpoint to '/content/MultiHopKG-master/model/umls-point.rs.conve-xavier-n/a-200-200-3-0.001-0.1-0.0-0.95-400-0.05/checkpoint-2.tar'
0% 0/11 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/MultiHopKG-master/src/experiments.py", line 765, in
run_experiment(args)
File "/content/MultiHopKG-master/src/experiments.py", line 746, in run_experiment
train(lf)
File "/content/MultiHopKG-master/src/experiments.py", line 235, in train
lf.run_train(train_data, dev_data)
File "/content/MultiHopKG-master/src/learn_framework.py", line 146, in run_train
dev_scores = self.forward(dev_data, verbose=False)
File "/content/MultiHopKG-master/src/learn_framework.py", line 202, in forward
pred_score = self.predict(mini_batch, verbose=verbose)
File "/content/MultiHopKG-master/src/rl/graph_search/pg.py", line 226, in predict
pn, e1, r, e2, kg, self.num_rollout_steps, self.beam_size)
File "/content/MultiHopKG-master/src/rl/graph_search/beam_search.py", line 168, in beam_search
pn.update_path(action, kg, offset=action_offset)
File "/content/MultiHopKG-master/src/rl/graph_search/pn.py", line 187, in update_path
offset_path_history(self.path, offset)
File "/content/MultiHopKG-master/src/rl/graph_search/pn.py", line 176, in offset_path_history
new_tuple = tuple([_x[:, offset, :] for _x in x])
File "/content/MultiHopKG-master/src/rl/graph_search/pn.py", line 176, in
new_tuple = tuple([_x[:, offset, :] for _x in x])
IndexError: tensors used as indices must be long, byte or bool tensors

Does anyone meet that?

代码中生成的checkpoint-1.tar文件类型显示是8086 relocatable (Microsoft)类型，请问怎么打开呀？

NELL-995 bad data statistics

About the code in learn_framework.py

Hello,

When I run line 212, the following problem has occurred:
RuntimeError: Trying to create tensor with negative dimension -1: [512, -1]

So why the num_labels=-1 , and how can I fix it?

Thank you!

dataset

Could you explain the dataset about the dev.triples, raw.pgrk,
I don't sure the work for the files.
Thank you very much.

model:point.gc and gc

hi,
browser your code, I don't find point model and point.gc model configs and lack the code about achieving the graph_convolution_layers.
So, if you can update the code, I will be grateful to you!
Best regard!

add Action Dropout on minerva WN18RR

Hi, I tried action dropout trick on original minerva code on WN18RR. However, hit@10 decreased from 0.47 to 0.37 when action dropout rate changed from 1.0 to 0.9. Is there any other auxiliary tricks for action dropout?
The following is action drop code, where self.params['flat_epsilon'] = float(np.finfo(float).eps):

pre_distribution = tf.nn.softmax(scores)
if mode == "train":
pre_distribution = pre_distribution * dropout_mask + self.params['flat_epsilon'] * (1 - dropout_mask)
dummy_scores_1 = tf.zeros_like(prelim_scores)
pre_distribution = tf.where(mask, dummy_scores_1, pre_distribution)

dist = tf.distributions.Categorical(probs=pre_distribution)
action = tf.to_int32(dist.sample())

And the dropout mask is given as follows:
rans = np.random.random(size=[self.batch_size * self.num_rollouts, self.max_num_actions])
dropout_mask = np.greater(rans, 1.0 - self.score_keepRate)

Another mask in above code is for padding other unavailable actions.

And I also calculate the final softmax loss by the original distribution as follows
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=scores, labels=label_action)

and when I change self.score_keepRate from 1.0 and 0.9, while training 100 batches with batch size 128, and the hits@k values on the dev set is as follows:

Score for iteration 100
Hits@1: 0.3771
Hits@3: 0.4370
Hits@5: 0.4591
Hits@10: 0.4796
Hits@20: 0.4852
MRR: 0.4113

Score for iteration 100
Hits@1: 0.3250
Hits@3: 0.3464
Hits@5: 0.3616
Hits@10: 0.3711
Hits@20: 0.3744
MRR: 0.3395

For 1000 batches training,

the MRR of dev set variates as follows:

the hit@1 of training batches variates as follows:

AttributeError: 'KnowledgeGraph' object has no attribute 'entity2bucketid'

pn.py L236 entity2bucketid = kg.entity2bucketid[e.tolist()]

There is an error 【AttributeError: 'KnowledgeGraph' object has no attribute 'entity2bucketid'】 in the 236th line of pn.py, how can I solve it?

How to get the "raw.pgrk" file in the dataset?

Hi there,

Just as the title said, I don't know how to get the "raw.pgrk" file. Do you have the code?

Thank you!

cannot replicate NELL-995 MINERVA result

Hi,

Going through the code, it seems that 'point' is equivalent to MINERVA. While I can achieve expected performance on the remaining datasets, I am unable to get anywhere near reported performance on NELL-995. I have tried using the exact config provided in conjunction with the --test process, and with the exact parameters given by MINERVA.

Below is my performance after training with the default config:
Dev:
Hits@1 = .727
Hits@3 = .893
Hits@5 = .895
Hits@10 = .901
MRR = .812
Test:
Hits@1 = .500
Hits@3 = .694
Hits@5 = .742
Hits@10 = .782
MRR = .611
And with exact hyperparameters (changing beta to .06 and gamma to 0):
Dev:
Hits@1 = .479
Hits@3 = .750
Hits@5 = .847
Hits@10 = .873
MRR = .63
Test:
Hits@1 = .380
Hits@3 = .585
Hits@5 = .676
Hits@10 = .775
MRR = .498

Is there something I am missing when it comes to training the model?

A bug in pn.py

I think "false_negative_masks" in src/rl/graph_search/pn.py:line 338 should be "false_negative_mask". It is a small bug.

about the NELL dataset

hi,

just as you said, "train.large.triples = raw.kb -dev.triples" and " train.dev.triples is subset of raw.kb" and "train.dev.large.triples = raw.kb". And I don't sure that "the original data release" from the readme is the raw.kb or not.
So, I want to ask you what is the meaning of doing that.
And in the readme, I don't understand the meaning of " and the final model to test has to be trained with these two files combined". Could you explain it?

Questions about nell-995 and nell-995.test 关于nell-995和nell-995.test的问题

The code only provides the .test file of the nell-995 data set. What is the difference between these two data sets?
How to generate .test files for other data sets?

代码只提供了nell-995数据集的.test文件，请问这两个数据集有什么区别？
请问其他数据集的.test文件如何生成？

NELL-995 dataset

hi,
I see that you publish two NELL 995 dataset which includes NELL-995 dataset and NELL-995.test.
As you said, the "train.large.triples = raw.kb - dev.triple" and "train.dev.triples is the training file from the original data" and "train.dev.large.triples = raw.kb".
So, I want to ask what is the meaning of doing that?
and I want to find the original NELL-995 dataset but I failed. So, if you get the dataset, could you share the link with me.
I will appreciate it If you could help me.
Best wishes!

It seems that this code doesn't use negative triples to train embedding models.

Do this code use negative triples when training embedding models like DistMult? For example, given a triple (h, r, t), we need to create a negative triple (h', r, t') to use a margin-based ranking loss. I can't find codes related to them. Could you please help me to point them out? Thank you very much!

the meaning of NELL split

Hi, thank you for your reply.

just as the spotimage and you have said, I think that you can split the raw into the train which should get some new triples from raw.kb and dev. So, from the image, I found many overlop and development dataset is belong to the test dataset.
hence, I want to consult for you why do this and the work of that.
Bese regards!

about entity2id.txt

Hi, I read your paper and codes and there is a confusion.
In entity2id.txt, the value of ID is number of occurrences of entities, What is the reason for this?

Can not reproduce the result on kinship dataset

Dear authors, I run the experiment on kinship dataset with the config file "kinship-rs.sh", while I cannot reproduce the result in the paper, this is my config file:

------------------------------Config-------------------------------------
use_action_space_bucketing="True"
bandwidth=400
entity_dim=200
relation_dim=200
history_dim=200
history_num_layers=3
num_rollouts=20
num_rollout_steps=2
bucket_interval=10
num_epochs=1000
num_wait_epochs=400
num_peek_epochs=2
batch_size=128
train_batch_size=128
dev_batch_size=32
learning_rate=0.001
baseline="n/a"
grad_norm=5
emb_dropout_rate=0.3
ff_dropout_rate=0.1
action_dropout_rate=0.9
action_dropout_anneal_interval=1000
reward_shaping_threshold=0
beta=0.05
relation_only="False"
beam_size=128

------------------------------Result with ConvE-------------------------------------
Dev set performance:
Hits@1 = 0.5655430711610487
Hits@3 = 0.8398876404494382
Hits@5 = 0.9119850187265918
Hits@10 = 0.952247191011236
MRR = 0.7152329056775273
Hits@1 = 0.7397003745318352
Hits@3 = 0.8838951310861424
Hits@5 = 0.9250936329588015
Hits@10 = 0.9550561797752809
MRR = 0.8201877377481808
Test set performance:
Hits@1 = 0.7262569832402235
Hits@3 = 0.8975791433891993
Hits@5 = 0.9348230912476723
Hits@10 = 0.9720670391061452
MRR = 0.8186909747405656

The tap between my result and the result in the paper is very large, can you give my some advice on how to reprocude the result on kinship! Thanks very much!!!!

Find a comment typo in pg.py

In line 108, for function rollout(), comment ":param q: (Variable:batch) query embedding." should be ":param q: (Variable:batch) query relation indices."

Some doubts about result

   Hello, thank for sharing your code. 
   But, I notice that the result of ConvE in your paper is much higher than the origin paper. Are the testing procedures the same for both?  I guess, may have added beam search for ConvE, or something different?
   Hope for your reply soon.

cannot replicate NELL-995 MINERVA result [Ongoing]

Hi, apologies for never following up on the original comment, and for adding this issue (again). I'm not sure how to re-open the previous issue #9 .

I have recently been trying again to replicate the results reported with the implementation of MINERVA by the 'point' method but am still having trouble obtaining near reported performance.

I've tried using the 'point' method from NELL-995.sh right out of the box as well as changing its parameters to exactly mirror MINERVA's hyperparameter setup but haven't been able to get close to their reported performances. For instance, the best performance came from NELL-995 right out of the box with (in decimal form):

HITS@1: .544
HITS@3: .735
HITS@5: .775
HITS@10: .811
MRR: .642

To answer the question from a few months ago regarding training ConvE first, I did not, but my understanding is that ConvE/embedding based methods would be needed to train the reward shaping version of MultiHopKG, instead of the 'point' version. Is this correct? Or should I still train ConvE before training 'point' for NELL-995?

Thanks very much for all the help!

Relation and entity dimension

Can you please explain whether it is possible to have relation and entities with different dimensions for doing reward shaping using Conve? I'm getting an assertion error when I have 200 for the entity dim and 100 for relation dim. How can this be fixed?

How to prepare page ranks?

We are migrating other datasets for testing on your model. We noticed that in each of the datasets you supplied there is a raw.pgrk file. How do you generate the file?

check the dev performance

I verified some codes, However, when i ran the part of checking the dev performance. some errors would happen:
new_tuple = tuple([_x[:, offset.type(torch.uint8), :] for _x in x])
IndexError: The shape of the mask [70] at index 0 does not match the shape of the indexed tensor [3, 1, 200] at index 1

I know the match problem, but i could not find such error? could you give me some suggestion? may I comment these codes?

about the function of nell split

Hi, I have read your NELL data readme. And I have some problem to consult you. I don't know the meaning to split the dataset. Why don't just divide the dataset into train, dev, test. Can you explain it?
Thank you very much.
And in detail, I open the raw.kb and just found the 150222 lines which are not same as your introduction in your paper. I'm not sure the reason.
Best wishes!

Problem with output model files

Hi,

I'm having an issue with opening the output model files and checkpoints. I have trained the RL models for the Kinship dataset. However, the output checkpoints and model_best tar files seem to be corrupted. Do you know why this might happen?
Also, how can I print the paths taken by the agent?

RuntimeError: CUDA error: device-side assert triggered

I was trying to run the model on my custom data of KG triples, to compare its performance, however I encountered a problem.

Upon running the training command for policy gradient model:
./experiment.sh configs/<model>.sh --train 0

Encountered the following error:
RuntimeError: CUDA error: device-side assert triggered

Full stack trace:

 33%|████████████████████████████████████████████████                                                                                                | 226/677 [01:38<02:55,  2.58it/s]
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:256: void at::native::<unnamed>::sampleMultinomialOnce(long *, long, int, scalar_t *, scalar_t *, int, int) [with scalar_t = float, accscalar_t = float]: block: [283,0,0], thread: [0,0,0] Assertion `sum > accZero` failed.
Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/workspace/KGReasoning/code/MultiHopKG/src/experiments.py", line 765, in <module>
    run_experiment(args)
  File "/workspace/KGReasoning/code/MultiHopKG/src/experiments.py", line 746, in run_experiment
    train(lf)
  File "/workspace/KGReasoning/code/MultiHopKG/src/experiments.py", line 235, in train
    lf.run_train(train_data, dev_data)
  File "/workspace/KGReasoning/code/MultiHopKG/src/learn_framework.py", line 108, in run_train
    loss = self.loss(mini_batch)
  File "/workspace/KGReasoning/code/MultiHopKG/src/rl/graph_search/pg.py", line 58, in loss
    output = self.rollout(e1, r, e2, num_steps=self.num_rollout_steps)
  File "/workspace/KGReasoning/code/MultiHopKG/src/rl/graph_search/pg.py", line 135, in rollout
    sample_outcome = self.sample_action(db_outcomes, inv_offset)
  File "/workspace/KGReasoning/code/MultiHopKG/src/rl/graph_search/pg.py", line 205, in sample_action
    sample_outcome = sample(action_space, action_dist)
  File "/workspace/KGReasoning/code/MultiHopKG/src/rl/graph_search/pg.py", line 190, in sample
    sample_action_dist = apply_action_dropout_mask(action_dist, action_mask)
  File "/workspace/KGReasoning/code/MultiHopKG/src/rl/graph_search/pg.py", line 177, in apply_action_dropout_mask
    action_keep_mask = var_cuda(rand > self.action_dropout_rate).float()
  File "/workspace/KGReasoning/code/MultiHopKG/src/utils/ops.py", line 121, in var_cuda
    return Variable(x, requires_grad=requires_grad).cuda()
RuntimeError: CUDA error: device-side assert triggered

Kindly help me debug this, possible error sources and how to remove them.