baharefatemi / simple Goto Github PK

View Code? Open in Web Editor NEW

115.0 115.0 21.0 8.64 MB

Implementation of SimplE Embedding for Link Prediction in Knowledge Graphs in PyTorch

Python 100.00%

simple's People

Contributors

Stargazers

Watchers

simple's Issues

"Min" Policy for ties in scoring

Hi, thank you for your work on this model. I really appreciate it.
I am writing because, studying the code in the tester.py module, I found the get_rank:

    def get_rank(self, sim_scores):#assuming the test fact is the first one
        return (sim_scores > sim_scores[0]).sum() + 1.0

In this method, you count the rank of the target entity as the number of entities for which score strictly higher than the one of the target entity itself.

This means that if separate entities yield exactly the same ranking as the target entity, you will not count them in the ranking. In other words, in case of ties, you always return the minimum rank.
This is a "min" policy; is this the expected behavior?

I am asking this because I believe that the "min" policy is not best one for link prediction models.
In theory, a model using "min" policy could give the same rank to all entities in all predictions, and it would score MRR = 1.0.

Of course its effects depend on how much your model is prone to give the same, identical score to multiple answers: if there are not ties at all, the policy will never be applied.
In your experience, does SimplE generate ties?

How to build new dataset.

Sorry for asking such a fool question.
I'm really new in this area,I'm trying to create a dataset by my own. The dataset already has (h,r,t) three columns, and how to split them into train,valid,test. For other dataset (not graph dataset), I split them randomly to 8:1:1. Is that correct in graph dataset? Or is that correct in SimplE dataset?

Reproducing FB15K Results / FB15K-237?

I am trying to reproduce results, as well as try a new dataset, FB15K-237; I'd appreciate any thoughts you have (see questions at the end):

Reproducing FB15K results: I am using Windows10, pytorch-nightly (1.1, May15), TitanXP. I ran the example from the README.md:

python main.py -ne 1000 -lr 0.05 -reg 0.1 -dataset FB15K -emb_dim 200 -neg_ratio 10 -batch_size 4832 -save_each 50

Training took right about 6 hours. Validation takes 1250 seconds/epoch. Results:

Source	FB15K	MRR Filter	MRR Raw	Hit@1	Hit@3	Hit@10
Paper	SimplE-ignr	0.700	0.237	0.625	0.754	0.821
Paper	SimplE	0.727	0.239	0.660	0.773	0.838
My Results	SimplE-a	0.726	0.240	0.659	0.770	0.837

FB15K-237
I looked at RotatE at github, and noticed that the test.txt, train.txt, and valid.txt exactly matched the files in your FB15K directory. So, I thought I would be able to run their FB15k-237 dataset with SimplE. Using the same command as above (but referencing FB15K237), I got the following, which seems a little low (notwithstanding this is a much harder dataset, with inverse relations removed):

Source	FB15K237	MRR Filter	MRR Raw	Hit@1	Hit@3	Hit@10
My Results	SimplE-b	0.168	0.074	0.094	0.178	0.324

Questions:

What is the proper command to recreate the results for SimplE, from the paper? Already, I can see I should have used 0.03 for the learning rate, for example. But clearly, the results are quite close anyway.
Are the result from the paper the result of a single run, or the average/best over N runs?
Are the modifications needed to run "SimplE-ignr" straightforward, or more involved?
Are my run-times about the same as what you get?
Does one need to do anything special to incorporate a new dataset like FB15K-237? The train/test/val files look like the same format to me.
Does the removal of inverse relations from the dataset impact SimpleE? Based on some of your comments in the paper, it seems it might, but where in the code is the impact particularly felt?
Regarding section 6.2 from the paper, how did you incorporate the background knowledge?

Thanks!

The FB15k train file is missing.

Output of evaluation script with 0 values for Raw setting

I just ran the FB15K example and got this:

Loss in iteration 1000: 160424.18872070312(FB15K)
Saving the model
~~~~ Select best epoch on validation set ~~~~
50
Raw setting:
        Hit@1 = 0.0
        Hit@3 = 0.0
        Hit@10 = 0.0
        MR = 0.0
        MRR = 0.0

Fil setting:
        Hit@1 = 0.51695
        Hit@3 = 0.70038
        Hit@10 = 0.81951
        MR = 106.17348
        MRR = 0.6264143401644883

All the results for Raw setting are 0.

baharefatemi / simple Goto Github PK

simple's People

Contributors

Stargazers

Watchers

Forkers

simple's Issues

"Min" Policy for ties in scoring

How to build new dataset.

Reproducing FB15K Results / FB15K-237?

The FB15k train file is missing.

Output of evaluation script with 0 values for Raw setting

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent