Comments (14)
You mean changing action dropout rate from 0.0 to 0.1? 0.9 is very aggressive dropout and 1.0 implies dropout everything and randomly sample an edge.
If so 0.1 dropout rate shouldn't make such a huge difference. Would you mind posting the action dropout code you added to the original minerva code? And how many iterations did you train to observe this result difference? It would be great if you can plot the training curve before and after adding action dropout for comparison.
from multihopkg.
@todpole3 I re-edited my issue to show more detailed information of the training results.
from multihopkg.
@David-Lee-1990 In MINERVA code, is pre_distribution
used in gradient computation?
For us we only use action dropout to encourage diverse sampling but the policy gradient is still computed using the original probability vector.
from multihopkg.
In MINERVA, there is no dropout and I try to add this advancement on it. Follow your idea, I use dropout to encourage diverse sampling and the policy gradient is still compuated using the original distribution. I tested this for two versions: one is relation-only and the other not. Both versions show the similar results as i stated in the issue.
from multihopkg.
@David-Lee-1990 My question is, after adding "action dropout", did you use the updated probability vector pre_distribution
to compute the policy gradient?
from multihopkg.
No, i use the original one.
from multihopkg.
@David-Lee-1990 I cannot spot anything wrong with the code snippet you posted. Thanks for sharing. It might have something to do with the integration with the rest of MINERVA code.
Technically you only disturbed the sampling prob by a small factor (and your policy gradient computation still follows the traditional formula) so the result shouldn't change so significantly no matter what.
Would you mind running a sanity-checking experiments by setting the dropout rate to 0.01 and see how the result turned out? Technically the change should be very small. Then maybe try 0.02 and 0.05 and see if the results change gradually?
from multihopkg.
hi, i run a sanity-checking experiments by setting the keep rate in [1.0, 0.99, 0.98, 0.97, 0.95, 0.93, 0.90]. The results of hits@1 on training batch is as follows:
1.0 VS 0.99
1.0 VS 0.98
1.0 VS 0.97
1.0 VS 0.95
1.0 VS 0.93
1.0 VS 0.90
from multihopkg.
@David-Lee-1990 Very interesting. I want to look deeper into this issue.
The most noticeable difference is that the dev result you reported without action dropout is close to what we have with 0.1 action dropout and significantly higher than what we have without action dropout.
Besides action dropout rate, did you use the same set of hyperparameters as we did in the configuration files?
If not, would you mind sharing your set of hyperparameters? I want to see if I can reproduce the same results on our code repo.
And one more question, did you observe similar trend on other datasets using MINERVA code + action dropout?
from multihopkg.
I tested this for two versions: one is relation-only and the other not. Both versions show the similar results as i stated in the issue.
@David-Lee-1990 Are the plots shown above generated with relation-only
or not?
from multihopkg.
The most noticeable difference is that the dev result you reported without action dropout is close to what we have with 0.1 action dropout and significantly higher than what we have without action dropout.
@todpole3 about the dev result, i need to clarify that i used "sum" method, which is different with "max" method as you used, when calculating hit@k and MRR. "sum" method ranks the predicted entity by adding those probalibily, which predict the same end entity, up. The following code is from MINERVA, where lse is calculating the log sum.
And I also test "max" method on WN18RR, MRR on dev set is as follows.
as comparison, i paste the result of "sum" and "max "method together here:
from multihopkg.
I tested this for two versions: one is relation-only and the other not. Both versions show the similar results as i stated in the issue.
@David-Lee-1990 Are the plots shown above generated with
relation-only
or not?
relation only
from multihopkg.
Besides action dropout rate, did you use the same set of hyperparameters as we did in the configuration files?
If not, would you mind sharing your set of hyperparameters? I want to see if I can reproduce the same results on our code repo.
I give my hyperparameters in the form of your notation as follows:
group_examples_by_query="False"
use_action_space_bucketing="False"
bandwidth=200
entity_dim=100
relation_dim=100
history_dim=100
history_num_layers=1
train_num_rollouts=20
dev_num_rollouts=40
num_epochs=1000 # follow minerva, i randomly choose training data which has batch_size samples
train_batch_size=128
dev_batch_size=128
learning_rate=0.001
grad_norm=5
emb_dropout_rate=0
ff_dropout_rate=0
action_dropout_rate=1.0
beta=0.05
relation_only="True"
beam_size=100
from multihopkg.
And one more question, did you observe similar trend on other datasets using MINERVA code + action dropout?
@todpole3 Follow your advice, I test action dropout on nell-995 today, its performance is as follows.
from multihopkg.
Related Issues (20)
- cannot replicate NELL-995 MINERVA result [Ongoing] HOT 8
- About "margin" and "num_negative_samples" hyperparames when training embedding based models HOT 2
- Problem with output model files HOT 6
- Relation and entity dimension
- RuntimeError: CUDA error: device-side assert triggered HOT 17
- dataset
- about the NELL dataset HOT 2
- NELL-995 dataset HOT 1
- about the function of nell split HOT 1
- the meaning of NELL split HOT 2
- model:point.gc and gc HOT 1
- About the code in learn_framework.py
- How to get the "raw.pgrk" file in the dataset? HOT 1
- Some doubts about result HOT 3
- check the dev performance HOT 1
- AttributeError: 'KnowledgeGraph' object has no attribute 'entity2bucketid'
- Questions about nell-995 and nell-995.test 关于nell-995和nell-995.test的问题 HOT 2
- tensors used as indices must be long, byte or bool tensors HOT 5
- 代码中生成的checkpoint-1.tar文件类型显示是8086 relocatable (Microsoft)类型,请问怎么打开呀? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from multihopkg.