Comments (5)
Yes, just rename it.
from seq3.
The results now are a bit worse indeed, however this can be caused by many things. It is comparable, but it should be better.
Also, double-check the hyper-parameters in the config files and make sure that you are using the ones on the paper. After the submission I did some housekeeping in the codebase before uploading it, and I may have copied the configs with the wrong hyperparameters. I do have the checkpoints from the reported results though for reproducibility.
Good luck with your experiments!
from seq3.
@cbaziotis
Hi, I run it with seq3.full.yaml, and got this result
+----+-----------+-----------+-----------+
| | rouge-2 | rouge-1 | rouge-l |
|----+-----------+-----------+-----------|
| f | 0.0943 | 0.2970 | 0.3264 |
| p | 0.0780 | 0.2493 | 0.2798 |
| r | 0.1302 | 0.3975 | 0.4136 |
+----+-----------+-----------+-----------+
the second line (p) looks similar to the paper result:
Does that sound right?
Thanks!
from seq3.
No, you want the first line (F1). Your results are far better than the ones that we report in the paper. Did you do anything different in terms of the data that you used for example?
Also, judging from the layout of the console output you didn't use the same evaluation script as I did. I recall that there were differences between them and as a result, there is the danger that you'll not be comparable to other work. I recommend that you use the scripts in the evaluation directory.
This is the complete output of our model evaluation:
---------------------------------------------
1 ROUGE-1 Average_R: 0.34890 (95%-conf.int. 0.33765 - 0.36067)
1 ROUGE-1 Average_P: 0.21015 (95%-conf.int. 0.20305 - 0.21688)
1 ROUGE-1 Average_F: 0.25392 (95%-conf.int. 0.24581 - 0.26225)
---------------------------------------------
1 ROUGE-2 Average_R: 0.11558 (95%-conf.int. 0.10758 - 0.12343)
1 ROUGE-2 Average_P: 0.06746 (95%-conf.int. 0.06294 - 0.07216)
1 ROUGE-2 Average_F: 0.08214 (95%-conf.int. 0.07655 - 0.08752)
---------------------------------------------
1 ROUGE-L Average_R: 0.31184 (95%-conf.int. 0.30061 - 0.32222)
1 ROUGE-L Average_P: 0.18772 (95%-conf.int. 0.18105 - 0.19420)
1 ROUGE-L Average_F: 0.22679 (95%-conf.int. 0.21935 - 0.23423)
---------------------------------------------
1 ROUGE-W-1.2 Average_R: 0.19307 (95%-conf.int. 0.18577 - 0.19989)
1 ROUGE-W-1.2 Average_P: 0.17531 (95%-conf.int. 0.16906 - 0.18119)
1 ROUGE-W-1.2 Average_F: 0.17580 (95%-conf.int. 0.16990 - 0.18158)
To be sure that your evaluation is correct, try to evaluate the lead-8 baseline and compare with it.
from seq3.
@cbaziotis
Oh, I'm just printing the eval result with the dev file.
Now I switch to evaluation/gigaword, and got
Preparing documents... 0 line(s) ignored
Running ROUGE...
---------------------------------------------
1 ROUGE-1 Average_R: 0.31312 (95%-conf.int. 0.30219 - 0.32441)
1 ROUGE-1 Average_P: 0.20629 (95%-conf.int. 0.19913 - 0.21319)
1 ROUGE-1 Average_F: 0.24028 (95%-conf.int. 0.23236 - 0.24793)
---------------------------------------------
1 ROUGE-2 Average_R: 0.09535 (95%-conf.int. 0.08827 - 0.10227)
1 ROUGE-2 Average_P: 0.05992 (95%-conf.int. 0.05547 - 0.06421)
1 ROUGE-2 Average_F: 0.07076 (95%-conf.int. 0.06544 - 0.07568)
---------------------------------------------
1 ROUGE-L Average_R: 0.28157 (95%-conf.int. 0.27106 - 0.29250)
1 ROUGE-L Average_P: 0.18560 (95%-conf.int. 0.17886 - 0.19201)
1 ROUGE-L Average_F: 0.21606 (95%-conf.int. 0.20837 - 0.22356)
---------------------------------------------
1 ROUGE-W-1.2 Average_R: 0.17347 (95%-conf.int. 0.16665 - 0.18048)
1 ROUGE-W-1.2 Average_P: 0.17250 (95%-conf.int. 0.16610 - 0.17837)
1 ROUGE-W-1.2 Average_F: 0.16533 (95%-conf.int. 0.15939 - 0.17098)
So it's a little bit worse than yours. (still looks good to me) (my main purpose is to use the code, instead of comparing with it)
The difference could be I'm using pytorch1.3.0, and I changed the code in gumbel_softmax to be:
"""
#gumbels = -torch.empty_like(logits, memory_format=torch.legacy_contiguous_format).exponential_().log() # ~Gumbel(0,1)
gumbels = -torch.empty_like(logits).exponential_().log()
gumbels = (logits + gumbels) / tau # ~Gumbel(logits,tau)
y_soft = gumbels.softmax(dim = -1)
"""
Because torch.legacy_contiguous_format and _gumbel_softmax_sample can no longer be found.
I hope these make sense..
Thanks for the reply!
from seq3.
Related Issues (5)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seq3.