Coder Social home page Coder Social logo

LM prior checkpoint naming about seq3 HOT 5 CLOSED

cloudygoose avatar cloudygoose commented on August 12, 2024
LM prior checkpoint naming

from seq3.

Comments (5)

cbaziotis avatar cbaziotis commented on August 12, 2024 1

Yes, just rename it.

from seq3.

cbaziotis avatar cbaziotis commented on August 12, 2024 1

The results now are a bit worse indeed, however this can be caused by many things. It is comparable, but it should be better.

Also, double-check the hyper-parameters in the config files and make sure that you are using the ones on the paper. After the submission I did some housekeeping in the codebase before uploading it, and I may have copied the configs with the wrong hyperparameters. I do have the checkpoints from the reported results though for reproducibility.

Good luck with your experiments!

from seq3.

cloudygoose avatar cloudygoose commented on August 12, 2024

@cbaziotis
Hi, I run it with seq3.full.yaml, and got this result
+----+-----------+-----------+-----------+
| | rouge-2 | rouge-1 | rouge-l |
|----+-----------+-----------+-----------|
| f | 0.0943 | 0.2970 | 0.3264 |
| p | 0.0780 | 0.2493 | 0.2798 |
| r | 0.1302 | 0.3975 | 0.4136 |
+----+-----------+-----------+-----------+
the second line (p) looks similar to the paper result:
image
Does that sound right?
Thanks!

from seq3.

cbaziotis avatar cbaziotis commented on August 12, 2024

No, you want the first line (F1). Your results are far better than the ones that we report in the paper. Did you do anything different in terms of the data that you used for example?

Also, judging from the layout of the console output you didn't use the same evaluation script as I did. I recall that there were differences between them and as a result, there is the danger that you'll not be comparable to other work. I recommend that you use the scripts in the evaluation directory.

This is the complete output of our model evaluation:

---------------------------------------------
1 ROUGE-1 Average_R: 0.34890 (95%-conf.int. 0.33765 - 0.36067)
1 ROUGE-1 Average_P: 0.21015 (95%-conf.int. 0.20305 - 0.21688)
1 ROUGE-1 Average_F: 0.25392 (95%-conf.int. 0.24581 - 0.26225)
---------------------------------------------
1 ROUGE-2 Average_R: 0.11558 (95%-conf.int. 0.10758 - 0.12343)
1 ROUGE-2 Average_P: 0.06746 (95%-conf.int. 0.06294 - 0.07216)
1 ROUGE-2 Average_F: 0.08214 (95%-conf.int. 0.07655 - 0.08752)
---------------------------------------------
1 ROUGE-L Average_R: 0.31184 (95%-conf.int. 0.30061 - 0.32222)
1 ROUGE-L Average_P: 0.18772 (95%-conf.int. 0.18105 - 0.19420)
1 ROUGE-L Average_F: 0.22679 (95%-conf.int. 0.21935 - 0.23423)
---------------------------------------------
1 ROUGE-W-1.2 Average_R: 0.19307 (95%-conf.int. 0.18577 - 0.19989)
1 ROUGE-W-1.2 Average_P: 0.17531 (95%-conf.int. 0.16906 - 0.18119)
1 ROUGE-W-1.2 Average_F: 0.17580 (95%-conf.int. 0.16990 - 0.18158)

To be sure that your evaluation is correct, try to evaluate the lead-8 baseline and compare with it.

from seq3.

cloudygoose avatar cloudygoose commented on August 12, 2024

@cbaziotis
Oh, I'm just printing the eval result with the dev file.
Now I switch to evaluation/gigaword, and got

Preparing documents... 0 line(s) ignored
Running ROUGE...
---------------------------------------------
1 ROUGE-1 Average_R: 0.31312 (95%-conf.int. 0.30219 - 0.32441)
1 ROUGE-1 Average_P: 0.20629 (95%-conf.int. 0.19913 - 0.21319)
1 ROUGE-1 Average_F: 0.24028 (95%-conf.int. 0.23236 - 0.24793)
---------------------------------------------
1 ROUGE-2 Average_R: 0.09535 (95%-conf.int. 0.08827 - 0.10227)
1 ROUGE-2 Average_P: 0.05992 (95%-conf.int. 0.05547 - 0.06421)
1 ROUGE-2 Average_F: 0.07076 (95%-conf.int. 0.06544 - 0.07568)
---------------------------------------------
1 ROUGE-L Average_R: 0.28157 (95%-conf.int. 0.27106 - 0.29250)
1 ROUGE-L Average_P: 0.18560 (95%-conf.int. 0.17886 - 0.19201)
1 ROUGE-L Average_F: 0.21606 (95%-conf.int. 0.20837 - 0.22356)
---------------------------------------------
1 ROUGE-W-1.2 Average_R: 0.17347 (95%-conf.int. 0.16665 - 0.18048)
1 ROUGE-W-1.2 Average_P: 0.17250 (95%-conf.int. 0.16610 - 0.17837)
1 ROUGE-W-1.2 Average_F: 0.16533 (95%-conf.int. 0.15939 - 0.17098)

So it's a little bit worse than yours. (still looks good to me) (my main purpose is to use the code, instead of comparing with it)

The difference could be I'm using pytorch1.3.0, and I changed the code in gumbel_softmax to be:
"""
#gumbels = -torch.empty_like(logits, memory_format=torch.legacy_contiguous_format).exponential_().log() # ~Gumbel(0,1)
gumbels = -torch.empty_like(logits).exponential_().log()
gumbels = (logits + gumbels) / tau # ~Gumbel(logits,tau)
y_soft = gumbels.softmax(dim = -1)
"""
Because torch.legacy_contiguous_format and _gumbel_softmax_sample can no longer be found.
I hope these make sense..

Thanks for the reply!

from seq3.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.