Comments (6)
We haven't applied this control output length model to CNN/DM dataset because the dataset doesn't prepare desired lengths for each summary in my understanding.
from control-length.
What exactly does it mean that each summary has the desired length?
My understanding is that the DUC 04 data set has 4 references, right?
And can you provide a download link to the GigaWord test set?
The link you provided earlier contains only training sets and validation sets.
thank you very much~
from control-length.
Yes, DUC 2004 task 1 dataset contains 4 manual summaries.
The organizers of DUC 2004 instructed people to write summaries whose lengths are less than 75 characters.
Thus, we evaluate DUC 2004 task 1 dataset with summaries truncated over 75 characters.
Please read descriptions of datasets and previous studies.
https://duc.nist.gov/duc2004/
https://www.aclweb.org/anthology/D15-1044.pdf
For Gigaword dataset, I can't distribute it due to the license of LDC as described the previous comments.
But, in my understanding, the linked directory contains test data used in https://www.aclweb.org/anthology/D15-1044.pdf. The test set contains 1951 lines.
from control-length.
That's a good answer. I have another question for you.
Are the generated results of different lengths calculated with all the summaries?
Otherwise, is? The result with a length of 10 and the summary with a length of 10 is calculated at Rouge, the result with a length of 20 and the summary with a length of 20 is calculated at Rouge
Such as:
Two original sentences and two abstracts. The abstract length is 10 and 20, respectively. I set the target length to 10. Is the first sentence in the generated result calculated only with the first summary of length 10? Or are the two results calculated separately with the corresponding summary?
from control-length.
I consider your question is on the case that we have multiple reference summaries.
The ROUGE score uses all reference summaries but if you set a max length in ROUGE script, it truncate characters over the max length from both of system and reference summaries.
from control-length.
Oh, I see. Thank you very much for your patience~
from control-length.
Related Issues (7)
- how can i use it to do text summarization? HOT 5
- Trained checkpoint? HOT 2
- gigawod dataset HOT 2
- How to load pre-trained checkpoint? HOT 13
- RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: HOT 6
- Use of `padding_idx` in SinusoidalPositionalEmbedding HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from control-length.