Hello, may I ask, have you trained in THE CNN/DM data set based on this model? What do

CNN/DM about control-length HOT 6 CLOSED

YoonaX commented on May 27, 2024

CNN/DM

from control-length.

Comments (6)

takase commented on May 27, 2024

We haven't applied this control output length model to CNN/DM dataset because the dataset doesn't prepare desired lengths for each summary in my understanding.

from control-length.

YoonaX commented on May 27, 2024

What exactly does it mean that each summary has the desired length?
My understanding is that the DUC 04 data set has 4 references, right?
And can you provide a download link to the GigaWord test set?
The link you provided earlier contains only training sets and validation sets.
thank you very much~

from control-length.

takase commented on May 27, 2024

Yes, DUC 2004 task 1 dataset contains 4 manual summaries.
The organizers of DUC 2004 instructed people to write summaries whose lengths are less than 75 characters.
Thus, we evaluate DUC 2004 task 1 dataset with summaries truncated over 75 characters.
Please read descriptions of datasets and previous studies.
https://duc.nist.gov/duc2004/
https://www.aclweb.org/anthology/D15-1044.pdf

For Gigaword dataset, I can't distribute it due to the license of LDC as described the previous comments.
But, in my understanding, the linked directory contains test data used in https://www.aclweb.org/anthology/D15-1044.pdf. The test set contains 1951 lines.

from control-length.

YoonaX commented on May 27, 2024

That's a good answer. I have another question for you.
Are the generated results of different lengths calculated with all the summaries?
Otherwise, is? The result with a length of 10 and the summary with a length of 10 is calculated at Rouge, the result with a length of 20 and the summary with a length of 20 is calculated at Rouge

Such as:
Two original sentences and two abstracts. The abstract length is 10 and 20, respectively. I set the target length to 10. Is the first sentence in the generated result calculated only with the first summary of length 10? Or are the two results calculated separately with the corresponding summary?

from control-length.

takase commented on May 27, 2024

I consider your question is on the case that we have multiple reference summaries.
The ROUGE score uses all reference summaries but if you set a max length in ROUGE script, it truncate characters over the max length from both of system and reference summaries.

from control-length.

YoonaX commented on May 27, 2024

Oh, I see. Thank you very much for your patience~

from control-length.

CNN/DM about control-length HOT 6 CLOSED

Comments (6)

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent