Hi, could you tell me how to reload the pre-trained checkpoint you released? When

How to load pre-trained checkpoint? about control-length HOT 13 CLOSED

OrangeInSouth commented on May 27, 2024

How to load pre-trained checkpoint?

from control-length.

Comments (13)

takase commented on May 27, 2024 1

In our experimental configuration, we use a character-based decoder and use @@@@ as a space (a segmentation marker of words).
(I'm sorry the lack of my explanation. We applied the BPE code to a source side only.)

Thus, you pick up decoder outputs and replace the segmentation marker with the space as follows:

cat decoder_output.txt | sed 's/ //g' | sed 's/\@\@\@\@/ /g'

(decoder_output.txt contains the output hypotheses only)

In addition, you can modify the output length with --desired-length option.

from control-length.

takase commented on May 27, 2024 1

That table represents recall-based ROUGE scores i.e., ROUGE-N Average_R in results of the evaluation script.

from control-length.

takase commented on May 27, 2024

I think that this error is caused by the difference among vocabulary sizes.
Did you apply my BPE code (which is attached to the gz file, as I remember) to your text file?

from control-length.

OrangeInSouth commented on May 27, 2024

Should I fist apply BPE code to my text, then preprocess the text, right?

from control-length.

takase commented on May 27, 2024

Yes! Please try such procedure.

from control-length.

takase commented on May 27, 2024

I added dictionary file which is used to construct binary file in my experiments.
If you need it, please download the trained model file from my google drive again.
https://drive.google.com/file/d/15Sy8rv6Snw6Nso7T5MxYHSAZDdieXpE7/view?usp=sharing

from control-length.

OrangeInSouth commented on May 27, 2024

So nice are you!
I have used the dictionary to construct binary file of test set.
And I get a best_15_nbest.txt after running generate.py.
But the headlines generated seem terrible, there two examples:

S-67 the agenda might be global , but the men@@ u will be malaysian when world leaders meet next week for the asia-pacific economic cooperation forum .
T-67 <> si@@ a-@@ <> ac@@ ific <> con@@ om@@ ic <> o@@ operation forum features sp@@ icy cu@@ is@@ ine
H-67 -0.09418036788702011 a f p @@@@ w o r l d @@@@ n e w s
P-67 -0.8212 -0.1580 -0.0270 -0.0621 -0.0602 -0.0257 -0.0338 -0.0303 -0.0275 -0.0282 -0.0313 -0.0246 -0.0250 -0.0321 -0.0256
S-362 the european union wednesday condemned the slaying of four foreign hostages in chechnya and said it would raise the issue with russia 's foreign minister .
T-362 <> <> condem@@ ns slaying of 4 hostages in <> he@@ chn@@ ya@@ ; were telephone engine@@ er@@ s.
H-362 -0.11744093894958496 e u @@@@ c o n d e m n s @@@@ c h e
P-362 -0.3688 -0.0332 -0.0409 -0.2489 -0.0318 -0.0254 -0.0284 -0.0278 -0.0296 -0.0259 -0.0304 -0.0305 -0.5614 -0.0417 -0.0373 -0.3170

Before use the dictionary file you released to construct binary file, I preprocessed train and valid set of Gigaword and had got a dictionary whose size is 15380. However the size of dictionary you released is 16148.
It seems not avaliable to use your dictionary to preprocess test set on my Gigaword, which may be the reason why I failed to use your checkpoint to generate.

from control-length.

OrangeInSouth commented on May 27, 2024

You means just apply BPE to train.article, valid.article and test.article, and then preprocess them with *.title together to get a joint-dictionary?

after doing that, I get a dictionary whose size is 74932......(which is much larger than you released dictionary)

It seems that I misunderstand "We applied the BPE code to a source side only".

from control-length.

takase commented on May 27, 2024

Could I confirm your purpose?
I thought that you planned to apply pre-trained model to a text for generation.
But you want to train a model on a new corpus?

from control-length.

OrangeInSouth commented on May 27, 2024

My purpose is to apply pre-trained model to a text for generation, rather than train a model on a new corpus.

However, I get terrible generated headlines even if applying the checkpoint and dictionary you released.

Therefor I'm wondering the reason why I failed.

Following are two instructions that I have used to preprocess test file and generate:

preprocess test.article and test.title:
python preprocess.py --source-lang article --target-lang title --tgtdict /users6/ychuang/program/python/control-length-master/databin_bpe/test/dict.joined.txt --srcdict /users6/ychuang/program/python/control-length-master/databin_bpe/test/dict.joined.txt --testpref /users6/ychuang/program/python/control-length-master/sumdata/DUC2004/test_bpe --destdir /users6/ychuang/program/python/control-length-master/databin_bpe/test
generate:
CUDA_VISIBLE_DEVICES=0,1,2,3 python generate.py /users6/ychuang/program/python/control-length-master/databin_bpe/test --source-lang article --target-lang title --path /users6/ychuang/program/python/control-length-master/pretrained_model/trained_model_lrpe_pe_averaged.pt --desired-length 15 --batch-size 32 --beam 5 > best_75_nbest.txt

from control-length.

takase commented on May 27, 2024

I see.
Could you extract the generated sentences from best_75_nbest.txt with the following commands?
cat best_75_nbest.txt | grep '^H' | sed 's/^H\-//g' | sort -t ' ' -k1,1 -n | cut -f 3- > output.txt

Then, replace segmentation markers into spaces:
cat output.txt | sed 's/ //g' | sed 's/\@\@\@\@/ /g'

I expect that you can obtain sentences whose lengths are 15.
(Or, since our pre-trained model is fragile in the same as other neural encoder-decoders, the model might be broken by unknown inputs)

from control-length.

OrangeInSouth commented on May 27, 2024

Hi, I have succeed using checkpoint and dictionary you released to generate headlines with following instructions:

CUDA_VISIBLE_DEVICES=0,1,2,3 python generate.py /users6/ychuang/program/python/control-length-master/databin2/test --source-lang article --target-lang title --path /users6/ychuang/program/python/control-length-master/pretrained_model/trained_model_lrpe_pe_averaged.pt --desired-length 75 --batch-size 32 --beam 5 > best_75_nbest.txt
cat best_75_nbest.txt | grep '^H' | sed 's/^H\-//g' | sort -t ' ' -k1,1 -n | cut -f 3- > output.txt
cat output.txt | sed 's/ //g' | sed 's/\@\@\@\@/ /g' > res_75.txtcat output.txt | sed 's/ //g' | sed 's/\@\@\@\@/ /g' > res_75.txt

Then I used the script in directory 'eval' to evaluate ROUGE of res_75.txt, follow is the instruction and result:

./eval.sh /users6/ychuang/program/python/control-length-master/eval/testdata /users6/ychuang/program/python/control-length-master/eval/testdata/input.txt 75 /users6/ychuang/program/python/control-length-master/eval /users6/ychuang/software/ROUGE/
ROUGE result：
a ROUGE-1 Average_R: 0.31056 (95%-conf.int. 0.29791 - 0.32237)
a ROUGE-1 Average_P: 0.28526 (95%-conf.int. 0.27364 - 0.29666)
a ROUGE-1 Average_F: 0.29650 (95%-conf.int. 0.28433 - 0.30811)
a ROUGE-2 Average_R: 0.11021 (95%-conf.int. 0.10188 - 0.11849)
a ROUGE-2 Average_P: 0.10084 (95%-conf.int. 0.09295 - 0.10858)
a ROUGE-2 Average_F: 0.10499 (95%-conf.int. 0.09703 - 0.11296)
a ROUGE-L Average_R: 0.27212 (95%-conf.int. 0.26072 - 0.28356)
a ROUGE-L Average_P: 0.25024 (95%-conf.int. 0.23987 - 0.26069)
a ROUGE-L Average_F: 0.25995 (95%-conf.int. 0.24929 - 0.27060)

The ROUGE of DUC2004 in your paper is:

My question is that:
The ROUGE you present in your paper is Recall or F1?

from control-length.

OrangeInSouth commented on May 27, 2024

Thank you again!

from control-length.

How to load pre-trained checkpoint? about control-length HOT 13 CLOSED

Comments (13)

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent