Comments (13)
In our experimental configuration, we use a character-based decoder and use @@@@
as a space (a segmentation marker of words).
(I'm sorry the lack of my explanation. We applied the BPE code to a source side only.)
Thus, you pick up decoder outputs and replace the segmentation marker with the space as follows:
cat decoder_output.txt | sed 's/ //g' | sed 's/\@\@\@\@/ /g'
(decoder_output.txt contains the output hypotheses only)
In addition, you can modify the output length with --desired-length
option.
from control-length.
That table represents recall-based ROUGE scores i.e., ROUGE-N Average_R in results of the evaluation script.
from control-length.
I think that this error is caused by the difference among vocabulary sizes.
Did you apply my BPE code (which is attached to the gz file, as I remember) to your text file?
from control-length.
Should I fist apply BPE code to my text, then preprocess the text, right?
from control-length.
Yes! Please try such procedure.
from control-length.
I added dictionary file which is used to construct binary file in my experiments.
If you need it, please download the trained model file from my google drive again.
https://drive.google.com/file/d/15Sy8rv6Snw6Nso7T5MxYHSAZDdieXpE7/view?usp=sharing
from control-length.
So nice are you!
I have used the dictionary to construct binary file of test set.
And I get a best_15_nbest.txt after running generate.py.
But the headlines generated seem terrible, there two examples:
S-67 the agenda might be global , but the men@@ u will be malaysian when world leaders meet next week for the asia-pacific economic cooperation forum .
T-67 <> si@@ a-@@ <> ac@@ ific <> con@@ om@@ ic <> o@@ operation forum features sp@@ icy cu@@ is@@ ine
H-67 -0.09418036788702011 a f p @@@@ w o r l d @@@@ n e w s
P-67 -0.8212 -0.1580 -0.0270 -0.0621 -0.0602 -0.0257 -0.0338 -0.0303 -0.0275 -0.0282 -0.0313 -0.0246 -0.0250 -0.0321 -0.0256
S-362 the european union wednesday condemned the slaying of four foreign hostages in chechnya and said it would raise the issue with russia 's foreign minister .
T-362 <> <> condem@@ ns slaying of 4 hostages in <> he@@ chn@@ ya@@ ; were telephone engine@@ er@@ s.
H-362 -0.11744093894958496 e u @@@@ c o n d e m n s @@@@ c h e
P-362 -0.3688 -0.0332 -0.0409 -0.2489 -0.0318 -0.0254 -0.0284 -0.0278 -0.0296 -0.0259 -0.0304 -0.0305 -0.5614 -0.0417 -0.0373 -0.3170
Before use the dictionary file you released to construct binary file, I preprocessed train and valid set of Gigaword and had got a dictionary whose size is 15380. However the size of dictionary you released is 16148.
It seems not avaliable to use your dictionary to preprocess test set on my Gigaword, which may be the reason why I failed to use your checkpoint to generate.
from control-length.
You means just apply BPE to train.article, valid.article and test.article, and then preprocess them with *.title together to get a joint-dictionary?
after doing that, I get a dictionary whose size is 74932......(which is much larger than you released dictionary)
It seems that I misunderstand "We applied the BPE code to a source side only".
from control-length.
Could I confirm your purpose?
I thought that you planned to apply pre-trained model to a text for generation.
But you want to train a model on a new corpus?
from control-length.
My purpose is to apply pre-trained model to a text for generation, rather than train a model on a new corpus.
However, I get terrible generated headlines even if applying the checkpoint and dictionary you released.
Therefor I'm wondering the reason why I failed.
Following are two instructions that I have used to preprocess test file and generate:
-
preprocess test.article and test.title:
python preprocess.py --source-lang article --target-lang title --tgtdict /users6/ychuang/program/python/control-length-master/databin_bpe/test/dict.joined.txt --srcdict /users6/ychuang/program/python/control-length-master/databin_bpe/test/dict.joined.txt --testpref /users6/ychuang/program/python/control-length-master/sumdata/DUC2004/test_bpe --destdir /users6/ychuang/program/python/control-length-master/databin_bpe/test
-
generate:
CUDA_VISIBLE_DEVICES=0,1,2,3 python generate.py /users6/ychuang/program/python/control-length-master/databin_bpe/test --source-lang article --target-lang title --path /users6/ychuang/program/python/control-length-master/pretrained_model/trained_model_lrpe_pe_averaged.pt --desired-length 15 --batch-size 32 --beam 5 > best_75_nbest.txt
from control-length.
I see.
Could you extract the generated sentences from best_75_nbest.txt
with the following commands?
cat best_75_nbest.txt | grep '^H' | sed 's/^H\-//g' | sort -t ' ' -k1,1 -n | cut -f 3- > output.txt
Then, replace segmentation markers into spaces:
cat output.txt | sed 's/ //g' | sed 's/\@\@\@\@/ /g'
I expect that you can obtain sentences whose lengths are 15.
(Or, since our pre-trained model is fragile in the same as other neural encoder-decoders, the model might be broken by unknown inputs)
from control-length.
Hi, I have succeed using checkpoint and dictionary you released to generate headlines with following instructions:
CUDA_VISIBLE_DEVICES=0,1,2,3 python generate.py /users6/ychuang/program/python/control-length-master/databin2/test --source-lang article --target-lang title --path /users6/ychuang/program/python/control-length-master/pretrained_model/trained_model_lrpe_pe_averaged.pt --desired-length 75 --batch-size 32 --beam 5 > best_75_nbest.txt
cat best_75_nbest.txt | grep '^H' | sed 's/^H\-//g' | sort -t ' ' -k1,1 -n | cut -f 3- > output.txt
cat output.txt | sed 's/ //g' | sed 's/\@\@\@\@/ /g' > res_75.txtcat output.txt | sed 's/ //g' | sed 's/\@\@\@\@/ /g' > res_75.txt
Then I used the script in directory 'eval' to evaluate ROUGE of res_75.txt, follow is the instruction and result:
-
./eval.sh /users6/ychuang/program/python/control-length-master/eval/testdata /users6/ychuang/program/python/control-length-master/eval/testdata/input.txt 75 /users6/ychuang/program/python/control-length-master/eval /users6/ychuang/software/ROUGE/
-
ROUGE result:
a ROUGE-1 Average_R: 0.31056 (95%-conf.int. 0.29791 - 0.32237)
a ROUGE-1 Average_P: 0.28526 (95%-conf.int. 0.27364 - 0.29666)
a ROUGE-1 Average_F: 0.29650 (95%-conf.int. 0.28433 - 0.30811)
a ROUGE-2 Average_R: 0.11021 (95%-conf.int. 0.10188 - 0.11849)
a ROUGE-2 Average_P: 0.10084 (95%-conf.int. 0.09295 - 0.10858)
a ROUGE-2 Average_F: 0.10499 (95%-conf.int. 0.09703 - 0.11296)
a ROUGE-L Average_R: 0.27212 (95%-conf.int. 0.26072 - 0.28356)
a ROUGE-L Average_P: 0.25024 (95%-conf.int. 0.23987 - 0.26069)
a ROUGE-L Average_F: 0.25995 (95%-conf.int. 0.24929 - 0.27060)
The ROUGE of DUC2004 in your paper is:
My question is that:
The ROUGE you present in your paper is Recall or F1?
from control-length.
Thank you again!
from control-length.
Related Issues (7)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from control-length.