Coder Social home page Coder Social logo

chenxinan-fdu / cont Goto Github PK

View Code? Open in Web Editor NEW
9.0 1.0 0.0 3.94 MB

Fairseq Code for Neurips 2022 paper: "CoNT: Contrastive Neural Text Generation"

Home Page: https://arxiv.org/pdf/2205.14690v2.pdf

Python 97.62% Shell 0.09% C++ 0.67% Cuda 1.20% Cython 0.42%
deep-learning fairseq machine-translation

cont's Introduction

CoNT

Fairseq Code for Neurips 2022 paper: "CoNT: Contrastive Neural Text Generation"

This is the fairseq-based implementation for NeurIPS 2022 paper: CoNT: Contrastive Neural Text Generation. CoNT is a strong contrastive learning framework for neural text generation which outperforms the MLE based training method on five generation tasks, including machine translation, summarization, code comment generation, data-to-text generation, commensense generation. This repo is mainly about machine translation tasks. For other tasks please refer to our transformers repo


Dependencies

Main libraries

# clone our repo and fairseq
git clone https://github.com/ChenxinAn-fdu/CoNT.git
git clone https://github.com/facebookresearch/fairseq.git
# replace the fairseq folder with our custom code
rm -rf fairseq/fairseq && mv CoNT/fairseq  fairseq/
mv fairseq CoNT/ && cd CoNT/fairseq && pip install -e . && cd ..

Please follow the instruction in Fairseq to prepare the data.

We have provided the training scripts for IWSLT14 and WMT14 translation tasks which can make it very easy to reproduce our results: run_iwslt14.py and run_wmt14.py.

Generating binarized_dataset

python run_iwslt14.py --mode preprocess

Training

For example, if you have 4 V100-32 GPUs, run the following script for training with warmup:

python run_iwslt14.py --mode train --gpus 0,1,2,3 --warmup

If the --save-dir has already has a warmed-up checkpoint, you can directly omit the --warmup option

python run_iwslt14.py --mode train --gpus 0,1,2,3

Testing

python run_iwslt14.py --mode gen --gpus 0 --save_path /path/to/checkpoints/checkpoint_best.pt

With checkpoints average

python run_iwslt14.py --mode gen --gpus 0 --save_path /path/to/checkpoints/ --avg_ckpt

Tips of reproducing results on WMT14

  • Compound splitting: Previous work usually reports the results with compound splitting on WMT'14 En-De translation task. To apply compound splitting to the reference file and output file, please run the following cmd:
python run_wmt14.py --mode score --out_file /path/to/checkpoints/checkpoint_best.out
  • Training set: Fairseq has provided two versions of training set: (1) 3.9M training samples and (2) 4.5M training samples. We reported the results with the 4.5M training set (the same as the original paper transformer) in this paper. Using the 3.9M training set will lead to better results.

Citing

@article{an2022cont,
  title={CoNT: Contrastive Neural Text Generation},
  author={An, Chenxin and Feng, Jiangtao and Lv, Kai and Kong, Lingpeng and Qiu, Xipeng and Huang, Xuanjing},
  journal={arXiv preprint arXiv:2205.14690},
  year={2022}
}

cont's People

Contributors

chenxinan-fdu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

cont's Issues

Train IWSLT14 De-En

I tried to run the model again using fairseq, but the result obtained on the valid set during training is only about 6 BLEU points.
I have not changed the parameters.
I have reviewed the model architecture
Then I see the model in log file when training, TransformerCoNT Model includes:

  • Encoder
  • Decoder
  • Generator
  • Search, Sample,...
    According to this architecture, the Encoder and Decoder blocks are separate from the Generator, right?
    Thanks to the author team and look forward to hearing from you soon!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.