Coder Social home page Coder Social logo

raphael-sch / revisit-bert-finetuning Goto Github PK

View Code? Open in Web Editor NEW

This project forked from asappresearch/revisit-bert-finetuning

0.0 0.0 0.0 367 KB

For the code release of our arXiv paper "Revisiting Few-sample BERT Fine-tuning" (https://arxiv.org/abs/2006.05987).

License: Other

Shell 6.57% Python 93.43%

revisit-bert-finetuning's Introduction

Revisiting Few-sample BERT Fine-tuning

made-with-python License: MIT

Paper Link

Authors:

*: Equal Contribution

Overview

In this paper, we study the problem of few-sample BERT fine-tuning and identify three sub-optimal practices. First, we observe that the omission of the gradient bias correction in the BERTAdam makes fine-tuning unstable. We also find that the top layers of BERT provide a detrimental initialization and simply re-initializing these layers improves convergence and performance. Finally, we observe that commonly used recipes often do not allocate sufficient time for training.

If you find this repo useful, please cite:

@article{revisit-bert-finetuning,
  title={Revisiting Few-sample BERT Fine-tuning},
  author={Zhang, Tianyi and Wu, Felix and Katiyar, Arzoo and Weinberger, Kilian Q. and Artzi, Yoav.},
  journal={arXiv preprint arXiv:2006.05987},
  year={2019}
}

Requirements

torch==1.4.0
transformers==2.8.0
apex==0.1
tqdm
tensorboardX

Please install apex following the instructions at https://github.com/NVIDIA/apex.

Usage

We provide the following sample scripts. When using these scripts, please change --data_dir, --output_dir and --cache_dir to the your path to data folder, output folder, and transformers cache directory.

  1. To train BERT baseline (with debiased Adam):
bash sample_commands/debiased_adam_baseline.sh
  1. To use Re-init:
bash sample_commands/reinit.sh
  1. To train the model with more iterations
bash sample_commands/debiased_adam_longer.sh
  1. To use mixout:
bash sample_commands/mixout.sh
  1. To use layer-wise learning rate decay:
bash sample_commands/llrd.sh
  1. To use pretrained weight decay:
bash sample_commands/pretrained_wd.sh 

Input

You need to download GLUE dataset by this script. Feed the path to your data through --data_dir.

Commands

We provide example commands to replicate our experiments in sample_commands.

run_glue.py contains the main program to fine-tuning and evaluate models. python run_glue.py --help shows all available options.

Some key options are:

# These two replicate our experiments of bias cortrection
--use_bertadam        No bias correction # this replicates the behavior of BERTAdam
--use_torch_adamw     Use pytorch adamw # this replicates the behavior of debiased Adam 
# These two two replicate our experiments of Re-init
--reinit_pooler       reinitialize the pooler
--reinit_layers       re-initialize the last N Transformer blocks. reinit_pooler must be turned on.

Output

A standard output folder generated by run_glue.py will look like:

├── raw_log.txt
├── test_best_log.txt
├── test_last_log.txt
└── training_args.bin

*_log.txt are csv files that record relevant training and evaluate results. test_best_log.txt records the test performance with the best model checkpoint during training. test_last_log.txt records that with the last model checkpoint. training_args.bin contains all arguments used to run a job.

revisit-bert-finetuning's People

Contributors

fwu-asapp avatar tianyi-asapp avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.