Coder Social home page Coder Social logo

jortvincenti / fast_robust_early_exit Goto Github PK

View Code? Open in Web Editor NEW

This project forked from raymin0223/fast_robust_early_exit

0.0 0.0 1.0 4.49 MB

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)

Shell 2.62% Python 94.63% Jupyter Notebook 2.75%

fast_robust_early_exit's Introduction

Fast and Robust Early-Exiting (EMNLP 2023)

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding
Sangmin Bae$^*$, Jongwoo Ko$^*$, Hwanjun Song$^\dagger$, Se-Young Yun$^\dagger$
* equal contribution $&nbsp$ $\dagger$ corresponding author

  • Early-Exiting dynamically allocates computation paths based on the complexity of generation for each token.
  • Conventional framework failed to show actual speedup due to the large number of exit points and state copying mechanism.
  • We propose FREE, consists of (1) shallow-deep module, (2) synchronized parallel decoding, and (3) adaptive threshold estimator.
  • In contrast to conventional approaches, FREE achieved larger inference speedup on extensive generation tasks.

πŸš€ Updates

  • Implement CALM and FREE on decoder-only models
  • (24.02.08) Release finetuned checkpoints
  • (24.01.26) Won πŸ₯ˆSilver award from Samsung Humantech Paper Awards

Requirements

Install the necessary packages with:

$ pip install -r requirements.txt

Experiments

We experimented with 4 summarization tasks, 1 question answering task, and 1 machine translation task.
Please see the scripts and run shell files to train or evaluate on each dataset.

$ bash run_[TASK_NAME]_[DATASET_NAME].sh

Methods

You can run three early-exiting methods, including Static-Exiting, CALM, and our FREE method.

Here are some important arguments to be considered.
Please refer additional_args for more details.

Training for FREE:

  • --ouput_hidden_states_decoder True: return hidden_states from intermediate layers
  • --intermediate_loss_fn shallowdeep_kd_dyna: use a dynamic distillation loss between shallow and deep models
  • --shallow_exit_layer [int]: set the number of layers for the shallow model
  • --distill_layer_alpha [float]: distillation interpolation hyperparameter between CE and KL divergence losses

Training for CALM and Static-Exiting:

  • --ouput_hidden_states_decoder True: return hidden_states from intermediate layers
  • --intermediate_loss_fn weighted_ce: use a weighted average loss across all layers

Evaluation for FREE:

  • --deploy_scenario True: this should be always True to use deploying_[MODEL_NAME].py for FREE or CALM
  • --use_shallow_deep True: use shallow-deep module
  • --shallow_exit_layer [int]: set the number of layers for the shallow model
  • --shallow2deep_conf_type softmax: set the confidence measure to softmax values
  • --shallow2deep_conf_threshold [float]: threshold value to decide whether to exit or not in the shallow model
  • --use_adap_threshold True: use adaptive threshold estimator, where the initial threshold is set to shallow2deep_conf_threshold

Evaluation for CALM:

  • --deploy_scenario True: this should be always True to use deploying_[MODEL_NAME].py for FREE or CALM
  • --use_early_exit True: use conventional early-exiting framework
  • --exit_conf_type softmax: set the confidence measure to softmax values
  • --exit_conf_threshold [float]: threshold value to decide whether to exit or not
  • --exit_min_layer [int]: the minimum number of layers to forward to decide the exiting

Evaluation for Static-Exiting:

  • --static_exit_layer [int]: set how many layers to use for prediction

Results

FREE demonstrated robust performance and a larger AUC across various datasets and models, specifically with T5-large and T5-3B.

Human-like Summarization Evaluation

We conducted two human-like evaluation methods, Likert scale scoring and pairwise comparison (refer to this paper).
After correctly making input files through ipynb file, run bash gpt_eval.sh with your own OpenAI API_KEY.
Then, you can get the results by running the last cell in ipynb file.

Checkpoints

We share finetuned checkpoints in google drive.
Note that you must download tokenizer.json for each model individually from HuggingFace to run it without errors. (refer to Issue #3)

BibTeX

If you find this repo useful for your research, please consider citing our paper:

@misc{bae2023fast,
      title={Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding}, 
      author={Sangmin Bae and Jongwoo Ko and Hwanjun Song and Se-Young Yun},
      year={2023},
      eprint={2310.05424},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contact

fast_robust_early_exit's People

Contributors

joanvelja avatar jongwooko avatar jortvincenti avatar raymin0223 avatar

Forkers

joanvelja

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.