Coder Social home page Coder Social logo

frankaging / causal-distill-xxs Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 1.0 2.24 MB

The Codebase for Causal Distillation for Task-Specific Models

Home Page: https://arxiv.org/abs/2112.02505

Dockerfile 0.01% Python 55.33% Jupyter Notebook 44.65%
model-distillation language-model natural-language-understanding causality

causal-distill-xxs's Introduction

Python 3.7 License CC BY-NC

Causal Distillation for Natural Language Understanding Tasks (DIITO-XXS)

This is an ONGOING research effort. So, don't expect everything to be working. The is an extended implementation of our preprint Causal Distillation for Language Models by applying the method to task-specific models (i.e., the teacher model here is a fine-tuned model). The codebased for the distillation method the distillation interchange intervention training objective (DIITO) can be found here.

We fork our main codebase from the PKD Distillation to ensure a fair comparison.

Release Notes

โœ… 02/21/2022 Release this codebase for others who are interested in applying DIITO to task-specific models.

If you experience any issues or have suggestions, please contact me either thourgh the issues page or at [email protected].

Main Contents

Citation

If you use this repository, please cite the following two papers: paper for interchange intervention training, and paper for the our distillation method.

  @article{geiger-etal-2021-iit,
        title={Inducing Causal Structure for Interpretable Neural Networks}, 
        author={Geiger, Atticus and Wu, Zhengxuan and Lu, Hanson and Rozner, Josh and Kreiss, Elisa and Icard, Thomas and Goodman, Noah D. and Potts, Christopher},
        year={2021},
        eprint={2112.00826},
        archivePrefix={arXiv},
        primaryClass={cs.LG}
  }

  @article{wu-etal-2021-distill,
        title={Causal Distillation for Language Models}, 
        author={Wu, Zhengxuan and Geiger, Atticus and Rozner, Josh and Kreiss, Elisa and Lu, Hanson and Icard, Thomas and Potts, Christopher and Goodman, Noah D.},
        year={2021},
        eprint={2112.02505},
        archivePrefix={arXiv},
        primaryClass={cs.CL}
  }

Requirements

  • Python 3.6 or 3.7 are supported.
  • Pytorch Version: 1.9.0
  • Transfermers Version: 4.11.3
  • Datasets Version: Version: 1.8.0
  • Since we build our codebase off the Huggingface Distillation Interface, please review their doc for requirements.

Distillation

Now, here is an example for you to distill with our causal distillation objective or without,

python KD_training.py \
--task_name SST-2 \
--output_dir data/outputs/KD/SST-2/teacher_12layer/ \
--bert_model bert-base-uncased \
--max_seq_length 128 \
--train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 5 \
--eval_batch_size 32 \
--gradient_accumulation_steps 1 \
--log_interval 10 \
--checkpoint_interval 100 \
--do_train \
--fp16 False \
--student_hidden_layers 6 \
--fc_layer_idx 1,3,5,7,9 \
--kd_model kd \
--alpha 0.7 \
--T 20 \
--is_wandb \
--wandb_metadata wuzhengx:DIITO-XXS \
--neuron_mapping full \
--is_diito \
--interchange_prop 0.3

causal-distill-xxs's People

Contributors

frankaging avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

trellixvulnteam

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.