AccelTran: A Sparsity-Aware Monolithic 3D Accelerator for Transformer Architectures at Scale

AccelTran is a tool to simulate a design space of accelerators on diverse flexible and heterogeneous transformer architectures supported by the FlexiBERT 2.0 framework at jha-lab/txf_design-space.

The figure below shows the utilization of different modules in an AccelTran architecture for the BERT-Tiny transformer model.

Environment Setup
- Clone this repository and initialize sub-modules
- Setup python environment
Run synthesis
Run pruning
Run simulator
Developer
Cite this work
License

Environment setup

Clone this repository and initialize sub-modules

git clone https://github.com/JHA-Lab/acceltran.git
cd ./acceltran/
git submodule init
git submodule update

Setup python environment

The python environment setup is based on conda. The script below creates a new environment named txf_design-space:

source env_setup.sh

For pip installation, we are creating a requirements.txt file. Stay tuned!

Run synthesis

Synthesis scripts use Synopsys Design Compiler. All hardware modules are implemented in SystemVerilog in the directory synthesis/top.

To get area and power consumption reports for each module, use the following command:

cd ./synthesis/
dc_shell -f 14nm_sg.tcl -x "set top_module <M>"
cd ..

Here, <M> is the module that is to be synthesized in: mac_lane, ln_forward_<T> (for layer normalization), softmax_<T>, etc. where <T> is the tile size among 8, 16, or 32.

All output resports are stored in synthesis/reports.

To run the synthesis for the DMA module, run the following command instead:

cd ./synthesis/
dc_shell -f dma.tcl

Run pruning

To get the sparsity in activations and weights in an input transformer model and its corresponding performance on the GLUE benchmark, use the dynamic pruning model: DP-BERT.

To test the effect of different sparsity ratios on the model performance on the SST-2 benchmark, use the following script:

cd ./pruning/
python3 run_evaluation.py --task sst2 --max_pruning_threshold 0.1
cd ..

The script uses a weight-pruned model, and so, the weights are not pruned futher. To prune the weights with a pruning_threshold as well, use the flag: --prune_weights.

Run simulator

AccelTran supports a diverse range of accelerator hyperparameters. It also supports all ~10⁸⁸ models in the FlexiBERT 2.0 design space.

To specify the configuration of an accelerator's architecture, use a configuration file in simulator/config directory. Example configuration files are given accelerators optimized for BERT-Nano and BERT-Tiny. Accelerator hardware configuration files should conform with the design space specified in the simulator/design_space/design_space.yaml file.

To specify the transformer model parameters, use a model dictionary file in simulator/model_dicts. Model dictionaries for BERT-Nano and BERT-Tiny have already been provided for convenience.

To run AccelTran on the BERT-Tiny model, while plotting utilization and metric curves every 1000 cycles, use the following command:

cd ./simulator/
python3 run_simulator.py --model_dict_path ./model_dicts/bert_tiny.json --config_path ./config/config_tiny.yaml --plot_steps 1000 --debug
cd ..

This will output the accelerator state for every cycle. For more information on the possible inputs to the simulation script, use:

cd ./simulator/
python3 run_simulator.py --help
cd ..

Developer

Shikhar Tuli. For any questions, comments or suggestions, please reach me at [email protected].

Cite this work

Cite our work using the following bitex entry:

@article{tuli2023acceltran,
  title={{AccelTran}: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers},
  author={Tuli, Shikhar and Jha, Niraj K},
  journal={arXiv preprint arXiv:2302.14705},
  year={2023}
}

If you use the AccelTran design space to implement transformer-accelerator co-design, please also cite:

@article{tuli2023transcode,
  title={{TransCODE}: Co-design of Transformers and Accelerators for Efficient Training and Inference},
  author={Tuli, Shikhar and Jha, Niraj K},
  journal={arXiv preprint arXiv:2303.14882},
  year={2023}
}

License

See License file for more details.

wxbbuaa2011 / acceltran Goto Github PK

acceltran's Introduction

AccelTran: A Sparsity-Aware Monolithic 3D Accelerator for Transformer Architectures at Scale

Table of Contents

Environment setup

Clone this repository and initialize sub-modules

Setup python environment

Run synthesis

Run pruning

Run simulator

Developer

Cite this work

License

acceltran's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent