Note: If above DOI is broken, this one should be correct.
In our paper, we develop an FPGA-based accelerator toolflow targeting Deep Convolutional Early-Exit Neural Networks. Our work builds on the streaming architecture hardware in fpgaConvNet. We leverage the probabilitic nature of the input-dependent Early-Exit network to scale the resource allocation for different stages of the accelerator.
This repository contains the software and hardware to generate accelerator designs for early-exit networks and the artifacts for the FCCM 2023 paper.
There are three main artifacts:
- Optimiser
- Buffer Hardware Component (and generation)
- HLS-based hardware generation using Vivado
The optimiser (and HLS generation) has been verified using the following software:
conda=4.9.2, 4.10.1
python=3.7
To install this package, run from this directory the following:
sudo apt install protobuf-compiler libprotoc-dev
cd ./optimiser/
conda env create -f atheena_opt_hls_p37.yml
conda activate atheena_opt_hls_p37
python atheena_setup.py install
To install the appropriate software for buffer generation:
Note This module has been verified working for Ubuntu 20.04.6 LTS, Java version 11.0.18, sbt version 1.4.9, Scala 2.12.13
The following instructions are taken from Chisel's instructions on environment setup:
- Install Coursier and follow the instructions.
curl -fL https://github.com/coursier/coursier/releases/latest/download/cs-x86_64-pc-linux.gz | gzip -d > cs && chmod +x cs
./cs setup
Note: This will install the most recent version of scala. To check it has worked, run
scala -version
(a restart of the terminal maybe required).
- Install Scala version 2.12.13 and sbt version 1.4.9
cs install scala:2.12.13 && cs install scalac:2.12.13
cs install sbt:1.4.9 && cs install sbtn:1.4.9
Note: To check the scala and sbt versions, run
scala -version
andsbt --script-version
.
- Regenerate the project for the buffer package.
cd ./buffer/
sbt pack
To install Vivado 2019.1:
-
First download from the Xilinx website.
-
Install the y2k22 patch according to these instructions.
-
Add the following to your ~/.bashrc file:
source /tools/Xilinx/Vivado/2019.1/settings64.sh
source /tools/Xilinx/SDK/2019.1/settings64.sh
export FPGACONVNET_ROOT=(path to repo)/ATHEENA_fccm_artifacts/hls
export FPGACONVNET_HLS=(path to repo)/ATHEENA_fccm_artifacts/hls
export FPGACONVNET_OPTIMISER=(path to repo)/ATHEENA_fccm_artifacts/optimiser
-
Once installed, you will also need to add a license server to your .bashrc file.
-
You will need to setup JTAG drivers to program a device. To do so, execute the following script:
/tools/Xilinx/Vivado/2019.1/data/xicom/cable_drivers/lin64/install_script/install_drivers/install_drivers
For more information, visit here.
Finally, there is a known bug to do with C++ libraries. A workaround for this is adding the mpfr.h
and gmp.h
headers manually. For this project, you need to create a header file include/system.hpp
which includes the following:
#ifndef SYSTEM_HPP_
#define SYSTEM_HPP_
#include "(path to Vivado 2019.1)/include/gmp.h"
#include "(path to Vivado 2019.1)/include/mpfr.h"
#endif
To generate an optimised FPGA accelerator description for an Early-Exit network, follow the instructions in optimiser/README.md
:
- Run optimiser on the branchy LeNet network description.
cd ./optimiser/
python -m fpgaconvnet_optimiser.tools.dev_script \
--expr opt_brn \
--save_name branchy_lenet \
-o outputs/branchy_lenet \
--model_path examples/models/atheena/branchy_lenet_20220902.onnx \
--platform_path examples/platforms/zc706.json \
--optimiser_path examples/optimiser_example.yml \
-bs 1024
- Generate the pareto graph for the optimiser results at an early-exit probability of 75% (as in the paper).
python -m fpgaconvnet_optimiser.tools.dev_script \
--expr gen_graph \
--save_name branchy_lenet_graph \
-o outputs/branchy_lenet/results/ \
-i outputs/branchy_lenet/ \
--profiled_probability 0.75
- Run the following command to perform a stage merge for all the results in the combined report.
python -m fpgaconvnet_optimiser.tools.ee_stage_merger \
-c outputs/branchy_lenet/results/combined_rpt_eefrac75.txt \
-j outputs/branchy_lenet/ \
-on branchy_lenet_merged \
--output_path outputs/branchy_lenet/merged/
- Copy this
.json
file into a folder inhls/test/partitions/(example)/
.
For example:
mkdir -p ../hls/test/partitions/branchy_lenet_eg
cp outputs/branchy_lenet/merged/branchy_lenet_merged_rsc80_thru95000.json ../hls/test/partitions/branchy_lenet_eg/
Note: Due to the non-deterministic nature of the optimiser, the above file will have slightly different resource usage and throughput. For the A1-like design use an rsc30-35 and thru~19500. For A2-like, use rsc45-50 and thru~45000. For A3-like design, use rsc80-90 and thru95000.
- Run the following instructions to generate available hardware IP for the buffer layer at different resource allocations.
cd ../buffer/
./gen_buff.sh
- Respond to the prompt with
a
, to generate all the configurations.
- Run the following instructions to start the HLS generation process for the layers based on the hardware description provided.
cd ../hls/test/partitions/
../../scripts/split_run.sh -a \
-n branchy_lenet_eg \
-m $FPGACONVNET_OPTIMISER/examples/models/atheena/branchy_lenet_20220902.onnx \
-p branchy_lenet_merged_rsc80_thru95000.json \
-v
Note: The
-a
is used to generate all the network layers, the top layer, and the host code. The-v
flag is used to stitch the resulting network IP layers into a full board design and then run Vivado synthesis and implementation before finally generating the bitstream. The script can be run with or without these flags if only one operation is required.
-
The final step requires some manual integration with the Vivado SDK and assumes that the target board is the ZC706 (used in the paper).
a. Open the resulting
project_1
intest/partitions/branchy_lenet_eg/partition_0/branchy_lenet_eg_hw_prj
b. Export the hardware + bitstream:
File > Export > Export Hardware
. Checkinclude bitstream
.c. Launch the SDK:
File > Launch SDK
d. Generate the FSBL:
File > New > Application Project
. Provide a project name and select the exported hw platform 0. HitNext
and selectZynq FSBL
and hitFinish
.e. Generate the host code:
File > New > Application Project
. Provide a project name and select the exported hw platform 0. HitNext
and selectHello world
and hitFinish
.f. In this project, open
hello_world.c
and replace the contents withbranchy_lenet_eg_host_code.c
g. Add the xilffs support to the host code (hello world) BSP using
system.mss > modify bsp
h. Insert SD card loaded with
i0.bin
file copied from./hls/test/data/test/partitions/branchy_lenet_eg/partition_0/data/input0.bin
i. Run the FSBL project on the board, program with the bitstream, and then run the host code (hello world) project!
For further details on running the hardware, see the hls README.
As the HLS generation and Vivado Synthesis take a significant amount of time to run, I have included three ATHEENA hardware projects and designs from the paper have been included in this repository.
A1 : ./hls/test/partitions/design_A1/
A2 : ./hls/test/partitions/design_A2/
A3 : ./hls/test/partitions/design_A3/
These folders contain:
-
.json
file with the hardware description (generated by the optimiser). -
.c
host code that runs the project on the ZC706 board. -
split_run.sh
script that regenerates the HLS files and Vivado project. -
Run the generation from inside the folder, using
./split_run.sh -a -v
.
A copy of the hardware project with host code for each of these examples can be found and downloaded here
Note: to unzip use
tar -xzvf a1_hw_artifact.tar.gz
and then opened using Vivado design suite and SDK.
@inproceedings{bbiggs_ATHEENA_2023,
title = {{ATHEENA: A Toolflow for Hardware Early-Exit Network Automation}},
booktitle = {2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)},
author = {Benjamin Biggs and Christos-Savvas Bouganis and George A. Constantinides},
year = {2023},
}
A huge thank you to our Artifact reviewer Yizhao Gao for their advice and patience throughout the artifact review process!