Project: VLSI Architecture of High Throughput Processing Unit for Efficiently Executing 2D Convolution
Convolution serves as a foundational component enabling Deep Neural Networks (DNN) to extract meaningful features from input data. This paper focuses on the real-time architecture of a high-throughput processing unit (PU) designed for 2D convolutions. Three distinct PU topologies are presented: (a) a non-pipelined Multiply-Accumulate (MAC) based PU, (b) a 2-stage pipelined MAC based PU, and (c) a 5-stage pipelined MAC-based PU, which significantly reduces convolution process delay. These proposed architectures are implemented on Field Programmable Gate Arrays (FPGAs) using the Xilinx Vivado tool. Resource utilization and latency are thoroughly assessed and compared across various PU topologies on Artix-7 and Zynq-7 FPGA chips. Additionally, throughput is estimated and compared for different kernel and image sizes, revealing a substantial average improvement of 96.36% achieved by the pipelined MAC with its five-stage design compared to the non-pipelined MAC-based processing unit.
please contact:[email protected], [email protected]