dhm2013724 / yolov2_xilinx_fpga Goto Github PK
View Code? Open in Web Editor NEWA demo for accelerating YOLOv2 in xilinx's fpga pynq/zedboard
License: MIT License
A demo for accelerating YOLOv2 in xilinx's fpga pynq/zedboard
License: MIT License
Hi, I have modified the software simulation codes for yolov2 especially the weight_offset and beta_offset. I wanted to get the ap16 bin files for weights and beta. This is the modification:
void yolov2_hls_ps(network *net, float *input)
{
int x;
network orig = *net;
net->input = input;
int weight_offset[16] = {432, 4608, 18432, 73728, 294912, 1179648, 4718592, 4718592, 1958400, 0, 0, 0, 0, 0, 0, 0};
int beta_offset[16] = {16, 32, 64, 128, 256, 512, 1024, 512, 425, 0, 0, 0, 0, 0, 0, 0};
int offset_index = 0;
float *Weight_buf = (float *)calloc(51869376/4,sizeof(float));
float *Beta_buf = (float *)calloc(11876/4,sizeof(float));
MEM_LEN (416*416*16+208*208*32)
float *Memory_buf = (float*)calloc(MEM_LEN+1024+1024,sizeof(float));
float *Memory_top = Memory_buf+1024;
float *Memory_bottom = Memory_top + MEM_LEN;
memcpy(Memory_top,input,416*416*3*sizeof(float));//416x416x3 input_pic
float* in_ptr[16];
float* out_ptr[16];
for(x=0;x<15;x++)
{
if(x%2==0)
{
in_ptr[x] = Memory_top;
out_ptr[x] = Memory_bottom - net->layers[x].outputs ;
}
else
{
in_ptr[x] = out_ptr[x-1];
out_ptr[x] = Memory_top;
}
}
in_ptr[15] = out_ptr[14];
However, it cannot predict. Do i still need to modify the "YOLO2_FPGA" function? What are the necessary modifications for the tiny yolov2? Thank you. Hoping for your reply.
@dhm2013724 hey,
I wanted to know about the 16 bit fixed point Implementation. Did you convert the weights and biases to 16-bit before the inference and just used stored 16-bit weights during inference? or did u do the quantization while the inference process? Also how about the activations(feature map and input image values) did u convert them to 16-bit fixed during inference ?
Do you use 16 bit fixed for all operations including, conv, maxpool and region layer? Are the outputs stored as 16 bit fixed values as well?
Would be a great help if you could explain these to me. Thanks in advance!
可不可以重新分享一下,谢谢!
Hello, I am recreating your code in vivado_hls2017. It takes 45 minutes to test a picture. What is the reason? I want to reproduce your code on the ZCU102. Can you give some advice?
C Simulation always takes too much time, and I have warned others when I uploaded the testbench. There is no problem. I don't suggest you evaluate this design in ZCU102, because it is not designed to so high-level FPGA chip, and many architectures and considerations are not suitable. Maybe you can try https://github.com/Xilinx/CHaiDNN. which is designed for MPSOC
Originally posted by @dhm2013724 in #3 (comment)
大佬,您好,我想问一下测试视频的时候FPS是多少?
Hi,
When I try to run the application, it fails with:
Couldn't open file: weightsv2_comb_reorg_ap16.bin
I think the same will happen when I try to load biasv2_comb_ap16.bin
and perhaps some other files. Is there any way I can get the missing files?
Because of my previous problem with overlay import where I couldn't download any bitstream file to my Zedboard while using the Pynq_Z1_v2.3 image on my SD card, I built a customized SD card pynq_v2.3 image for my Zedboard using PetaLinux 2018.2 with bionic.arm.2.3.img as base and avnet-digilent-zedboard-2018.2.bsp (both downloaded from Xilinx). The Zedboard booted from the SD card but I'm unable to connect to the local network through ethernet; on running ifconfig command only loopback address (127.0.0.1) is returned. There is no eth0. I can't use Jupyter Notebooks. Please suggest a solution
I downloaded vscode, can teach me how to run the main.cpp?
Hello, I am recreating your code in vivado_hls2017. It takes 45 minutes to test a picture. What is the reason? I want to reproduce your code on the ZCU102. Can you give some advice?
运行hls下的script.tcl出现了很多warning,包括第一行出现的"Unknown Tcl command ' vivado_hls -f script.tcl' sending command to the OSshell for execution. It is recommended to use 'exec' to send the command to the OS shell",以及后面好像跟具体设计有关的warning,可以直接忽视吗?
您好,方便加微信交流吗,我的微信是747112077。
Hi, I would like to ask how were you able to evaluate the performance for PYNQ in terms of Power, GOP, GOP/s? Did you use any tools?
Hi, I've noticed there is a variable named T2R existed in input load, and this variable is only asserted as 4, 2 and 1 based on the input Tile width.
However, this value is then send in to another sub-function "copy_input2buf_row."
In this sub-function, T2r will then change the loop constraint.
So, what's the purpose of this variable, what does it stand for?
Thanks!
Since I want to recreate the bitstream file, I cannot import the yolov2_FPGA block diagram by followed the README.m and only found the PYNQ-Z1_C.xdc and pynq_revC.tcl.
How I going to import the yolov2 block design by using these 2 file ?
Thank you.
I have a modified version of the yolov2.cfg and yolov2.weights to predict only for one class. I know for sure that I need to modify some codes especially here:
Can you tell me what these variables are? and also how to modify in accordance to the cfg file? Can this be also applied if i want to change it to yolov2-tiny?
Thank you. Hoping for your feedback.
when I run hls, there was a problem:
ERROR: [HLS 200-70] Compilation errors found: In file included from cnn.cpp:1:
cnn.cpp:172:59: error: conditional expression is ambiguous; 'typename ap_int_base<6, false>::RType<3, false>::plus' (aka 'ap_uint<7>') can be converted to 'int' and vice versa
ap_uint<6> T2R_bound = ((t2_local + T2R)<(((26 -1)*2 +3))?(t2_local + T2R):(((26 -1)*2 +3)));
how can I solve it?Thank you very much!
@dhm2013724 Hey,
I am trying to understand how your implementation works and I do not understand what weight offset and beta_offset is. Also What it does? Please explain.
Also is inputQ, outputQ, weightQ and betaQ mean quantized input, quantized output, quantized weight and quantized beta? do you input both quantized weight, beta files and 32fp weight, beta files to the design? (I am confused because there are number of weight and beta files) what is the difference between weight and weightQ? how are they used in the design ?
Thanks in advance!
如题,这么打扰您实在是不好意思
@dhm2013724 Hey,
I got this error during the co simulation.
AESL_axi_master_DATA_BUS.v: Read request address 1040 exceed AXI master DATA_BUS array depth: 1024
What is the reason for this?
Thanks in advance!
想请教您,在vivado中怎么使用pynq_revC.tcl 这个脚本。因为我自己搭建的zynq ip核出了问题,想用您的tcl脚本搭建,但是运行source pynq_revC.tcl 并没有任何反应?
您好,请问是否可以分享zedboard移植yolo的代码?
您好,我在vivado-hls2018.2中复现了这个工程,c仿真和c综合都没有问题,但是在rtl仿真的时候,有以下错误。原因是AXI master 深度不够吗?
// RTL Simulation : "Inter-Transaction Progress" ["Intra-Transaction Progress"] @ "Simulation Time"
////////////////////////////////////////////////////////////////////////////////////
// RTL Simulation : 0 / 29 [n/a] @ "125000"
/home/sdsoc/yolov2-hls/yolov2/solution1/sim/verilog/AESL_axi_master_DATA_BUS3.v: Read request address 173056 exceed AXI master DATA_BUS3 array depth: 512
$finish called at time : 3285 ns : File "/home/sdsoc/yolov2-hls/yolov2/solution1/sim/verilog/AESL_axi_master_DATA_BUS3.v" Line 695
INFO: [Common 17-206] Exiting xsim at Tue Nov 19 11:50:16 2019...
ERROR: [COSIM 212-303] Aborting co-simulation: RTL simulation failed.
ERROR: [COSIM 212-344] Rtl simulation failed.
could not read "/home/sdsoc/yolov2-hls/yolov2/solution1/sim/tv/rtldatafile/sim/report/cosim.log": no such file or directory
while executing
"source /home/sdsoc/yolov2-hls/yolov2/solution1/cosim.tcl"
invoked from within
"hls::main /home/sdsoc/yolov2-hls/yolov2/solution1/cosim.tcl"
("uplevel" body line 1)
invoked from within
"uplevel 1 hls::main {*}$args"
(procedure "hls_proc" line 5)
invoked from within
"hls_proc $argv"
Finished C/RTL cosimulation.
大佬,请问方便加一下您的微信么 我的微信是c578232360 能不能把Zedborad更新一版,最近也忙着做在FPGA上做YOLO的加速,想请教请教问题 多谢~
@dhm2013724 If I want to run this code in zedboard(without PYNQ), then can I use the same HLS repo as in this with the zedboard? Also I see that the tcl files are created for PYNQ board, how can I change it for zedboard ? Do I need to change anything else for the zedboard ?
Also is that enough ? should there be an repo for zedboard which is equivalent to the PYNQ repo(folder) in this project? or just hls and vivado repos are enough ?
Also previously there were files yolo.h and main.cpp in the hls repo. why is it removed ? is it not needed? main.cpp was the benchmark file for cnn.cpp (which used yolo2.h) right? Am I correct was it the benchmark file? Please guide me. I want to run this on zedboard.
Thanks in advance!
你好,我也是搞这方面的研究生,有幸看到文章并尝试复现这工作,然而,运行tcl文件后发现:
1、你tcl中时钟设置的是5.2ns,在你默认fpga选型上时序到不了。(150MHZ也不行)
2、各项资源运行后都与论文的不符,而且差的较多。我的软件版本与你一致,是不是有一些指令没有以source存储而是在directives.tcl里,而你没传这个文件呢,或者这个project有configuration settings,没有说明。
3、关于pingpong操作,hls自带的dataflow命令用着也挺好的,手动设置标志位是为了更好的控制还是有啥别的考虑?为什么不用官方的ap_fix来修饰变量啥的?
4、还有从你给代码中,你测试正确性是把代码放yolov2.h中在VS里面测的,也就是csim的正确性OK,但是好像没有TB,没有hls中的cosim功能。
5、这个是按照4*32展开的,我之前也写了类似的卷积核,在一些情况下有速度的制约点。
def overlap(x1, w1, x2, w2):
l1 = x1 - w1/2
l2 = x2 - w2/2
if(l1 > l2): left = l1
**else: left = l1**
r1 = x1 + w1/2
r2 = x2 + w2/2
if(r1 < r2): right = r1
else: right = r2
return right - left
Change t he bold to else: left = l2
Hi, I've noticed your HLS code contain 4 same Input addresses, and they were sent to mmcpy_inputport0~3 respectively.
I wonder what is the purpose of doing this?
Can't we only use 1 input address, then send them to 4 distinct mmcpy_inputport?
Thanks.
我把你的程序移植到我的zcu104上生成了bit和tcl脚本,之后根据你的步骤在jupyter上调用yolov2.ipynb文件的时候发现,调用的图片路径ORIG_IMG_PATH = 'dog.jpg' 还有bit overlay = Overlay("yolov2.bit") 以及两个权重的路径好像都是用的你自己的路径,在我这边执行的话就会报错找不到相关文件,所以我想问一下你像bit、coco.names、dog.jpg、weightsv2_comb_reorg_ap16.bin这些文件都应该放在哪里,我都移植在了pynq目录下的overlays下仍然有很多错,所以希望你能够解答一下疑惑,我是python的小白,硬转软刚入的坑
Hey,
Could anyone please explain or tell me what beta_offset and weight offset and INTER_WIDTH (19) mean in this project ? It will be a great help.
Thanks in advance!
I didnt find this file so i got errors
#===============================================
#==============================================
params_wight = np.fromfile("yolov2_w_reorg_bn_ap16_short16.bin", dtype=np.uint32)
np.copyto(weight_base_buffer, params_wight)
print("yolov2_weight copy ok\n")
params_bais = np.fromfile("yolov2_b_ap16_short16.bin", dtype=np.uint32)
np.copyto(bate_base_buffer, params_bais)
print("yolov2_bais copy ok\n")
大佬您好 您采用16位量化,用的是定点数吗?还有怎末确定定点数的小数位是多少呢?是不是需要查看yolo v2每层的输出,观察是否溢出。
代码中有五个bin文件:
其中 buf的有两个,Weight_buf(yolov2_w_reorg_bn_ap16_short16.bin)、Beta_buf ( yolov2_b_ap16_short16.bin ) ,这两个文件是什么文件,有什么用呢?
其中还有三个,inputQ(yolov2_bn_input_maxQ_24.bin )、weightQ(yolov2_w_reorg_bn_ap16_maxQ_23.bin)、betaQ(yolov2_b_ap16_maxQ_23.bin),这三个文件是什么文件,有什么作用呢?
yolo的权重文件只有一个,就是 .weight,如(yolov2.weights),上面的五个文件跟yolo的.weight是什么关系呢?
你好。在vivado目录的第二步里面,Block Design添加上述ip的时候,管脚和图中所示无法对应,其他的都可以手动添加,但是IRQ_F2P这个脚是灰色的无法添加。于是考虑使用文档所述的pynq_revC.tcl和PYNQ-Z1_C.xdc文件进行设置和约束,但是不知道如何使用这两个文件呢?使用source pynq_revC.tcl命令好像没有产生任何效果。
Hello, thanks for sharing. How could I try the yolov2-tiny by using your code?
Hi, i am trying to replicate your vivado block diagram. I just followed what is on the tcl file. However, i have timing issues during implementation.
Have you encountered these warnings? I tried to ignore them but bitstream seems not fine when i tried to use it for the overlay. Have you added any timing constraints?
Thank you. Hoping for your feedback.
Hi, I'm a little bit confused with the python code.
In which, you allocated the memory as follow
img_base_buffer = xlnk.cma_array(shape=(4194304,), dtype=np.int32)
print("16M",img_base_buffer.physical_address)
IMG_MEM = img_base_buffer.physical_address
However, I am not quite sure about how does "4194304" come up.
I mean, in the following code, you declared some length like this
MEM_BASE = IMG_MEM
MEM_LEN = 416*416*32*2+208*208*32*2
Memory_top = MEM_BASE
Memory_bottom = MEM_BASE + MEM_LEN
I guess the MEM_LEN stand for the maximum data you have to keep (Input and output of layer1).
So, it's more like "41641632* 2(Byte)++20820832*2(Byte)=3461120 (int 32 Word)"
However, this number still far from the "4194304"?
Did I miss anything?
Thx!!
hey @dhm2013724 thank you for uploading your code. as you said this runs at a speed of 1fps on pynq ryt? do u think that it is possible to make it run on pynq real time with tiny-yolo (if you further optimize?) Also can this be run on any other xilinx board other than pynq ?
Thanks in advance !
Hi,
When I run the yolov2.ipynq , I can not find the yolov2_w_reorg_bn_ap16_short16.bin and yolov2_b_ap16_short16.bin. Please help me.
hey @dhm2013724 ; I really appreciate your work as it is really amazing.
I am trying to implement your IP with creating a new overlay. but the issue is when i run your code in my overlay it shows some errors in the second last code in .ipynb which I dont know how to resolve it. can you please help me with that?
Open pictrue success!
pictrue size: (640, 424)
pictrue mode: RGB
yolov2_image copy ok
KeyboardInterrupt Traceback (most recent call last)
in ()
6
7 img_out = frame_in
----> 8 yolo_meminout(frame_in,img_w,img_h,img_out)
9 img_out
in yolo_meminout(frame_in, img_w, img_h, frame_out)
31 region_buff = np.zeros((73008,), dtype=np.float32)
32
---> 33 yolo_fpga(img_base_buffer,region_buff)
34 end_time = time.time()
35 fpga_process_time = end_time - start_time
in yolo_fpga(img_base_buffer, region_buff)
53 mLoops,nLoops,rLoops,cLoops,0,
54 inputQ[offset_index],inputQ[offset_index+1],weightQ[offset_index],betaQ[offset_index],
---> 55 WEIGHT_BASE,BETA_BASE)
56
57
in YOLO__Init_EX(In_Address, Out_Address, Weight_offset, Beta_offset, InFM_num, OutFM_num, Kernel_size, Kernel_stride, Input_w, Input_h, Padding, IsNL, IsBN, TM, TN, TR, TC, mLoops, nLoops, rLoops, cLoops, LayerType, InputQ, OutputQ, WeightQ, BetaQ, WEIGHT_BASE, BETA_BASE)
53
54 while True:
---> 55 ap_idle = (mmio.read(XYOLO2_FPGA_CTRL_BUS_ADDR_AP_CTRL)>>2)&0x01
56 if(ap_idle):
57 break
/usr/local/lib/python3.6/dist-packages/pynq/mmio.py in read(self, offset, length)
137
138 self._debug('Reading {0} bytes from offset {1:x}',
--> 139 length, offset)
140
141 # Read data out
the last line shows keyboard interrupt which was done by me as it was taking so much time to run the code (24 min)
please help me regarding this.....
大佬既然进行了开源,可否将代码的说明文档也开源啦以便于对代码进行阅读,感激不尽
Look forward to your sharing of the SDK version.
@dhm2013724 @clancylea Hey,
I do not understand the difference between weight_memcpy_buffer(which is a 1D buffer as I understand, and to which you copy the weights first, also has the half of the size of weight_buffer) and weight_buffer( to which you copy finally and which is a 3D buffer as I underatand). Could you please explain what weight_memcpy_buffer is and for what it is needed ? Would be a great help.
Thanks in advance!
您好我导入的PROCESSING_SYSTEM7_0跟您的相比少了很多的管脚,因为第一次接触vivado,所以不知道您教程中的add ip ps7.0 apply configuration pynq_revC.tcl是什么意思,我在tcl console中运行了这个tcl 文件但是好像也没有什么变化,所以想来向你请教一下
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.