dhm2013724 / yolov2_xilinx_fpga Goto Github PK

A demo for accelerating YOLOv2 in xilinx's fpga pynq/zedboard

License: MIT License

C++ 14.92% Tcl 1.24% C 72.05% Python 0.28% Jupyter Notebook 10.54% Makefile 0.11% Shell 0.02% Cuda 0.83%

yolov2_xilinx_fpga's Issues

Can you help with Tiny Yolov2 Software Simulation?

Hi, I have modified the software simulation codes for yolov2 especially the weight_offset and beta_offset. I wanted to get the ap16 bin files for weights and beta. This is the modification:

void yolov2_hls_ps(network *net, float *input)
{
	int x;

	network orig = *net;
	net->input = input;

	int weight_offset[16] = {432, 4608, 18432, 73728, 294912, 1179648, 4718592, 4718592, 1958400, 0, 0, 0, 0, 0, 0, 0};
	int beta_offset[16] = {16, 32, 64, 128, 256, 512, 1024, 512, 425, 0, 0, 0, 0, 0, 0, 0};
	int offset_index = 0;

	float *Weight_buf = (float *)calloc(51869376/4,sizeof(float));
	float *Beta_buf   = (float *)calloc(11876/4,sizeof(float));

        MEM_LEN (416*416*16+208*208*32)
	float *Memory_buf = (float*)calloc(MEM_LEN+1024+1024,sizeof(float));
	float *Memory_top = Memory_buf+1024;
	float *Memory_bottom = Memory_top + MEM_LEN;
	memcpy(Memory_top,input,416*416*3*sizeof(float));//416x416x3 input_pic

	float* in_ptr[16];
	float* out_ptr[16];

	for(x=0;x<15;x++)
	{
		if(x%2==0)
		{
			in_ptr[x] = Memory_top;
			out_ptr[x] = Memory_bottom - net->layers[x].outputs ;
		}
		else
		{
			in_ptr[x] = out_ptr[x-1];
			out_ptr[x] = Memory_top;
		}
	}

	in_ptr[15] = out_ptr[14];

However, it cannot predict. Do i still need to modify the "YOLO2_FPGA" function? What are the necessary modifications for the tiny yolov2? Thank you. Hoping for your reply.

您好，我想问一下最终在SD卡上生成系统的问题

我在最后生成的时候遇到了这个问题，在重新利用petalinux编译后还是这样，我想问下这个问题您遇到过吗？
还有一个问题就是在利用HLS进行C仿真的步骤中，这一步可以省略不进行吗？
感谢您的不吝赐教！

@dhm2013724 hey,
I wanted to know about the 16 bit fixed point Implementation. Did you convert the weights and biases to 16-bit before the inference and just used stored 16-bit weights during inference? or did u do the quantization while the inference process? Also how about the activations(feature map and input image values) did u convert them to 16-bit fixed during inference ?

Do you use 16 bit fixed for all operations including, conv, maxpool and region layer? Are the outputs stored as 16 bit fixed values as well?

Would be a great help if you could explain these to me. Thanks in advance!

weight & bias网盘文件过期了

可不可以重新分享一下，谢谢！

> Hello, I am recreating your code in vivado_hls2017. It takes 45 minutes to test a picture. What is the reason? I want to reproduce your code on the ZCU102. Can you give some advice?

Hello, I am recreating your code in vivado_hls2017. It takes 45 minutes to test a picture. What is the reason? I want to reproduce your code on the ZCU102. Can you give some advice?

C Simulation always takes too much time, and I have warned others when I uploaded the testbench. There is no problem. I don't suggest you evaluate this design in ZCU102, because it is not designed to so high-level FPGA chip, and many architectures and considerations are not suitable. Maybe you can try https://github.com/Xilinx/CHaiDNN. which is designed for MPSOC

Originally posted by @dhm2013724 in #3 (comment)

大佬，您好，我想问一下测试视频的时候FPS是多少？

Missing files

Hi,

When I try to run the application, it fails with:

Couldn't open file: weightsv2_comb_reorg_ap16.bin

I think the same will happen when I try to load biasv2_comb_ap16.bin and perhaps some other files. Is there any way I can get the missing files?

你好，我想问一下关于地址分配的细节

我在zynq7020上面复现您的代码，我想先问一下导出IP核以后不进行c仿真可以吗？
然后，我在blockdesign中遇到了**[BD 41-971]显示地址分配错误的问题还有[BD 41-703]**的问题，所以我想问一下您遇到过这种问题吗

我不知道应该如何分配地址，如果您了解的话麻烦分享一下如何解决这种问题的细节
谢谢

no ethernet on zedboard

Because of my previous problem with overlay import where I couldn't download any bitstream file to my Zedboard while using the Pynq_Z1_v2.3 image on my SD card, I built a customized SD card pynq_v2.3 image for my Zedboard using PetaLinux 2018.2 with bionic.arm.2.3.img as base and avnet-digilent-zedboard-2018.2.bsp (both downloaded from Xilinx). The Zedboard booted from the SD card but I'm unable to connect to the local network through ethernet; on running ifconfig command only loopback address (127.0.0.1) is returned. There is no eth0. I can't use Jupyter Notebooks. Please suggest a solution

How to use vscode run "02_ReorganizeWeightAndQuantizeWeightAndBias"

I downloaded vscode, can teach me how to run the main.cpp?

Hello, your work is very good, but I have some questions, can you give some advice?

Hello, I am recreating your code in vivado_hls2017. It takes 45 minutes to test a picture. What is the reason? I want to reproduce your code on the ZCU102. Can you give some advice?

大佬您好，感谢您的帮助，最后成功运行了

但是这里有一个小小的问题，就是我最后的结果并不像您一样生成了prediction.png而是输出了一长串的百分百，但是也没有报错误，请问这是为什么呢？

如果要生成.png的话是需要进行修改某些文件吗？

生成ip时候的warning信息

运行hls下的script.tcl出现了很多warning，包括第一行出现的"Unknown Tcl command ' vivado_hls -f script.tcl' sending command to the OSshell for execution. It is recommended to use 'exec' to send the command to the OS shell"，以及后面好像跟具体设计有关的warning，可以直接忽视吗？

您好，我在pynq上跑了您的yolo程序，有些问题想请教一下您

您好，方便加微信交流吗，我的微信是747112077。

Performance Evaluation for PYNQ

Hi, I would like to ask how were you able to evaluate the performance for PYNQ in terms of Power, GOP, GOP/s? Did you use any tools?

What does the "T2R" in the input load stand for ?

Hi, I've noticed there is a variable named T2R existed in input load, and this variable is only asserted as 4, 2 and 1 based on the input Tile width.

However, this value is then send in to another sub-function "copy_input2buf_row."

In this sub-function, T2r will then change the loop constraint.

So, what's the purpose of this variable, what does it stand for?

Thanks!

how to import the ip to vivado?

Since I want to recreate the bitstream file, I cannot import the yolov2_FPGA block diagram by followed the README.m and only found the PYNQ-Z1_C.xdc and pynq_revC.tcl.

import ip from ..\hls\yolov2_ap16_n2m32_inburst\solution1\impl\ip and add ip Yolo2_fpga
add ip ps7.0 apply configuration pynq_revC.tcl
add constraint PYNQ-Z1_C.xdc

How I going to import the yolov2 block design by using these 2 file ?
Thank you.

PYNQ - predicting only for one class

I have a modified version of the yolov2.cfg and yolov2.weights to predict only for one class. I know for sure that I need to modify some codes especially here:

Can you tell me what these variables are? and also how to modify in accordance to the cfg file? Can this be also applied if i want to change it to yolov2-tiny?

Thank you. Hoping for your feedback.

vivado 2018.3 hls problem

when I run hls, there was a problem:

ERROR: [HLS 200-70] Compilation errors found: In file included from cnn.cpp:1:
cnn.cpp:172:59: error: conditional expression is ambiguous; 'typename ap_int_base<6, false>::RType<3, false>::plus' (aka 'ap_uint<7>') can be converted to 'int' and vice versa
ap_uint<6> T2R_bound = ((t2_local + T2R)<(((26 -1)*2 +3))?(t2_local + T2R):(((26 -1)*2 +3)));

how can I solve it?Thank you very much!

大佬您好，我想问下最后没有生成prediction.png是什么原因呢

#46 (comment)

what is weight and beta offset ?

@dhm2013724 Hey,
I am trying to understand how your implementation works and I do not understand what weight offset and beta_offset is. Also What it does? Please explain.
Also is inputQ, outputQ, weightQ and betaQ mean quantized input, quantized output, quantized weight and quantized beta? do you input both quantized weight, beta files and 32fp weight, beta files to the design? (I am confused because there are number of weight and beta files) what is the difference between weight and weightQ? how are they used in the design ?

Thanks in advance!

大佬您好，我想问下这个模型可以尝试迁移到yolov3吗

如题，这么打扰您实在是不好意思

AESL_axi_master_DATA_BUS.v: Read request address 1040 exceed AXI master DATA_BUS array depth:1024

@dhm2013724 Hey,
I got this error during the co simulation.
AESL_axi_master_DATA_BUS.v: Read request address 1040 exceed AXI master DATA_BUS array depth: 1024
What is the reason for this?
Thanks in advance!

想请教您，在vivado中怎么使用pynq_revC.tcl 这个脚本

想请教您，在vivado中怎么使用pynq_revC.tcl 这个脚本。因为我自己搭建的zynq ip核出了问题，想用您的tcl脚本搭建，但是运行source pynq_revC.tcl 并没有任何反应？

求助

您好，请问是否可以分享zedboard移植yolo的代码？

我在vivado-hls2018.2中有以下错误

您好，我在vivado-hls2018.2中复现了这个工程，c仿真和c综合都没有问题，但是在rtl仿真的时候，有以下错误。原因是AXI master 深度不够吗？
// RTL Simulation : "Inter-Transaction Progress" ["Intra-Transaction Progress"] @ "Simulation Time"
////////////////////////////////////////////////////////////////////////////////////
// RTL Simulation : 0 / 29 [n/a] @ "125000"
/home/sdsoc/yolov2-hls/yolov2/solution1/sim/verilog/AESL_axi_master_DATA_BUS3.v: Read request address 173056 exceed AXI master DATA_BUS3 array depth: 512
$finish called at time : 3285 ns : File "/home/sdsoc/yolov2-hls/yolov2/solution1/sim/verilog/AESL_axi_master_DATA_BUS3.v" Line 695

quit

INFO: [Common 17-206] Exiting xsim at Tue Nov 19 11:50:16 2019...
ERROR: [COSIM 212-303] Aborting co-simulation: RTL simulation failed.
ERROR: [COSIM 212-344] Rtl simulation failed.
could not read "/home/sdsoc/yolov2-hls/yolov2/solution1/sim/tv/rtldatafile/sim/report/cosim.log": no such file or directory
while executing
"source /home/sdsoc/yolov2-hls/yolov2/solution1/cosim.tcl"
invoked from within
"hls::main /home/sdsoc/yolov2-hls/yolov2/solution1/cosim.tcl"
("uplevel" body line 1)
invoked from within
"uplevel 1 hls::main {*}$args"
(procedure "hls_proc" line 5)
invoked from within
"hls_proc $argv"
Finished C/RTL cosimulation.

Zedborad版本

大佬，请问方便加一下您的微信么我的微信是c578232360 能不能把Zedborad更新一版，最近也忙着做在FPGA上做YOLO的加速，想请教请教问题多谢~

Running in zedboard

@dhm2013724 If I want to run this code in zedboard(without PYNQ), then can I use the same HLS repo as in this with the zedboard? Also I see that the tcl files are created for PYNQ board, how can I change it for zedboard ? Do I need to change anything else for the zedboard ?

Also is that enough ? should there be an repo for zedboard which is equivalent to the PYNQ repo(folder) in this project? or just hls and vivado repos are enough ?

Also previously there were files yolo.h and main.cpp in the hls repo. why is it removed ? is it not needed? main.cpp was the benchmark file for cnn.cpp (which used yolo2.h) right? Am I correct was it the benchmark file? Please guide me. I want to run this on zedboard.

Thanks in advance!

复现这个工作的时候遇到的问题

你好，我也是搞这方面的研究生，有幸看到文章并尝试复现这工作，然而，运行tcl文件后发现：
1、你tcl中时钟设置的是5.2ns，在你默认fpga选型上时序到不了。（150MHZ也不行）
2、各项资源运行后都与论文的不符，而且差的较多。我的软件版本与你一致，是不是有一些指令没有以source存储而是在directives.tcl里，而你没传这个文件呢，或者这个project有configuration settings，没有说明。
3、关于pingpong操作，hls自带的dataflow命令用着也挺好的，手动设置标志位是为了更好的控制还是有啥别的考虑？为什么不用官方的ap_fix来修饰变量啥的？
4、还有从你给代码中，你测试正确性是把代码放yolov2.h中在VS里面测的，也就是csim的正确性OK，但是好像没有TB，没有hls中的cosim功能。
5、这个是按照4*32展开的，我之前也写了类似的卷积核，在一些情况下有速度的制约点。

微信群

创了一个微信群，现阶段较忙，更新会比较慢，有问题可以先加群大家讨论。
Good Luck!

do_nms_sort/box_intersection/overlap in PYNQ jupyter notebook is wrong. Please correct

def overlap(x1, w1, x2, w2):
    l1 = x1 - w1/2
    l2 = x2 - w2/2
    if(l1 > l2):  left = l1 
    **else: left = l1**
    r1 = x1 + w1/2
    r2 = x2 + w2/2
    if(r1 < r2): right = r1 
    else:  right = r2
    return right - left

Change t he bold to else: left = l2

What's the purpose for Implementing 4 Input?

Hi, I've noticed your HLS code contain 4 same Input addresses, and they were sent to mmcpy_inputport0~3 respectively.

I wonder what is the purpose of doing this?

Can't we only use 1 input address, then send them to 4 distinct mmcpy_inputport?

Thanks.

LUT overflow

Hi, I tried to run c synthesis with your code in /hls/src. And I set YOLO2_FPGA as top function.
But the result implies that LUT has overflow. And clock cannot reach the requirement either.

what's going wrong?

我在PYNQ上复现您的程序，有些问题想请教一下您

我把你的程序移植到我的zcu104上生成了bit和tcl脚本，之后根据你的步骤在jupyter上调用yolov2.ipynb文件的时候发现，调用的图片路径ORIG_IMG_PATH = 'dog.jpg' 还有bit overlay = Overlay("yolov2.bit") 以及两个权重的路径好像都是用的你自己的路径，在我这边执行的话就会报错找不到相关文件，所以我想问一下你像bit、coco.names、dog.jpg、weightsv2_comb_reorg_ap16.bin这些文件都应该放在哪里，我都移植在了pynq目录下的overlays下仍然有很多错，所以希望你能够解答一下疑惑，我是python的小白，硬转软刚入的坑

what are beta_offset and weight offset and INTER_WIDTH (19)?

Hey,
Could anyone please explain or tell me what beta_offset and weight offset and INTER_WIDTH (19) mean in this project ? It will be a great help.
Thanks in advance!

error in----

I didnt find this file so i got errors
#===============================================

yolov2 weight and bais copyto memory

#==============================================
params_wight = np.fromfile("yolov2_w_reorg_bn_ap16_short16.bin", dtype=np.uint32)
np.copyto(weight_base_buffer, params_wight)
print("yolov2_weight copy ok\n")

params_bais = np.fromfile("yolov2_b_ap16_short16.bin", dtype=np.uint32)
np.copyto(bate_base_buffer, params_bais)
print("yolov2_bais copy ok\n")

输入输出16位问题

大佬您好您采用16位量化，用的是定点数吗？还有怎末确定定点数的小数位是多少呢？是不是需要查看yolo v2每层的输出，观察是否溢出。

您好，我使用zynq进行仿真，那几个bin文件有些疑问请教您

代码中有五个bin文件：
其中 buf的有两个，Weight_buf（yolov2_w_reorg_bn_ap16_short16.bin）、Beta_buf ( yolov2_b_ap16_short16.bin ) ，这两个文件是什么文件，有什么用呢？
其中还有三个，inputQ（yolov2_bn_input_maxQ_24.bin ）、weightQ（yolov2_w_reorg_bn_ap16_maxQ_23.bin）、betaQ（yolov2_b_ap16_maxQ_23.bin），这三个文件是什么文件，有什么作用呢？

yolo的权重文件只有一个，就是 .weight，如（yolov2.weights），上面的五个文件跟yolo的.weight是什么关系呢？

ZYNQ7 Processing System模块缺少管脚

你好。在vivado目录的第二步里面，Block Design添加上述ip的时候，管脚和图中所示无法对应，其他的都可以手动添加，但是IRQ_F2P这个脚是灰色的无法添加。于是考虑使用文档所述的pynq_revC.tcl和PYNQ-Z1_C.xdc文件进行设置和约束，但是不知道如何使用这两个文件呢？使用source pynq_revC.tcl命令好像没有产生任何效果。

How could I change to yolov2-tiny?

Hello, thanks for sharing. How could I try the yolov2-tiny by using your code?

Generate bitstream error

Hi, i am trying to replicate your vivado block diagram. I just followed what is on the tcl file. However, i have timing issues during implementation.

Have you encountered these warnings? I tried to ignore them but bitstream seems not fine when i tried to use it for the overlay. Have you added any timing constraints?

Thank you. Hoping for your feedback.

The allocated Memory Size in Python Code

Hi, I'm a little bit confused with the python code.

In which, you allocated the memory as follow

img_base_buffer = xlnk.cma_array(shape=(4194304,), dtype=np.int32)
print("16M",img_base_buffer.physical_address)
IMG_MEM = img_base_buffer.physical_address

However, I am not quite sure about how does "4194304" come up.

I mean, in the following code, you declared some length like this

MEM_BASE = IMG_MEM
MEM_LEN = 416*416*32*2+208*208*32*2
Memory_top = MEM_BASE
Memory_bottom = MEM_BASE + MEM_LEN

I guess the MEM_LEN stand for the maximum data you have to keep (Input and output of layer1).

So, it's more like "41641632* 2(Byte)++20820832*2(Byte)=3461120 (int 32 Word)"

However, this number still far from the "4194304"?

Did I miss anything?

Thx!!

can this code be used with other boards ?

hey @dhm2013724 thank you for uploading your code. as you said this runs at a speed of 1fps on pynq ryt? do u think that it is possible to make it run on pynq real time with tiny-yolo (if you further optimize?) Also can this be run on any other xilinx board other than pynq ?

Thanks in advance !

使用vivado SDK按照作者的petalinux生成的镜像，无法读取yolov2.cfg文件

使用vivado SDK按照作者的petalinux生成的镜像，无法读取yolov2.cfg文件，希望大佬给与解答，感激不尽，

where is the 'yolov2_w_reorg_bn_ap16_short16.bin'

Hi,
When I run the yolov2.ipynq , I can not find the yolov2_w_reorg_bn_ap16_short16.bin and yolov2_b_ap16_short16.bin. Please help me.

can this IP can run with other overlay?

hey @dhm2013724 ; I really appreciate your work as it is really amazing.
I am trying to implement your IP with creating a new overlay. but the issue is when i run your code in my overlay it shows some errors in the second last code in .ipynb which I dont know how to resolve it. can you please help me with that?

the issue is as follows:-

Open pictrue success!
pictrue size: (640, 424)
pictrue mode: RGB
yolov2_image copy ok

0 conv

KeyboardInterrupt Traceback (most recent call last)
in ()
6
7 img_out = frame_in
----> 8 yolo_meminout(frame_in,img_w,img_h,img_out)
9 img_out

in yolo_meminout(frame_in, img_w, img_h, frame_out)
31 region_buff = np.zeros((73008,), dtype=np.float32)
32
---> 33 yolo_fpga(img_base_buffer,region_buff)
34 end_time = time.time()
35 fpga_process_time = end_time - start_time

in yolo_fpga(img_base_buffer, region_buff)
53 mLoops,nLoops,rLoops,cLoops,0,
54 inputQ[offset_index],inputQ[offset_index+1],weightQ[offset_index],betaQ[offset_index],
---> 55 WEIGHT_BASE,BETA_BASE)
56
57

in YOLO__Init_EX(In_Address, Out_Address, Weight_offset, Beta_offset, InFM_num, OutFM_num, Kernel_size, Kernel_stride, Input_w, Input_h, Padding, IsNL, IsBN, TM, TN, TR, TC, mLoops, nLoops, rLoops, cLoops, LayerType, InputQ, OutputQ, WeightQ, BetaQ, WEIGHT_BASE, BETA_BASE)
53
54 while True:
---> 55 ap_idle = (mmio.read(XYOLO2_FPGA_CTRL_BUS_ADDR_AP_CTRL)>>2)&0x01
56 if(ap_idle):
57 break

/usr/local/lib/python3.6/dist-packages/pynq/mmio.py in read(self, offset, length)
137
138 self._debug('Reading {0} bytes from offset {1:x}',
--> 139 length, offset)
140
141 # Read data out

KeyboardInterrupt:

the last line shows keyboard interrupt which was done by me as it was taking so much time to run the code (24 min)

please help me regarding this.....

代码说明文档

大佬既然进行了开源，可否将代码的说明文档也开源啦以便于对代码进行阅读，感激不尽

SDK version

Look forward to your sharing of the SDK version.

whats the difference between "weight_memcpy_buffer" and "weight_buffer"

@dhm2013724 @clancylea Hey,
I do not understand the difference between weight_memcpy_buffer(which is a 1D buffer as I understand, and to which you copy the weights first, also has the half of the size of weight_buffer) and weight_buffer( to which you copy finally and which is a 3D buffer as I underatand). Could you please explain what weight_memcpy_buffer is and for what it is needed ? Would be a great help.

Thanks in advance!

有关VIVADO的BLOCK DESIGN的问题

您好我导入的PROCESSING_SYSTEM7_0跟您的相比少了很多的管脚，因为第一次接触vivado，所以不知道您教程中的add ip ps7.0 apply configuration pynq_revC.tcl是什么意思，我在tcl console中运行了这个tcl 文件但是好像也没有什么变化，所以想来向你请教一下

dhm2013724 / yolov2_xilinx_fpga Goto Github PK

yolov2_xilinx_fpga's Issues

quit

yolov2 weight and bais copyto memory

the issue is as follows:-

0 conv

KeyboardInterrupt:

Recommend Projects

Recommend Topics

Recommend Org