🚀 yolov5_reserach PLUS— High-level

更新升级中.....

⭐新闻板块【实时更新/计划】

- 2022/9/19 修复已知BUG，更新了实时的V5BUG修复和代码优化融合验证，核心检测、分类、分割的部分CI验证，关键点检测实测训练正常，基本功能整理完毕。
- 2022/9/15 分类、检测、分割、关键点检测基本整合完毕，工程结构精简化中，关键点检测训练正常已经验证，分割待调试，火速迭代中
- 分割代码结合V5和V7的代码进行了合并DEBUG调试，训练部分待验证，另外注意力层训练过程中，没法收敛或者NAN的情况，排除代码问题，需要在超参数YAML里，先对学习衰减率从改为0.2 ，比如GAM的注意力头部问题，训练周期加到400epoch+ 就可以。
- 去年的decoupled结构虽然能提点，不过FLOPS增加的太多，目前用V5作者分支的解耦头替换，效果待验证。
- 融合了代码做了部分的优化，这里看了下V7的代码优化较差，后续会集成精简版本的分类、分割、检测、POSE检测的结构，目前已经完成了一部分工作，更新频繁有问题欢迎反馈和提供实验结果。

关于这个项目的使用说明详请可参考此博客： csdn持续更新

当前 Project 结构说明

yolov5_research
│   pose  
│   └─────   ##关键点检测
│   ...   
│
│   segment
│   └─────    ## 分割
|   classify
│   └─────     ##分类
│   utils
│   └───── .
|         .           .
|                     .
|            	  . segment      ##分割的数据处理操作部分
│     
|       .
|       .      ##其余为检测核心代码和通用部分

Feature 🚀

 - 以最新的yoloV5工程风格代码融合，针对High-level任务：完成先进的检测器、分类器、分割、关键点检测功能集成，删除额外的三方库依赖，快速迭代
 
 - 实时的v5代码更新改动&&v7的适配
 
 - 早期集成的attention、self-attention维护和调试
 
 - 额外的网络结构和Tricks补充
	
 - deepstream部署工程（目前git上是21年开源的5.1版本，后续会整理好说明上传6.1的版本）

CSDN同步更新 (保证每周同步更新一次维护，不定期更新算法代码和引进结果实验！关于消融实验大多来自朋友的热心反馈，探究范式CNN和transformer，如何根据经验设计网络结构、LOSS改进、辅助训练分支、样本匹配.... 欢迎提供实验数据和結果~)

对于自注意力机制的使用：很多人与CNN相结合使用得到精度提升，个人理解：原因不仅仅是长距离的依赖，早期我们使用固定权重的滤波器提取边缘再到CNN，CNN也许是对应着高通滤波，而self-attention对应于低通滤波，那么相当于对featuremap进行了一次平滑，这样从某种程度上可以解释互补之后的提升；而且transfromer是很难发生过拟合或者说不存在，动态特性确实更具泛化性，常规情况中优先考虑你训练数据集的拟合够不够好，你的模型是否能反映出数据之间的特征特异性，其次扩充构建相应的辅助分支加入特征属性描述。声明：这一年里有很多朋友用修改的idea方式以及注意力和设计结构去做Paper，不用咨询我意见：一切free！有成效自己论文开源随便，与我无关，只是希望各位能把实验数据反馈给我，因为平时工作业务比较忙，实验的时间不够。也是借此目的来和大家一起学习交流。

Install

Clone repo and install [requirements.txt](https://github.com/positive666/yolov5_research/requirements.txt)

git clone https://github.com/positive666/yolov5_research  # clone
cd yolov5_research
pip install -r requirements.txt  # install

Multi-GPU DistributedDataParallel

使用DistributedDataParallel，多个进程只进行倒数传播，每个GPU都进行一次梯度求导和参数更新，这比DataParallel的方式更高效，因为DataParalledl只有一个主GPU进行参数更新，所以需要各个子进程调用的GPU传递倒数到主GPU后，才会更新参数給各个GPU，所以这会比DistributedDataParallel每个GPU直接进行参数更新要慢很多。 –nproc_per_node: 作为GPU的使用数量节点数 –batch：总batch-size ,然后除以Node数量，平均给每个GPU。 ```bash python -m torch.distributed.run --nproc_per_node 2 train.py --batch 64 --data coco.yaml --weights yolov5s.pt --device 0,1 ```

Multi -machines && Multi-GPU

```bash 主机 python -m torch.distributed.run --nproc_per_node G --nnodes N --node_rank 0 --master_addr "192.168.1.1" --master_port 1234 train.py --batch 64 --data coco.yaml --cfg yolov5s.yaml --weights '' #多个副机 python -m torch.distributed.run --nproc_per_node G --nnodes N --node_rank R --master_addr "192.168.1.1" --master_port 1234 train.py --batch 64 --data coco.yaml --cfg yolov5s.yaml --weights '' ``` –master_port：端口号 master_addr：主进程ip地址 G:每个机器的GPU数量 N:机器数量 R:子机器序号

目标检测篇

Inference with detect.py

python detect.py --source 0  # webcam     --weights <your model weight>
                          img.jpg  # image
                          vid.mp4  # video
                          path/  # directory
                          'path/*.jpg'  # glob
                          'https://youtu.be/Zgi9g1ksQHc'  # YouTube
                          'rtsp://example.com/media.mp4'  # RTSP, RTMP, HTTP stream

yolov7 官方的訓練权重打包链接：https://pan.baidu.com/s/1UIYzEZqTPMUpWWBBczOcgA?pwd=v7v7(由于我删除了P6模型里的Reorg操作和Focus没本质区别，所以删掉需要重新训练，如果你想使用V7原始权重，你只需要在YAML里改回去，还有一种方式是遍历Reorg的权重把它替换掉) 提取码：v7v7

Train

see train.py args ,command as :

python train.py --data <your data yaml>  --cfg  <your model yaml> --weights <weights path>  --batch-size 128    --hyp   <hyps yaml>  --batch-size <numbers>

Notes

"--swin_float" for "SwinTransformer_Layer" layers,because of " @" not support fp16,so you can use offical yolov7 “ Swinv2Block”
"--aux_ota_loss" for aux- head only . Such "models/v7_cfg/training/yolov7x6x.yaml, (P6-model) ,you can create aux -head models.
"ota_loss" in hyps filse ,ota_loss default=0

Training commnd example

run yolov7-P5 model train and yolov5 seriese models ,scratch or fine ,your need a weights

python train.py  --data data/coco128.yaml  --cfg models/yolov5s_decoupled.yaml

python train.py  --cfg  models/v7_cfg/training/yolov7.yaml  --weights yolov7.pt  --data (custom datasets) --hyp data/hyps/hyp.scratch-v7.custom.yaml

run yolov7-aux model train ,your model must P6-model !

python train.py  --cfg  models/v7_cfg/training/yolov7w6.yaml --imgsz 1280  --weights 'yolov7-w6_training.pt'  --data (custom datasets)  --aux_ota_loss  --hyp data/hyps/hyp.scratch-v7.custom.yaml

After training/under yaml structure, your initial weight xxx. PT will become a trained yolov7xxx.pt , with specific references to reparameterized scripts. Then use the deploy model to load the weights of your training, change the index and structure to re-parameterize the model.

re-parameterizetrained yolov7 model

   python reparameterization.py  --weights <yolov7.pt,yolov7e6e.pt.....>  --name <model name > --save_file   models/v7_cfg/deploy  --cfg <model.yaml>

关键点检测篇

数据集构建

yolov5_research
│   pose  
│   └─────(key point detect code )
│   ...   
│
coco_kpts(your data yaml path name )
│   images
│   annotations/**.json
|   labels
│   └─────train2017
│       │       └───
|       |       └───
|       |       '
|       |       .
│       └─val2017
|               └───
|               └───
|               .
|               .
|    train2017.txt
|    val2017.txt

Inference

refernce v7 weights. yolov7-w6-pose.pt

python pose/detect.py --weights pose/pose_weights/yolov7-w6-pose.pt  --source  data/images/bus.jpg   --kpt-label

Train

yolov7-w6-person.pt

    python pose/train.py --data  data/coco_kpts.yaml  --cfg  pose/cfg/yolov7-w6-pose.yaml weights yolov7-w6-person.pt --img 960  --kpt-label --hyp data/hyps/hyp.pose.yaml

分割篇

Inference

python segment/predict.py --weights yolov5s-seg.pt --source 0

Train

python segment/train.py --data coco128-seg.yaml --weights yolov5s-seg.pt --img 640  # from pretrained (recommended)

分类篇yolov5官方版本

YOLOv5 release v6.2 brings support for classification model training, validation, prediction and export! We've made training classifier models super simple. Click below to get started.

Classification Checkpoints (click to expand)

We trained YOLOv5-cls classification models on ImageNet for 90 epochs using a 4xA100 instance, and we trained ResNet and EfficientNet models alongside with the same default training settings to compare. We exported all models to ONNX FP32 for CPU speed tests and to TensorRT FP16 for GPU speed tests. We ran all speed tests on Google Colab Pro for easy reproducibility.

Model	size ^(pixels)	acc ^top1	acc ^top5	Training ^{90 epochs 4xA100 (hours)}	Speed ^{ONNX CPU (ms)}	Speed ^{TensorRT V100 (ms)}	params ^(M)	FLOPs ^{@224 (B)}
YOLOv5n-cls	224	64.6	85.4	7:59	3.3	0.5	2.5	0.5
YOLOv5s-cls	224	71.5	90.2	8:09	6.6	0.6	5.4	1.4
YOLOv5m-cls	224	75.9	92.9	10:06	15.5	0.9	12.9	3.9
YOLOv5l-cls	224	78.0	94.0	11:56	26.9	1.4	26.5	8.5
YOLOv5x-cls	224	79.0	94.4	15:04	54.3	1.8	48.1	15.9

ResNet18	224	70.3	89.5	6:47	11.2	0.5	11.7	3.7
ResNet34	224	73.9	91.8	8:33	20.6	0.9	21.8	7.4
ResNet50	224	76.8	93.4	11:10	23.4	1.0	25.6	8.5
ResNet101	224	78.5	94.3	17:10	42.1	1.9	44.5	15.9

EfficientNet_b0	224	75.1	92.4	13:03	12.5	1.3	5.3	1.0
EfficientNet_b1	224	76.4	93.2	17:04	14.9	1.6	7.8	1.5
EfficientNet_b2	224	76.6	93.4	17:10	15.9	1.6	9.1	1.7
EfficientNet_b3	224	77.7	94.0	19:19	18.9	1.9	12.2	2.4

Table Notes (click to expand)

All checkpoints are trained to 90 epochs with SGD optimizer with lr0=0.001 and weight_decay=5e-5 at image size 224 and all default settings.
Runs logged to https://wandb.ai/glenn-jocher/YOLOv5-Classifier-v6-2
Accuracy values are for single-model single-scale on ImageNet-1k dataset.
Reproduce by python classify/val.py --data ../datasets/imagenet --img 224
Speed averaged over 100 inference images using a Google Colab Pro V100 High-RAM instance.
Reproduce by python classify/val.py --data ../datasets/imagenet --img 224 --batch 1
Export to ONNX at FP32 and TensorRT at FP16 done with export.py.
Reproduce by python export.py --weights yolov5s-cls.pt --include engine onnx --imgsz 224

Classification Usage Examples (click to expand)

Train

YOLOv5 classification training supports auto-download of MNIST, Fashion-MNIST, CIFAR10, CIFAR100, Imagenette, Imagewoof, and ImageNet datasets with the --data argument. To start training on MNIST for example use --data mnist.

# Single-GPU
python classify/train.py --model yolov5s-cls.pt --data cifar100 --epochs 5 --img 224 --batch 128

# Multi-GPU DDP
python -m torch.distributed.run --nproc_per_node 4 --master_port 1 classify/train.py --model yolov5s-cls.pt --data imagenet --epochs 5 --img 224 --device 0,1,2,3

Val

Validate YOLOv5m-cls accuracy on ImageNet-1k dataset:

bash data/scripts/get_imagenet.sh --val  # download ImageNet val split (6.3G, 50000 images)
python classify/val.py --weights yolov5m-cls.pt --data ../datasets/imagenet --img 224  # validate

Predict

Use pretrained YOLOv5s-cls.pt to predict bus.jpg:

python classify/predict.py --weights yolov5s-cls.pt --data data/images/bus.jpg

model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s-cls.pt')  # load from PyTorch Hub

Export

Export a group of trained YOLOv5s-cls, ResNet and EfficientNet models to ONNX and TensorRT:

python export.py --weights yolov5s-cls.pt resnet50.pt efficientnet_b0.pt --include onnx engine --img 224

历史更新

更新记录

- 2020/9/15 High-level 集成待验证，目前姿态训练和检测已经调试完成。 - 2022/7/21 除关键点部分的V7代码以及V5代码风格优化合并更新，改善了重参数脚本的功能，详情看 reparameterization.py

2022/7/13 同步更新加入了yolov7的P6模型訓練部分代碼，p6是需要aux的所以需要添加Loss部分計算，代碼和CSDN持續更新中,由于我删除了P6模型里的Reorg操作其实就说FOcus，所以需要重新训练，如果你想使用V7原始权重，你只需要在YAML里改回去
2022/7/7 依旧全网首发：以目前仓库的魔改版V5为基准同步了YOLOV7的核心改动，代码风格是最新的，后续会持续完善优化，完美融合V7，后续博客争取更新第一时间！
2022/5/23 合并更新了YOLOV5仓库的最新版本，作者代码有点小问题就是数据集会重复下载，这部分我没就没合并他的更新，引入了新的算子，看来他也在探索实验

2022/3/26 测试下解耦训练结果/更新GAM注意力层代码：按照论文示意在大模型中使用分组卷积降低FLOPs,同步简单实验下，关于实验在闲暇之余都会慢慢完善的。以small模型，在Visdrone数据下的简单验证：

Model	[email protected]	[email protected]:95	Parameters(M)	GFLOPs
YOLOv5s	0.351	0.194	7.2	16.5
YOLOv5s+GAM	0.35	0.194	22.2	36.9
YOLOv5s_decoup	0.367	0.203	7.1	17.2
YOLOv5s_GAM_group	0.353	0.192	11	21.4

2022/3/26 1.修复了一些常规的问题BUG并合并了V5作者的最新代码更新，大概包含之前缺少了一些可学习参数和代码优化,如添加了swintransformerV2.0的相对位置编码加入优化器等。 2.目前看来GAM换用组卷积效果有待商榷，后续进一步整理消融实验总结。
2022/3/16 对上传的GAM注意力层进行了简单的实验，yolov5s+GAM在Visdrone数据集上的结果举例参考，后续的话其实难点在于轻量化，探究大模型的骨干估计只有大厂研究资源能有成本去做。
2022/3/5 近期会整理一些去年的实验数据/、使用swin2的骨干，超参数需要调试一下，首先要稍微减低学习率，（实测SGD）；也可以把SWIN层作为注意力插件训练，这个和以往的操作类似，不再赘述了需要开启--swin_float 命令参数，因为点积不被cuda的half支持，而优化器的问题，那么问题基本就是较多的swin block 堆积导致的增量更新。同时伴随着GPU的开销。
2022/3.1 （不完整更新,供参考，怕忙断更，所以先放出部分修改，目前还在动态调试中）按照SWintransformerV2.0 的改进点：修改了NORM层的位置/attention将dot换成scaled cosine self-attention，待更新的优化部分：1.序列窗口注意力计算，降低显存开销 2、训练优化器
2022/2.28 添加了一个Swintransformer的Backbone和yaml示意结构，很多人把SWIN还像之前做成注意力层，但是其实swin设计是为了摒弃CNN去和NLP一统，而且精髓在于控制计算复杂度，其实backbone的全替换也许更值得尝试，内存开销和结构设计待优化
2022/2.22 忙里抽闲：更新了今天的yolov5的工程修复，修改了解耦头的代码风格，直接yaml选择参考使用，服务器回滚了代码。SWIN算子在，YAML文件丢失了，找时间从新写一个再上传，太忙看更新可能优先GIT，等有空博客细致归纳下
2022/2.6 ASFF使用的BUG已经修复;近期更新Swintransformer代码，简单说明下程序上其实是两种改法：1.类似算子层的修改，这个比较简单 2、全部替换成Swintransformer，这个对于整个程序存在一定的代码修改地方，稍微复杂点。
2022/1.9 补充一些注意力算子GAM，原理后续CSDN说明，修复BUG
2021/11.3 合并最新的YOLOV5的改动，替换了CSPBOTTLENNECK的LeakRELUw为SLIU，其余全是代码和工程规范修改
2021.10.25 修复BUG，恢复EIOU
2021.10.13 更新合并YOLOV5v6.0版本，改进点：第一时间的更新解析可参考CSND博客
2021.9.25 将自注意力位置编码设置成可选项，默认取消，CBAM不收敛——将激活函数改回Sigmoid
2021.6.25 添加BIFPN结构包含P5/P6层，增大开销但是对于一些任务是能够提点的
2021.6 Botnet transformer 算子块引入于Backbone底层
2021.2.10 全网首发的YOLOV5魔改，ASFF检测头封装加入、注意力机制CBAM、CooRD、等注意力算子引入，并介绍了通用修改方式

高性能视频推理部署—待更新升级

工程部署 Why Deepstream?

工程部署：该仓库只属于研究探索，但是工程部署讲究简单高效、故可以参考我的Deepstream SDK改的项目，集合了通用检测、人脸识别、OCR三个项目，高性能的部署AI框架开发逻辑，这个项目是我2021年整理并开源的，代码还未规范，但程序是没问题的。 DS_5.1&&Tensorrt7+ ：https://github.com/positive666/Deepstream_Project

 1.英伟达提供的Deepstream &&Tensorrt，应用于流媒体处理，因为做过业务的都知道，推理性能不等于程序运行性能，核心除了模型的本身剪枝量化之外，涉及到了对数据输入的处理，这里的核心问题是如何提高GPU的利用率，那么最直接的就是GPU编解码.
 2.目前嵌入式部署可能大多采用剪枝通道压缩模型的流程，在结合一些框架去进行引擎推理，推荐Yolov5nano或者nanodetplus,(工程上主流是通道裁剪，替换如C3的BOLOCK，你可以在仔细比对YOLOV5的迭代。还有就是如何使用SGD炼丹的经验了)
 还有就是deepstream的普及，网上很多剪枝版本我也看了值得学习，但是工程不只在于学习，而在于成本和结果。
 3.x86和Jeston都可以部署，既有一站式解决方案，我觉得工程和研究应用是完全不同的操作思路，精简高效达到目的.deepstream全做了并完成降维打击 ，当然也需要一定的综合开发能力。

C++ sdk的完整Deepstream5.1部署（内置C++嵌入的Kafka服务）

目前是5.1版本，近期更新6.0(主要区别在于Tensorrt7和Tensorrt8的源码区别导致的，部分6.0 SDK有变动) [Deepsteam YOLOV5 V5.0]https://github.com/positive666/Deepstream_Project/tree/main/Deepstream_Yolo

Acknowledgements

Expand

yang-0201 / yolov5_research Goto Github PK

yolov5_research's Introduction

🚀 yolov5_reserach PLUS— High-level

⭐新闻板块【实时更新/计划】

Feature 🚀

目标检测篇

Train

关键点检测篇

Inference

Train

分割篇

Inference

Train

分类篇yolov5官方版本

Train

Val

Predict

Export

历史更新

高性能视频推理部署—待更新升级

C++ sdk的完整Deepstream5.1部署（内置C++嵌入的Kafka服务）

Acknowledgements

yolov5_research's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org