qingkwl / yolov5 Goto Github PK

2.0 0.0 2.0 458 KB

Python 95.41% Shell 4.59%

yolov5's Introduction

Click to unfold/fold

Contents
[YOLOv5 Description](#YOLOv5 Description)
[Model Architecture](#Model Architecture)
Dataset
- [Dataset Download](#Dataset Download)
- [Dataset Structure](#Dataset Structure)
- [Dataset Conversion](#Dataset Conversion)
Quick Start
Script Description
- [Script and Sample Code](#Script and Sample Code)
- [Script Parameters](#Script Parameters)
- [Training Process](#Training Process)
  - Training
  - [Distributed Training](#Distributed Training)
- [Evaluation Process](#Evaluation Process)
  - Evaluation
- [Infer Process](#Infer Process)
  - Environment
  - Infer
[Model Description](#Model Description)
Performance
Q&A

YOLOv5 Description

Published in April 2020 by Ultralytics, YOLOv5 achieved state-of-the-art performance on the COCO dataset for object detection. It is an important improvement of YoloV3, the implementation of a new architecture in the Backbone and the modifications in the Neck have improved the mAP(mean Average Precision) by 10% and the number of FPS(Frame per Second) by 12%.

Repository of official implementation by PyTorch：https://github.com/ultralytics/yolov5

Model Architecture

The YOLOv5 network is mainly composed of CSP and Focus as a backbone, spatial pyramid pooling(SPP) additional module, PANet path-aggregation neck and YOLOv3 head. CSP is a novel backbone that can enhance the learning capability of CNN. The spatial pyramid pooling block is added over CSP to increase the receptive field and separate out the most significant context features. Instead of Feature pyramid networks (FPN) for object detection used in YOLOv3, the PANet is used as the method for parameter aggregation for different detector levels. To be more specific, CSPDarknet53 contains 5 CSP modules which use the convolution C with kernel size k=3x3, stride s = 2x2; Within the PANet and SPP, 1x1, 5x5, 9x9, 13x13 max poolings are applied.

Dataset

YOLOv5 is trained on COCO dataset with labels of YOLO format.

Dataset Download

Dataset:

Raw data
- Train set: http://images.cocodataset.org/zips/train2017.zip
- Validation set: http://images.cocodataset.org/zips/val2017.zip
- Test set: http://images.cocodataset.org/zips/test2017.zip
YOLO format labels coco2017labels: https://github.com/ultralytics/yolov5/releases/download/v1.0/coco2017labels.zip
YOLO format segmentation labels coco2017labels-segments: https://github.com/ultralytics/yolov5/releases/download/v1.0/coco2017labels-segments.zip

Download the raw images data, and the labels files according to target model:

Model	Label
YOLOv5n	coco2017labels
YOLOv5s	coco2017labels
YOLOv5m	coco2017labels-segments
YOLOv5l	coco2017labels-segments
YOLOv5x	coco2017labels-segments

Dataset Structure

After downloading the dataset and labels, you should put them in correct position as the following text shows. The images folder saves the images and labels folder saves the corresponding labels. The text files like train2017.txt saves the image paths of the corresponding subset of dataset.

YOLO
├── images
|   ├── train2017
|   ├── val2017
|   └── test2017
├── labels
|   ├── train2017
|   ├── val2017
├── images
|   ├── train2017
|   ├── val2017
|   └── test2017
├── train2017.txt
├── val2017.txt
└── test2017.txt

Dataset Conversion

If you want to use customized data with COCO or labelme format, you can use conversion script to convert them to YOLO format.

Conversion steps:

Change directory to config/data_conversion. The names of the files in this folder stand for configs of corresponding dataset.
Modify the config files of the original format and the conversion target format. Change the path in config files.
After edit of config files, run convert_data.py script. For example, python convert_data.py coco yolo means convert dataset from coco format to yolo.

Quick Start

Installation

Follow the tutorial in MindSpore official website to install mindspore. Then use the following command to install other required packages:

pip install -r requirements.txt

Training

You can use the following command to train on a single device:

# Run training example(1p) on Ascend/GPU by python command
python train.py \
    --ms_strategy="StaticShape" \
    --overflow_still_update=True \
    --optimizer="momentum" \
    --cfg="../config/network/yolov5s.yaml" \
    --data="../config/data/coco.yaml" \
    --hyp="../config/data/hyp.scratch-low.yaml" \
    --device_target=Ascend \
    --epochs=300 \
    --batch_size=32  > log.txt 2>&1 &

Or you can use shell scripts. The scripts support training on single or multiple devices. The command are in the following:

# Run 1p by shell script, please change `device_target` in config file to run on Ascend/GPU, and change `T_max`, `max_epoch`, `warmup_epochs` refer to contents of notes
bash run_standalone_train_ascend.sh -c ../config/network/yolov5s.yaml -d ../config/data/coco.yaml \
     -h ../config/data/hyp.scratch-low.yaml

# For Ascend device, distributed training example(8p) by shell script
bash run_distribute_train_ascend.sh -c ../config/network/yolov5s.yaml -d ../config/data/coco.yaml \
     -h ../config/data/hyp.scratch-low.yaml -r hccl_8p_xx.json

You could pass --help or -H to shell script to see usage in detail.

If you want to use custom dataset，you can use compute_anchors.py to compute new anchors, then use the output anchors to update the anchors item in corresponding model config files.

Evaluation

You can use the following command to evaluate a model:

# Run evaluation on Ascend/GPU by python command
python val.py \
  --weights="path/to/weights.ckpt" \
  --cfg="../config/network/yolov5s.yaml" \
  --data="../config/data/coco.yaml" \
  --hyp="../config/data/hyp.scratch-low.yaml" \
  --device_target=Ascend \
  --img_size=640 \
  --conf=0.001 \
  --rect=False \
  --iou_thres=0.60 \
  --batch_size=32 > log.txt 2>&1 &

The rect switch can increase mAP of evaluation result. The results in official repository is evaluated with this switch on. Please note this difference when you compare evaluation results of two repositories.

Or you can also use shell scripts to do evaluation:

# Run distributed evaluation by shell script
bash run_distribute_test_ascend.sh -w path/to/weights.ckpt -c ../config/network/yolov5s.yaml -d ../config/data/coco.yaml \
     -h ../config/data/hyp.scratch-low.yaml -r hccl_8p_xx.json

# Run standalone evaluation by shell script
bash run_standalone_test_ascend.sh -w path/to/weights.ckpt -c ../config/network/yolov5s.yaml -d ../config/data/coco.yaml \
     -h ../config/data/hyp.scratch-low.yaml

The corresponding config files are in config folder. The coco.yaml in config/data folder is about dataset configs. The hyp.scratch-low.yaml are hyperparameters settings. The yolov5s.yaml saves model architecture configs.

Script Description

Script and Sample Code

Click to unfold/fold

yolov5
├── README.md                                      // descriptions about yolov5
├── README_CN.md                                   // Chinese descriptions about yolov5
├── __init__.py
├── config
│   ├── args.py                                    // get config parameters from command line
│   ├── data
│   │   ├── coco.yaml                              // configs about dataset
│   │   ├── hyp.scratch-high.yaml                   // configs about hyper-parameters
│   │   ├── hyp.scratch-low.yaml
│   │   └── hyp.scratch-med.yaml
│   ├── data_conversion
│   │   ├── coco.yaml                              // config of coco format dataset
│   │   ├── labelme.yaml                           // config of labelme format dataset
│   │   └── yolo.yaml                              // config of yolo format dataset
│   └── network                                    // configs of model architecture
│       ├── yolov5l.yaml
│       ├── yolov5m.yaml
│       ├── yolov5n.yaml
│       ├── yolov5s.yaml
│       └── yolov5x.yaml
├── compute_anchors.py                             // compute anchors for specified data
├── convert_data.py                                // convert dataset format
├── deploy                                         // code for inference
│   ├── __init__.py
│   └── infer_engine
│       ├── __init__.py
│       ├── lite.py                                // code for inference with MindSporeLite
│       ├── mindx.py                               // code for inference with mindx
│       └── model_base.py
├── export.py
├── preprocess.py
├── scripts
│   ├── common.sh                                  // common functions used in shell scripts
│   ├── get_coco.sh
│   ├── hccl_tools.py                              // generate rank table files for distributed training or evaluation
│   ├── mpirun_test.sh                             // launch evaluation with OpenMPI
│   ├── mpirun_train.sh                            // launch training with OpenMPI
│   ├── run_distribute_test_ascend.sh              // launch distributed evaluation(8p) on Ascend
│   ├── run_distribute_train_ascend.sh             // launch distributed training(8p) on Ascend
│   ├── run_standalone_test_ascend.sh              // launch 1p evaluation on Ascend
│   └── run_standalone_train_ascend.sh             // launch 1p training on Ascend
├── src
│   ├── __init__.py
│   ├── all_finite.py
│   ├── augmentations.py                           // data augmentations
│   ├── autoanchor.py
│   ├── boost.py
│   ├── callback.py
│   ├── checkpoint_fuse.py
│   ├── coco_visual.py
│   ├── data                                       // code for dataset format conversion
│   │   ├── __init__.py
│   │   ├── base.py                                // base class for data conversion
│   │   ├── coco.py                                // transfer dataset with coco format to others
│   │   ├── labelme.py                             // transfer dataset with labelme format to others
│   │   └── yolo.py                                // transfer dataset with yolo format to others
│   ├── dataset.py                                 // create dataset
│   ├── general.py                                 // general functions used in other scripts
│   ├── loss_scale.py
│   ├── metrics.py
│   ├── modelarts.py
│   ├── ms2pt.py                                   // transfer weights from MindSpore to PyTorch
│   ├── network
│   │   ├── __init__.py
│   │   ├── common.py                              // common code for building network
│   │   ├── loss.py                                // loss
│   │   └── yolo.py                                // YOLOv5 network
│   ├── optimizer.py                               // optimizer
│   ├── plots.py
│   └── pt2ms.py                                   // transfer weights from PyTorch to MindSpore
├── test.py                                        // script for evaluation
├── third_party                                    // third-party code
│   ├── __init__.py
│   ├── fast_coco                                  // faster coco mAP computation
│   │   ├── __init__.py
│   │   ├── build.sh
│   │   ├── cocoeval
│   │   │   ├── cocoeval.cpp
│   │   │   └── cocoeval.h
│   │   ├── fast_coco_eval_api.py
│   │   └── setup.py
│   ├── fast_nms                                   // faster nms computation
│   │   ├── __init__.py
│   │   ├── build.sh
│   │   ├── nms.pyx
│   │   └── setup.py
│   └── yolo2coco                                  // yolo data format to coco format converter
│       ├── __init__.py
│       └── yolo2coco.py
└── train.py                                       // script for training

Script Parameters

Major parameters in train.py are:

optional arguments:
  --ms_strategy           Training strategy. Default: "StaticShape"
  --distributed_train     Distributed training or not. Default: False
  --device_target         Device where the code will be executed. Default: "Ascend"
  --cfg                   Model architecture yaml config file path. Default: "./config/network/yolov5s.yaml"
  --data                  Dataset yaml config file path. Default: "./config/data/data.yaml"
  --hyp                   Hyper-parameters yaml config file path. Default: "./config/data/hyp.scratch-low.yaml"
  --epochs                Training epochs. Default: 300
  --batch_size            Batch size per device. Default: 32
  --save_checkpoint       Whether save checkpoint. Default: True
  --start_save_epoch      Epoch index after which checkpoint will be saved. Default: 1
  --save_interval         Epoch interval to save checkpoints. Default: 1
  --max_ckpt_num          Maximum number of saved checkpoints. Default: 10
  --cache_images          Whether cache images for faster training. Default: False
  --optimizer             Optimizer used for training. Default: "sgd"
  --sync_bn               Whether use SyncBatchNorm, only available in DDP mode. Default: False
  --project               Folder path to save output data. Default: "runs/train"
  --linear_lr             Whether use linear learning rate. Default: True
  --run_eval              Whether do evaluation after a training epoch. Default: True
  --eval_start_epoch      Epoch index after which model will do evaluation. Default: 200
  --eval_epoch_interval   Epoch interval to do evaluation. Default: 10
  --distributed_eval      Distributed evaluation or not. Default: False

Training Process

Training

For Ascend device, you can use shell scripts. The scripts support training on single or multiple devices. The command are in the following:

# Run 1p by shell script, please change `device_target` in config file to run on Ascend/GPU, and change `T_max`, `max_epoch`, `warmup_epochs` refer to contents of notes
bash run_standalone_train_ascend.sh -c ../config/network/yolov5s.yaml -d ../config/data/coco.yaml \
     -h ../config/data/hyp.scratch-low.yaml

# For Ascend device, distributed training example(8p) by shell script
bash run_distribute_train_ascend.sh -c ../config/network/yolov5s.yaml -d ../config/data/coco.yaml \
     -h ../config/data/hyp.scratch-low.yaml -r hccl_8p_xx.json

Or you can use the following command to start standalone training:

# Run training example(1p) on Ascend/GPU by python command
python train.py \
    --ms_strategy="StaticShape" \
    --optimizer="momentum" \
    --cfg="../config/network/yolov5s.yaml" \
    --data="../config/data/coco.yaml" \
    --hyp="../config/data/hyp.scratch-low.yaml" \
    --device_target=Ascend \
    --epochs=300 \
    --batch_size=32  > log.txt 2>&1 &

We recommend do training by running shell script.

You should fine tune the parameters when run training for custom dataset.

The python command above will run in the background.

Distributed Training

Distributed training example(8p) by shell script:

# For Ascend device, distributed training example(8p) by shell script
bash run_distribute_train_ascend.sh -c ../config/network/yolov5s.yaml -d ../config/data/coco.yaml \
     -h ../config/data/hyp.scratch-low.yaml -r hccl_8p_xx.json

# For GPU device, distributed training example(8p) by shell script
bash run_distribute_train_gpu.sh ../config/network/yolov5s.yaml ../config/data/coco.yaml \
     ../config/data/hyp.scratch-low.yaml

You can also use OpenMPI to run distributed training. You should follow the official tutorial to configure OpenMPI environment，then execute the following command：

bash mpirun_train.sh -c ../config/network/yolov5s.yaml -d ../config/data/coco.yaml \
     -h ../config/data/hyp.scratch-low.yaml

Evaluation Process

Evaluation

Before running the command below, please check the checkpoint path used for evaluation.

# Run evaluation by python command
python val.py \
  --weights="path/to/weights.ckpt" \
  --cfg="../config/network/yolov5s.yaml" \
  --data="../config/data/coco.yaml" \
  --hyp="../config/data/hyp.scratch-low.yaml" \
  --device_target=Ascend \
  --img_size=640 \
  --conf=0.001 \
  --rect=False \
  --iou_thres=0.65 \
  --batch_size=32 > log.txt 2>&1 &
# OR
# Run evaluation(8p) by shell script
bash run_distribute_test_ascend.sh -w path/to/weights.ckpt -c ../config/network/yolov5s.yaml -d ../config/data/coco.yaml \
     -h ../config/data/hyp.scratch-low.yaml -r hccl_8p_xx.json
# OR
# Run standalone evaluation by shell script
bash run_standalone_test_ascend.sh --w path/to/weights.ckpt -c ../config/network/yolov5s.yaml -d ../config/data/coco.yaml \
     -h ../config/data/hyp.scratch-low.yaml

The above python command will run in the background. You can view the results through the file "log.txt".

You can also use OpenMPI to run distributed test. You should follow the official tutorial to configure OpenMPI environment，then execute the following command：

bash mpirun_test.sh --w path/to/weights.ckpt -c ../config/network/yolov5s.yaml -d ../config/data/coco.yaml \
     -h ../config/data/hyp.scratch-low.yaml

Infer Process

Environment

Download Ascend-mindxsdk-mxmanufacture package of community version from MindX SDK community according to architecture of your device. We recommend package with .run suffix. We are now support MindX SDK 3.0 version.

When downloading complete, please firstly make sure you have configured related Ascend environment variables, then use the following command to install package:

bash Ascend-mindxsdk-mxmanufacture_xxx.run --install

After installation, you can use python -c "import mindx" to test whether installation is successful。

If you see error related to libgobject.so.2, you need to configure environment variable for library libffi.so.7:

Firstly, use find / -nane "libffi.so.7" to find the location of this library file；
Then use export LD_PRELOAD=/path/to/libffi.so.7 to configure environment variable.

Infer

The model of ckpt format can be transformed to om format by atc tool for doing inference on inference server. The following are steps:

Export model with AIR format： python export.py --weights /path/to/model.ckpt --file_format AIR;
Transform model with AIR format to om format by atc tool： /usr/local/Ascend/latest/atc/bin/atc --model=yolov5s.air --framework=1 --output=./yolov5s --input_format=NCHW --input_shape="Inputs:1,3,640,640" --soc_version=Ascend310, the --soc_version option can be got by npu-smi info command. Supported option choices are Ascend310，Ascend310P3;
Infer by executing infer.py script：python infer.py --batch_size 1 --om yolov5s.om

Note that, because dynamic shape is not supported for om format, the rect switch can not be set. So the mAP is lower than the result of checkpoint with rect enabled.

Model Description

Performance

Model	size ^(pixels)	mAP^{val 50-95 rect=True}	mAP^{val 50 rect=True}	mAP^{val 50-95 rect=False}	mAP^{val 50 rect=False}	Epoch Time(s)	Throughput (images/s)
YOLOv5n	640	0.279	0.459	0.277	0.455	66	224.00
YOLOv5s	640	0.375	0.572	0.373	0.57	79	187.14
YOLOv5m	640	0.453	0.637	0.451	0.637	133	111.16
YOLOv5l	640	0.489	0.675	0.486	0.671	163	90.70
YOLOv5x	640	0.505	0.686	0.506	0.687	221	66.90

Note

All models are trained to 300 epochs with default settings. Nano and Small models use hyp.scratch-low.yaml hyper-parameters, all others use hyp.scratch-high.yaml.
The following are settings used for different models:

--data coco.yaml --epochs 300 --weights '' --cfg yolov5n.yaml  --batch-size  16
                                                 yolov5s.yaml                32
                                                 yolov5m.yaml                24
                                                 yolov5l.yaml                24
                                                 yolov5x.yaml                24

The result of Epoch Time is evaluated on 8 Ascend 910A with batch_size 32 per device.
The result of Throughput is of single Ascend 910A device.
mAP^val values are for single-model single-scale on COCO val2017 dataset.
The key configs are --img_size 640 --conf_thres 0.001 --iou_thres 0.65
When data preprocessing is the bottleneck, you can set --cache_images to ram or memory to accelerate preprocessing. Note that ram may cause out of memory.
yolov5n need enable --sync_bn.

Q&A

cannot allocate memory in static TLS block

ImportError: /xxx/scikit_image.libs/libgomp-xxx.so: cannot allocate memory in static TLS block
It seems that scikit-image has not been built correctly.

This error is not caused by our code, but some packages depend on scikit-image package. Generally, you can change the import order to solve this error by adding import sklearn or import skimage at the beginning of the train.py.

If this still cannot solve the problem, you can search for this error to find other solution.

During the training, loss suddenly increases to a large value(generally lobj loss causes this), or some loss is nan。

This problem is caused by overflow during the training process. Overflow makes loss becomes nan, and after updating the the model, the value weights will become very large. This usually appears when training small dataset with just one class.

If you come into this problem, you can change the enable_clip_grad to True in hyp-scratch.xx.yaml to enable gradient clip. Besides, in our updated code, we add overflow detection. When we detect that the overflow happens, we will skip the update of this step, which can avoid this.

mAP is not good

Well, there are many possible reasons making mAP not good enough, like overflow mentioned in the 2nd question. If you use method in the above, you should see mAP will become good.

You can also try to adjust the lr0 in config/data/hyp-scratch-xx.yaml, or change --batch_size to finetune the model.

yolov5's People

Contributors

Stargazers

Forkers

yanglinzhuo yukminglaw

yolov5's Issues

跑脚本run_standalone_train_ascend.sh出现错误

[INFO] 2023-04-18 08:41:23.650 [src/general.py:39] Use third party coco eval api to speed up mAP calculation.
[INFO] 2023-04-18 08:41:26.253 [src/metrics.py:151] Use fast cpu nms.
[INFO] OPT: Namespace(accumulate=False, artifact_alias='latest', augment=False, batch_size=32, bbox_interval=-1, bucket='', cache_images=False, cfg='/home/ma-user/work/yolov5/config/network/yolov5s.yaml', clip_grad=False, conf_thres=0.001, data='/home/ma-user/work/yolov5/config/data/coco.yaml', data_dir='/cache/data/', data_url='', device_target='Ascend', ema=True, ema_weight='', enable_modelarts=False, entity=None, epochs=300, eval_epoch_interval=10, eval_start_epoch=200, evolve=False, exist_ok=False, freeze=[0], hyp='/home/ma-user/work/yolov5/config/data/hyp.scratch-low.yaml', image_weights=False, img_size=[640, 640], iou_thres=0.65, is_distributed=False, label_smoothing=0.0, linear_lr=True, max_ckpt_num=40, ms_amp_level='O0', ms_grad_sens=1024, ms_loss_scaler='none', ms_loss_scaler_value=1.0, ms_mode='graph', ms_optim_loss_scale=1.0, ms_strategy='StaticShape', multi_scale=False, name='exp', noautoanchor=False, nosave=False, notest=False, optimizer='momentum', overflow_still_update=False, plots=True, profiler=False, project='runs/train', quad=False, rank=0, rank_size=1, recommend_threshold=False, recompute=False, recompute_layers=0, rect=False, result_view=False, resume=False, run_eval=True, save_checkpoint=True, save_conf=False, save_dir='runs/train/exp', save_hybrid=False, save_interval=5, save_json=True, save_period=-1, save_txt=False, single_cls=False, start_save_epoch=100, summary=False, summary_dir='summary', summary_interval=1, sync_bn=False, task='val', total_batch_size=32, trace=False, train_url='', transfer_format=True, upload_dataset=False, v5_metric=False, verbose=False, weights='')

             from  n    params  module                                  arguments

0 -1 1 1 -1 1 2 -1 1 3 -1 1 4 5 6 7 8 9 10 11 -1 1 12 [-1, 6] 1 13 14 15 -1 1 16 [-1, 4] 1 17 18 19 [-1, 14] 1 20 21 22 [-1, 10] 1 23 24 [17, 20, 23] 1 albumentations: [INFO] albumentations load success
train: Scanning train: WARNING train: WARNING train: WARNING train: New cache [INFO] Num parallel workers: [12]
[INFO] Batch size: 32 3584 <class 'src.network.common.Conv'> [3, 32, 6, 2, 2]
18688 <class 'src.network.common.Conv'> [32, 64, 3, 2]
19200 <class 'src.network.common.C3'> [64, 64, 1]
74240 <class 'src.network.common.Conv'> [64, 128, 3, 2]
-1 1 116736 <class 'src.network.common.C3'> [128, 128, 2]
-1 1 295936 <class 'src.network.common.Conv'> [128, 256, 3, 2]
-1 1 627712 <class 'src.network.common.C3'> [256, 256, 3]
-1 1 1181696 <class 'src.network.common.Conv'> [256, 512, 3, 2]
-1 1 1185792 <class 'src.network.common.C3'> [512, 512, 1]
-1 1 658432 <class 'src.network.common.SPPF'> [512, 512]
-1 1 132096 <class 'src.network.common.Conv'> [512, 256, 1, 1]
0 <class 'src.network.common.ResizeNearestNeighbor'>[2]
0 <class 'src.network.common.Concat'> [1]
-1 1 363520 <class 'src.network.common.C3'> [512, 256, 1, False]
-1 1 33280 <class 'src.network.common.Conv'> [256, 128, 1, 1]
0 <class 'src.network.common.ResizeNearestNeighbor'>[2]
0 <class 'src.network.common.Concat'> [1]
-1 1 91648 <class 'src.network.common.C3'> [256, 128, 1, False]
-1 1 147968 <class 'src.network.common.Conv'> [128, 128, 3, 2]
0 <class 'src.network.common.Concat'> [1]
-1 1 297984 <class 'src.network.common.C3'> [256, 256, 1, False]
-1 1 590848 <class 'src.network.common.Conv'> [256, 256, 3, 2]
0 <class 'src.network.common.Concat'> [1]
-1 1 1185792 <class 'src.network.common.C3'> [512, 512, 1, False]
229281 <class 'src.network.common.Detect'> [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
/home/ma-user/work/coco/train2017... 117266 images, 1021 backgrounds, 0 corrupt: 100%|██████████| 118287/118287 [02:19<00train: WARNING ⚠️ /home/ma-user/work/coco/images/train2017/000000099844.jpg: 2 duplicate labels removed
⚠️ /home/ma-user/work/coco/images/train2017/000000201706.jpg: 1 duplicate labels removed
⚠️ /home/ma-user/work/coco/images/train2017/000000214087.jpg: 1 duplicate labels removed
⚠️ /home/ma-user/work/coco/images/train2017/000000522365.jpg: 1 duplicate labels removed
created: /home/ma-user/work/coco/train2017.cache

val: Scanning /home/ma-user/work/coco/val2017... 4952 images, 48 backgrounds, 0 corrupt: 100%|██████████| 5000/5000 [00:05<00:00, 940.67ival: New cache created: /home/ma-user/work/coco/val2017.cache
[INFO] Num parallel workers: [8]
[INFO] Batch size: 32
Scaled weight_decay = 0.0005
optimizer loss scale is 1.0
[INFO] rank_size: 1
[INFO] Enable loss scale: False
[INFO] Enable enable_clip_grad: False

[WARNING] ME(29095:281473131866816,MainProcess):2023-04-18-08:46:44.561.935 [mindspore/dataset/engine/datasets_user_defined.py:767] GeneratorDataset's num_parallel_workers: 12 is too large which may cause a lot of memory occupation (>85%) or out of memory(OOM) during multiprocessing. Therefore, it is recommended to reduce num_parallel_workers to 11 or smaller.
[WARNING] ME(29095:281473131866816,MainProcess):2023-04-18-08:46:47.772.040 [mindspore/dataset/engine/datasets_user_defined.py:767] GeneratorDataset's num_parallel_workers: 12 is too large which may cause a lot of memory occupation (>85%) or out of memory(OOM) during multiprocessing. Therefore, it is recommended to reduce num_parallel_workers to 9 or smaller.
[WARNING] ME(29095:281473131866816,MainProcess):2023-04-18-08:46:50.811.479 [mindspore/dataset/engine/datasets_user_defined.py:767] GeneratorDataset's num_parallel_workers: 12 is too large which may cause a lot of memory occupation (>85%) or out of memory(OOM) during multiprocessing. Therefore, it is recommended to reduce num_parallel_workers to 8 or smaller.
[WARNING] ME(29095:281473131866816,MainProcess):2023-04-18-08:47:09.494.359 [mindspore/ops/primitive.py:713] The "use_copy_slice" is a constexpr function. The input arguments must be all constant value.
[WARNING] ME(29095:281473131866816,MainProcess):2023-04-18-08:47:09.530.959 [mindspore/ops/primitive.py:713] The "is_slice" is a constexpr function. The input arguments must be all constant value.
[WARNING] ME(29095:281473131866816,MainProcess):2023-04-18-08:47:09.859.532 [mindspore/ops/primitive.py:713] The "use_copy_slice" is a constexpr function. The input arguments must be all constant value.
[WARNING] MD(29095,ffff9209eac0,python):2023-04-18-08:47:52.127.788 [mindspore/ccsrc/minddata/dataset/engine/datasetops/data_queue_op.cc:93] ~DataQueueOp] preprocess_batch: 22; batch_queue: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0; push_start_time: 2023-04-18-08:47:10.836.283, 2023-04-18-08:47:11.673.801, 2023-04-18-08:47:12.446.570, 2023-04-18-08:47:13.540.377, 2023-04-18-08:47:14.615.487, 2023-04-18-08:47:23.731.584, 2023-04-18-08:47:24.072.012, 2023-04-18-08:47:29.288.332, 2023-04-18-08:47:30.134.599, 2023-04-18-08:47:47.249.569; push_end_time: 2023-04-18-08:47:10.866.000, 2023-04-18-08:47:11.703.229, 2023-04-18-08:47:12.477.852, 2023-04-18-08:47:13.570.208, 2023-04-18-08:47:14.644.110, 2023-04-18-08:47:23.788.004, 2023-04-18-08:47:24.105.155, 2023-04-18-08:47:29.320.161, 2023-04-18-08:47:30.166.423, 2023-04-18-08:47:47.280.622.

Traceback (most recent call last):
File "train.py", line 449, in
main()
File "train.py", line 440, in main
train(hyp, opt)
File "train.py", line 327, in train
loss = sink_process()
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/data_sink.py", line 133, in sink_process
out = real_sink_fun()
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 594, in staging_specialize
out = _MindsporeFunctionExecutor(func, hash_obj, input_signature, process_obj, jit_config)(*args)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 98, in wrapper
results = fn(*arg, **kwargs)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 405, in call
phase = self.compile(args_list, self.fn.name)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 379, in compile
is_compile = self._graph_executor.compile(self.fn, compile_args, phase, True)
TypeError: Can not select a valid kernel info for [ScatterNdUpdate] in AI CORE or AI CPU kernel info candidates list:
AI CORE:
(, , ) -> ()
(, , ) -> ()
(, , ) -> ()
(, , ) -> ()
(, , ) -> ()
(, , ) -> ()
(, , ) -> ()
(, , ) -> ()
(, , ) -> ()
(, , ) -> ()
AI CPU:
{}
Please check the given data type or shape:
AI CORE: : (<Tensor[Int32], (7), value=...>, <Tensor[Int64], (4, 1), value=...>, <Tensor[Int32], (4)>) -> (<Tensor[Int32], (7)>)
AI CPU: : (<Tensor[Int32], (7), value=...>, <Tensor[Int64], (4, 1), value=...>, <Tensor[Int32], (4)>) -> (<Tensor[Int32], (7)>)
For more details, please refer to 'Kernel Select Failed' at https://www.mindspore.cn
The function call stack:
Corresponding code candidate:

In file /home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/ops/composite/multitype_ops/_compile_utils.py:961/ result = F.tensor_scatter_update(data, indices, value.astype(F.dtype(data)))/
In file /home/ma-user/work/yolov5/scripts/train_exp_standalone1/src/network/loss.py:412/ gain[2:6] = get_tensor(shape, targets.dtype)[[3, 2, 3, 2]] # xyxy gain/
In file /home/ma-user/work/yolov5/scripts/train_exp_standalone1/src/network/loss.py:335/ tcls, tbox, indices, anchors, tmasks = self.build_targets(p,/
In file train.py:108/ loss, loss_items = self.compute_loss(pred, label)/
In file /home/ma-user/work/yolov5/scripts/train_exp_standalone1/src/boost.py:128/ loss = self.network(*inputs)/
In file /home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/data_sink.py:120/ out = fn(*data)/

C++ Call Stack: (For framework developers)

mindspore/ccsrc/plugin/device/ascend/hal/hardware/ascend_graph_optimization.cc:379 SetOperatorInfo

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.