The yolov5's discuss from qingkwl

跑脚本run_standalone_train_ascend.sh出现错误

[INFO] 2023-04-18 08:41:23.650 [src/general.py:39] Use third party coco eval api to speed up mAP calculation.
[INFO] 2023-04-18 08:41:26.253 [src/metrics.py:151] Use fast cpu nms.
[INFO] OPT: Namespace(accumulate=False, artifact_alias='latest', augment=False, batch_size=32, bbox_interval=-1, bucket='', cache_images=False, cfg='/home/ma-user/work/yolov5/config/network/yolov5s.yaml', clip_grad=False, conf_thres=0.001, data='/home/ma-user/work/yolov5/config/data/coco.yaml', data_dir='/cache/data/', data_url='', device_target='Ascend', ema=True, ema_weight='', enable_modelarts=False, entity=None, epochs=300, eval_epoch_interval=10, eval_start_epoch=200, evolve=False, exist_ok=False, freeze=[0], hyp='/home/ma-user/work/yolov5/config/data/hyp.scratch-low.yaml', image_weights=False, img_size=[640, 640], iou_thres=0.65, is_distributed=False, label_smoothing=0.0, linear_lr=True, max_ckpt_num=40, ms_amp_level='O0', ms_grad_sens=1024, ms_loss_scaler='none', ms_loss_scaler_value=1.0, ms_mode='graph', ms_optim_loss_scale=1.0, ms_strategy='StaticShape', multi_scale=False, name='exp', noautoanchor=False, nosave=False, notest=False, optimizer='momentum', overflow_still_update=False, plots=True, profiler=False, project='runs/train', quad=False, rank=0, rank_size=1, recommend_threshold=False, recompute=False, recompute_layers=0, rect=False, result_view=False, resume=False, run_eval=True, save_checkpoint=True, save_conf=False, save_dir='runs/train/exp', save_hybrid=False, save_interval=5, save_json=True, save_period=-1, save_txt=False, single_cls=False, start_save_epoch=100, summary=False, summary_dir='summary', summary_interval=1, sync_bn=False, task='val', total_batch_size=32, trace=False, train_url='', transfer_format=True, upload_dataset=False, v5_metric=False, verbose=False, weights='')

             from  n    params  module                                  arguments

0 -1 1 1 -1 1 2 -1 1 3 -1 1 4 5 6 7 8 9 10 11 -1 1 12 [-1, 6] 1 13 14 15 -1 1 16 [-1, 4] 1 17 18 19 [-1, 14] 1 20 21 22 [-1, 10] 1 23 24 [17, 20, 23] 1 albumentations: [INFO] albumentations load success
train: Scanning train: WARNING train: WARNING train: WARNING train: New cache [INFO] Num parallel workers: [12]
[INFO] Batch size: 32 3584 <class 'src.network.common.Conv'> [3, 32, 6, 2, 2]
18688 <class 'src.network.common.Conv'> [32, 64, 3, 2]
19200 <class 'src.network.common.C3'> [64, 64, 1]
74240 <class 'src.network.common.Conv'> [64, 128, 3, 2]
-1 1 116736 <class 'src.network.common.C3'> [128, 128, 2]
-1 1 295936 <class 'src.network.common.Conv'> [128, 256, 3, 2]
-1 1 627712 <class 'src.network.common.C3'> [256, 256, 3]
-1 1 1181696 <class 'src.network.common.Conv'> [256, 512, 3, 2]
-1 1 1185792 <class 'src.network.common.C3'> [512, 512, 1]
-1 1 658432 <class 'src.network.common.SPPF'> [512, 512]
-1 1 132096 <class 'src.network.common.Conv'> [512, 256, 1, 1]
0 <class 'src.network.common.ResizeNearestNeighbor'>[2]
0 <class 'src.network.common.Concat'> [1]
-1 1 363520 <class 'src.network.common.C3'> [512, 256, 1, False]
-1 1 33280 <class 'src.network.common.Conv'> [256, 128, 1, 1]
0 <class 'src.network.common.ResizeNearestNeighbor'>[2]
0 <class 'src.network.common.Concat'> [1]
-1 1 91648 <class 'src.network.common.C3'> [256, 128, 1, False]
-1 1 147968 <class 'src.network.common.Conv'> [128, 128, 3, 2]
0 <class 'src.network.common.Concat'> [1]
-1 1 297984 <class 'src.network.common.C3'> [256, 256, 1, False]
-1 1 590848 <class 'src.network.common.Conv'> [256, 256, 3, 2]
0 <class 'src.network.common.Concat'> [1]
-1 1 1185792 <class 'src.network.common.C3'> [512, 512, 1, False]
229281 <class 'src.network.common.Detect'> [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
/home/ma-user/work/coco/train2017... 117266 images, 1021 backgrounds, 0 corrupt: 100%|██████████| 118287/118287 [02:19<00train: WARNING ⚠️ /home/ma-user/work/coco/images/train2017/000000099844.jpg: 2 duplicate labels removed
⚠️ /home/ma-user/work/coco/images/train2017/000000201706.jpg: 1 duplicate labels removed
⚠️ /home/ma-user/work/coco/images/train2017/000000214087.jpg: 1 duplicate labels removed
⚠️ /home/ma-user/work/coco/images/train2017/000000522365.jpg: 1 duplicate labels removed
created: /home/ma-user/work/coco/train2017.cache

val: Scanning /home/ma-user/work/coco/val2017... 4952 images, 48 backgrounds, 0 corrupt: 100%|██████████| 5000/5000 [00:05<00:00, 940.67ival: New cache created: /home/ma-user/work/coco/val2017.cache
[INFO] Num parallel workers: [8]
[INFO] Batch size: 32
Scaled weight_decay = 0.0005
optimizer loss scale is 1.0
[INFO] rank_size: 1
[INFO] Enable loss scale: False
[INFO] Enable enable_clip_grad: False

[WARNING] ME(29095:281473131866816,MainProcess):2023-04-18-08:46:44.561.935 [mindspore/dataset/engine/datasets_user_defined.py:767] GeneratorDataset's num_parallel_workers: 12 is too large which may cause a lot of memory occupation (>85%) or out of memory(OOM) during multiprocessing. Therefore, it is recommended to reduce num_parallel_workers to 11 or smaller.
[WARNING] ME(29095:281473131866816,MainProcess):2023-04-18-08:46:47.772.040 [mindspore/dataset/engine/datasets_user_defined.py:767] GeneratorDataset's num_parallel_workers: 12 is too large which may cause a lot of memory occupation (>85%) or out of memory(OOM) during multiprocessing. Therefore, it is recommended to reduce num_parallel_workers to 9 or smaller.
[WARNING] ME(29095:281473131866816,MainProcess):2023-04-18-08:46:50.811.479 [mindspore/dataset/engine/datasets_user_defined.py:767] GeneratorDataset's num_parallel_workers: 12 is too large which may cause a lot of memory occupation (>85%) or out of memory(OOM) during multiprocessing. Therefore, it is recommended to reduce num_parallel_workers to 8 or smaller.
[WARNING] ME(29095:281473131866816,MainProcess):2023-04-18-08:47:09.494.359 [mindspore/ops/primitive.py:713] The "use_copy_slice" is a constexpr function. The input arguments must be all constant value.
[WARNING] ME(29095:281473131866816,MainProcess):2023-04-18-08:47:09.530.959 [mindspore/ops/primitive.py:713] The "is_slice" is a constexpr function. The input arguments must be all constant value.
[WARNING] ME(29095:281473131866816,MainProcess):2023-04-18-08:47:09.859.532 [mindspore/ops/primitive.py:713] The "use_copy_slice" is a constexpr function. The input arguments must be all constant value.
[WARNING] MD(29095,ffff9209eac0,python):2023-04-18-08:47:52.127.788 [mindspore/ccsrc/minddata/dataset/engine/datasetops/data_queue_op.cc:93] ~DataQueueOp] preprocess_batch: 22; batch_queue: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0; push_start_time: 2023-04-18-08:47:10.836.283, 2023-04-18-08:47:11.673.801, 2023-04-18-08:47:12.446.570, 2023-04-18-08:47:13.540.377, 2023-04-18-08:47:14.615.487, 2023-04-18-08:47:23.731.584, 2023-04-18-08:47:24.072.012, 2023-04-18-08:47:29.288.332, 2023-04-18-08:47:30.134.599, 2023-04-18-08:47:47.249.569; push_end_time: 2023-04-18-08:47:10.866.000, 2023-04-18-08:47:11.703.229, 2023-04-18-08:47:12.477.852, 2023-04-18-08:47:13.570.208, 2023-04-18-08:47:14.644.110, 2023-04-18-08:47:23.788.004, 2023-04-18-08:47:24.105.155, 2023-04-18-08:47:29.320.161, 2023-04-18-08:47:30.166.423, 2023-04-18-08:47:47.280.622.

Traceback (most recent call last):
File "train.py", line 449, in
main()
File "train.py", line 440, in main
train(hyp, opt)
File "train.py", line 327, in train
loss = sink_process()
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/data_sink.py", line 133, in sink_process
out = real_sink_fun()
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 594, in staging_specialize
out = _MindsporeFunctionExecutor(func, hash_obj, input_signature, process_obj, jit_config)(*args)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 98, in wrapper
results = fn(*arg, **kwargs)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 405, in call
phase = self.compile(args_list, self.fn.name)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/common/api.py", line 379, in compile
is_compile = self._graph_executor.compile(self.fn, compile_args, phase, True)
TypeError: Can not select a valid kernel info for [ScatterNdUpdate] in AI CORE or AI CPU kernel info candidates list:
AI CORE:
(, , ) -> ()
(, , ) -> ()
(, , ) -> ()
(, , ) -> ()
(, , ) -> ()
(, , ) -> ()
(, , ) -> ()
(, , ) -> ()
(, , ) -> ()
(, , ) -> ()
AI CPU:
{}
Please check the given data type or shape:
AI CORE: : (<Tensor[Int32], (7), value=...>, <Tensor[Int64], (4, 1), value=...>, <Tensor[Int32], (4)>) -> (<Tensor[Int32], (7)>)
AI CPU: : (<Tensor[Int32], (7), value=...>, <Tensor[Int64], (4, 1), value=...>, <Tensor[Int32], (4)>) -> (<Tensor[Int32], (7)>)
For more details, please refer to 'Kernel Select Failed' at https://www.mindspore.cn
The function call stack:
Corresponding code candidate:

In file /home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/ops/composite/multitype_ops/_compile_utils.py:961/ result = F.tensor_scatter_update(data, indices, value.astype(F.dtype(data)))/
In file /home/ma-user/work/yolov5/scripts/train_exp_standalone1/src/network/loss.py:412/ gain[2:6] = get_tensor(shape, targets.dtype)[[3, 2, 3, 2]] # xyxy gain/
In file /home/ma-user/work/yolov5/scripts/train_exp_standalone1/src/network/loss.py:335/ tcls, tbox, indices, anchors, tmasks = self.build_targets(p,/
In file train.py:108/ loss, loss_items = self.compute_loss(pred, label)/
In file /home/ma-user/work/yolov5/scripts/train_exp_standalone1/src/boost.py:128/ loss = self.network(*inputs)/
In file /home/ma-user/anaconda3/envs/MindSpore/lib/python3.7/site-packages/mindspore/train/data_sink.py:120/ out = fn(*data)/

C++ Call Stack: (For framework developers)

mindspore/ccsrc/plugin/device/ascend/hal/hardware/ascend_graph_optimization.cc:379 SetOperatorInfo

qingkwl / yolov5 Goto Github PK

yolov5's Issues

跑脚本run_standalone_train_ascend.sh出现错误

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent