donnyyou / torchcv Goto Github PK

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

License: Apache License 2.0

Makefile 0.01% Python 26.99% Shell 53.05% Lua 0.54% MATLAB 1.51% C++ 1.53% C 0.37% Jupyter Notebook 13.99% Cuda 2.02%

torchcv's Introduction

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

@misc{you2019torchcv,
    author = {Ansheng You and Xiangtai Li and Zhen Zhu and Yunhai Tong},
    title = {TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision},
    howpublished = {\url{https://github.com/donnyyou/torchcv}},
    year = {2019}
}

This repository provides source code for most deep learning based cv problems. We'll do our best to keep this repository up-to-date. If you do find a problem about this repository, please raise an issue or submit a pull request.

- Semantic Flow for Fast and Accurate Scene Parsing
- Code and models: https://github.com/lxtGH/SFSegNets

Implemented Papers

Image Classification
- VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition
- ResNet: Deep Residual Learning for Image Recognition
- DenseNet: Densely Connected Convolutional Networks
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- ShuffleNet V2: Practical Guidelines for Ecient CNN Architecture Design
- Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
Semantic Segmentation
- DeepLabV3: Rethinking Atrous Convolution for Semantic Image Segmentation
- PSPNet: Pyramid Scene Parsing Network
- DenseASPP: DenseASPP for Semantic Segmentation in Street Scenes
- Asymmetric Non-local Neural Networks for Semantic Segmentation
- Semantic Flow for Fast and Accurate Scene Parsing
Object Detection
- SSD: Single Shot MultiBox Detector
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
- YOLOv3: An Incremental Improvement
- FPN: Feature Pyramid Networks for Object Detection
Pose Estimation
- CPM: Convolutional Pose Machines
- OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
Instance Segmentation
- Mask R-CNN
Generative Adversarial Networks
- Pix2pix: Image-to-Image Translation with Conditional Adversarial Nets
- CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.

QuickStart with TorchCV

Now only support Python3.x, pytorch 1.3.

pip3 install -r requirements.txt
cd lib/exts
sh make.sh

Performances with TorchCV

All the performances showed below fully reimplemented the papers' results.

Image Classification

ImageNet (Center Crop Test): 224x224

Model	Train	Test	Top-1	Top-5	BS	Iters	Scripts
ResNet50	train	val	77.54	93.59	512	30W	ResNet50
ResNet101	train	val	78.94	94.56	512	30W	ResNet101
ShuffleNetV2x0.5	train	val	60.90	82.54	1024	40W	ShuffleNetV2x0.5
ShuffleNetV2x1.0	train	val	69.71	88.91	1024	40W	ShuffleNetV2x1.0
DFNetV1	train	val	70.99	89.68	1024	40W	DFNetV1
DFNetV2	train	val	74.22	91.61	1024	40W	DFNetV2

Semantic Segmentation

Cityscapes (Single Scale Whole Image Test): Base LR 0.01, Crop Size 769

Model	Backbone	Train	Test	mIOU	BS	Iters	Scripts
PSPNet	3x3-Res101	train	val	78.20	8	4W	PSPNet
DeepLabV3	3x3-Res101	train	val	79.13	8	4W	DeepLabV3

ADE20K (Single Scale Whole Image Test): Base LR 0.02, Crop Size 520

Model	Backbone	Train	Test	mIOU	PixelACC	BS	Iters	Scripts
PSPNet	3x3-Res50	train	val	41.52	80.09	16	15W	PSPNet
DeepLabv3	3x3-Res50	train	val	42.16	80.36	16	15W	DeepLabV3
PSPNet	3x3-Res101	train	val	43.60	81.30	16	15W	PSPNet
DeepLabv3	3x3-Res101	train	val	44.13	81.42	16	15W	DeepLabV3

Object Detection

Pascal VOC2007/2012 (Single Scale Test): 20 Classes

Model	Backbone	Train	Test	mAP	BS	Epochs	Scripts
SSD300	VGG16	07+12_trainval	07_test	0.786	32	235	SSD300
SSD512	VGG16	07+12_trainval	07_test	0.808	32	235	SSD512
Faster R-CNN	VGG16	07_trainval	07_test	0.706	1	15	Faster R-CNN

Pose Estimation

OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Instance Segmentation

Mask R-CNN

Generative Adversarial Networks

Pix2pix
CycleGAN

DataSets with TorchCV

TorchCV has defined the dataset format of all the tasks which you could check in the subdirs of data. Following is an example dataset directory trees for training semantic segmentation. You could preprocess the open datasets with the scripts in folder data/seg/preprocess

Dataset
    train
        image
            00001.jpg/png
            00002.jpg/png
            ...
        label
            00001.png
            00002.png
            ...
    val
        image
            00001.jpg/png
            00002.jpg/png
            ...
        label
            00001.png
            00002.png
            ...

Commands with TorchCV

Take PSPNet as an example. ("tag" could be any string, include an empty one.)

Training

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag

Resume Training

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag

Validate

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh val tag

Testing:

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh test tag

Demos with TorchCV

Example output of VGG19-OpenPose

torchcv's People

Contributors

Stargazers

Watchers

Forkers

loliod rainfalj fangj99 daibin88 pgaholic pluto16 wanjinchang lc82111 allensmile fendaq happog gatarelib dreadlord1984 chl916185 guolong-zhang clscy hajungong007 hzzhujf mousechen denis-xiao iguazi liuyao0530 jason08 lipengfu tatsuyashirakawa machinelearning147 eglrp baifanysu logicholmes marvin521 zyr0603 acewjh hucongkun shuyufranky greenteahua xiaojinu tomjerrygithub gonghuiyun energybogey syzlhh youdonny ganji15 lxj0276 littlebay diceryas boomluo dangchenyu parety yanhuacheng for-aiur dsp6414 abbottmo babyformula firedfree huizhang0110 qazcy1983 yobcmst noobgrow wanghan0501 wulingtian pkang2017 solomon1588 alzayats swg209 luxiuqing0808 pipi2020 jangocheng zxt881108 davecoding 853498747 pandinosaurus stjordanis raviv faizwhb kkangshen vikasmech trendingtechnology blakecheng kongbia aihekukafeidexiaoafei qichenghan666 leo-xxx jexion capino512 pythonwuyu ipersevere growold qwerbbbb fengfengfeng96 maqiuping59 baby47 wangjianyuweg whz1861 onejune2018 aiyangyang963 huangsiyuzhoujie 666dzy666 morgan-bc mostajabi gjm441

torchcv's Issues

cudaGetLastError() == cudaSuccess ASSERT FAILED

2019-06-12 06:42:21,014 INFO [module_helper.py, 138] Loading pretrained model:/tmp/cars_segmentation/torchcv/pretrained_models/3x3resnet101-imagenet.pth
2019-06-12 06:42:28,858 INFO [controller.py, 28] Training start...
Traceback (most recent call last):
File "main.py", line 199, in
Controller.train(runner)
File "/tmp/cars_segmentation/torchcv/methods/tools/controller.py", line 40, in train
runner.train()
File "/tmp/cars_segmentation/torchcv/methods/seg/fcn_segmentor.py", line 85, in train
out_dict = self.seg_net(data_dict)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
raise output
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
output = module(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/tmp/cars_segmentation/torchcv/models/seg/nets/pspnet.py", line 84, in forward
x = self.backbone(data_dict['img'])
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/tmp/cars_segmentation/torchcv/models/backbones/resnet/resnet_backbone.py", line 94, in forward
x = self.prefix(x)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/syncbn.py", line 44, in forward
xsum, xsqsum = sum_square(input)
File "/tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/functions.py", line 19, in sum_square
return _sum_square.apply(input)
File "/tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/functions.py", line 27, in forward
xsum, xsqusum = gpu.sumsquare_forward(input)

RuntimeError: cudaGetLastError() == cudaSuccess ASSERT FAILED at syncbn_kernel.cu:263, please report a bug to PyTorch. (Sum_Square_Forward_CUDA at syncbn_kernel.cu:263)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f6cd6bd1441 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f6cd6bd0d7a in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: Sum_Square_Forward_CUDA(at::Tensor) + 0x281 (0x7f6cbdb615e2 in /tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/src/gpu/syncbn_gpu.cpython-36m-x86_64-linux-gnu.so)
frame #3: + 0x1fc29 (0x7f6cbdb57c29 in /tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/src/gpu/syncbn_gpu.cpython-36m-x86_64-linux-gnu.so)
frame #4: + 0x24095 (0x7f6cbdb5c095 in /tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/src/gpu/syncbn_gpu.cpython-36m-x86_64-linux-gnu.so)

when I train, I get this error. Can you help me solve it, thanks

TypeError with PyTorch 1.2

Hi,

When running Faster R-CNN, I am getting the following error:

2019-08-27 23:24:51,231 INFO    [controller.py, 28] Training start...
Traceback (most recent call last):
  File "main.py", line 181, in <module>
    Controller.train(runner)
  File "/ssd_scratch/cvit/aditya/torchcv/runner/tools/controller.py", line 37, in train
    runner.train()
  File "/ssd_scratch/cvit/aditya/torchcv/runner/det/faster_rcnn.py", line 85, in train
    out_dict = self.det_net(data_dict)
  File "/home/aditya.a/Libraries/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/ssd_scratch/cvit/aditya/torchcv/model/det/nets/faster_rcnn.py", line 74, in forward
    x = self.backbone(data_dict['img'])
  File "/home/aditya.a/Libraries/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aditya.a/Libraries/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/aditya.a/Libraries/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aditya.a/Libraries/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 338, in forward
    self.padding, self.dilation, self.groups)
TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not DataContainer

Any idea how to fix this? I am on PyTorch 1.2, and Python 3.7

open pose

could you provide the performance or pretrained model with open pose on coco?

How to train models on custom dataset?

As I have a custom dataset following COCO format for object detection and segmentation, how to train models on my own dataset? It is appreciated if you could provide some tutorials.

SSD512-->SSDHead-->self.feature6

Two questions:

self.feature6 does not have Relu layers.

        self.feature6 = nn.Sequential(
            nn.Conv2d(256, 128, kernel_size=1, stride=1),
            nn.Conv2d(128, 256, kernel_size=4, stride=1, padding=1))

Is it supposed to be like this?

        self.feature6 = nn.Sequential(
            nn.Conv2d(256, 128, kernel_size=1, stride=1),
            nn.ReLU(),
            nn.Conv2d(128, 256, kernel_size=4, stride=1, padding=1))
            nn.ReLU(),

The input feature's size to this layer is torch.Size([batch_size, 256, 2, 2]), but why use kernel_size=4 instead of kernel_size=2 ?

Thank you again in advance for your great help!

怎么在window上编译使用roi_pool

使用git去编译exts文件夹下的make.sh,在align和pool文件夹下出现build文件夹。之后应该做什么操作才能正常使用import exts.ops.roi_align。现在导入import exts.ops.roi_align会出现

1 from torch.autograd import Function
2
----> 3 from .. import roi_align_cuda
4
5

ImportError: cannot import name 'roi_align_cuda'

No module named 'extensions.layers.nms.src.cython_nms'

Hi,

Great job! It is a great framework to learn pytorch and latest networks.

But when I am training the openpose network, it shows No module named 'extensions.layers.nms.src.cython_nms'. And I check with the files, finding the file is missing.

Is there anything I am missing?

[ERROR] DeepLabV3+DenseNet Backbone

Hi,

I have a question when I try to train DeepLabV3 with DenseNet backbone.

the loss term of DeepLabV3.

I wonder why you use this auxiliary loss term. As far as I know, original DeepLabV3 was not trained by the auxiliary cross-entropy loss.

As your code about deeplabv3 was hard-coded, it should be fixed for other backbone (i.g., DenseNet).

Thanks

How to train the icnet model in the folder（demos/personseg/models）

Excuese me ,the code 'person_segmentor.py' in demos folder needs pretrained model in the following code :
'fabby_model': './model/ICNet32Photo640x480_iter_310000.pth',
'trimap_model': './model/trimap_icnet32SimpleV2_iter_120000.pth',
'mat_model': './model/trimap_icnet32SimpleV2_matting_iter_120000.pth'

How can i get the pretrained model or how to train the model ?
thankyou

Segmentation aux-out loss

Could you inform me of the theoretical rationale (e.g. related paper) for aux-out loss? The deeplabv3 network in your code outputs additional prediction w/o ASPP.

yolov3在coco2017数据集上跑出错

作者你好，看资料应该是北京大学的，中文肯定是好的，我就用中文了。
我想用yolov3来跑coco2017的数据，看项目的结构需要写一个scripts文件，我也写了scripts文件。但是在加了之后开始训练的时候出现了错误。
scripts文件
#!/usr/bin/env bash

nvidia-smi
PYTHON="python"

export PYTHONPATH="/home/dezheng/work/torchcv":$PYTHONPATH

cd ../../../

DATA_DIR="/home/dezheng/work/torchcv/data/torchcv_coco"
MODEL_NAME="yolov3"
LOSS_TYPE="yolov3loss"
CHECKPOINTS_NAME="yolov3_darknet_coco_det"$2
PRETRAINED_MODEL="./pretrained_models/yolov3_darknet_caffe_pretrained.pth"
HYPES_FILE='hypes/det/coco/yolov3_darknet_coco_det.json'

LOG_DIR="./log/det/coco/"
LOG_FILE="${LOG_DIR}${CHECKPOINTS_NAME}.log"

if [[ ! -d ${LOG_DIR} ]]; then
echo ${LOG_DIR}" not exists!!!"
mkdir -p ${LOG_DIR}
fi

if [[ "$1"x == "train"x ]]; then
${PYTHON} -u main.py --hypes ${HYPES_FILE} --phase train --log_to_file n --gpu 0 --cudnn n
--data_dir ${DATA_DIR} --loss_type ${LOSS_TYPE} --model_name ${MODEL_NAME}
--checkpoints_name ${CHECKPOINTS_NAME} --pretrained ${PRETRAINED_MODEL} 2>&1 | tee ${LOG_FILE}

最后报错如下：
Traceback (most recent call last):
File "main.py", line 186, in
runner = method_selector.select_det_method()
File "/home/dezheng/work/torchcv/methods/method_selector.py", line 93, in select_det_method
return DET_METHOD_DICTkey
File "/home/dezheng/work/torchcv/methods/det/yolov3.py", line 35, in init
self.det_data_loader = DataLoader(configer)
File "/home/dezheng/work/torchcv/datasets/det/data_loader.py", line 23, in init
self.aug_train_transform = pil_aug_trans.PILAugCompose(self.configer, split='train')
File "/home/dezheng/work/torchcv/datasets/tools/pil_aug_transforms.py", line 1187, in init
self.transforms[trans] = PIL_AUGMENTATIONS_DICTtrans
TypeError: init() got an unexpected keyword argument 'center_jitter'
这个怎么改呢？

loss in fcn_segmentor is now a dict.

https://github.com/donnyyou/torchcv/blob/master/methods/seg/fcn_segmentor.py#L89

https://github.com/donnyyou/torchcv/blob/master/methods/seg/fcn_segmentor.py#L140

Question about "ignore_index": 19

In hypes/seg/cityscape/*_cityscape_seg.json,
"ignore_index": 19
But in datasets/seg/fs_data_loader.py
encoded_labelmap = np.ones(shape=(shape[0], shape[1]), dtype=np.float32) * 255
so labels treated as void will be equel to 255.
My question is that should ignore_index be equel to 255?
Looking forward to your reply.

pytorch1.0 make error

hello, when I used torch0.4.1, I could make successfully, but when I switched to pytorch1.0, it failed...

pytorch1.1 running error

It seems that torchCV is running with pytorch1.0. But with pytorch1.1 I got error "RuntimeError: CUDA error: an illegal memory access was encountered" during training. Could you help me with this? I don't want to use pytorch1.0....

Assert error when doing DCHelper.tolist(data_dict['meta']) in segmentor

The seg loader doesn't requite data_dict['meta'] to be return_dc=True. Causing error now.

Can't download Semantic Segmentation trained models

When I try to download trained models of DeepLabv3, there is a 404 error. Is there anything wrong to the hyperlink? Hope this problem to be corrected, THX :D

when I cd extension & sh make.sh, I got an error as follow.

In file included from /home/user/.local/lib/python3.6/site-packages/torch/lib/include/THC/THC.h:4:0,
                 from /home/user/.local/lib/python3.6/site-packages/torch/lib/include/THC/THCAtomics.cuh:4,
                 from src/roi_align_kernel.cu:2:
/home/user/.local/lib/python3.6/site-packages/torch/lib/include/THC/THCGeneral.h:17:23: fatal error: cublas_v2.h: No such file or directory
 #include "cublas_v2.h"
                       ^
compilation terminated.
error: command '/usr/local/cuda-9.0/bin/nvcc' failed with exit status 1
Building roi pool op...

I use pytorch1.0, cuda 9.0 and python3.6.

Openpose train problem

When train iteration is 68000, the loss is still nan.
I thought the problem is the miss of pre_trained backbone weights.
Now I use the weights download from the url that appear in model/model/backbones/vgg/vgg_models.py, and I noticed it's not a trained openpose backbone. But I cant find a better choice, may you provide a pre_trained backbone weights?

limit the number of boxes before nms/cls_nms

in utils/helpers/det_helper.py
the nms/cls_nms function doesn't limit the number boxes before num(cls_nms), so in the begining of training it will take too long because it will produce a large amount of boxes in the beginning.

simultaneous object detection and human pose estimation

this repo is an amazing idea! i have a question about its usage - would it be possible to do simultaneous object detection and human pose estimation?

i'd like to test on an image and use SSD for object detection and OpenPose for human pose estimation.

if its possible, i'd greatly appreciate the steps/code to accomplish this.

thanks,

Inference for pose estimation model

Could you share how it is possible to get inference on CPU - only machine with pretrained pose estimation model?

Got loss nan after several epoches while training yolov3

Hi, I have tested fix_size and multi_size mode. The same problem has arisen. The loss was decreasing before this problem suddenly occurred. Do you have this problem when training yolov3 or do you know what caused this problem?

Sorry for the inconvenience, and many thanks.

2018-11-09 10:32:38,109 INFO    [yolov3.py, 97] LR: [0.0005412580317889754, 0.005412580317889754]
2018-11-09 10:34:09,663 INFO    [yolov3.py, 153] Train Epoch: 2	Train Iteration: 3300	Time 91.554s / 100iters, (0.916)	Data load 0.018s / 100iters, (0.000178)
Learning rate = [0.001, 0.01]	Loss = 213.22541809 (ave = 149.97146935)

2018-11-09 10:34:09,663 INFO    [yolov3.py, 97] LR: [0.0005581670612106865, 0.005581670612106865]
2018-11-09 10:35:41,826 INFO    [yolov3.py, 153] Train Epoch: 2	Train Iteration: 3400	Time 92.163s / 100iters, (0.922)	Data load 0.018s / 100iters, (0.000181)
Learning rate = [0.001, 0.01]	Loss = 149.52812195 (ave = 151.01665321)

2018-11-09 10:35:41,827 INFO    [yolov3.py, 97] LR: [0.0005750760906323977, 0.005750760906323977]
2018-11-09 10:37:13,627 INFO    [yolov3.py, 153] Train Epoch: 2	Train Iteration: 3500	Time 91.801s / 100iters, (0.918)	Data load 0.018s / 100iters, (0.000183)
Learning rate = [0.001, 0.01]	Loss = 178.01943970 (ave = 152.36151611)

2018-11-09 10:37:13,628 INFO    [yolov3.py, 97] LR: [0.0005919851200541089, 0.005919851200541088]
2018-11-09 10:38:44,910 INFO    [yolov3.py, 153] Train Epoch: 2	Train Iteration: 3600	Time 91.282s / 100iters, (0.913)	Data load 0.018s / 100iters, (0.000178)
Learning rate = [0.001, 0.01]	Loss = nan (ave = nan)

2018-11-09 10:38:44,910 INFO    [yolov3.py, 97] LR: [0.0006088941494758201, 0.006088941494758201]
2018-11-09 10:40:14,760 INFO    [yolov3.py, 153] Train Epoch: 2	Train Iteration: 3700	Time 89.850s / 100iters, (0.898)	Data load 0.018s / 100iters, (0.000177)
Learning rate = [0.001, 0.01]	Loss = nan (ave = nan)

ModuleNotFoundError: No module named 'torchvision'

about the cityscape_generator.py

a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

configure file of openpose?

Hi, I think there exists a lot of mismatches of the open_pose_vgg_19.conf and the modules of your code, I revised most of them but still faces a lot of undefined ones.
Most of them are key Error and some of them may come from the directory of exts/tools/.

I was wondering if there is a newly-updated configuration file of open_pose. Appreciate for your updates.

Removing RunnerHelper.to_device causing target not on device

2a88d72#diff-70d7e4ef7ba85a07a156154073047868L71

When available device is 1 or loss_balance is no, the loss will not be wrapped by dataparallelCriterion. Thus the target will not be automatically scattered to gpus.

AssertionError: <class 'torch.Tensor'>

2019-10-12 13:36:13,632 INFO [runner_helper.py, 63] Resuming from ./checkpoints/seg/cityscapes/fs_deeplabv3_cityscapes_segtag_latest.pth
2019-10-12 13:36:15,737 INFO [controller.py, 67] Testing start...
Traceback (most recent call last):
File "main.py", line 186, in
Controller.test(runner)
File "/home/spple/paddle/DeepGlint/torchcv/runner/tools/controller.py", line 79, in test
runner.test(test_dir, out_dir)
File "/home/spple/paddle/DeepGlint/torchcv/runner/seg/fcn_segmentor_test.py", line 48, in test
total_logits = self.ss_test(data_dict)
File "/home/spple/paddle/DeepGlint/torchcv/runner/seg/fcn_segmentor_test.py", line 85, in ss_test
data_dict = self.blob_helper.get_blob(in_data_dict, scale=1.0)
File "/home/spple/paddle/DeepGlint/torchcv/runner/tools/blob_helper.py", line 25, in get_blob
for image, meta in zip(DCHelper.tolist(data_dict['img']), DCHelper.tolist(data_dict['meta'])):
File "/home/spple/paddle/DeepGlint/torchcv/tools/helper/dc_helper.py", line 19, in tolist
assert isinstance(dc, DataContainer), type(dc)
AssertionError: <class 'torch.Tensor'>
2019-10-12 13:36:17,599 INFO [seg_evaluator.py, 51] Evaluate 0 images
/home/spple/paddle/DeepGlint/torchcv/metric/seg/seg_running_score.py:37: RuntimeWarning: invalid value encountered in double_scalars
acc = np.diag(hist).sum() / hist.sum()
/home/spple/paddle/DeepGlint/torchcv/metric/seg/seg_running_score.py:38: RuntimeWarning: invalid value encountered in true_divide
acc_cls = np.diag(hist) / hist.sum(axis=1)
/home/spple/paddle/DeepGlint/torchcv/metric/seg/seg_running_score.py:39: RuntimeWarning: Mean of empty slice
acc_cls = np.nanmean(acc_cls)
/home/spple/paddle/DeepGlint/torchcv/metric/seg/seg_running_score.py:40: RuntimeWarning: invalid value encountered in true_divide
iu = np.diag(hist) / (hist.sum(axis=1) + hist.sum(axis=0) - np.diag(hist))
/home/spple/paddle/DeepGlint/torchcv/metric/seg/seg_running_score.py:41: RuntimeWarning: Mean of empty slice
mean_iu = np.nanmean(iu)
/home/spple/paddle/DeepGlint/torchcv/metric/seg/seg_running_score.py:42: RuntimeWarning: invalid value encountered in true_divide
freq = hist.sum(axis=1) / hist.sum()
/home/spple/paddle/DeepGlint/torchcv/metric/seg/seg_running_score.py:43: RuntimeWarning: invalid value encountered in greater
fwavacc = (freq[freq > 0] * iu[freq > 0]).sum()
2019-10-12 13:36:17,641 INFO [seg_evaluator.py, 52] Class mIOU: {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan, 8: nan, 9: nan, 10: nan, 11: nan, 12: nan, 13: nan, 14: nan, 15: nan, 16: nan, 17: nan, 18: nan}
2019-10-12 13:36:17,642 INFO [seg_evaluator.py, 53] mIOU: nan
2019-10-12 13:36:17,643 INFO [seg_evaluator.py, 54] Pixel ACC: nan

ssd_vgg16_caffe_pretrained.pth

where is the model ?

OpenPoseTest has no method test

Error when runner.test(test_dir, out_dir) for open pose.

LICENSE

This is a great project. I found the Synchronized Batch Normalization part is from one of my repo. Please keep the orginal copyright and notice in the LICENSE file:

https://github.com/zhanghang1989/PyTorch-Encoding/blob/master/LICENSE

Please also consider crediting my repo. Thanks

ModuleNotFoundError: No module named 'extensions.nms.src.cpu_nms'

when I use this command：
python main.py --hypes hypes/pose/coco/op_coco_pose.json --phase test --resume checkpoints/pose/coco/coco_open_pose_65000.pth --test_img val/samples/ski.jpg --gpu 0

DataParallel leads to Error of Cifar10 classification?

I try to do the task of classification of Cifar10 using multiple GPUs. I run the script for this, it leads to an error during calculating the loss as follows:

However, when I set a single gpu , it works fine. How to fix it?

No module named 'datasets.tools.data_transformer'

Dear Donny:

I have already finished make.sh

when I run "python3 main.py ", Error occurs, It seems some files are missing:

Traceback (most recent call last):
File "main.py", line 15, in
from methods.method_selector import MethodSelector
File "/home/zhaolun/Projects/PyTorchCV/methods/method_selector.py", line 12, in
from methods.det.faster_rcnn import FasterRCNN
File "/home/zhaolun/Projects/PyTorchCV/methods/det/faster_rcnn.py", line 17, in
from methods.det.faster_rcnn_test import FastRCNNTest
File "/home/zhaolun/Projects/PyTorchCV/methods/det/faster_rcnn_test.py", line 17, in
from datasets.tools.data_transformer import DataTransformer
ImportError: No module named 'datasets.tools.data_transformer'

Do you have any idea why this happens?

when I run the test？It is error.

python main.py --hypes=hypes/det/voc/ssd300_vgg16_voc_det.json --phase=test --resume=checkpoints/det/voc/ssd_vgg300_voc_0.786.pth --test_img=test/ski.jpg --gpu=0 --model_name=vgg16_ssd300

ERROR:
AttributeError: 'SingleShotDetectorTest' object has no attribute 'test_img'

Clueless halt

After your refactoring, I also got this error. The error appears with no pattern; it just sometimes appear randomly. And I would say it happens quite often.

terminate called after throwing an instance of 'std::system_error'
  what():  Broken pipe

i have no clue why it is happening, hope you could have more insight.

About pool_center_lower in CPM

Hi, I am referring your implementation of Convolutional Pose Machines to reproduce original paper (https://arxiv.org/abs/1602.00134) using PyTorch.

I have a question about pool_center_lower layer. In original implementation, an output of pool_center_layer is concatenated with other layers (example). However, in your code, pool_center_layer is defined but does not seem to be used. Is there a good reason for that?

Thank you

pre-trained model about pose

could you share the pre-trained model about pose

F.interpolate

in datasets/tools/collate_functions.py line 119:
the targeted scale for F.interpolate should be (height, width) instead of (width, height), isn't it??

A bug when doing bash xxx.sh val tag

https://github.com/donnyyou/torchcv/blob/master/methods/seg/fcn_segmentor_test.py#L172

Should [-1] be removed?

res[-1] has only three dimensions.

yolov3 loss error

in utils/layers/det/yolo_detection_layer.py :
I believe line 42(if self.configer.get('phase') != 'debug':) should be removed,
because values passed to BCE loss should be in the range of (0, 1) in the training phase.

when I run the test？It is error.

python main.py --hypes=hypes/det/voc/ssd300_vgg16_voc_det.json --phase=test --resume=checkpoints/det/voc/ssd_vgg300_voc_0.786.pth --test_img=test/ski.jpg --gpu=0 --model_name=vgg16_ssd300

请问一下，在你的det_helper.py文件中，为什么用cls_nms而不是直接用nms进行非极大值抑制呢？

二者有什么区别呢？

Performance of Yolo-v3?

@youansheng Could you report the performance of Yolo-v3 of this implementation?

Question about SSDFocalLoss?

in the code :

    logit = F.softmax(x,dim=-1)
    logit = logit.clamp(1e-7, 1.-1e-7)
    conf_loss_tmp = -1 * t.float() * torch.log(logit)
    conf_loss_tmp = alpha * conf_loss_tmp * (1-logit)**gamma
    conf_loss = conf_loss_tmp.sum()

the positive and negative has the same alpha=0.25,some implement is when positive alpha =0.25,negative=1-0.25,what is the difference?

some module missed in datasets.tools

some module missed, like datasets.tools.data_transformer，BoundResize from datasets.tools.transforms and so on.

HeatmapGenerator and PafGenerator run so slowly

In methods/pose/open_pose.py, when batch size equels to 32,

heatmap = self.heatmap_generator(data_dict['kpts'], input_size, maskmap=maskmap)
vecmap = self.paf_generator(data_dict['kpts'], input_size, maskmap=maskmap)

These two lines will take about 40 seconds every iteration. It is too time consuming. The network runs only 0.03 second.

Are there any details to reduce runing time I ignored ?

KeyError when loading ssd512 pretrained model

Hi, donnyyou.
After downloading the pretrained .pth ssd512 model, I run
bash run_ssd512_vgg16_voc_det.sh train tag
Then I got:

2019-05-23 10:23:17,616 INFO [vgg512_ssd.py, 60] Loading pretrained model:./pretrained_models/ssd_vgg512_voc_0.808.pth
2019-05-23 10:23:20,996 INFO [vgg512_ssd.py, 63] Pretrained Keys: dict_keys(['config_dict', 'state_dict'])
2019-05-23 10:23:20,997 INFO [vgg512_ssd.py, 65] Model Keys: odict_keys(['features.0.weight', 'features.0.bias', 'features.2.weight', 'features.2.bias', 'features.5.weight', 'features.5.bias', 'features.7.weight', 'features.7.bias', 'features.10.weight', 'features.10.bias', 'features.12.weight', 'features.12.bias', 'features.14.weight', 'features.14.bias', 'features.17.weight', 'features.17.bias', 'features.19.weight', 'features.19.bias', 'features.21.weight', 'features.21.bias', 'features.24.weight', 'features.24.bias', 'features.26.weight', 'features.26.bias', 'features.28.weight', 'features.28.bias', 'features.31.weight', 'features.31.bias', 'features.33.weight', 'features.33.bias'])
2019-05-23 10:23:20,997 INFO [vgg512_ssd.py, 75] Matched Keys: dict_keys([])
2019-05-23 10:23:21,067 ERROR [configer.py, 69] ssd_detection_layer.py, 16 KeyError: ('gt', 'num_anchor_list').

Since your ssd script is 404 not found, I modify the faster rcnn script for ssd as below:

#!/usr/bin/env bash

#check the enviroment info
nvidia-smi
PYTHON="python"

export PYTHONPATH="/home/ruijin/Work/python/torchcv-master":$PYTHONPATH

cd ../../../

DATA_DIR="/home/donny/DataSet/VOC07_DET"
MODEL_NAME="vgg512_ssd"
LOSS_TYPE="ssd_multibox_loss"
CHECKPOINTS_NAME="ssd_vgg16_voc_det"$2
PRETRAINED_MODEL="./pretrained_models/ssd_vgg512_voc_0.808.pth"
HYPES_FILE='hypes/det/voc/ssd512_vgg16_voc_det.json'

LOG_DIR="./log/det/voc/"
LOG_FILE="${LOG_DIR}${CHECKPOINTS_NAME}.log"

if [[ ! -d ${LOG_DIR} ]]; then
echo ${LOG_DIR}" not exists!!!"
mkdir -p ${LOG_DIR}
fi

elif [[ "$1"x == "resume"x ]]; then
${PYTHON} -u main.py --hypes ${HYPES_FILE} --phase train --log_to_file n --gpu 0 --cudnn n
--data_dir ${DATA_DIR} --loss_type ${LOSS_TYPE} --model_name ${MODEL_NAME}
--resume_continue y --resume ./checkpoints/cityscapes/${CHECKPOINTS_NAME}_latest.pth
--checkpoints_name ${CHECKPOINTS_NAME} --pretrained ${PRETRAINED_MODEL} 2>&1 | tee -a ${LOG_FILE}

elif [[ "$1"x == "debug"x ]]; then
${PYTHON} -u main.py --hypes ${HYPES_FILE} --phase debug --gpu 0 --log_to_file n 2>&1 | tee ${LOG_FILE}

elif [[ "$1"x == "val"x ]]; then
${PYTHON} -u main.py --hypes ${HYPES_FILE} --phase test --log_to_file n --model_name ${MODEL_NAME}
--phase test --gpu 0 --resume ./checkpoints/cityscapes/${CHECKPOINTS_NAME}_latest.pth
--test_dir ${DATA_DIR}/val/image --out_dir val 2>&1 | tee -a ${LOG_FILE}
cd metrics/det/
${PYTHON} -u voc_evaluator.py --hypes "../../../"${HYPES_FILE}
--json_dir ../../../out/results/voc/test_dir/${CHECKPOINTS_NAME}/val/label
--gt_dir ${DATA_DIR}/val/label 2>&1 | tee -a "../../"${LOG_FILE}

else
echo "$1"x" is invalid..."
fi

Hope you can help me. Thanks!

About DenseASPP

Hi, I want try to train denseaspp_model on your code. Can I just change the .sh? Or, could you have
the script about denseaspp on cityscapes? Thank you.

mask rcnn is empty.

The file methods/seg/mask_rcnn.py and methods/seg/mask_rcnn_test.py are 0 byte.

donnyyou / torchcv Goto Github PK

torchcv's Introduction

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

Implemented Papers

QuickStart with TorchCV

Performances with TorchCV

Image Classification

Semantic Segmentation

Object Detection

Pose Estimation

Instance Segmentation

Generative Adversarial Networks

DataSets with TorchCV

Commands with TorchCV

Demos with TorchCV

torchcv's People

Contributors

Stargazers

Watchers

Forkers

torchcv's Issues

Two questions:

Recommend Projects

Recommend Topics

Recommend Org