Coder Social home page Coder Social logo

donnyyou / torchcv Goto Github PK

View Code? Open in Web Editor NEW
2.2K 70.0 377.0 29.49 MB

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

Home Page: https://pytorchcv.com

License: Apache License 2.0

Makefile 0.01% Python 26.99% Shell 53.05% Lua 0.54% MATLAB 1.51% C++ 1.53% C 0.37% Jupyter Notebook 13.99% Cuda 2.02%

torchcv's Introduction

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

@misc{you2019torchcv,
    author = {Ansheng You and Xiangtai Li and Zhen Zhu and Yunhai Tong},
    title = {TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision},
    howpublished = {\url{https://github.com/donnyyou/torchcv}},
    year = {2019}
}

This repository provides source code for most deep learning based cv problems. We'll do our best to keep this repository up-to-date. If you do find a problem about this repository, please raise an issue or submit a pull request.

- Semantic Flow for Fast and Accurate Scene Parsing
- Code and models: https://github.com/lxtGH/SFSegNets

Implemented Papers

  • Image Classification

    • VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition
    • ResNet: Deep Residual Learning for Image Recognition
    • DenseNet: Densely Connected Convolutional Networks
    • ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
    • ShuffleNet V2: Practical Guidelines for Ecient CNN Architecture Design
    • Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
  • Semantic Segmentation

    • DeepLabV3: Rethinking Atrous Convolution for Semantic Image Segmentation
    • PSPNet: Pyramid Scene Parsing Network
    • DenseASPP: DenseASPP for Semantic Segmentation in Street Scenes
    • Asymmetric Non-local Neural Networks for Semantic Segmentation
    • Semantic Flow for Fast and Accurate Scene Parsing
  • Object Detection

    • SSD: Single Shot MultiBox Detector
    • Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
    • YOLOv3: An Incremental Improvement
    • FPN: Feature Pyramid Networks for Object Detection
  • Pose Estimation

    • CPM: Convolutional Pose Machines
    • OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
  • Instance Segmentation

    • Mask R-CNN
  • Generative Adversarial Networks

    • Pix2pix: Image-to-Image Translation with Conditional Adversarial Nets
    • CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.

QuickStart with TorchCV

Now only support Python3.x, pytorch 1.3.

pip3 install -r requirements.txt
cd lib/exts
sh make.sh

Performances with TorchCV

All the performances showed below fully reimplemented the papers' results.

Image Classification

  • ImageNet (Center Crop Test): 224x224
Model Train Test Top-1 Top-5 BS Iters Scripts
ResNet50 train val 77.54 93.59 512 30W ResNet50
ResNet101 train val 78.94 94.56 512 30W ResNet101
ShuffleNetV2x0.5 train val 60.90 82.54 1024 40W ShuffleNetV2x0.5
ShuffleNetV2x1.0 train val 69.71 88.91 1024 40W ShuffleNetV2x1.0
DFNetV1 train val 70.99 89.68 1024 40W DFNetV1
DFNetV2 train val 74.22 91.61 1024 40W DFNetV2

Semantic Segmentation

  • Cityscapes (Single Scale Whole Image Test): Base LR 0.01, Crop Size 769
Model Backbone Train Test mIOU BS Iters Scripts
PSPNet 3x3-Res101 train val 78.20 8 4W PSPNet
DeepLabV3 3x3-Res101 train val 79.13 8 4W DeepLabV3
  • ADE20K (Single Scale Whole Image Test): Base LR 0.02, Crop Size 520
Model Backbone Train Test mIOU PixelACC BS Iters Scripts
PSPNet 3x3-Res50 train val 41.52 80.09 16 15W PSPNet
DeepLabv3 3x3-Res50 train val 42.16 80.36 16 15W DeepLabV3
PSPNet 3x3-Res101 train val 43.60 81.30 16 15W PSPNet
DeepLabv3 3x3-Res101 train val 44.13 81.42 16 15W DeepLabV3

Object Detection

  • Pascal VOC2007/2012 (Single Scale Test): 20 Classes
Model Backbone Train Test mAP BS Epochs Scripts
SSD300 VGG16 07+12_trainval 07_test 0.786 32 235 SSD300
SSD512 VGG16 07+12_trainval 07_test 0.808 32 235 SSD512
Faster R-CNN VGG16 07_trainval 07_test 0.706 1 15 Faster R-CNN

Pose Estimation

  • OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Instance Segmentation

  • Mask R-CNN

Generative Adversarial Networks

  • Pix2pix
  • CycleGAN

DataSets with TorchCV

TorchCV has defined the dataset format of all the tasks which you could check in the subdirs of data. Following is an example dataset directory trees for training semantic segmentation. You could preprocess the open datasets with the scripts in folder data/seg/preprocess

Dataset
    train
        image
            00001.jpg/png
            00002.jpg/png
            ...
        label
            00001.png
            00002.png
            ...
    val
        image
            00001.jpg/png
            00002.jpg/png
            ...
        label
            00001.png
            00002.png
            ...

Commands with TorchCV

Take PSPNet as an example. ("tag" could be any string, include an empty one.)

  • Training
cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag
  • Resume Training
cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag
  • Validate
cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh val tag
  • Testing:
cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh test tag

Demos with TorchCV

Example output of VGG19-OpenPose

Example output of VGG19-OpenPose

torchcv's People

Contributors

donnyyou avatar jessemelpolio avatar lxtgh avatar qrsforever avatar yassouali avatar youdonny avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

torchcv's Issues

cudaGetLastError() == cudaSuccess ASSERT FAILED

2019-06-12 06:42:21,014 INFO [module_helper.py, 138] Loading pretrained model:/tmp/cars_segmentation/torchcv/pretrained_models/3x3resnet101-imagenet.pth
2019-06-12 06:42:28,858 INFO [controller.py, 28] Training start...
Traceback (most recent call last):
File "main.py", line 199, in
Controller.train(runner)
File "/tmp/cars_segmentation/torchcv/methods/tools/controller.py", line 40, in train
runner.train()
File "/tmp/cars_segmentation/torchcv/methods/seg/fcn_segmentor.py", line 85, in train
out_dict = self.seg_net(data_dict)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
raise output
File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
output = module(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/tmp/cars_segmentation/torchcv/models/seg/nets/pspnet.py", line 84, in forward
x = self.backbone(data_dict['img'])
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/tmp/cars_segmentation/torchcv/models/backbones/resnet/resnet_backbone.py", line 94, in forward
x = self.prefix(x)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/syncbn.py", line 44, in forward
xsum, xsqsum = sum_square(input)
File "/tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/functions.py", line 19, in sum_square
return _sum_square.apply(input)
File "/tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/functions.py", line 27, in forward
xsum, xsqusum = gpu.sumsquare_forward(input)

RuntimeError: cudaGetLastError() == cudaSuccess ASSERT FAILED at syncbn_kernel.cu:263, please report a bug to PyTorch. (Sum_Square_Forward_CUDA at syncbn_kernel.cu:263)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f6cd6bd1441 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f6cd6bd0d7a in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: Sum_Square_Forward_CUDA(at::Tensor) + 0x281 (0x7f6cbdb615e2 in /tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/src/gpu/syncbn_gpu.cpython-36m-x86_64-linux-gnu.so)
frame #3: + 0x1fc29 (0x7f6cbdb57c29 in /tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/src/gpu/syncbn_gpu.cpython-36m-x86_64-linux-gnu.so)
frame #4: + 0x24095 (0x7f6cbdb5c095 in /tmp/cars_segmentation/torchcv/extensions/ops/sync_bn/src/gpu/syncbn_gpu.cpython-36m-x86_64-linux-gnu.so)

when I train, I get this error. Can you help me solve it, thanks

TypeError with PyTorch 1.2

Hi,

When running Faster R-CNN, I am getting the following error:

2019-08-27 23:24:51,231 INFO    [controller.py, 28] Training start...
Traceback (most recent call last):
  File "main.py", line 181, in <module>
    Controller.train(runner)
  File "/ssd_scratch/cvit/aditya/torchcv/runner/tools/controller.py", line 37, in train
    runner.train()
  File "/ssd_scratch/cvit/aditya/torchcv/runner/det/faster_rcnn.py", line 85, in train
    out_dict = self.det_net(data_dict)
  File "/home/aditya.a/Libraries/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/ssd_scratch/cvit/aditya/torchcv/model/det/nets/faster_rcnn.py", line 74, in forward
    x = self.backbone(data_dict['img'])
  File "/home/aditya.a/Libraries/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aditya.a/Libraries/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/aditya.a/Libraries/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aditya.a/Libraries/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 338, in forward
    self.padding, self.dilation, self.groups)
TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not DataContainer

Any idea how to fix this? I am on PyTorch 1.2, and Python 3.7

open pose

could you provide the performance or pretrained model with open pose on coco?

How to train models on custom dataset?

As I have a custom dataset following COCO format for object detection and segmentation, how to train models on my own dataset? It is appreciated if you could provide some tutorials.

SSD512-->SSDHead-->self.feature6

Two questions:

  1. self.feature6 does not have Relu layers.
        self.feature6 = nn.Sequential(
            nn.Conv2d(256, 128, kernel_size=1, stride=1),
            nn.Conv2d(128, 256, kernel_size=4, stride=1, padding=1))

Is it supposed to be like this?

        self.feature6 = nn.Sequential(
            nn.Conv2d(256, 128, kernel_size=1, stride=1),
            nn.ReLU(),
            nn.Conv2d(128, 256, kernel_size=4, stride=1, padding=1))
            nn.ReLU(),
  1. The input feature's size to this layer is torch.Size([batch_size, 256, 2, 2]), but why use kernel_size=4 instead of kernel_size=2 ?

Thank you again in advance for your great help!

怎么在window上编译使用roi_pool

使用git去编译exts文件夹下的make.sh,在align和pool文件夹下出现build文件夹。之后应该做什么操作才能正常使用import exts.ops.roi_align。现在导入import exts.ops.roi_align会出现

1 from torch.autograd import Function
2
----> 3 from .. import roi_align_cuda
4
5

ImportError: cannot import name 'roi_align_cuda'

No module named 'extensions.layers.nms.src.cython_nms'

Hi,

Great job! It is a great framework to learn pytorch and latest networks.

But when I am training the openpose network, it shows No module named 'extensions.layers.nms.src.cython_nms'. And I check with the files, finding the file is missing.

Is there anything I am missing?

[ERROR] DeepLabV3+DenseNet Backbone

Hi,

I have a question when I try to train DeepLabV3 with DenseNet backbone.

the loss term of DeepLabV3.

I wonder why you use this auxiliary loss term. As far as I know, original DeepLabV3 was not trained by the auxiliary cross-entropy loss.

As your code about deeplabv3 was hard-coded, it should be fixed for other backbone (i.g., DenseNet).

Thanks

How to train the icnet model in the folder(demos/personseg/models)

Excuese me ,the code 'person_segmentor.py' in demos folder needs pretrained model in the following code :
'fabby_model': './model/ICNet32Photo640x480_iter_310000.pth',
'trimap_model': './model/trimap_icnet32SimpleV2_iter_120000.pth',
'mat_model': './model/trimap_icnet32SimpleV2_matting_iter_120000.pth'

How can i get the pretrained model or how to train the model ?
thankyou

Segmentation aux-out loss

Could you inform me of the theoretical rationale (e.g. related paper) for aux-out loss? The deeplabv3 network in your code outputs additional prediction w/o ASPP.

yolov3在coco2017数据集上跑出错

作者你好,看资料应该是北京大学的,中文肯定是好的,我就用中文了。
我想用yolov3来跑coco2017的数据,看项目的结构需要写一个scripts文件,我也写了scripts文件。但是在加了之后开始训练的时候出现了错误。
scripts文件
#!/usr/bin/env bash

nvidia-smi
PYTHON="python"

export PYTHONPATH="/home/dezheng/work/torchcv":$PYTHONPATH

cd ../../../

DATA_DIR="/home/dezheng/work/torchcv/data/torchcv_coco"
MODEL_NAME="yolov3"
LOSS_TYPE="yolov3loss"
CHECKPOINTS_NAME="yolov3_darknet_coco_det"$2
PRETRAINED_MODEL="./pretrained_models/yolov3_darknet_caffe_pretrained.pth"
HYPES_FILE='hypes/det/coco/yolov3_darknet_coco_det.json'

LOG_DIR="./log/det/coco/"
LOG_FILE="${LOG_DIR}${CHECKPOINTS_NAME}.log"

if [[ ! -d ${LOG_DIR} ]]; then
echo ${LOG_DIR}" not exists!!!"
mkdir -p ${LOG_DIR}
fi

if [[ "$1"x == "train"x ]]; then
${PYTHON} -u main.py --hypes ${HYPES_FILE} --phase train --log_to_file n --gpu 0 --cudnn n
--data_dir ${DATA_DIR} --loss_type ${LOSS_TYPE} --model_name ${MODEL_NAME}
--checkpoints_name ${CHECKPOINTS_NAME} --pretrained ${PRETRAINED_MODEL} 2>&1 | tee ${LOG_FILE}

最后报错如下:
Traceback (most recent call last):
File "main.py", line 186, in
runner = method_selector.select_det_method()
File "/home/dezheng/work/torchcv/methods/method_selector.py", line 93, in select_det_method
return DET_METHOD_DICTkey
File "/home/dezheng/work/torchcv/methods/det/yolov3.py", line 35, in init
self.det_data_loader = DataLoader(configer)
File "/home/dezheng/work/torchcv/datasets/det/data_loader.py", line 23, in init
self.aug_train_transform = pil_aug_trans.PILAugCompose(self.configer, split='train')
File "/home/dezheng/work/torchcv/datasets/tools/pil_aug_transforms.py", line 1187, in init
self.transforms[trans] = PIL_AUGMENTATIONS_DICTtrans
TypeError: init() got an unexpected keyword argument 'center_jitter'
这个怎么改呢?

Question about "ignore_index": 19

In hypes/seg/cityscape/*_cityscape_seg.json,
"ignore_index": 19
But in datasets/seg/fs_data_loader.py
encoded_labelmap = np.ones(shape=(shape[0], shape[1]), dtype=np.float32) * 255
so labels treated as void will be equel to 255.
My question is that should ignore_index be equel to 255?
Looking forward to your reply.

pytorch1.0 make error

hello, when I used torch0.4.1, I could make successfully, but when I switched to pytorch1.0, it failed...

pytorch1.1 running error

It seems that torchCV is running with pytorch1.0. But with pytorch1.1 I got error "RuntimeError: CUDA error: an illegal memory access was encountered" during training. Could you help me with this? I don't want to use pytorch1.0....

when I cd extension & sh make.sh, I got an error as follow.

In file included from /home/user/.local/lib/python3.6/site-packages/torch/lib/include/THC/THC.h:4:0,
                 from /home/user/.local/lib/python3.6/site-packages/torch/lib/include/THC/THCAtomics.cuh:4,
                 from src/roi_align_kernel.cu:2:
/home/user/.local/lib/python3.6/site-packages/torch/lib/include/THC/THCGeneral.h:17:23: fatal error: cublas_v2.h: No such file or directory
 #include "cublas_v2.h"
                       ^
compilation terminated.
error: command '/usr/local/cuda-9.0/bin/nvcc' failed with exit status 1
Building roi pool op...


I use pytorch1.0, cuda 9.0 and python3.6.

Openpose train problem

When train iteration is 68000, the loss is still nan.
I thought the problem is the miss of pre_trained backbone weights.
Now I use the weights download from the url that appear in model/model/backbones/vgg/vgg_models.py, and I noticed it's not a trained openpose backbone. But I cant find a better choice, may you provide a pre_trained backbone weights?

limit the number of boxes before nms/cls_nms

in utils/helpers/det_helper.py
the nms/cls_nms function doesn't limit the number boxes before num(cls_nms), so in the begining of training it will take too long because it will produce a large amount of boxes in the beginning.

simultaneous object detection and human pose estimation

this repo is an amazing idea! i have a question about its usage - would it be possible to do simultaneous object detection and human pose estimation?

i'd like to test on an image and use SSD for object detection and OpenPose for human pose estimation.

if its possible, i'd greatly appreciate the steps/code to accomplish this.

thanks,

Got loss nan after several epoches while training yolov3

Hi, I have tested fix_size and multi_size mode. The same problem has arisen. The loss was decreasing before this problem suddenly occurred. Do you have this problem when training yolov3 or do you know what caused this problem?

Sorry for the inconvenience, and many thanks.

2018-11-09 10:32:38,109 INFO    [yolov3.py, 97] LR: [0.0005412580317889754, 0.005412580317889754]
2018-11-09 10:34:09,663 INFO    [yolov3.py, 153] Train Epoch: 2	Train Iteration: 3300	Time 91.554s / 100iters, (0.916)	Data load 0.018s / 100iters, (0.000178)
Learning rate = [0.001, 0.01]	Loss = 213.22541809 (ave = 149.97146935)

2018-11-09 10:34:09,663 INFO    [yolov3.py, 97] LR: [0.0005581670612106865, 0.005581670612106865]
2018-11-09 10:35:41,826 INFO    [yolov3.py, 153] Train Epoch: 2	Train Iteration: 3400	Time 92.163s / 100iters, (0.922)	Data load 0.018s / 100iters, (0.000181)
Learning rate = [0.001, 0.01]	Loss = 149.52812195 (ave = 151.01665321)

2018-11-09 10:35:41,827 INFO    [yolov3.py, 97] LR: [0.0005750760906323977, 0.005750760906323977]
2018-11-09 10:37:13,627 INFO    [yolov3.py, 153] Train Epoch: 2	Train Iteration: 3500	Time 91.801s / 100iters, (0.918)	Data load 0.018s / 100iters, (0.000183)
Learning rate = [0.001, 0.01]	Loss = 178.01943970 (ave = 152.36151611)

2018-11-09 10:37:13,628 INFO    [yolov3.py, 97] LR: [0.0005919851200541089, 0.005919851200541088]
2018-11-09 10:38:44,910 INFO    [yolov3.py, 153] Train Epoch: 2	Train Iteration: 3600	Time 91.282s / 100iters, (0.913)	Data load 0.018s / 100iters, (0.000178)
Learning rate = [0.001, 0.01]	Loss = nan (ave = nan)

2018-11-09 10:38:44,910 INFO    [yolov3.py, 97] LR: [0.0006088941494758201, 0.006088941494758201]
2018-11-09 10:40:14,760 INFO    [yolov3.py, 153] Train Epoch: 2	Train Iteration: 3700	Time 89.850s / 100iters, (0.898)	Data load 0.018s / 100iters, (0.000177)
Learning rate = [0.001, 0.01]	Loss = nan (ave = nan)

configure file of openpose?

Hi, I think there exists a lot of mismatches of the open_pose_vgg_19.conf and the modules of your code, I revised most of them but still faces a lot of undefined ones.
Most of them are key Error and some of them may come from the directory of exts/tools/.

I was wondering if there is a newly-updated configuration file of open_pose. Appreciate for your updates.

AssertionError: <class 'torch.Tensor'>

2019-10-12 13:36:13,632 INFO [runner_helper.py, 63] Resuming from ./checkpoints/seg/cityscapes/fs_deeplabv3_cityscapes_segtag_latest.pth
2019-10-12 13:36:15,737 INFO [controller.py, 67] Testing start...
Traceback (most recent call last):
File "main.py", line 186, in
Controller.test(runner)
File "/home/spple/paddle/DeepGlint/torchcv/runner/tools/controller.py", line 79, in test
runner.test(test_dir, out_dir)
File "/home/spple/paddle/DeepGlint/torchcv/runner/seg/fcn_segmentor_test.py", line 48, in test
total_logits = self.ss_test(data_dict)
File "/home/spple/paddle/DeepGlint/torchcv/runner/seg/fcn_segmentor_test.py", line 85, in ss_test
data_dict = self.blob_helper.get_blob(in_data_dict, scale=1.0)
File "/home/spple/paddle/DeepGlint/torchcv/runner/tools/blob_helper.py", line 25, in get_blob
for image, meta in zip(DCHelper.tolist(data_dict['img']), DCHelper.tolist(data_dict['meta'])):
File "/home/spple/paddle/DeepGlint/torchcv/tools/helper/dc_helper.py", line 19, in tolist
assert isinstance(dc, DataContainer), type(dc)
AssertionError: <class 'torch.Tensor'>
2019-10-12 13:36:17,599 INFO [seg_evaluator.py, 51] Evaluate 0 images
/home/spple/paddle/DeepGlint/torchcv/metric/seg/seg_running_score.py:37: RuntimeWarning: invalid value encountered in double_scalars
acc = np.diag(hist).sum() / hist.sum()
/home/spple/paddle/DeepGlint/torchcv/metric/seg/seg_running_score.py:38: RuntimeWarning: invalid value encountered in true_divide
acc_cls = np.diag(hist) / hist.sum(axis=1)
/home/spple/paddle/DeepGlint/torchcv/metric/seg/seg_running_score.py:39: RuntimeWarning: Mean of empty slice
acc_cls = np.nanmean(acc_cls)
/home/spple/paddle/DeepGlint/torchcv/metric/seg/seg_running_score.py:40: RuntimeWarning: invalid value encountered in true_divide
iu = np.diag(hist) / (hist.sum(axis=1) + hist.sum(axis=0) - np.diag(hist))
/home/spple/paddle/DeepGlint/torchcv/metric/seg/seg_running_score.py:41: RuntimeWarning: Mean of empty slice
mean_iu = np.nanmean(iu)
/home/spple/paddle/DeepGlint/torchcv/metric/seg/seg_running_score.py:42: RuntimeWarning: invalid value encountered in true_divide
freq = hist.sum(axis=1) / hist.sum()
/home/spple/paddle/DeepGlint/torchcv/metric/seg/seg_running_score.py:43: RuntimeWarning: invalid value encountered in greater
fwavacc = (freq[freq > 0] * iu[freq > 0]).sum()
2019-10-12 13:36:17,641 INFO [seg_evaluator.py, 52] Class mIOU: {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan, 6: nan, 7: nan, 8: nan, 9: nan, 10: nan, 11: nan, 12: nan, 13: nan, 14: nan, 15: nan, 16: nan, 17: nan, 18: nan}
2019-10-12 13:36:17,642 INFO [seg_evaluator.py, 53] mIOU: nan
2019-10-12 13:36:17,643 INFO [seg_evaluator.py, 54] Pixel ACC: nan

DataParallel leads to Error of Cifar10 classification?

I try to do the task of classification of Cifar10 using multiple GPUs. I run the script for this, it leads to an error during calculating the loss as follows:

捕获

However, when I set a single gpu , it works fine. How to fix it?

No module named 'datasets.tools.data_transformer'

Dear Donny:

I have already finished make.sh

when I run "python3 main.py ", Error occurs, It seems some files are missing:

Traceback (most recent call last):
File "main.py", line 15, in
from methods.method_selector import MethodSelector
File "/home/zhaolun/Projects/PyTorchCV/methods/method_selector.py", line 12, in
from methods.det.faster_rcnn import FasterRCNN
File "/home/zhaolun/Projects/PyTorchCV/methods/det/faster_rcnn.py", line 17, in
from methods.det.faster_rcnn_test import FastRCNNTest
File "/home/zhaolun/Projects/PyTorchCV/methods/det/faster_rcnn_test.py", line 17, in
from datasets.tools.data_transformer import DataTransformer
ImportError: No module named 'datasets.tools.data_transformer'

Do you have any idea why this happens?

when I run the test?It is error.

python main.py --hypes=hypes/det/voc/ssd300_vgg16_voc_det.json --phase=test --resume=checkpoints/det/voc/ssd_vgg300_voc_0.786.pth --test_img=test/ski.jpg --gpu=0 --model_name=vgg16_ssd300

ERROR:
AttributeError: 'SingleShotDetectorTest' object has no attribute 'test_img'

Clueless halt

After your refactoring, I also got this error. The error appears with no pattern; it just sometimes appear randomly. And I would say it happens quite often.

terminate called after throwing an instance of 'std::system_error'
  what():  Broken pipe

i have no clue why it is happening, hope you could have more insight.

F.interpolate

in datasets/tools/collate_functions.py line 119:
the targeted scale for F.interpolate should be (height, width) instead of (width, height), isn't it??

yolov3 loss error

in utils/layers/det/yolo_detection_layer.py :
I believe line 42(if self.configer.get('phase') != 'debug':) should be removed,
because values passed to BCE loss should be in the range of (0, 1) in the training phase.

when I run the test?It is error.

python main.py --hypes=hypes/det/voc/ssd300_vgg16_voc_det.json --phase=test --resume=checkpoints/det/voc/ssd_vgg300_voc_0.786.pth --test_img=test/ski.jpg --gpu=0 --model_name=vgg16_ssd300
Uploading image.png…

Question about SSDFocalLoss?

in the code :

    logit = F.softmax(x,dim=-1)
    logit = logit.clamp(1e-7, 1.-1e-7)
    conf_loss_tmp = -1 * t.float() * torch.log(logit)
    conf_loss_tmp = alpha * conf_loss_tmp * (1-logit)**gamma
    conf_loss = conf_loss_tmp.sum()

the positive and negative has the same alpha=0.25,some implement is when positive alpha =0.25,negative=1-0.25,what is the difference?

HeatmapGenerator and PafGenerator run so slowly

In methods/pose/open_pose.py, when batch size equels to 32,

heatmap = self.heatmap_generator(data_dict['kpts'], input_size, maskmap=maskmap)
vecmap = self.paf_generator(data_dict['kpts'], input_size, maskmap=maskmap)

These two lines will take about 40 seconds every iteration. It is too time consuming. The network runs only 0.03 second.

Are there any details to reduce runing time I ignored ?

KeyError when loading ssd512 pretrained model

Hi, donnyyou.
After downloading the pretrained .pth ssd512 model, I run
bash run_ssd512_vgg16_voc_det.sh train tag
Then I got:

2019-05-23 10:23:17,616 INFO [vgg512_ssd.py, 60] Loading pretrained model:./pretrained_models/ssd_vgg512_voc_0.808.pth
2019-05-23 10:23:20,996 INFO [vgg512_ssd.py, 63] Pretrained Keys: dict_keys(['config_dict', 'state_dict'])
2019-05-23 10:23:20,997 INFO [vgg512_ssd.py, 65] Model Keys: odict_keys(['features.0.weight', 'features.0.bias', 'features.2.weight', 'features.2.bias', 'features.5.weight', 'features.5.bias', 'features.7.weight', 'features.7.bias', 'features.10.weight', 'features.10.bias', 'features.12.weight', 'features.12.bias', 'features.14.weight', 'features.14.bias', 'features.17.weight', 'features.17.bias', 'features.19.weight', 'features.19.bias', 'features.21.weight', 'features.21.bias', 'features.24.weight', 'features.24.bias', 'features.26.weight', 'features.26.bias', 'features.28.weight', 'features.28.bias', 'features.31.weight', 'features.31.bias', 'features.33.weight', 'features.33.bias'])
2019-05-23 10:23:20,997 INFO [vgg512_ssd.py, 75] Matched Keys: dict_keys([])
2019-05-23 10:23:21,067 ERROR [configer.py, 69] ssd_detection_layer.py, 16 KeyError: ('gt', 'num_anchor_list').

Since your ssd script is 404 not found, I modify the faster rcnn script for ssd as below:

#!/usr/bin/env bash

#check the enviroment info
nvidia-smi
PYTHON="python"

export PYTHONPATH="/home/ruijin/Work/python/torchcv-master":$PYTHONPATH

cd ../../../

DATA_DIR="/home/donny/DataSet/VOC07_DET"
MODEL_NAME="vgg512_ssd"
LOSS_TYPE="ssd_multibox_loss"
CHECKPOINTS_NAME="ssd_vgg16_voc_det"$2
PRETRAINED_MODEL="./pretrained_models/ssd_vgg512_voc_0.808.pth"
HYPES_FILE='hypes/det/voc/ssd512_vgg16_voc_det.json'

LOG_DIR="./log/det/voc/"
LOG_FILE="${LOG_DIR}${CHECKPOINTS_NAME}.log"

if [[ ! -d ${LOG_DIR} ]]; then
echo ${LOG_DIR}" not exists!!!"
mkdir -p ${LOG_DIR}
fi

if [[ "$1"x == "train"x ]]; then
${PYTHON} -u main.py --hypes ${HYPES_FILE} --phase train --log_to_file n --gpu 0 --cudnn n
--data_dir ${DATA_DIR} --loss_type ${LOSS_TYPE} --model_name ${MODEL_NAME}
--checkpoints_name ${CHECKPOINTS_NAME} --pretrained ${PRETRAINED_MODEL} 2>&1 | tee ${LOG_FILE}

elif [[ "$1"x == "resume"x ]]; then
${PYTHON} -u main.py --hypes ${HYPES_FILE} --phase train --log_to_file n --gpu 0 --cudnn n
--data_dir ${DATA_DIR} --loss_type ${LOSS_TYPE} --model_name ${MODEL_NAME}
--resume_continue y --resume ./checkpoints/cityscapes/${CHECKPOINTS_NAME}_latest.pth
--checkpoints_name ${CHECKPOINTS_NAME} --pretrained ${PRETRAINED_MODEL} 2>&1 | tee -a ${LOG_FILE}

elif [[ "$1"x == "debug"x ]]; then
${PYTHON} -u main.py --hypes ${HYPES_FILE} --phase debug --gpu 0 --log_to_file n 2>&1 | tee ${LOG_FILE}

elif [[ "$1"x == "val"x ]]; then
${PYTHON} -u main.py --hypes ${HYPES_FILE} --phase test --log_to_file n --model_name ${MODEL_NAME}
--phase test --gpu 0 --resume ./checkpoints/cityscapes/${CHECKPOINTS_NAME}_latest.pth
--test_dir ${DATA_DIR}/val/image --out_dir val 2>&1 | tee -a ${LOG_FILE}
cd metrics/det/
${PYTHON} -u voc_evaluator.py --hypes "../../../"${HYPES_FILE}
--json_dir ../../../out/results/voc/test_dir/${CHECKPOINTS_NAME}/val/label
--gt_dir ${DATA_DIR}/val/label 2>&1 | tee -a "../../"${LOG_FILE}

else
echo "$1"x" is invalid..."
fi

Hope you can help me. Thanks!

About DenseASPP

Hi, I want try to train denseaspp_model on your code. Can I just change the .sh? Or, could you have
the script about denseaspp on cityscapes? Thank you.

mask rcnn is empty.

The file methods/seg/mask_rcnn.py and methods/seg/mask_rcnn_test.py are 0 byte.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.