nv-tlabs / gscnn Goto Github PK

View Code? Open in Web Editor NEW

906.0 36.0 198.0 165.41 MB

Gated-Shape CNN for Semantic Segmentation (ICCV 2019)

Home Page: https://nv-tlabs.github.io/GSCNN/

License: Other

Python 99.53% Dockerfile 0.47%

semantic-segmentation deep-learning iccv2019 computer-vision pytorch nv-tlabs semantic-boundaries

gscnn's Introduction

GSCNN

This is the official code for:

Gated-SCNN: Gated Shape CNNs for Semantic Segmentation

Towaki Takikawa, David Acuna, Varun Jampani, Sanja Fidler

ICCV 2019 [Paper] [Project Page]

Based on based on https://github.com/NVIDIA/semantic-segmentation.

License

Copyright (C) 2019 NVIDIA Corporation. Towaki Takikawa, David Acuna, Varun Jampani, Sanja Fidler
All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).

Permission to use, copy, modify, and distribute this software and its documentation
for any non-commercial purpose is hereby granted without fee, provided that the above
copyright notice appear in all copies and that both that copyright notice and this
permission notice appear in supporting documentation, and that the name of the author
not be used in advertising or publicity pertaining to distribution of the software
without specific, written prior permission.

THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE.
IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL
DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
~

Usage

Clone this repo

git clone https://github.com/nv-tlabs/GSCNN
cd GSCNN

Python requirements

Currently, the code supports Python 3

numpy
PyTorch (>=1.1.0)
torchvision
scipy
scikit-image
tensorboardX
tqdm
torch-encoding
opencv
PyYAML

Download pretrained models

Download the pretrained model from the Google Drive Folder, and save it in 'checkpoints/'

Download inferred images

Download (if needed) the inferred images from the Google Drive Folder

Evaluation (Cityscapes)

python train.py --evaluate --snapshot checkpoints/best_cityscapes_checkpoint.pth

Training

A note on training- we train on 8 NVIDIA GPUs, and as such, training will be an issue with WiderResNet38 if you try to train on a single GPU.

If you use this code, please cite:

@article{takikawa2019gated,
  title={Gated-SCNN: Gated Shape CNNs for Semantic Segmentation},
  author={Takikawa, Towaki and Acuna, David and Jampani, Varun and Fidler, Sanja},
  journal={ICCV},
  year={2019}
}

gscnn's People

Contributors

Stargazers

Watchers

Forkers

yaoxinbin jdc08161063 xincyu fendaq happog openseg-group larrybrid ml-lab ashergaga tovacinni hzhang57 alzayats ricklentz xllau dailyactie windson87 boyuezhong autogyro steven-chang1114 asdlei99 lijing1996 nnu-gisa templeblock stjordanis liuxinren456852 xiaolaodi mysephi shannongxn plkms devolfnn mattsokoloff sadimohammad kobeee0614 yinghuasha htylab yangyuren03 yuv4r4j pc2005 lfs119 hyzwj tangyoubao yogsin j12138 ebuty waterbearbee dlwbm123 takuyashintate peterzhousz xiezixiustc yacobby inaaa cuevhv zhizhangxian princefly aixioma weifj0212 iancarv hanyeliu suyanzhou626 toopigtobig zuojianhao ginobilinie nicole2zh aoe-khkhan abnerxzhe rodri651 dmsalvatore harshita1804 sxhxliang maphysart koala0qoo jackeyghd1 huangxin4520 spytensor xukc1995 drericebert rwang97 cv-ip arthurallshire showbaba helenaalinder rotemshaul qiuweibin2005 surfcao zhangjy2008327 hzcxq bozkurthan hanguangmicrice what2dovin likedan zzzhoudj minygd weisili2016 justin0111 mengkunzhao qcstephen xuhuahaoren bigheartdb zhanghongyan6553 songkq

gscnn's Issues

What are shape of map and _edgemap?

Thanks for sharing the great project.

If BxCxHxW is shape of the input image, what is shape of mask and _edgemap?

Thks

Question about paper.

Thanks for your share.

Several questions:

Does Eq.(6) miss a minus sign?
How to obtain the gradient of an image for the input of the shape stream? soble?
The regular steam outputs multi-scale feat-maps. How to obtain the attention maps at a resolution of HxW always?

Unused feature in forward pass?

I'm just reading the paper and looking at the code to follow along. In doing so I've gotten a bit confused because there seems to be an extra shape feature map in the code. In networks/gscnn.py on line 274 we have

s1 = F.interpolate(self.dsn1(m1), x_size[2:], mode='bilinear', align_corners=True)

but this is never output or used. Following the paper I would expect this to be the input to the self.res1 unit. However, this is not the case. What is actually put through that is

m1f = F.interpolate(m1, x_size[2:], mode='bilinear', align_corners=True)
cs = self.res1(m1f)

Something is off here as we calculate s1 but do nothing with it. My question is, should s1 replace m1f or should it just be deleted?

a sigmoid function σ

Does the variable s and r in Equation 1 come from different cnn backbone? Do these two network structures need to be trained separately?

ImportError: No module named 'enclib_cpu'

I'm trying to fix this problem, but seems very few information for me.
I'm using dock images from NGC nvcr.io/nvidia/pytorch:18.10-py03.
After modify the ['ninja', '-v'] to ['ninja', '--version'], I run

python train.py --evaluate --snapshot checkpoints/best_cityscapes_checkpoint.pth

and met problem below:
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
File "train.py", line 381, in
main()
File "train.py", line 128, in main
assert_and_infer_cfg(args)
File "/usr/GSCNN/config.py", line 86, in assert_and_infer_cfg
import encoding
File "/opt/conda/lib/python3.6/site-packages/encoding/init.py", line 13, in
from . import nn, functions, parallel, utils, models, datasets, transforms
File "/opt/conda/lib/python3.6/site-packages/encoding/nn/init.py", line 12, in
from .encoding import *
File "/opt/conda/lib/python3.6/site-packages/encoding/nn/encoding.py", line 18, in
from ..functions import scaled_l2, aggregate, pairwise_cosine
File "/opt/conda/lib/python3.6/site-packages/encoding/functions/init.py", line 2, in
from .encoding import *
File "/opt/conda/lib/python3.6/site-packages/encoding/functions/encoding.py", line 14, in
from .. import lib
File "/opt/conda/lib/python3.6/site-packages/encoding/lib/init.py", line 15, in
], build_directory=cpu_path, verbose=False)
File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 661, in load
is_python_module)
File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 841, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1048, in _import_module_from_library
file, path, description = imp.find_module(module_name, [path])
File "/opt/conda/lib/python3.6/imp.py", line 297, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'enclib_cpu'

my pre-requests version:
Python 3.6.6 :: Anaconda, Inc.
numpy 1.17.4
PyTorch 1.3.1
torchvision 0.2.1
scipy 1.1.0
scikit-image 0.16.2
tensorboardX 1.9
tqdm 4.26.0
torch-encoding 1.0.1
opencv-python 4.1.1.26
PyYAML 3.13

Anyone met this kind of problem before? or could provide some suggestions for me?

How to run this model for a custom dataset?

Hi!, Thanks for the fantastic paper and also for providing code to it. Can you tell me how to run this model for my custom dataset? My training dataset has a set of images and it's masks and I would like it to predict the masks for the test dataset.

_ConvNd from torch.nn.modules.conv is used with wrong arguments

Hi!
I am running
python train.py --evaluate --snapshot checkpoints/best_cityscapes_checkpoint.pth
in Docker using your Dockerfile. The following error occurs:

Traceback (most recent call last):
  File "train.py", line 381, in <module>
    main()
  File "train.py", line 132, in main
    net = network.get_net(args, criterion)
  File "/app/gscnn/network/__init__.py", line 12, in get_net
    criterion=criterion, trunk=args.trunk)
  File "/app/gscnn/network/__init__.py", line 27, in get_model
    net = net_func(num_classes=num_classes, trunk=trunk, criterion=criterion)
  File "/app/gscnn/network/gscnn.py", line 233, in __init__
    self.gate1 = gsc.GatedSpatialConv2d(32, 32)
  File "/app/gscnn/my_functionals/GatedSpatialConv.py", line 36, in __init__
    False, _pair(0), groups, bias, 'zeros')
TypeError: __init__() takes 11 positional arguments but 12 were given

The following lines in my_functionals/GatedSpatialConv.py cause the problem:

super(GatedSpatialConv2d, self).__init__(
            in_channels, out_channels, kernel_size, stride, padding, dilation,
            False, _pair(0), groups, bias, 'zeros')

Better IU than in paper

Hi,

when executing

python train.py --evaluate --snapshot checkpoints/best_cityscapes_checkpoint.pth

I achieve a slightly better IoU value than in your arxiv paper. Is the arxiv paper not up-to-date or is this a bug?

TypeError: init() takes 11 positional arguments but 12 were given

How can I solve this problem?

convTri function in custom_functional.py doesn't do anything

There is a return before actual execution of the function code, is this expected bahavior?

GSCNN/my_functionals/custom_functional.py

Line 93 in 3648b86

return input

Loss implementation seems different than equation (4) in the paper.

Looks like the loss implementation does not apply argmax to the semantic segmentation logits, right? But in the paper, equation (4) applies argmax before ConvTri and Gradient computation, right?
Which one is the one used to produce the experimental results?

Thank you,
Hai

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])

ValueError: Caught ValueError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/renshasha/anaconda3/envs/aaa/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/renshasha/anaconda3/envs/aaa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/renshasha/GSCNN-master/network/gscnn.py", line 315, in forward
x = self.aspp(m7, acts)
File "/home/renshasha/anaconda3/envs/aaa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/renshasha/GSCNN-master/network/gscnn.py", line 167, in forward
img_features = self.img_conv(img_features)
File "/home/renshasha/anaconda3/envs/aaa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/renshasha/anaconda3/envs/aaa/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/renshasha/anaconda3/envs/aaa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/renshasha/anaconda3/envs/aaa/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 81, in forward
exponential_average_factor, self.eps)
File "/home/renshasha/anaconda3/envs/aaa/lib/python3.7/site-packages/torch/nn/functional.py", line 1652, in batch_norm
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])

Evaluation code bug

In train.py,

on line 139

if args.evaluate:
        # Early evaluation for benchmarking
        validate(val_loader, net, criterion_val,
                 optim, epoch, writer)
        evaluate(val_loader, net)
        return

The variable 'epoch' is used before being defined. I was able to work my way around this by just passing 0 or 1 in place of epoch, since for pure evaluation purposes this is irrelevant.

if args.evaluate:
        # Early evaluation for benchmarking
        validate(val_loader, net, criterion_val,
                 optim, 1, writer)
        evaluate(val_loader, net)
        return

Maybe this is worth looking into.

Best regards,
Shreyas

Unfair Comparison in your paper

After reading your paper and discussion, we have few questions.

In the leaderboard, we can see that you use Mapillary data. However, In the paper of table 6. You compared TKCN and AAF-PSP which didn't use Mapillay data. It is obviously unfair.
So what is real result without Maypiilary data and only use fine-data on Cityscapes test set ?
Also we didn't find your claim in using Mapillary data in your paper why didn't mention it?
In the table 3, Why mIoU is so low when use your ResNet101 backbone according other repo. ResNet101 based deeplabv3+ should be 78+. Repo: https://github.com/speedinghzl/pytorch-segmentation-toolbox

the problem of train

Hello blogger，When I started training the network from scratch, I used four loss functions. The network is very difficult to convergence, four losses have been oscillating, how do you train the network, is the first use of the single loss function at the beginning?

RuntimeError:CUDA out of memory.

pytorch = 1.1.0
I can print net and visualize the net , but when I run train.py ,
the program was killed in "seg_out, edge_out = net(input)" .
Then , I wanted to use "from thop import profile" to count model
parameter size and flops , but this also had an error:
"RuntimeError:module must have its parameters and buffers on
device cuda:0 but found one of them on device:cpu".
Then I specify device as 'cuda:0' , but this error still exists.
So I want to know how to solve these errors ,
can anyone tell me the params and flops of the gscnn net.
In other words, how much memory is used to run this gscnn net?

ImportError: CANN'T import name 'imsave'

While trying the train.py file, I have encountered the Error "ImportError: CANN'T import name 'imsave'". This issue is caused by the version of scipy. In the scipy (version>1.3),the method "imsave" has been eliminated, so install the module scipy with lower version (1.2.0) can fix this error.

Training script

Would you please provide the training script?

Thanks!

some problems about training

When I run 'python train.py --evaluate --snapshot checkpoints/best_cityscapes_checkpoint.pth',CUDA out of memory.

My GPU is 11GB of memory,How can I solve this problem，thx.

Edge loss does not decline

I try to reproduce the results from the paper. I use the recommended hyper-parameters.

Namespace(adam=False, amsgrad=False, arch='network.gscnn.GSCNN', att_weight=1.0, batch_weighting=False, bblur=False, best_record={'acc_cls': 0, 'val_loss': 10000000000.0, 'mean_iu': 0, 'iter': 0, 'epoch': -1, 'fwavacc': 0, 'acc': 0}, bs_mult=1, bs_mult_val=2, ckpt='/experiments/image-segmentation/GSCNN/', color_aug=0.25, crop_size=720, cv=0, dataset='cityscapes', date_str='2019_12_09_15_39_04', dual_weight=1.0, dump_augmentation_images=False, edge_weight=20.0, eval_thresholds='0.0005,0.001875,0.00375,0.005', evaluate=False, exp='GSCNN-cityspace', exp_path='/experiments/image-segmentation/GSCNN/GSCNN-cityspace/testing', gblur=True, img_wt_loss=False, joint_edgeseg_loss=True, last_record={}, local_rank=0, lr=0.01, lr_schedule='poly', maxSkip=0, max_epoch=175, momentum=0.9, ngpu=1, poly_exp=1.0, pre_size=None, repoly=1.5, rescale=1.0, restore_optimizer=False, rotate=0, scale_max=2.0, scale_min=0.5, seg_weight=1.0, sgd=True, sgd_finetuned=False, snapshot=None, start_epoch=0, syncbn=True, tb_exp_path='/experiments/image-segmentation/GSCNN/GSCNN-cityspace/testing', tb_path='/experiments/image-segmentation/GSCNN/', tb_tag='', test_mode=False, trunk='resnet101', weight_decay=0.0001, world_size=1, wt_bound=1.0)

For training I use one Nvidia Tesla V100 Data Center GPU.

I have a following issue, all the IoUs go up, all the losses fall except for the edge loss? Why is so? I attach screenshot from my tensorboard

Thanks in advance for the response.

How to evaluate the edge score?

Hi! Thanks for your code. Could you tell me how to evaluate the edge which is reported in your paper. I want to evaluate it locally. Also how to generate the edges results in your demo. It looks so cool.

collect_arguments(Args &&...args) {\n ^\n/usr/local/lib/python3.6/dist-packages/torch/include/pybind11/cast.h:2094:1: note: template argument deduction/substitution failed:\nninja: build stopped: subcommand failed.\n'

i runned the command python train.py --evaluate --snapshot checkpoints/best_cityscapes_checkpoint.pth after i changed the DIR name to /home/usrname/data/cityscapes , the mistakes with the pacages of /python3.6/dist-packages/torch/include/pybind11/cast happened

Can this project choose resnet50 or resnet101?

Although there is args.trunck can be resnet50 or resnet101. But I can't find relevant code in the code. Is this project only use widerresnet?

Binary semantic segmentation

Hi,

congrats on your work! I am wondering, could your method be applied to a binary semantic segmentation task? I am interested in segmenting roads from aerial imagery, by classifying roads as one class and all the remaining pixels as background class.

If this is possible, could you tell me which parts I should modify to train your network for a binary semantic segmentation task? Would it be beneficial to use your pre-trained model as a starting training point?

Have a nice day!

subprocess.CalledProcessError: Command '['ninja', '-v']'

hi,

when executing
python train.py --evaluate --snapshot checkpoints/best_cityscapes_checkpoint.pth

I get this error:
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

I'm using:
gcc=4.9.4
ninja=1.9.0
pytorch = 1.2.0
torchvision = 0.4.0
python = 3.6.9
cuda = 9.2
torch-encoding = 1.0.0

may you share the package versions of your environment?

Is Canny indeed applied?

In the model's forward code there is a cv2.Canny function call:

GSCNN/network/gscnn.py

Lines 287 to 288 in 64f8487

    
           for i in range(x_size[0]): 
        
               canny[i] = cv2.Canny(im_arr[i],10,100)

It works for images of np.uint8 type. But the image was preprocessed in the dataset using torchvision transforms which converts image to range [0.0, 1.0]. Converting the image back to np.uint8 from the range [0.0, 1.0] leaves 0 and 1 values only which are lower than the lowest threshold in the Canny call - Canny will not find any edges on such an image.

So, I'm wondering whether this operation has an effect, or have I lost some conversion operation?

Unable to train due to CUDA out of memory error in Google Colab with ultra low resolution pics

Hi, thanks for the repo. However, I am not able to run my training on the cityscapes dataset. I have around 50 images for training and about 10 for validation and testing each. I have reduced the image resolution to 128x128 and still, it gives the CUDA out of memory error. I am running this on Google Colab which has 12 GB of GPU memory. Can you tell me what I should do to be able to run this model? Any changes that have to be tweaked?
@shubhaminnani @tovacinni @varunjampani @davidjesusacu @ShreyasSkandanS

IoU on cityscapes test set cannot achieve your result reported in the paper

Your work is great and thanks for sharing the code.

However,when I use your pre-trained model for cityscapes test set and submit to the website, the average IoU is 79.7%,which is lower than your result 82.8%.

Are there any other special techniques or settings you have adopted for testing?

Looking forward to your reply.

How can i use the pretrained model

i want to train another dataset, how can i use the pretrained model
is this correct?
python train.py --snapshot checkpoints/best_cityscapes_checkpoint.pth

Anyone reproduced the results by training the model using this code?

Did anyone reproduce the results by training the model using this code?
Looks like there are a few bugs and discrepancies between the code and the paper. At least, in the paper, there is a argmax ops in the loss (eq. (4) and (5)), but the code does not have argmax ops. Based on the code, eq. (5) is conducted on the output of gumbel softmax. Is this a bug or typo?

Thank you,
Hai

how to test same on test dataset

@tovacinni @ShreyasSkandanS @varunjampani @davidjesusacu Hi, I have a question what if i want to predict i.e. test the test data? how can we do it?
Thanks in advance.

while evaluting on other data its shows error while evaluting any suggestions to remove it?
Error - Sizes of tensors must match except in dimension 0. Got 720 and 1080 in dimension 2

few questions with interesting result, outstanding architecture of yours

interesting result, outstanding architecture! when can i read your codes
did your team consider more loss technics, something like VGG loss or something?
this architecture could be called as one of transformation of encoder-decoder, right?

not question but additionally, there is typo on regular stream part of page 3 in in paper.
fully-convoutional => fully-convolutional

thank you for reading it

Training from scratch error

I tried training on cityscapes from scratch using 'python train.py' and get this error:

raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])

It is something in the _AtrousSpatialPyramidPoolingModule class in network/gscnn.py

img_features = self.img_conv(img_features)

How to load wide resnet pretrained model from IMGNET

Thank for your great work and share your code,but i have same question,i can't finde the code load wide resnet pretraned model from ImageNet.Can anyone help?thank you.

Code error in GSCNN/datasets/edge_utils.py file?

Hello, I noticed in ./datasets/edge_utils.py file, the function mask_to_onehot may be error.
the origin code is:

def mask_to_onehot(mask, num_classes):
    """
    Converts a segmentation mask (H,W) to (K,H,W) where the last dim is a one
    hot encoding vector
    """
    _mask = [mask == (i + 1) for i in range(num_classes)]
    return np.array(_mask).astype(np.uint8)

However, the content of mask should be in [0, 18], because you defined trainId in ./datasets/cityscapes_labels.py that trainId start from 0 and does not have 19. But the code
mask == (i + 1) for i in range(num_classes)
assume the content of mask in [1, 19], which does not correct. In my opinion, maybe you could change code :

def mask_to_onehot(mask, num_classes):
    """
    Converts a segmentation mask (H,W) to (K,H,W) where the last dim is a one
    hot encoding vector
    """
    _mask = [mask == i for i in range(num_classes)]
    return np.array(_mask).astype(np.uint8)

Thanks for your project!

subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

Traceback (most recent call last):
File "D:/my/torch-gpu/mu_yi_hua-GSCNN-master/GSCNN/train.py", line 381, in
main()
File "D:/my/torch-gpu/mu_yi_hua-GSCNN-master/GSCNN/train.py", line 128, in main
assert_and_infer_cfg(args)
File "D:\my\torch-gpu\mu_yi_hua-GSCNN-master\GSCNN\config.py", line 86, in assert_and_infer_cfg
import encoding
File "D:\deeplearning\anaconda\envs\torch-gpu\lib\site-packages\encoding_init_.py", line 13, in
from . import nn, functions, parallel, utils, models, datasets, transforms
File "D:\deeplearning\anaconda\envs\torch-gpu\lib\site-packages\encoding\nn_init_.py", line 12, in
from .encoding import *
File "D:\deeplearning\anaconda\envs\torch-gpu\lib\site-packages\encoding\nn\encoding.py", line 18, in
from ..functions import scaled_l2, aggregate, pairwise_cosine
File "D:\deeplearning\anaconda\envs\torch-gpu\lib\site-packages\encoding\functions_init_.py", line 2, in
from .encoding import *
File "D:\deeplearning\anaconda\envs\torch-gpu\lib\site-packages\encoding\functions\encoding.py", line 14, in
from .. import lib
File "D:\deeplearning\anaconda\envs\torch-gpu\lib\site-packages\encoding\lib_init_.py", line 15, in
], build_directory=cpu_path, verbose=False)
File "D:\deeplearning\anaconda\envs\torch-gpu\lib\site-packages\torch\utils\cpp_extension.py", line 644, in load
is_python_module)
File "D:\deeplearning\anaconda\envs\torch-gpu\lib\site-packages\torch\utils\cpp_extension.py", line 813, in _jit_compile
with_cuda=with_cuda)
File "D:\deeplearning\anaconda\envs\torch-gpu\lib\site-packages\torch\utils\cpp_extension.py", line 862, in _write_ninja_file_and_build
with_cuda=with_cuda)
File "D:\deeplearning\anaconda\envs\torch-gpu\lib\site-packages\torch\utils\cpp_extension.py", line 1072, in _write_ninja_file
'cl']).decode().split('\r\n')
File "D:\deeplearning\anaconda\envs\torch-gpu\lib\subprocess.py", line 336, in check_output
**kwargs).stdout
File "D:\deeplearning\anaconda\envs\torch-gpu\lib\subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

Questions about the numerical_gradients_2d function

Thanks for sharing your work!
It seems a bug here:

GSCNN/my_functionals/custom_functional.py

Lines 38 to 52 in 64f8487

def gradient_central_diff(input, cuda):

return input, input

kernel = [[1, 0, -1]]

kernel_t = 0.5 * torch.Tensor(kernel) * -1. # pytorch implements correlation instead of conv

if type(cuda) is int:

if cuda != -1:

kernel_t = kernel_t.cuda(device=cuda)

else:

if cuda is True:

kernel_t = kernel_t.cuda()

n, c, h, w = input.shape

x = conv2d_same(input, kernel_t.unsqueeze(0).unsqueeze(0).repeat([c, 1, 1, 1]), c)

y = conv2d_same(input, kernel_t.t().unsqueeze(0).unsqueeze(0).repeat([c, 1, 1, 1]), c)

return x, y

The numerical_gradients_2d function call this function that return the input directly and the input is only processed by the triangle filter. I wonder whether it was a bug here.

No output images from running the evaluation? Not clear if there is a code crash or not, help please

Hi,
I am not sure the below is an error but at the same time, looking at the train.py evaluate function, it is supposed to output "threshold" and F-Score values, so I do not think the code ran up to the end properly. I can see there is an out of memory error message before processing frames but I do not know if the memory issue was a critical error since the code continued running. The frame section is not clearly looking like an error, but at the same time it could be.

Can you tell me if this is the expected output?

thanks

gscnn_train_eval_run_output.txt

Error trying to run code

`/usr/local/lib/python3.5/dist-packages/torch/nn/modules/loss.py:217: UserWarning: NLLLoss2d has been deprecated. Please use NLLLoss instead as a drop-in replacement and see https://pytorch.org/docs/master/nn.html#torch.nn.NLLLoss for more details.
warnings.warn("NLLLoss2d has been deprecated. "
/usr/local/lib/python3.5/dist-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
08-27 20:15:43.742 Using Cross Entropy Loss
/usr/local/lib/python3.5/dist-packages/encoding/nn/syncbn.py:149: EncodingDeprecationWarning: encoding.nn.BatchNorm2d is now deprecated in favor of encoding.nn.SyncBatchNorm.
.format('BatchNorm2d', SyncBatchNorm.name), EncodingDeprecationWarning)
Traceback (most recent call last):

File "train.py", line 380, in
main()
File "train.py", line 132, in main
net = network.get_net(args, criterion)
File "/data/code/GSCNN/network/init.py", line 12, in get_net
criterion=criterion, trunk=args.trunk)
File "/data/code/GSCNN/network/init.py", line 27, in get_model
net = net_func(num_classes=num_classes, trunk=trunk, criterion=criterion)
File "/data/code/GSCNN/network/gscnn.py", line 233, in init
self.gate1 = gsc.GatedSpatialConv2d(32, 32)
File "/data/code/GSCNN/my_functionals/GatedSpatialConv.py", line 36, in init
False, _pair(0), groups, bias, 'zeros')
TypeError: init() takes 11 positional arguments but 12 were given
`

I have tried my best to make sure all the necessary libraries are the right versions:
absl-py (0.8.0) astor (0.8.0) certifi (2019.6.16) chardet (3.0.4) cycler (0.10.0) decorator (4.4.0) gast (0.2.2) google-pasta (0.1.7) grpcio (1.23.0) h5py (2.9.0) idna (2.8) imageio (2.5.0) joblib (0.13.2) Keras-Applications (1.0.8) Keras-Preprocessing (1.1.0) kiwisolver (1.1.0) Markdown (3.1.1) matplotlib (3.0.3) networkx (2.3) nose (1.3.7) numpy (1.17.1) opencv-python (4.1.0.25) Pillow (6.1.0) pip (9.0.1) protobuf (3.9.1) pyparsing (2.4.2) python-dateutil (2.8.0) PyWavelets (1.0.3) PyYAML (5.1.2) requests (2.22.0) scikit-image (0.15.0) scikit-learn (0.21.3) scipy (1.1.0) setuptools (20.7.0) six (1.12.0) tensorboard (1.14.0) tensorboardX (1.8) tensorflow (1.14.0) tensorflow-estimator (1.14.0) termcolor (1.1.0) torch (1.0.0) torch-encoding (1.0.1) torchvision (0.2.0) tqdm (4.35.0) urllib3 (1.25.3) Werkzeug (0.15.5) wheel (0.29.0) wrapt (1.11.2)

Do you have any suggestions on how to go about fixing this?

Best regards,
Shreyas

Spatial derivative not being calculated in dual loss

GSCNN/my_functionals/custom_functional.py

Line 39 in 64f8487

return input, input

This is different from the paper and looks like a bug in the code.

ninja：build stopped：subcommand failed.

when i run python train.py there is a problem:ninja：build stopped：subcommand failed.
Why this problem happend?and how to fix it?

PyTorch (<= 1.0.0 ) ?

In the install instructions you mention PyTorch (<= 1.0.0 ). I have pytorch 1.1, so does it means it is going to fail for sure or that you did not test with version above 1.0?
If the former, is there a reason why such a recent paper/code does not run on pytorch version beyond 1.0?
Thanks.

I can't find the dataset for this code, could anyone share it with me?

encoding/lib/cpu/enclib_cpu.so: undefined symbol: _ZN3c106Symbol14fromQualStringERKSs

Hi, I met a package bug when trying to evaluate the method by 'python train.py --evaluate --snapshot checkpoints/best_cityscapes_checkpoint.pth'

The bug is :
Traceback (most recent call last):
File "train.py", line 380, in
main()
File "train.py", line 128, in main
assert_and_infer_cfg(args)
File "/home/fuyi02/vos/GSCNN/config.py", line 86, in assert_and_infer_cfg
import encoding
File "/home/fuyi02/anaconda3/envs/GSCNN/lib/python3.6/site-packages/encoding/init.py", line 13, in
from . import nn, functions, parallel, utils, models, datasets, transforms
File "/home/fuyi02/anaconda3/envs/GSCNN/lib/python3.6/site-packages/encoding/nn/init.py", line 12, in
from .encoding import *
File "/home/fuyi02/anaconda3/envs/GSCNN/lib/python3.6/site-packages/encoding/nn/encoding.py", line 18, in
from ..functions import scaled_l2, aggregate, pairwise_cosine
File "/home/fuyi02/anaconda3/envs/GSCNN/lib/python3.6/site-packages/encoding/functions/init.py", line 2, in
from .encoding import *
File "/home/fuyi02/anaconda3/envs/GSCNN/lib/python3.6/site-packages/encoding/functions/encoding.py", line 14, in
from .. import lib
File "/home/fuyi02/anaconda3/envs/GSCNN/lib/python3.6/site-packages/encoding/lib/init.py", line 15, in
], build_directory=cpu_path, verbose=False)
File "/home/fuyi02/anaconda3/envs/GSCNN/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 644, in load
is_python_module)
File "/home/fuyi02/anaconda3/envs/GSCNN/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 824, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/home/fuyi02/anaconda3/envs/GSCNN/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 971, in _import_module_from_library
return imp.load_module(module_name, file, path, description)
File "/home/fuyi02/anaconda3/envs/GSCNN/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/home/fuyi02/anaconda3/envs/GSCNN/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: /home/fuyi02/anaconda3/envs/GSCNN/lib/python3.6/site-packages/encoding/lib/cpu/enclib_cpu.so: undefined symbol: _ZN3c106Symbol14fromQualStringERKSs

It seems the unmatched packages in my conda env:
pytorch 1.1.0
python 3.6.9
torch-encoding 1.0.1
torchvision 0.2.0

So, is the torch encoding unmatched? Could you tell me how to fix it. Thank you!

How to change number of gpus

Hi, i want to know how to use multiple gpu for training with multiple batch size .. and validate with single batch size @tovacinni @varunjampani @davidjesusacu @ShreyasSkandanS

RuntimeError: copy_if failed to synchronize: device-side assert triggered

while training the program I am getting the following error
RuntimeError: copy_if failed to synchronize: device-side assert triggered.
dataset is cityscapes its training but while validating it gives the above error?
pytorch 1.0.0
cuda 10
@tovacinni @varunjampani @davidjesusacu

TypeError: init() takes 11 positional arguments but 12 were given

While trying the train.py file, I have encountered the Error "TypeError: init() takes 11 positional arguments but 12 were given". Here is the wrong code:
class GatedSpatialConv2d(_ConvNd):
super(GatedSpatialConv2d, self).__init__( in_channels, out_channels, kernel_size, stride, padding, dilation, False, _pair(0), groups, bias, 'zeros')

A small issue on the usage of "s1"