fyu / dilation Goto Github PK

View Code? Open in Web Editor NEW

773.0 773.0 266.0 3.99 MB

Dilated Convolution for Semantic Image Segmentation

Home Page: https://www.vis.xyz/pub/dilation

License: MIT License

Python 98.55% Shell 1.45%

dilation's People

Stargazers

Watchers

Forkers

wanjinchang tfwu txd866 codeaudit ml-lab zchengquan godprincelp peratham tybxiaobao benjamesbabala westamine kermit5 gohweixun jjpark7 timo-hab daidengxin caomw bordesf zhixinshu ch977 wucpmark ethanhe42 neopenx ziyu-zhang davyneven wwwanghao fanghaizhao ahojnnes vzvzx elviswf aliscifp vseledkin guoyilin ablavatski mlzxy githubfragments ilovecv miaoyuanyuan fireae selcuksandikci mave5 bitvoyager barongeng pchank chienyiwang gaopeng-eugene nrupatunga yhkim8412 ten2net kimkilho kindziora kaca0083 janceelee manuelschmidt ice-pice stoneyang-cv zhaishengfu chelovekhe jianweilin lukeandshuo livst guominyingxiongququ krishnatarun sandbox3aster storife coderx7 liviust cubicasa leliaonvidia xychen9459 yuzhile mingensiie fcyeh chankent nagyist gninnur chenbangfeng lijia-xing wangtingc bityangke soonminhwang happyzhouch ginobilinie cc13ny sarah20187 felicia126 jinming-su ydnaandy123 dreadlord1984 jxlin shansiliu daobinhuang xllau segmentationorg vashishtmadhavan hxkwindwizard jhoffman fatcamera hongdoki-sualab unsky

dilation's Issues

Random Crops during training

Hello,

How hard would it be to change the random cropping used during training to a more structured one(e.g using every crop without an overlap or using some predefined overlap)?

an error occurs when make pycaffe

the python code should import caffe
but when I make pycaffe, an error occurs:

rsync -a --include '/' --include '.py' --exclude '*'
python/caffe/ build_master_release/python/caffe

What' s the problem?
I cloned the caffe from your link fyu/caffe-dilation

I am trying to reproduce the results for the cityscapes dataset. I am now at the joint training and in the paper says the crop size was 1396x1396 px (half image + label margin)(batch size =1). Surprisingly, this exceeds 12 GB GPU memory, so i cant start the training. 1180x1180px is the max value, already running it in virtual console.

Can that be related to the cudnn version? I tried version 2, 3, 4, 5 and 5.1 and could not observe a different memory requirement. In all cases 1180x1180px were the max.
Thats kind of strange, isnt it? I would suspect a difference when using different cudnn versions.
Could you please give me some advice? Thank you in advance!

Training loss

Hi,

Could you tell me about the loss during training?
When I use your code to train front-end model, my loss is about 2、3 in the first 15 iteration.
After that, my loss increase to 50 ~ 80 and be stabled.
After 20K iterations, it is still about 60~80.
I'm not sure the correctness of this situation....
Could you tell me this situation is normal or not?
What loss is correct?
(My training/testing input images are all original and I don't change anything in the train.py.)

Thanks a lot,
Mao

five years later,a question

Hi, I find this nice work recently,and i want to train it on pytorch, should i do? Do you have pytorch version? thank you very much.

about the scale

As your paper, input 512x512 images, the output is 64x64. But which is not according the train.prototxt. So for the training, the input is also 900x900? Thanks.

Get label images for evaluation

Hi Fisher

Thanks for that great repository! Could you still tell me, how to convert the color images to label images, where each pixel has an ID that represents the ground truth label? Is there already a script? Thanks in advance!

Best
Timo

train context

when train context, have a error

F0714 16:04:01.983191 3176 bin_label_data_layer.cpp:100] Check failed: count == shape_size (2328 vs. 1179258880)
*** Check failure stack trace: ***
@ 0x7f793354f5cd google::LogMessage::Fail()
@ 0x7f7933551433 google::LogMessage::SendToLog()
@ 0x7f793354f15b google::LogMessage::Flush()
@ 0x7f7933551e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f7933bf4e14 (anonymous namespace)::ReadImage()
@ 0x7f7933bf7931 caffe::BinLabelDataLayer<>::DataLayerSetUp()
@ 0x7f7933c552f3 caffe::BasePrefetchingDataLayer<>::LayerSetUp()
@ 0x7f7933b93f4f caffe::Net<>::Init()
@ 0x7f7933b95a81 caffe::Net<>::Net()
@ 0x7f7933d0df2a caffe::Solver<>::InitTrainNet()
@ 0x7f7933d0e497 caffe::Solver<>::Init()
@ 0x7f7933d0e82a caffe::Solver<>::Solver()
@ 0x7f7933d314c3 caffe::Creator_SGDSolver<>()
@ 0x40ad89 train()
@ 0x407704 main
@ 0x7f7932003830 __libc_start_main
@ 0x407eb9 _start
@ (nil) (unknown)

Training front-end on Kitti dataset

Hello,

I am trying to train the front-end module on the Kitti dataset: http://adas.cvc.uab.es/s2uad/

I am using the following settings: 628x628 random cropping, reflection padding, 4 mini-batch size (can't fit 8 into memory), 0.001 learning rate, 0.99 momentum, SGD.

The problem is that my test loss stays around 9 without changing and the train loss just jumps between 30 and 50. The test accuracy is 0. I tried increasing the iter_size without any success. Any idea what I might be doing wrong?

variable image sizes with cityscapes model

Hi,

What should be the zoom value for cityscapes model to get predictions for image sizes other than 2048x1024? Currently zoom = 1, which doesn't allow predictions for other image sizes.

thanks,
Anil

Difference between dilation8 and dilation10

Hi Fisher

I riddle about the various input shape in the prototxt file. In dilation 10 it says:

input_shape {
  dim: 1
  dim: 3
  dim: 1396
  dim: 1396
}

What exactly does that lines say? Should that lines match to the size of the input image? Thats the size of a sliding window, doesnt it?

My second question relates to the margin_label value of 186. Thats the value of reflection padding, right? Where does this value come from and why do we need it?
It would be cool, if you could bring light in the darkness. :)

Best,
Timo

trainning error:AttributeError: 'module' object has no attribute 'ImageLabelDataParameter'

when I run the train.py ,I get this error:
Traceback (most recent call last):
File "train.py", line 251, in
main()
File "train.py", line 235, in main
train_net, test_net = make_nets(options)
File "train.py", line 112, in make_nets
train_net = make_net(options, True)
File "train.py", line 108, in make_net
return globals()['make_' + options.model](options, is_training)
File "train.py", line 55, in make_frontend_vgg
is_training, options.crop_size, options.mean)
File "/home/eric118/git/dilation/network.py", line 25, in make_image_label_data
padding=P.ImageLabelData.REFLECT,
File "/home/eric118/git/dilation/caffe-master/python/caffe/net_spec.py", line 220, in getattr
return getattr(getattr(caffe_pb2, name + 'Parameter'), param_name)
AttributeError: 'module' object has no attribute 'ImageLabelDataParameter'
How can I fix it? Thank you very much!

Mean IoU Kitti dataset

Hello,

When I evaluate the mean IoU on the test set of the Kitti dataset http://adas.cvc.uab.es/s2uad/?page_id=11, I get lower numbers than the one mentioned in the paper. I am using the the Dilation7 model. What could be the reason? Is there any mean IoU code which is available?

Could you implement 3D dilated convolution?

Hello, I have scheduled to use your work in 3D data. Do you have the schedule to implement 3D dilated convolution in caffe? I read some PR about 3D convolution but they did not consider 3D dilated convolution
Good job!

Runtime measurement

Hi Fisher

I have measured the runtime of dilation10 with the pretrained model by following command:

/home/timo/caffe/build/tools/caffe time -gpu 0 -model /home/timo/dilation/models/dilation10_cityscapes_deploy.prototxt -iterations 10

For the measurement I use a Titan X and I get an average forward pass of 1058ms.
The input shape of the dilation10_cityscapes_deploy.prototxt is:

input: "data"
input_shape {
  dim: 1
  dim: 3
  dim: 1396
  dim: 1396
}

In the Cityscapes benchmark it says the runtime is 4s. So the question is, how did you measured it? Makes my measurement the results better as they are?

Btw. if I change the input_shape to:

input: "data"
input_shape {
  dim: 1
  dim: 3
  dim: 500
  dim: 500
}

I get a runtime of only 53 ms!
On FCN8s I get despite the same setting 134ms, although the model should be faster if compare the runtime on cityscapes benchmark.
https://www.cityscapes-dataset.com/benchmarks/#pixel-level-results

Best,
Timo

Check failed: outer_num_ * inner_num_ == bottom[1]->count()

I want to train the context module, but get the following error:

F0911 01:01:41.267956 15432 softmax_loss_layer.cpp:47] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (3276800 vs. 435600) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N_H_W, with integer values in {0, 1, ..., C-1}.

I followed the documentation "training.md". First, i had trained the front-end module and than generated the .bin files from test.py (and the feats.txt).
I use the following to start the training:

python train.py context \
--train_image /home/timo/dilation/feat/train/feats.txt \
--train_label /home/timo/Cityscapes/gtFine/train/train_city_gt.txt \
--test_image /home/timo/dilation/feat/val/feats.txt \
--test_label /home/timo/Cityscapes/gtFine/val/val_city_gt.txt \
--train_batch 100 \
--test_batch 10 \
--caffe /home/timo/dilation/caffe-dilation/build_master_release/tools/caffe \
--classes 19 \
--layers 10 \
--label_shape 66 66
--lr 0.0001
--momentum 0.99

I am grateful for every tip.

Have a look at the results

im very curious about this repository:)Can anyone share the results with your own images with this algorithm?

How can I train on my own dataset?

I want to train the model with my own dataset.
I'm confused about the input and output forms of the image data.
Could you help me?
Could you please share the training prototxt files and the solver configuration?
Thanks.

Some questions in "multi-scale context aggregation by dilated convolution"

what does "p" mean in equation (1)
The first clause under equation (3) is very obscure, can you use some simple clauses explain it?
what does "t" mean in equation(4), and what does equation(4) mean, especially it's right expression

Thank you very much

How can I get frontend_vgg prototxt?

I found there is pretrained model of frontend_vgg.
However there is not prototxt for it.

I saw some code snippets in train.py and test.py which produce prototxt for it.
But I couldn't use them since I've failed to build the caffe-dilation.
So I couldn't make prototxt for frontend_vgg right now.
Is there any way to get it?

What to put inside of training/testing image/label text files?

I'm training for my own dataset, but not quite sure what to put in training/testing image/label text files. As far as I understood, the contents as follows:
train_image: <the list of paths of the original images>
train_label: <the list of paths of the images that is inversed in black and white where I want them to detect as the area (the correct, expected result)>
test_image: <the list of paths of the images I want to test>
test_label: <?>

What to put in the test_label?
Also, please correct me if I'm wrong.

Training files

Thanks for sharing your pre-trained models. Could you also share your training prototxt files and the solver configuration, so that we can easily try to reproduce and/or use your architecture on other datasets ?

unable to download pretrained weights

Seems like the link is down http://vobj.cs.princeton.edu/models/dilation10_cityscapes.caffemodel

Front-end domain adaptation code

Hello,

The front-end module is used in the paper "FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation". Do you know when the .prototxt files used in the paper will be made available?

How to reduce number of class on this model?

Thank you very much for very interesting code.
@fyu Can you give me some advice about how to reduce number of class for this model?

How can we use 3D dilated Convolution in Lasagne

I found both Keras and Lasagne have no define about 3D dilated convolution. So, How can I use this advanced tech via those tools.

How much GPU Memory is required/recommended to run the demo?

How much GPU memory is required/recommended to run the demo? I am trying to run the demo on my Nvidia Jetson TX1 with 4 GB of RAM and the program terminates ("Killed") and/or reboots the machine.

cuda 8
cudnn 6
ubuntu 16

#1:
python predict.py pascal_voc images/dog.jpg --gpu 0

#2:
python predict.py kitti images/example_kitti.png --gpu 0

nvidia@tegra-ubuntu:~/cviz/dilation$ python predict.py kitti images/example_kitti.png --gpu 0
I0213 00:39:05.451731 2439 gpu_memory.cpp:159] GPUMemory::Manager initialized with Caching (CUB) GPU Allocator
I0213 00:39:05.451974 2439 gpu_memory.cpp:161] Total memory: 4174815232, Free: 1888354304, dev_info[0]: total=4174815232 free=1888354304
I0213 00:39:05.452177 2439 gpu_memory.cpp:159] GPUMemory::Manager initialized with Caching (CUB) GPU Allocator
I0213 00:39:05.452195 2439 gpu_memory.cpp:161] Total memory: 4174815232, Free: 1888354304, dev_info[0]: total=4174815232 free=1888354304
Using GPU 0
I0213 00:39:05.463042 2439 upgrade_proto.cpp:66] Attempting to upgrade input file specified using deprecated input fields: models/dilation7_kitti_deploy.prototxt
I0213 00:39:05.463099 2439 upgrade_proto.cpp:69] Successfully upgraded file specified using deprecated input fields.
W0213 00:39:05.463116 2439 upgrade_proto.cpp:71] Note that future Caffe releases will only support input layers and not input fields.
I0213 00:39:05.463681 2439 net.cpp:70] Initializing net from parameters:
state {

ResNet as a front-end

Hi,

Do you think that the vgg16 can be replaced with ResNet? Would the idea of dilation work well?

//

CPU Training and Net architecture

-How can i launch the Context Train on CPU ?
-And how can i launch my training on my edited network architecture ? As it seems for train.py file takes only the weights and i have to trick it by generating the arch weights on other module and input them here.

questions about the initiate of the contextual module

dear fyu，
where can I find how the “identity” isdefined？
I want to rebuild it using Keras, and I wish I can use your code for reference

Input shape and deconvolution usage

Recently I am running your prediction code which is the predict.py, during the time, i found that you are not using the whole image as input, just a part of the image. you pad the original image with reflection and divide it into several parts, the confusing thing is that the adjacent parts overlap with each other, but their prediction just concatenate directly, which is really confusing.

In addition, when I read the prototxts, I notice that you add a deconvolution layer for upsampling in dilation10_cityscapes_deploy.prototxt, but not in other 3 prototxts, just interpolate the final class prediction. Why do you deal with them differently?

I am looking forward for your reply, thank you

Unknown layer type: Input

When running python predict.py cityscapes input_image -o output_image have the same error message as reported in caffe repository layer_factory.hpp:81 [...] Unknown layer type: Input. Can you point me to right direction? What does it mean? Version inconsistencies? I'm using caffe-dilation. I also tried to change my make file as described on stackoverflow but it breaks the makefile (this suggestion is from 2015 anyway...)

how to set the dilation?

Hi,
I want to use the ResNet50 with dilation, and I don't know which layer's dilation parameter should be added. Is there any suggestion for me?

Thanks.

apply_dilaton_conv_to_image_classification_such_as_imagenet

hi，Thanks for your sharing!

The dilation is used for dense prediction,such as Semantic Segmentation. I have a naive idea, can we apply the dilation conv to the image classification task，such as imagenet?

Do you know the work about this? Do you think the dilation will work for image classification?Is it worthy trying?

Thanks for your kindly help and nice work!

Also available in Tensorflow

See: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/nn.md#tfnnatrous_conv2dvalue-filters-rate-padding-namenone-atrous_conv2d

How to change the input image size?

Hi,

if I change the image size of the input images, I get:

color_image = dataset.palette[prediction.ravel()].reshape(image_size)

ValueError: total size of new array must be unchanged

My input image has a resolution of 800x600x3 = 1440000
The length of color_image after "color_image = dataset.palette[prediction.ravel()]" is 1024x1024=1048576.

1048576 != 1440000. Thats the reason for the error, but how to fix it? Thank you in advance!

How do you use the Cityscape for training?

I use your front-end code to train on the Cityscape dataset, I notice that you add a upsampling layer in dilation10_cityscapes_deploy.prototx, but the front-end_vgg_train_net.prototxt that I have generated don't have this layer, can you tell me the difference?

I am looking forward for your reply, thank you

there is no _caffe.so after make pycaffe

hi,

When I

make all
make test
make pycaffe

there is no _caffe.so in caffe-dilation/python/caffe

and I try the predict.py
there is an error
ImportError: libcaffe.so.1.0.0-rc3: cannot open shared object file: No such file or directory

I have already added the build_master/python into my PYTHONPATH

Does anyone know how to solve this problem?

Cheers,
Mao

ReadProtoFromBinaryFile problem

hello fyu,
I am using code for some tests. I download the model following your instructions.But where I run the
python predict.py images/dog.jpg --gpu 0
I always meet this problem.

upgrade_proto.cpp:86] Check failed: ReadProtoFromBinaryFile(param_file, param) Failed to parse NetParameter file: ./pretrained/dilated_convolution_context_coco.caffemodel

So, is there something wrong with you caffemodel? Or there exist some problem else?

Thank you~

Image and label mismatch while training

When I am training dilation frontend network with ADE20K dataset. The procedure log shows that the network got a mismatched pair of image and label. Here is the log:
`
I0426 15:44:08.319962 160840 io.cpp:84] /home/xiaosong/data/ADE20K_2016_07_26/images/training/s/skyscraper/ADE_train_00016452.jpg
I0426 15:44:08.320272 160840 io.cpp:84] /home/xiaosong/data/ADE20K_2016_07_26/output_train_label/ADE_train_00016452_seg.png
I0426 15:44:08.324337 160840 io.cpp:84] /home/xiaosong/data/ADE20K_2016_07_26/images/training/s/staircase/ADE_train_00016812.jpg
I0426 15:44:08.324854 160840 io.cpp:84] /home/xiaosong/data/ADE20K_2016_07_26/output_train_label/ADE_train_00016812_seg.png
I0426 15:44:10.132612 160840 io.cpp:84] /home/xiaosong/data/ADE20K_2016_07_26/images/training/misc/ADE_train_00012355.jpg
I0426 15:44:10.132908 160840 io.cpp:84] /home/xiaosong/data/ADE20K_2016_07_26/output_train_label/ADE_train_00013626_seg.png
I0426 15:44:10.171540 160840 io.cpp:84] /home/xiaosong/data/ADE20K_2016_07_26/images/training/misc/ADE_train_00012355.jpg
I0426 15:44:10.171807 160840 io.cpp:84] /home/xiaosong/data/ADE20K_2016_07_26/output_train_label/ADE_train_00013626_seg.png
OpenCV Error: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat, file /opt/packages/opencv-2.4.13/modules/core/src/matrix.cpp, line 323
terminate called after throwing an instance of 'cv::Exception'
what(): /opt/packages/opencv-2.4.13/modules/core/src/matrix.cpp:323: error: (-215) 0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows in function Mat

*** Aborted at 1493192650 (unix time) try "date -d @ 1493192650" if you are using GNU date ***
PC: @ 0x7f9b8b3c71d7 __GI_raise
*** SIGABRT (@0x3eb00027417) received by PID 160791 (TID 0x7f9b762fa700) from PID 160791; stack trace: ***
@ 0x7f9b8b762370 (unknown)
@ 0x7f9b8b3c71d7 __GI_raise
@ 0x7f9b8b3c88c8 __GI_abort
@ 0x7f9b8e13fab5 (unknown)
@ 0x7f9b8e13da26 (unknown)
@ 0x7f9b8e13da53 (unknown)
@ 0x7f9b8e13dc73 (unknown)
@ 0x7f9b8f363170 cv::error()
@ 0x7f9b8f2d689d cv::Mat::Mat()
@ 0x7f9b97f3713b caffe::DataTransformer<>::Transform()
@ 0x7f9b97fc7637 caffe::ImageLabelDataLayer<>::load_batch()
@ 0x7f9b97f636e9 caffe::BasePrefetchingDataLayer<>::InternalThreadEntry()
@ 0x7f9b97f3fcf0 caffe::InternalThread::entry()
@ 0x7f9b8e5f827a (unknown)
@ 0x7f9b8b75adc5 start_thread
@ 0x7f9b8b48973d __clone
`
Notice that the last two inputs, the image and the corresponding label, are not of a training pair. Besides, the problem occurred randomly on different images input.

Can someone help me with this?
Thank you.

Error: Caffe.convolutionParameter has no field named dilation

Hi, I followed your guideline on the page: downloaded the newest caffe, ran "make", and ran "pycaffe", everything worked fine, but when I tried to test out the code, I'm getting the following error. I checked around on Google and another guy suggested that dilation is a new thing and updating caffe to the newest version should help, but I think I'm getting the newest already. Any suggestion would help! Thanks!

"cudaSuccess (2 vs. 0) out of memory" on GTX Titan

Hi Fisher,
since great results have been reported by using Dilated CNN, I'm thinking to use it as the segmentation engine for my research. After reading your paper, today I started to play around with your code. Strangely, I kept getting the out of memory errors when tried the predict script. After checking the closed issues, I changed the Caffe back to the commit 08c5df but got the same errors.

I tried camvid, kitti and cityscapes. On camvid it worked, but not on the other two. Since GTX has 12GB memory, "out of memory" seems very weird to me. Is there any hint from your side?

Does anybody else get the similar errors?

cheers
Rui

Dealing with unlabeled pixels in KITTI dataset

Hello Fisher,

First of all great work and thank you for making your work publicly available. I have read your ICLR paper and have been trying to reproduce your results for the KITTI dataset. So far I have been able to train Front-end module with the recommended training parameters in your paper. I also trained the context module for KITTI dataset.

I have used the dataset from the following link:

http://adas.cvc.uab.es/s2uad/?page_id=11

This dataset provides RGB images and corresponding ground-truth label images. There are 11 semantic classes each represented by a specific color. What I noticed is there are some image pixels whose color is black (i.e. RGB = [0,0,0]) and I suppose these are "void" regions and don't belong to any of the 11 classes. I treated these void regions as a new class, making total number of classes 12 during training.

After training is complete, I compared segmentation results of my trained models to your pretrained Kitti Dilation-7 model on the validation set. Indeed they produce similar results except the "void" regions. Your pretrained model doesn't produce any "void" class, whereas mine does. Obviously you don't have this "void" class in your experiments, and you handled it somehow.

How did you handle these "void" regions in the KITTI dataset? More generally, is it possible to ignore specific classes during training in your algorithm? If so, could you please describe how to achieve that?

Thanks in advance,

Selcuk

Upconvolution layer at the end of Dilated10 ?

Hi there,

First of all, great work ! :)

I noticed there is an upconvolution layer at the end of Dilated10, (see here).

Correct me if I'm wrong but I don't remember seeing this mentioned in the paper. Is this the model which led to the results presented in the paper ? Or perhaps it is a new one ? If so, could you kindly advise on whether you kept the same training procedure ?
Thanks ! Cheers,

Pauline

Learnable interpolation for upsampling

Do you think it will be beneficial to include a learnable upsampling interpolation during training? Right now, a basic interpolation is used during prediction.

Relu problem

I download the CityScapesDataset dataset and run the code but I get an error

in init
pretrained=pretrained, num_classes=1000)
in drn_c_26
model = DRN(BasicBlock, [1, 1, 2, 2, 2, 2, 1, 1], arch='C', **kwargs)
in init
self.relu = nn.ReLU(inplace=False)
init_
super(ReLU, self).init(0, 0, inplace)

TypeError: super(type, obj): obj must be an instance or subtype of type

Label images for the KITTI dataset

Hello,

How can I transform the ground truth images of the KITTI dataset to label images suitable for training?

fyu / dilation Goto Github PK

dilation's People

Stargazers

Watchers

Forkers

dilation's Issues

Recommend Projects

Recommend Topics

Recommend Org