hkchengrex / cascadepsp Goto Github PK

View Code? Open in Web Editor NEW

812.0 16.0 92.0 3.19 MB

[CVPR 2020] CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement

Home Page: https://hkchengrex.com/CascadePSP/

License: MIT License

Python 100.00%

segmentation deep-learning pytorch cvpr2020 computer-vision segmentation-refinement refinement-network high-resolution

cascadepsp's People

Contributors

Stargazers

Watchers

Forkers

edgscout zebrajack yangsenwxy templeblock aihgf chisyliu songkq ohheysherry66 suyanzhou626 ntan-vnu youtang1993 cv-ip shuangyumelody wwxj zzzhoudj xrosliang dididixu bruinxiong hiyyg jdc08161063 mikeswf lxtgh wangyxxjtu zumbalamambo freegliboracle lloydsun827 zzx0836 min-juan wjlee-barco summer-mzh mmmuuuuua hust-wayne lqwrl542293 ketan4373 crystraldo yangshiyu89 cccccatv gersonvneto jingwang960108 panfei748 segcv zhaoyk1986 wr19960001 sue-up cvlinks dreamer121121 1152545264 agentfpga ahwxz123 naveenkumarmulabitmovin windaway xiongzhu666 wz940216 assassingq liufqing jorjiang pengpanda hetianle killsking schernhe linxinqiang90 waqarahmed89 zxh-007 zhiyoujingtian jasonfinisher123 stawendl puppetcq mirsadeghi zx50814558 iremddemir flyingdog-huang cv-seg hrehfeld pjpure liuleiohhh hadysysdev sckim0430 gg-big-org jemo-mh shiyoung77 dreamplayer-zhang peterzs lyclyc52 seungback bysowhat oncowycho dharmikjagodana hadryan pedrocabrera kexul

cascadepsp's Issues

Questions about the function process_high_res_im

Hi,

may I ask why such kind of threshold can be used to define the not interesting area? And what if the object is relatively small? Thanks a lot.

# Skip when it is not an interesting crop anyway
seg_part_norm = (seg_224_part>0).float()
high_thres = 0.9
low_thres = 0.1
if (seg_part_norm.mean() > high_thres) or (seg_part_norm.mean() < low_thres):
        continue
grid_images = safe_forward(model, im_part, seg_224_part, seg_56_part)
grid_pred_224 = grid_images['pred_224'].to(aggre_device)

what does the ms and ss mean in your data online？

What is the ground truth?

I have two classes. so i design the ground truth that has 0 for background and 1 for the other class. Am i right, or another way for ground truth?

How to make training model for paper?

Hi, Great work. After I read the paper, I have a question about model. Your code has just a Global Module. But I think the training process is shown in the picture below. Am I right？

Cannot execute the test refinement

Hey guys,

first of all, thanks for your great work.

I just tried to reproduce the test example (aeroplane), but received the following error:

     66             image = self.im_transform(image).unsqueeze(0).to(self.device)
---> 67             mask = self.seg_transform((mask>127).astype(np.uint8)*255).unsqueeze(0).to(self.device)
     68             if len(mask.shape) < 4:
     69                 mask = mask.unsqueeze(0)

~/Documents/Programming/VirtualEnvironments/python3_venv/lib/python3.7/site-packages/torchvision/transforms/transforms.py in __call__(self, img)
     47     def __call__(self, img):
     48         for t in self.transforms:
---> 49             img = t(img)
     50         return img
     51 

~/Documents/Programming/VirtualEnvironments/python3_venv/lib/python3.7/site-packages/torchvision/transforms/transforms.py in __call__(self, pic)
     74             Tensor: Converted image.
     75         """
---> 76         return F.to_tensor(pic)
     77 
     78     def __repr__(self):

~/Documents/Programming/VirtualEnvironments/python3_venv/lib/python3.7/site-packages/torchvision/transforms/functional.py in to_tensor(pic)
     46     if isinstance(pic, np.ndarray):
     47         # handle numpy array
---> 48         img = torch.from_numpy(pic.transpose((2, 0, 1)))
     49         # backward compatibility
     50         if isinstance(img, torch.ByteTensor):

ValueError: axes don't match array

I run the following:

refiner = refine.Refiner(device='cpu')

image = cv2.imread('cascade/aeroplane.jpg')
mask = cv2.imread('cascade/aeroplane.png', cv2.IMREAD_GRAYSCALE)

output = refiner.refine(image, mask, fast=True, L=900)

Tried it with both torch-1.0.0 torchvision-0.2.1 and the newest versions, but always get the same error. Is this a known issue?

hello,when i want to use SceneParsing how i process individual components?

Effect of hyperparameter L

Hello, I try to testing with the same L = 900.
I got the result like an original image. Then, I try to decrease L, I got the result but the looks like mixing with the original image.
Why its happen?

The expanded size of the tensor (654) must match the existing size (675) at non-singleton dimension 2

p_im[:,:,0:ph,0:pw] = im

RuntimeError: The expanded size of the tensor (654) must match the existing size (675) at non-singleton dimension 2. Target sizes: [1, 3, 654, 900]. Tensor sizes: [3, 675, 900].

input size: 3 * 640 * 480
mask size; 176 * 128

Can you help me with the problem, thanks

Confirm the training is correct

Training my own dataset，one epoch result as below:
It 1450 [TRAIN] [grad_loss ]: 0.0108861
It 1450 [TRAIN] [iou/orig_i ]: 15341.660
It 1450 [TRAIN] [iou/orig_u ]: 30589.000
It 1450 [TRAIN] [iou/new_i_224 ]: 0.0000000
It 1450 [TRAIN] [iou/new_u_224 ]: 17783.060
It 1450 [TRAIN] [iou/new_i_56 ]: 0.0000000
It 1450 [TRAIN] [iou/new_u_56 ]: 17783.060
It 1450 [TRAIN] [iou/new_i_28 ]: 0.0000000
It 1450 [TRAIN] [iou/new_u_28 ]: 17783.060
It 1450 [TRAIN] [iou/new_i_28_2 ]: 0.0000000
It 1450 [TRAIN] [iou/new_u_28_2 ]: 17783.060
It 1450 [TRAIN] [iou/new_i_28_3 ]: 0.0000000
It 1450 [TRAIN] [iou/new_u_28_3 ]: 17783.060
It 1450 [TRAIN] [iou/new_i_56_2 ]: 0.0000000
It 1450 [TRAIN] [iou/new_u_56_2 ]: 17783.060
It 1450 [TRAIN] [total_loss ]: 0.4961829
It 1450 [TRAIN] [ce_loss/s_0 ]: 0.3328005
It 1450 [TRAIN] [l1_loss/s_0 ]: 0.0295411
It 1450 [TRAIN] [l2_loss/s_0 ]: 0.0295335
It 1450 [TRAIN] [loss/s_0 ]: 0.1135052
It 1450 [TRAIN] [ce_loss/s_1 ]: 0.0864096
It 1450 [TRAIN] [l1_loss/s_1 ]: 0.0507903
It 1450 [TRAIN] [l2_loss/s_1 ]: 0.0253867
It 1450 [TRAIN] [loss/s_1 ]: 0.0864096
It 1450 [TRAIN] [ce_loss/s_2 ]: 0.0875118
It 1450 [TRAIN] [l1_loss/s_2 ]: 0.0458868
It 1450 [TRAIN] [l2_loss/s_2 ]: 0.0255962
It 1450 [TRAIN] [loss/s_2 ]: 0.0616267
It 1450 [TRAIN] [ce_loss/s_3 ]: 0.0865094
It 1450 [TRAIN] [l1_loss/s_3 ]: 0.0508352
It 1450 [TRAIN] [l2_loss/s_3 ]: 0.0253933
It 1450 [TRAIN] [loss/s_3 ]: 0.0865094
It 1450 [TRAIN] [ce_loss/s_4 ]: 0.0865065
It 1450 [TRAIN] [l1_loss/s_4 ]: 0.0508407
It 1450 [TRAIN] [l2_loss/s_4 ]: 0.0253932
It 1450 [TRAIN] [loss/s_4 ]: 0.0865065
It 1450 [TRAIN] [ce_loss/s_5 ]: 0.0875015
It 1450 [TRAIN] [l1_loss/s_5 ]: 0.0459030
It 1450 [TRAIN] [l2_loss/s_5 ]: 0.0255950
It 1450 [TRAIN] [loss/s_5 ]: 0.0616252
It 1450 [TRAIN] [iou/orig_iou ]: 0.5015417
It 1450 [TRAIN] [iou/new_iou_224 ]: 0.0000000
It 1450 [TRAIN] [iou/iou_gain_224 ]: -0.501541
It 1450 [TRAIN] [iou/new_iou_56 ]: 0.0000000
It 1450 [TRAIN] [iou/iou_gain_56 ]: -0.501541
It 1450 [TRAIN] [iou/new_iou_28 ]: 0.0000000
It 1450 [TRAIN] [iou/iou_gain_28 ]: -0.501541
It 1450 [TRAIN] [iou/new_iou_28_2 ]: 0.0000000
It 1450 [TRAIN] [iou/iou_gain_28_2 ]: -0.501541
It 1450 [TRAIN] [iou/new_iou_28_3 ]: 0.0000000
It 1450 [TRAIN] [iou/iou_gain_28_3 ]: -0.501541
It 1450 [TRAIN] [iou/new_iou_56_2 ]: 0.0000000
It 1450 [TRAIN] [iou/iou_gain_56_2 ]: -0.501541
I am not sure if it is normal, waiting for your reply.

OnlineDataset Issues

perturb=False flag is broken (self.bilinear_dual_transform_im is missing)
Much more worrying: the code seems to be applying horizontal flip independently for ground truth and label. With 50% chance, that means the ground truth is corrupted and no longer aligns with the rgb image.

Performance issue on green screen segmented Image

I wanted to make these green edges go away, by refinement. Is this possible by CascadePSP? I used the code given for testing as it is keeping the value of L=900. Is something else needed?

run demo, there is a bug: Failed to establish a new connection: [Errno 101] Network is unreachable.

Downloading the model file into: /home/dell/.segmentation-refinement/model...
How to get model ?thanks

how to get the name_seg.png when testing on semantic segmentation ?

Sorry to bother you agin.
1.When i testing on semantic segmentation use the BIG datasets you provided, i found that datasets haven't the name_seg.png picture as you described in the readme. I want to know how to get that style of the seg.png picture.
2.Your paper mentioned the crop process in Local step, in your code i don't find that process. Whether this process output the seg.png picture ?

Thanks for your time and kindness.

Dataset problem

Hello, author! My dataset is segmented using labelme, generates a JSON file, and a picture is segmented into classes. Can you train this network? Can there be multiple categories of segmentation results for a single image? If not, how can I make changes to my annotated image?

can you tell me where the refinement module in code？

AttributeError: 'NoneType' object has no attribute 'group'

hello ,when I want to run
python eval_post.py --dir /home/zj/PycharmProjects/CascadePSP-master/CascadePSP-master/output_directory --output /home/zj/PycharmProjects/CascadePSP-master/CascadePSP-master/output_temp_result
,occur a broke :

File "eval_post.py", line 64, in <module>
    this_class = int(re.search(r'\d+', gt_name[::-1]).group()[::-1]) - 1
AttributeError: 'NoneType' object has no attribute 'group'

pytorch version?

Hi! Awesome work!

It would be useful to know what exact versions were used with this repo. Do you think you can add that to the readme / requirements file?

Does r_inter_tanh_s8 = torch.tanh(r_inter_s8) matter as for the intermediate segmentation?

Hi @hkchengrex ,

When the intermediate segmentations as inputs, i.e. inter_s8 and inters_4, are fed into the latter cascade level, the tanh operation is necessary to normalize inter-segmentation within range[-1, 1]?

I wonder if it can be replaced with sigmoid operation? Does it have a significant effect on the performance?

How to get the refined mask file

In the example code, I learned how to get the masked images. However I don't see how to get the refined mask file. Could you please tell me use which function to get the refined mask file?

How to use "squeezenet" as backend to train model??

I can run with resnet，but too slow ,so how to use squeezenet???
I meet error:
RuntimeError: Given groups=1, weight of size [64, 3, 3, 3], expected input[8, 6, 224, 224] to have 3 channels, but got 6 channels instead

Got quite strange result with chessboard-like on it!

image = cv2.imread('/mnt/work/yuyanpeng/code/personal/PaddleSeg/deploy/python/dataset/360/doll/image/00_26.jpg')
mask = cv2.imread(
    '/mnt/work/yuyanpeng/code/personal/PaddleSeg/deploy/python/dataset/360/doll/mask/00_26.jpg', cv2.IMREAD_GRAYSCALE)
    
# model_path can also be specified here
# This step takes some time to load the model
refiner = refine.Refiner(device='cuda:0', model_folder="../downloaded_model") # device can also be 'cpu'

# Fast - Global step only.
# Smaller L -> Less memory usage; faster in fast mode.
output = refiner.refine(image, mask, fast=False, L=800) 

plt.imshow(output)
plt.show()

Following is the result.

TypeError: img should be PIL Image. Got <class 'tuple'>

Hi,
This work is very impressive!
I always report this error when running train.py.

How to train the global step and local step respectively?

Hi,

your work is really impressive. However, I don't seem to see how the global step and local step are trained respectively. Could you please help me figure out this problem? Thank you so much!

Best,
Frank

onedrive can't work

hi, there is something wrong with onedrive when we use it in china. So, can you upload something like model in google drive，thanks！
(作者你好呀，这个onedrive打开之后出现1s就消失了，不知道是不是在内地的原因，您方便的话可以上传到谷歌drive或者百度网盘什么的吗，谢谢啦)

How to generate Binary Masks from labelled dataset?

Sir,

As you reannotate the PASCAL VOC 2012 dataset, can you please tell me how can we generate binary masks from my dataset labelled using LabelMe Image Annotator in python (it generates JSON files), so that I can train this model on my custom dataset. I am doing Instance segmentation and labelled the data according to this.

Please provide that code.

Thanks

testing segmentation

sorry, I having training this model by using VOC2012 dataset and I want to test my model on val dataset.
But I don't know why test directory should include three types images, that is _im.jpg, _seg.png and _gt.png.
I have known that ground truth images and RGB images, but what is input segmentation images?
How to produce it?

Something about ResNet50

Hi,

may I ask what is the purpose to use dilation in your code? and how did you add the zero-initialized channels to the first conv. I cannot find it.

Thank you.

Hi，can you illustrate gt.png,im.png,mask.png,seg.png,hre.png,what does these mean?

Some train problems

What does the "some_unique_id" mean in the training command "some_unique_id"?
I saw the training images you provided. Is it only training with low-resolution images?

Question about the transformation details

Hi @hkchengrex,
Actually, I still have some question about the transformation details. May I ask what is your purpose to normalize the 'seg' with mean 0.5 and std 0.5? Shouldn't 'seg' be a mask that only has value 0 and 1? And I am not very clear why you use torch.tanh in your network. Are there some advantages? Thank you so much!

Long response time with the demo code

Hi, I tried the following demo code in both my CPU only local machine and SageMaker but seems I have never got a response:

import cv2
import time
import matplotlib.pyplot as plt
import segmentation_refinement as refine
image = cv2.imread('test/aeroplane.jpg')
mask = cv2.imread('test/aeroplane.png', cv2.IMREAD_GRAYSCALE)

# model_path can also be specified here
# This step takes some time to load the model
refiner = refine.Refiner(device='cpu') # device can also be 'cpu'

# Fast - Global step only.
# Smaller L -> Less memory usage; faster in fast mode.
output = refiner.refine(image, mask, fast=False, L=900) 

plt.imshow(output)
plt.show()

I had a successful case with using my 2070 GPU which response in 3s. Do you know what is taking that long for response in CPU-only machine? If I am using the pre-trained model only, is there a way to improve without losing accuracy? Thank you very much, it's a fantastic project!

CascadePSP performed bad on small object

It seems cascadePSP's performance is not good enough when dealing with small objects, the following is the before and after.

Boundary_accuracy

Hello,
I have a questions about the boundary accuracy metric implementation.
If I understand well you take the subset of the segmentation maps around the GT boundaries for various radiis.
But why using the accuracy metric since it won't go lower than 0.5 even if the model predicts nothing? That seems misleading to address boundary quality.

I suggest processing the dilated boundaries of GTs and PREDs and computing IoU or Dice on those for each radius.
This will range from 0. to 1. for no overlap to perfect overlap.

I think this is the way it's computed in the Edge Detection community, except they take F1 score.

How to Convert code to C++

Hi, I really like your code, your code works particularly well on our model, but now we need to deploy the model to TensorRT in C++, although a Pytorch model like pspnet can be converted very quickly, but how can a method like process_high_res_im be written with the api in TensorRT (C++), thanks a lot!

Some code branches in model and refinement are dead. Is it a mistake?

I noticed that inter_s4 here https://github.com/hkchengrex/CascadePSP/blob/master/segmentation-refinement/segmentation_refinement/eval_helper.py#L13 and corresponding one here https://github.com/hkchengrex/CascadePSP/blob/master/models/psp/pspnet.py#L92 are never used in evaluation.

Are they redundant? Or it is a mistake? Or it was used only for some earlier experiments?

How to train on my own data?

Thanks for your amazing work. It is very useful to me! However, how can I train a model on my own data? I browsed training.md but still did not grasp the key points. Do I have to download the training data you provided in training.md? Thanks for your help.

About sample rate in boundary_modification.py

Thanks for your work. I try to understand why you choose small value as sample rate, you choose the contour that the shape is more than ten to modify the contour. I thought big sample rate can help contain more information in a specify contour. So can you explain the reason why you choose 0.1 as the sample rate?

Perturbed labels

Dear author:
Is there any data perturbation code in this package? I didn't find it. Can you send me a copy? Thank you very much!

ValueError: axes don't match array

If you run test.py from segmentation refinement package source you get an error ValueError: axes don't match array thrown at this line:

mask = self.seg_transform((mask>127).astype(np.uint8)*255).unsqueeze(0).to(self.device)

Probably important note: before running python test.py I've changed cuda:0 to cpu at the line when you create a Refiner as I tested it on a machine without CUDA.

ResNet modifications

Could you please clarify what modifications are made in ResNet50 backbone and why?

Usually ResNet50 output stride is 32, but in your version it is 8 (due to dilations in last layers i think).
Also i noticed that layer3 in your code will generate 23 blocks instead of default 6

hello，i have a question .when you train,do you have the pspnet output(seg) before? if not,how can you get the input seg

hello,i train your data,but when i want to test the model,AttributeError: 'NoneType' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

issue of reproducing performance

Hi,

This work is very impressive!
I run this model using the released checkpoint, and the performance is very good. However when I tried to train a new model, I am not able to get the same performance. Could you please tell me which hyper parameters are used to get the released model's performance?

I have tried 2 settings:

default setting in hyper_para.py.
batch_size=9, lr=3.0e-4, iterations=60000, steps=30000, 2gpus, as indicated in the paper.

Thank you very much!

The reason of using r_inter_tanh_s8 = torch.tanh(r_inter_s8)

I read your code,in pspnet.py,line107:

r_inter_tanh_s8 = torch.tanh(r_inter_s8)

I can't find any explainments about this in your paper.Want to know why we need torch.tanh instead of relu or sigmoid?

Explain Perturbation principle, Thanks

Hello, Can you explain Perturbation principle， It is import to train PSPnet.

Test effect of own data

My own data is about the semantic segmentation of two categories. This algorithm is only useful for object segmentation. It is not effective for region semantic segmentation？

DUT-OMRON Link is not working

Hi, @hkchengrex
DUT-OMRON Link (http://saliencydetection.net/duts/download/) is not working, I wonder if you could provide a google drive link for downloading the dataset? Many thanks.

os.system("wget -P ../tmp_download_files http://saliencydetection.net/duts/download/DUTS-TR.zip")
os.system("wget -P ../tmp_download_files http://saliencydetection.net/duts/download/DUTS-TE.zip")

About Loss Calculation

Hi,

may I ask what is the advantage of using L1+L2 loss for supervision at high resolution?

download model error

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='docs.google.com', port=443): Max retries exceeded with url: /uc?export=download&id=103nLN1JQCs2yASkna0HqfioYZO7MA_J9 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001C46728F860>: Failed to establish a new connection: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应，连接尝试失败。',))

Purpose of input padding

Could you please tell what is the purpose of this padding https://github.com/hkchengrex/CascadePSP/blob/master/segmentation-refinement/segmentation_refinement/eval_helper.py#L18 ?

As far as i know convolutions are invariant to input size