xuebinqin / dis Goto Github PK

View Code? Open in Web Editor NEW

2.1K 92.0 244.0 49.65 MB

This is the repo for our new project Highly Accurate Dichotomous Image Segmentation

License: Apache License 2.0

Python 3.00% Jupyter Notebook 97.00%

background-removal deep-learning dichotomous-image-segmentation computer-vision u-2-net

dis's Issues

Error in cache making

before training it make cache for each image and it's mask with folder structure shown in image:

But during training time it asks for image from cache folder having from location:

also cache is made in .pt format but during training and testing it asks png format

Weird results

Hello, thank you for the great project. I ran into a problem and would like to ask for help. After running inference on an image I got weird results - hands in mask have a very low value:

RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1

So I run inference.py using this image : https://drive.google.com/file/d/1YoyuXLSGhu8plA7OYMB7xrMyqV26W2oz/view?usp=sharing

but I got this runtime error: RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1

How to get dataset specific to some categories

I need dataset but only few categories so is there any location where I find them for DISK dataset?

pretrained weights for gt-encoder

First of all, nice work! Do you mind sharing the pretrained weights for the GT encoder. I did not see the download link for that. Thanks.

How do I make my own dataset mask？Looking forward to your reply

finding 3000 training units but still saying num_samples =0

This is amazing, but I'm having some trouble with DIS.

Sorry, i'm new at this. It's finding 3000 training units but still saying num_samples =0

Error:

d_inference_main.py
/home/jakko/.conda/envs/pytorch18/lib/python3.7/site-packages/torch/nn/reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
building model...
batch size: 8
--- create training dataloader ---
------------------------------ train --------------------------------
--->>> train dataset 0 / 1 DIS5K-TR <<<---
-im- DIS5K-TR /home/jakko/Pictures/DIS5K/DIS5K/DIS-TR/im : 3000
-gt- DIS5K-TR /home/jakko/Pictures/DIS5K/DIS5K/DIS-TR/gt : 3000
Traceback (most recent call last):
File "train_valid_inference_main.py", line 727, in
hypar=hypar)
File "train_valid_inference_main.py", line 541, in main
shuffle = True)
File "/home/jakko/Github/DIS/IS-Net/data_loader_cache.py", line 97, in create_dataloaders
gos_dataloaders.append(DataLoader(gos_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers))
File "/home/jakko/.conda/envs/pytorch18/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 266, in init
sampler = RandomSampler(dataset, generator=generator) # type: ignore
File "/home/jakko/.conda/envs/pytorch18/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 104, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

--------------- STEP 1: Configuring the Train, Valid and Test datasets ---------------

## configure the train, valid and inference datasets
train_datasets, valid_datasets = [], []
dataset_1, dataset_1 = {}, {}

dataset_tr = {"name": "DIS5K-TR",
             "im_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TR/im",
             "gt_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TR/gt",
             "im_ext": ".jpg",
             "gt_ext": ".png",
             "cache_dir":"../DIS5K-Cache/DIS-TR"}

dataset_vd = {"name": "DIS5K-VD",
             "im_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-VD/im",
             "gt_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-VD/gt",
             "im_ext": ".jpg",
             "gt_ext": ".png",
             "cache_dir":"../DIS5K-Cache/DIS-VD"}

dataset_te1 = {"name": "DIS5K-TE1",
             "im_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TE1/im",
             "gt_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TE1/gt",
             "im_ext": ".jpg",
             "gt_ext": ".png",
             "cache_dir":"../DIS5K-Cache/DIS-TE1"}

dataset_te2 = {"name": "DIS5K-TE2",
             "im_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TE2/im",
             "gt_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TE2/gt",
             "im_ext": ".jpg",
             "gt_ext": ".png",
             "cache_dir":"../DIS5K-Cache/DIS-TE2"}

dataset_te3 = {"name": "DIS5K-TE3",
             "im_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TE3/im",
             "gt_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TE3/gt",
             "im_ext": ".jpg",
             "gt_ext": ".png",
             "cache_dir":"../DIS5K-Cache/DIS-TE3"}

dataset_te4 = {"name": "DIS5K-TE4",
             "im_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TE4/im",
             "gt_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TE4/gt",
             "im_ext": ".jpg",
             "gt_ext": ".png",
             "cache_dir":"../DIS5K-Cache/DIS-TE4"}
### test your own dataset
dataset_demo = {"name": "your-dataset",
             "im_dir": "../your-dataset/im",
             "gt_dir": "",
             "im_ext": ".jpg",
             "gt_ext": "",
             "cache_dir":"../your-dataset/cache"}

train_datasets = [dataset_tr] ## users can create mutiple dictionary for setting a list of datasets as training set
# valid_datasets = [dataset_vd] ## users can create mutiple dictionary for setting a list of datasets as vaidation sets or inference sets
valid_datasets = [dataset_vd] # dataset_vd, dataset_te1, dataset_te2, dataset_te3, dataset_te4] # and hypar["mode"] = "valid" for inference,

### --------------- STEP 2: Configuring the hyperparamters for Training, validation and inferencing ---------------
hypar = {}

## -- 2.1. configure the model saving or restoring path --
hypar["mode"] = "train"
## "train": for training,
## "valid": for validation and inferening,
## in "valid" mode, it will calculate the accuracy as well as save the prediciton results into the "hypar["valid_out_dir"]", which shouldn't be ""
## otherwise only accuracy will be calculated and no predictions will be saved
hypar["interm_sup"] = False ## in-dicate if activate intermediate feature supervision

if hypar["mode"] == "train":
    hypar["valid_out_dir"] = "" ## for "train" model leave it as "", for "valid"("inference") mode: set it according to your local directory
    hypar["model_path"] ="/home/jakko/Github/DIS/saved_models/your_model_weights" ## model weights saving (or restoring) path
    hypar["restore_model"] = "" ## name of the segmentation model weights .pth for resume training process from last stop or for the inferencing
    hypar["start_ite"] = 0 ## start iteration for the training, can be changed to match the restored training process
    hypar["gt_encoder_model"] = ""
else: ## configure the segmentation output path and the to-be-used model weights path
    hypar["valid_out_dir"] = "../your-results/"##"../DIS5K-Results-test" ## output inferenced segmentation maps into this fold
    hypar["model_path"] = "/home/jakko/Github/DIS/saved_models/your_model_weights" ## load trained weights from this path
    hypar["restore_model"] = "isnet.pth"##"isnet.pth" ## name of the to-be-loaded weights

Transfer learning for multiple class segmentation

Hello, thank you for the amazing work @xuebinqin, I'm wondering that am I able to make multiple class segmentation with IS-net as I do with U-net. I have 12 classes and 12k total images.
Thanks.

Clarifying Inference by relying on your-dataset

It seems the code is ready to be used with the your-dataset directory thanks to dataset_demo. Might be easier for people only interested in "validating" with their own data (images) to rely on that. Also could be good to mention that the conda environment must be activated. Finally, and again that's mostly for newcomer who are not familiar with this pipeline, relying on anaconda and Docker can make the resting setup much easier. Anyway, thanks for the work, can't wait to try locally with V2.

Channel mismatch error when converting the code to do multi-class segmentation.

I have made changes to your code to do multiclass segmentation. I have taken care that labels are read right. I have already done this job successfully for your last masterpiece, U2NET. In ISNET, I am getting this error when giving a four-channel multiclass label.

RuntimeError: Given groups=1, weight of size [16, 1, 3, 3], expected input[1, 4, 256, 256] to have 1 channels, but got 4 channels instead

I faced the same error in U2Net, but then this code fix at line 354 and line 459 in https://github.com/xuebinqin/U-2-Net/blob/master/model/u2net.py solved it -

   -  self.outconv = nn.Conv2d(6,out_ch,1)
   + self.outconv = nn.Conv2d(6*out_ch,out_ch,1)

Please check if ISNet needs a similar fix.

Thankful for your incredible work.

End notes
I have set out_channel to 4
I am taking care of gt_preprocess and ensuring it returns a four-channel label.

Different results between HuggingFace demo and running it locally

I'm a bit confused. The ReadMe says "Notes: Currently, this model is the academic version that was trained with DIS V1.0" about HuggingFace. The inference section only mentions "the pre-trained weights" but no version. As I get different results, I wonder what is the version for both. Are the downloadable pretrained weights V1 or unoptimized V2?

keep getting this when run your code

keep getting this when run your code reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
0it [00:00, ?it/s]

Multi gpu training

How can we perform multi-gpu training? Should we use DataParallel class of pytorch, or is there a parameter in the code for it?

Weird result after 30 epochs

Hi guys! Please advice if you know what happens and did trained the network.
The problem is that: I train DIS with ade20k dataset with only floor category (0 value or 255 for the floor) and after more than 30 epochs (there are some 25000 imgs for an epopch) i got that result (please see screenshots).
I got my loss decreasing but it couldn't go below 3.0.
Who knows what's wrong? Thank you!

U2Net accuracy performs better then DIS on almost every image

Hey guys, loved the paper!

If I understand correctly, DIS should be the successor of U2Net. In the tests I've made seems like DIS delivers higher quality mask yet less accurate result when comparing it to U2Net.
I benchmarked both of them and used DIS V2 general use model.

Is this the expected behavior?

can everyone help me? when I use PIL save matting

   when I use PIL save matting ,I get a err!

   mask = (result*255).permute(1,2,0).cpu().data.numpy().astype(np.uint8)
    io.imsave(os.path.join(result_path,im_name+".png"),mask)

    pil_mask = Image.fromarray(mask).convert("L")
   ---------------------------------------------------- up get error
    im_rgb = Image.open(im_path).convert("RGB")

    im_rgba = im_rgb.copy()
    im_rgba.putalpha(pil_mask)
    im_rgba.show()

About training

Hello author, thank you very much for your work on DIS, I have a questions about the training of this paper that I would like to ask you:

After I overfitted ISNetGTEncoder, I was going to train ISNetDIS, but I found that the loss for  fs and dfs feature layer is very large, what was your initial loss at that time?

Instructions

Can you please create better detailed instructions in the README on how to perform inference and training? I am running into many errors. If you do this, I would contribute a Docker container for better reproducibility for this.

hx = self.pool_in(hxin) hx is not in use?

class ISNetDIS(nn.Module):

In this class, the hx variable is not used, is it a code error?

Difference between isnet.pth and isnet-general-use.pth

Thanks for your great work.

Assuming that the dataset is the same, I'd like to know what are the training differences between isnet.pth and isnet-general-use.pth, as the latter works much better than the first.
Thanks

data release inquiry

Hi, xuebinqin.

I have been following your great work since U-2-Net, I really appreciate it and learn a lot from it

I just found out you are about to release your DIS repo.

I was wondering when you plan to release it and how you got your segmentation data using GIMP.

Can you let me know what was the procedure to get the segmentation data?? did you use specific GIMP-plugin (U-2-NET) or guided-segmentation?

I am actually planning to get high-quality segmentation data and wondering how you figured out.

Thanks in advance.

Implementation of [muti_loss_fusion_max] Missing in isnet.py

HI @xuebinqin ,

I hope you are doing well. Anyway, you did great work indeed. However, one issue was raised. The implementation of the [muti_loss_fusion_max] is missing at line 435. Could you please tell me the reason and the way to solve it? Please find attached.

name 'muti_loss_fusion' is not defined

If we keep hypar["interm_sup"] = False, then model fails to start training saying that name 'muti_loss_fusion' is not defined which is actually not defined in the code. It helped to replace loss2, loss = muti_loss_fusion(ds, labels_v) to net.compute_loss(ds, labels_v). But I dont know is it completely correct.

Discrepancy between paper and implementation of maxpool

In paper it is being said that:
"The input convolution layer is set as a plain convolution layer with a kernel size of 3×3 and stride of 2. Given an input image with a shape of 1024×1024×3, the input convolution layer first transforms it to a feature map 512×512×64 and this feature map is then directly fed to the original U2-Net, where the input channel is changed to 64 correspondingly".

However in implementation:

self.conv_in = nn.Conv2d(in_ch,64,3,stride=2,padding=1)
self.pool_in = nn.MaxPool2d(2,stride=2,ceil_mode=True)

Implying that if image of 1024X1024 is input than, U2Net architecture will receive 256X256 input, in return producing outputs for this reduced dimension since architecture is symmetrical. Am I right about this and if yes, what would be the impact on detail of results?

Where can i download dataset 2.0?

The link you tagged on DIS dataset V2.0 is just a page reload (or just links to DIS github page).

Please check this issue,
and give me an update with it.

Thanks

Is the Pnum added in the image-name ?

I was wondering if the complexity of the mask (PNum) was in the image name.

Thanks for the quality of this datasets,

Best regards,

DIS by general use model is pretty good!

Hi,
I have adapted DIS by general use model for my iPhone App, ClipEdge.

I imagine that without V2, the learning is not yet optimized for human and animals, but contrary to my expectations, it recognizes human and animals in a good way too.
Does this mean that as long as they are recognized as the central object, there is no problem?

I made a Colab quick inference demo.

Hello, thank you for the great work (also U2-net).
I made quick inference demo of IS-Net.

https://colab.research.google.com/drive/1PVDn3o3Ni2ZAeKpuqphdSR8-sYJ478vn?usp=sharing

And CoreML version for iOS.
https://github.com/john-rocky/CoreML-Models#is-net

Thank you.

Is it possible for the model to work in co-salient mode ?

I have a bunch of images that I wish to do smart salient on them that only salients the same object from all the images.

Can something like this happen using this model ?

Hosting pretrained model weights on Hugging Face

Hi there! Would there be any interest in hosting the pretrained model weights on the Hugging Face Hub? That way they would be discoverable for our users, and it would also make it easier for people to use the weights. You could add all kinds of info for the model, like in this one https://huggingface.co/bigscience/bloom.

We have documentation for doing this (https://huggingface.co/docs/hub/models-uploading), but I'm also more than happy to help out!

Mark error

Hi, xuebinqin.

I have been following your great work since U-2-Net, I really appreciate it and learn a lot from it.

When reading 'Highly Accurate Dichotomous Image Segmentation', the 'one click needed' here should correspond to figure (c), which you marked as figure (b) in the paper.

I apologize if I misunderstood.

How much time does it take to train one class on some a consumer GPU (1080 Ti or 3080)?

Please indicate how much time does it take to train DIS approximately on a consumer GPU?

a bit confused about the theory

Hi,

I understand you try to encode GT into high-dimension features using a model F_gt, but I just confused about the formula for GT encoder in the paper:
argmin(BCE(F_gt(theta, GT), GT))
...basically, you try to train a GT encoder which can minimize the BCE loss between each channel of feature maps and GT.

but my question is, why we need to build a model and encode GT? We can simply repeat GT for K times to simulate those feature maps than your feature map encoding, and you just supervise your U2net as usual?

because what you are doing now is: firstly minimize ||GT-F_gt_encoder||, then minimize ||I_img_encoder - F_gt_encoder||
but to me ||I_img_encoder - repeated_GT|| is doing the same thing?

Besides, a minor question...do you have any LR scheduling strategy, or you use 1e-3 throughout all the training stages...just to confirm since I cannot find the info in your paper nor code..

Thanks and best regards

converted to onnx and tensorrt

Good paper，I wrote two inference demos, onnxtuntime and tensorrt，go to https://github.com/xuanandsix/DIS-onnxruntime-and-tensorrt-demo。

Memory leakage during training

after running the train_valid_inference_main.py, the storage size in the PC decreases. That means memory leakage is happening. To prevent it, I tried the following two lines in the mentioned script but it didn't seem to be working.

gc.collect()

torch.cuda.empty_cache()

S.N. BatchSize = 2 & code running on PyCharm

GT_Encoder f1_score always 0

I tried overfitting on fewer images [24] to see if issue is real, f1 error array and hence np.amax(f1) both are coming out to be 0. Any solutions?

V2 ETA?

Great project. When will you approximately release the v2 dataset?

About the preparation of the DIS dataset and the next research topic.

@xuebinqin Hii, do you have any chance to tell a bit about how you prepared and annotated the data V1 and V2?

For the last 10 days, I have been annotating some high res data for my validation dataset. I got 66 images annotated carefully. I worked on pre-predicted masks and fixed them. Thus, the process was easier for me. Although I did a little work, the burden was huge.

So, how did you manage to generate or annotate the data? It doesn't look like any artificial (or rendered) data to me.

Also, do you have any more ongoing research on image segmentation topic? What's next?

Thank you!

advice on how to apply masks to remove bg from images

Thanks so much for the model!
I'm new to image processing, and I'm not sure what the best approach is to applying the mask to an image to remove the background. I've been using cv2, and while the masks are great, the edges are a bit jagged in the final images. Is there a best practice for this step? Any advice would be greatly appreciated!

traning on multi-objects images [ more than one object ]

Hi Xuebin Qin,
Great work. Thanks.
Questions 1: Is it ok to build model/train on multi-object images? or it has to be images with one object each?

Question 2: Training on Windows 10 failed message:
DIS\IS-Net>python train_valid_inference_main.py

building model...
batch size: 8
--- create training dataloader ---
------------------------------ train --------------------------------
--->>> train dataset 0 / 1 DIS5K-TR <<<---
-im- DIS5K-TR ../DIS5K/DIS-TR/im : 959
-gt- DIS5K-TR ../DIS5K/DIS-TR/gt : 959
Traceback (most recent call last):
File "train_valid_inference_main.py", line 722, in
main(train_datasets,
File "train_valid_inference_main.py", line 527, in main
train_dataloaders, train_datasets = create_dataloaders(train_nm_im_gt_list,
File DIS\IS-Net\data_loader_cache.py", line 95, in create_dataloaders
gos_dataloaders.append(DataLoader(gos_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers_))
File "Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 268, in init
sampler = RandomSampler(dataset, generator=generator)
File "Anaconda3\lib\site-packages\torch\utils\data\sampler.py", line 102, in init
raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

Thanks,
Gabew

Training with DUTS Datasets

Hi, I've been training the model with DUTS datasets.

As you have mentioned in the paper, DUTS Images are relatively smaller sized than DIS5K Images.

I have customized layers as mentioned, but outputs are not satisfying.

My question is..
How can I improve output in such small sized datasets?

Also, if DIS is trained with 4K high resolution datasets, what happens when test with lower resolution than 4K?

Thanks.

Tutorial on making my own dataset

I am looking forward to your data set production method. When will a general tutorial be released? Thank you very much.

#16

Last time said to be out, for a long time have not been out, in recent days to plan a tutorial，I look forward to your tutorial on making datasets

Cannot load

Whenever trying to load the new model I am getting:

--> 764     magic_number = pickle_module.load(f, **pickle_load_args)
    765     if magic_number != MAGIC_NUMBER:
    766         raise RuntimeError("Invalid magic number; corrupt file?")

UnpicklingError: invalid load key, '<'.

Been trying with either python 3.7 and newer versions, and different pytorch versions.

GT_Encoder validated against training dataset

Hi,
looking at train_valid_inference_main.py (lines 131-132), GT_Encoder training is validated against training dataset instead of validation dataset (that has been commented out). Why have you make this choice?
Thanks

docker and ideally cog support

Hi, relying on Docker and based on an continuumio/anaconda3 image I run DIS and was able (for fun) to make a little desktop utility https://twitter.com/utopiah/status/1554097779013713923 to grab images from anywhere, like videos I'm watching.

Anyway this was a rather straightforward process as I'm relatively familiar with Docker but this morning I discovered https://github.com/replicate/cog which seems to address specifically the challenge of reproducibility in ML. It could be interesting to consider support for it.

Question about the logic behind GT mask encoder

Hi @xuebinqin ,

I'm having issues understanding the logic behind the GT mask encoder. Generally, self-supervision encoders do not have skip connections between encoding and decoding layers. Again typically, the aim is to have a bottleneck at the middle part, have the model learn the abstract meaningful concepts, and get the most meaningful information from it. However, in ISNET model, you used an encoder consisting of RSU blocks which is like:

I believe there is a chance that the GT encoder can pass all the information from the input to the output without compressing or processing any high-level info since there are skip connections. I see that this was not the case in your training.

May I ask why this was not the case?
How did you develop this idea even though I think there is a chance that it wouldn't do any good at all?
Finally, why do you think this increased the performance of the network? Is it all about forcing the model to learn more stuff instead of learning one map?

I just wonder about your thoughts on this which is very important to me. Thank you!

Questions regarding GT encoder

Hi,

Thanks for the interesting work.

Just reviewing the paper here, I've got several questions regarding GT encoder.

You described GT encoder to be self-supervised -- Did you mean this as implementing auto-encoder?
In Figure 5(b), the depicted Ground Truth Encoder has only the encoder part -- does this mean that I only need to train the encoder part (not decoder), targeting the GT?
Again in Figure 5(b), do I need to do extra upsampling to make the result of the encoder to have the same size as input? (For clarity, if I put 3X1024X1024 as input, after the green conv layer, the input reshapes into 16X512X512. And after going through EN_1, it will reshape to 64X512X512 and upsample to 3X512X512 (Assuming the upsampling used for u2net is used here in the same manner). Now, the question is how can we compare the upsampled result of EN_1 (3X512X512) and original input (3X1024X1024) in BCE loss calculation?

-- One thought I had was temporarily adding extra upsampling layers for encoders while training the GT-encoder and remove those upsampling layers once I freeze the weights for GT-encoder. Would this be a viable option or did you mean something else?

Thanks in advance :)

Ideal Image size for inference

Hi,

Thanks for this great work and for uploading the trained weights.

I'm using your pretrained model for inference and I don't really get the purpose of the cache_size parameter.

I figured it means resizing the image berfore inference and this could be usefull for running on big images with low GPU memory.

The thing is for some smaller images the results look better when upscaling them..
In this example of size 450x450 the first (bad) results are when leaving the cache_size parameter blank (no resize?)
and the second (Good) results are when using cache_size=[1024,1024]

Can you explain the purpose of this parameter?

Instance Segmentation

Is it possible to perform instance segmentation rather than dichotomous segmentation with similar architecture?

Model not learning

Hi,

Firstly, thanks for the release.

I've tried on my custom dataset, but the model is not learning and loss taking insane high value.
In the end of 40k epochs training got finished due to early stopping & I got :- Train Loss =-7823, Val_Loss= -46939

This is the result:-

In my training dataset, there are around 7k images for vehicles. I trained without using the pre-trained model.

Any Suggestions? Thanks in Advance

xuebinqin / dis Goto Github PK

dis's Issues

Error:

--------------- STEP 1: Configuring the Train, Valid and Test datasets ---------------

gc.collect()

torch.cuda.empty_cache()

Recommend Projects

Recommend Topics

Recommend Org