xuebinqin / dis Goto Github PK
View Code? Open in Web Editor NEWThis is the repo for our new project Highly Accurate Dichotomous Image Segmentation
License: Apache License 2.0
This is the repo for our new project Highly Accurate Dichotomous Image Segmentation
License: Apache License 2.0
So I run inference.py using this image : https://drive.google.com/file/d/1YoyuXLSGhu8plA7OYMB7xrMyqV26W2oz/view?usp=sharing
but I got this runtime error: RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1
I need dataset but only few categories so is there any location where I find them for DISK dataset?
First of all, nice work! Do you mind sharing the pretrained weights for the GT encoder. I did not see the download link for that. Thanks.
This is amazing, but I'm having some trouble with DIS.
Sorry, i'm new at this. It's finding 3000 training units but still saying num_samples =0
d_inference_main.py
/home/jakko/.conda/envs/pytorch18/lib/python3.7/site-packages/torch/nn/reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
building model...
batch size: 8
--- create training dataloader ---
------------------------------ train --------------------------------
--->>> train dataset 0 / 1 DIS5K-TR <<<---
-im- DIS5K-TR /home/jakko/Pictures/DIS5K/DIS5K/DIS-TR/im : 3000
-gt- DIS5K-TR /home/jakko/Pictures/DIS5K/DIS5K/DIS-TR/gt : 3000
Traceback (most recent call last):
File "train_valid_inference_main.py", line 727, in
hypar=hypar)
File "train_valid_inference_main.py", line 541, in main
shuffle = True)
File "/home/jakko/Github/DIS/IS-Net/data_loader_cache.py", line 97, in create_dataloaders
gos_dataloaders.append(DataLoader(gos_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers))
File "/home/jakko/.conda/envs/pytorch18/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 266, in init
sampler = RandomSampler(dataset, generator=generator) # type: ignore
File "/home/jakko/.conda/envs/pytorch18/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 104, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0
## configure the train, valid and inference datasets
train_datasets, valid_datasets = [], []
dataset_1, dataset_1 = {}, {}
dataset_tr = {"name": "DIS5K-TR",
"im_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TR/im",
"gt_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TR/gt",
"im_ext": ".jpg",
"gt_ext": ".png",
"cache_dir":"../DIS5K-Cache/DIS-TR"}
dataset_vd = {"name": "DIS5K-VD",
"im_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-VD/im",
"gt_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-VD/gt",
"im_ext": ".jpg",
"gt_ext": ".png",
"cache_dir":"../DIS5K-Cache/DIS-VD"}
dataset_te1 = {"name": "DIS5K-TE1",
"im_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TE1/im",
"gt_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TE1/gt",
"im_ext": ".jpg",
"gt_ext": ".png",
"cache_dir":"../DIS5K-Cache/DIS-TE1"}
dataset_te2 = {"name": "DIS5K-TE2",
"im_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TE2/im",
"gt_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TE2/gt",
"im_ext": ".jpg",
"gt_ext": ".png",
"cache_dir":"../DIS5K-Cache/DIS-TE2"}
dataset_te3 = {"name": "DIS5K-TE3",
"im_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TE3/im",
"gt_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TE3/gt",
"im_ext": ".jpg",
"gt_ext": ".png",
"cache_dir":"../DIS5K-Cache/DIS-TE3"}
dataset_te4 = {"name": "DIS5K-TE4",
"im_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TE4/im",
"gt_dir": "/home/jakko/Pictures/DIS5K/DIS5K/DIS-TE4/gt",
"im_ext": ".jpg",
"gt_ext": ".png",
"cache_dir":"../DIS5K-Cache/DIS-TE4"}
### test your own dataset
dataset_demo = {"name": "your-dataset",
"im_dir": "../your-dataset/im",
"gt_dir": "",
"im_ext": ".jpg",
"gt_ext": "",
"cache_dir":"../your-dataset/cache"}
train_datasets = [dataset_tr] ## users can create mutiple dictionary for setting a list of datasets as training set
# valid_datasets = [dataset_vd] ## users can create mutiple dictionary for setting a list of datasets as vaidation sets or inference sets
valid_datasets = [dataset_vd] # dataset_vd, dataset_te1, dataset_te2, dataset_te3, dataset_te4] # and hypar["mode"] = "valid" for inference,
### --------------- STEP 2: Configuring the hyperparamters for Training, validation and inferencing ---------------
hypar = {}
## -- 2.1. configure the model saving or restoring path --
hypar["mode"] = "train"
## "train": for training,
## "valid": for validation and inferening,
## in "valid" mode, it will calculate the accuracy as well as save the prediciton results into the "hypar["valid_out_dir"]", which shouldn't be ""
## otherwise only accuracy will be calculated and no predictions will be saved
hypar["interm_sup"] = False ## in-dicate if activate intermediate feature supervision
if hypar["mode"] == "train":
hypar["valid_out_dir"] = "" ## for "train" model leave it as "", for "valid"("inference") mode: set it according to your local directory
hypar["model_path"] ="/home/jakko/Github/DIS/saved_models/your_model_weights" ## model weights saving (or restoring) path
hypar["restore_model"] = "" ## name of the segmentation model weights .pth for resume training process from last stop or for the inferencing
hypar["start_ite"] = 0 ## start iteration for the training, can be changed to match the restored training process
hypar["gt_encoder_model"] = ""
else: ## configure the segmentation output path and the to-be-used model weights path
hypar["valid_out_dir"] = "../your-results/"##"../DIS5K-Results-test" ## output inferenced segmentation maps into this fold
hypar["model_path"] = "/home/jakko/Github/DIS/saved_models/your_model_weights" ## load trained weights from this path
hypar["restore_model"] = "isnet.pth"##"isnet.pth" ## name of the to-be-loaded weights
Hello, thank you for the amazing work @xuebinqin, I'm wondering that am I able to make multiple class segmentation with IS-net as I do with U-net. I have 12 classes and 12k total images.
Thanks.
It seems the code is ready to be used with the your-dataset
directory thanks to dataset_demo
. Might be easier for people only interested in "validating" with their own data (images) to rely on that. Also could be good to mention that the conda
environment must be activated. Finally, and again that's mostly for newcomer who are not familiar with this pipeline, relying on anaconda and Docker can make the resting setup much easier. Anyway, thanks for the work, can't wait to try locally with V2.
I have made changes to your code to do multiclass segmentation. I have taken care that labels are read right. I have already done this job successfully for your last masterpiece, U2NET. In ISNET, I am getting this error when giving a four-channel multiclass label.
RuntimeError: Given groups=1, weight of size [16, 1, 3, 3], expected input[1, 4, 256, 256] to have 1 channels, but got 4 channels instead
I faced the same error in U2Net, but then this code fix at line 354 and line 459 in https://github.com/xuebinqin/U-2-Net/blob/master/model/u2net.py solved it -
- self.outconv = nn.Conv2d(6,out_ch,1)
+ self.outconv = nn.Conv2d(6*out_ch,out_ch,1)
Please check if ISNet needs a similar fix.
Thankful for your incredible work.
End notes
I have set out_channel to 4
I am taking care of gt_preprocess and ensuring it returns a four-channel label.
I'm a bit confused. The ReadMe says "Notes: Currently, this model is the academic version that was trained with DIS V1.0" about HuggingFace. The inference section only mentions "the pre-trained weights" but no version. As I get different results, I wonder what is the version for both. Are the downloadable pretrained weights V1 or unoptimized V2?
keep getting this when run your code reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
0it [00:00, ?it/s]
How can we perform multi-gpu training? Should we use DataParallel class of pytorch, or is there a parameter in the code for it?
Hi guys! Please advice if you know what happens and did trained the network.
The problem is that: I train DIS with ade20k dataset with only floor category (0 value or 255 for the floor) and after more than 30 epochs (there are some 25000 imgs for an epopch) i got that result (please see screenshots).
I got my loss decreasing but it couldn't go below 3.0.
Who knows what's wrong? Thank you!
Hey guys, loved the paper!
If I understand correctly, DIS should be the successor of U2Net. In the tests I've made seems like DIS delivers higher quality mask yet less accurate result when comparing it to U2Net.
I benchmarked both of them and used DIS V2 general use model.
Is this the expected behavior?
when I use PIL save matting ,I get a err!
mask = (result*255).permute(1,2,0).cpu().data.numpy().astype(np.uint8)
io.imsave(os.path.join(result_path,im_name+".png"),mask)
pil_mask = Image.fromarray(mask).convert("L")
---------------------------------------------------- up get error
im_rgb = Image.open(im_path).convert("RGB")
im_rgba = im_rgb.copy()
im_rgba.putalpha(pil_mask)
im_rgba.show()
Hello author, thank you very much for your work on DIS, I have a questions about the training of this paper that I would like to ask you:
After I overfitted ISNetGTEncoder, I was going to train ISNetDIS, but I found that the loss for fs and dfs feature layer is very large, what was your initial loss at that time?
Can you please create better detailed instructions in the README on how to perform inference and training? I am running into many errors. If you do this, I would contribute a Docker container for better reproducibility for this.
class ISNetDIS(nn.Module):
In this class, the hx variable is not used, is it a code error?
Thanks for your great work.
Assuming that the dataset is the same, I'd like to know what are the training differences between isnet.pth and isnet-general-use.pth, as the latter works much better than the first.
Thanks
Hi, xuebinqin.
I have been following your great work since U-2-Net, I really appreciate it and learn a lot from it
I just found out you are about to release your DIS repo.
I was wondering when you plan to release it and how you got your segmentation data using GIMP.
Can you let me know what was the procedure to get the segmentation data?? did you use specific GIMP-plugin (U-2-NET) or guided-segmentation?
I am actually planning to get high-quality segmentation data and wondering how you figured out.
Thanks in advance.
HI @xuebinqin ,
I hope you are doing well. Anyway, you did great work indeed. However, one issue was raised. The implementation of the [muti_loss_fusion_max] is missing at line 435. Could you please tell me the reason and the way to solve it? Please find attached.
If we keep hypar["interm_sup"] = False, then model fails to start training saying that name 'muti_loss_fusion' is not defined which is actually not defined in the code. It helped to replace loss2, loss = muti_loss_fusion(ds, labels_v) to net.compute_loss(ds, labels_v). But I dont know is it completely correct.
In paper it is being said that:
"The input convolution layer is set as a plain convolution layer with a kernel size of 3×3 and stride of 2. Given an input image with a shape of 1024×1024×3, the input convolution layer first transforms it to a feature map 512×512×64 and this feature map is then directly fed to the original U2-Net, where the input channel is changed to 64 correspondingly".
However in implementation:
self.conv_in = nn.Conv2d(in_ch,64,3,stride=2,padding=1)
self.pool_in = nn.MaxPool2d(2,stride=2,ceil_mode=True)
Implying that if image of 1024X1024 is input than, U2Net architecture will receive 256X256 input, in return producing outputs for this reduced dimension since architecture is symmetrical. Am I right about this and if yes, what would be the impact on detail of results?
The link you tagged on DIS dataset V2.0 is just a page reload (or just links to DIS github page).
Please check this issue,
and give me an update with it.
Thanks
I was wondering if the complexity of the mask (PNum) was in the image name.
Thanks for the quality of this datasets,
Best regards,
Hi,
I have adapted DIS by general use model for my iPhone App, ClipEdge.
I imagine that without V2, the learning is not yet optimized for human and animals, but contrary to my expectations, it recognizes human and animals in a good way too.
Does this mean that as long as they are recognized as the central object, there is no problem?
Hello, thank you for the great work (also U2-net).
I made quick inference demo of IS-Net.
https://colab.research.google.com/drive/1PVDn3o3Ni2ZAeKpuqphdSR8-sYJ478vn?usp=sharing
And CoreML version for iOS.
https://github.com/john-rocky/CoreML-Models#is-net
Thank you.
I have a bunch of images that I wish to do smart salient on them that only salients the same object from all the images.
Can something like this happen using this model ?
Hi there! Would there be any interest in hosting the pretrained model weights on the Hugging Face Hub? That way they would be discoverable for our users, and it would also make it easier for people to use the weights. You could add all kinds of info for the model, like in this one https://huggingface.co/bigscience/bloom.
We have documentation for doing this (https://huggingface.co/docs/hub/models-uploading), but I'm also more than happy to help out!
Hi, xuebinqin.
I have been following your great work since U-2-Net, I really appreciate it and learn a lot from it.
When reading 'Highly Accurate Dichotomous Image Segmentation', the 'one click needed' here should correspond to figure (c), which you marked as figure (b) in the paper.
I apologize if I misunderstood.
Please indicate how much time does it take to train DIS approximately on a consumer GPU?
Hi,
I understand you try to encode GT into high-dimension features using a model F_gt, but I just confused about the formula for GT encoder in the paper:
argmin(BCE(F_gt(theta, GT), GT))
...basically, you try to train a GT encoder which can minimize the BCE loss between each channel of feature maps and GT.
but my question is, why we need to build a model and encode GT? We can simply repeat GT for K times to simulate those feature maps than your feature map encoding, and you just supervise your U2net as usual?
because what you are doing now is: firstly minimize ||GT-F_gt_encoder||, then minimize ||I_img_encoder - F_gt_encoder||
but to me ||I_img_encoder - repeated_GT|| is doing the same thing?
Besides, a minor question...do you have any LR scheduling strategy, or you use 1e-3 throughout all the training stages...just to confirm since I cannot find the info in your paper nor code..
Thanks and best regards
Good paper,I wrote two inference demos, onnxtuntime and tensorrt,go to https://github.com/xuanandsix/DIS-onnxruntime-and-tensorrt-demo。
after running the train_valid_inference_main.py, the storage size in the PC decreases. That means memory leakage is happening. To prevent it, I tried the following two lines in the mentioned script but it didn't seem to be working.
S.N. BatchSize = 2 & code running on PyCharm
I tried overfitting on fewer images [24] to see if issue is real, f1 error array and hence np.amax(f1) both are coming out to be 0. Any solutions?
Great project. When will you approximately release the v2 dataset?
@xuebinqin Hii, do you have any chance to tell a bit about how you prepared and annotated the data V1 and V2?
For the last 10 days, I have been annotating some high res data for my validation dataset. I got 66 images annotated carefully. I worked on pre-predicted masks and fixed them. Thus, the process was easier for me. Although I did a little work, the burden was huge.
So, how did you manage to generate or annotate the data? It doesn't look like any artificial (or rendered) data to me.
Also, do you have any more ongoing research on image segmentation topic? What's next?
Thank you!
Thanks so much for the model!
I'm new to image processing, and I'm not sure what the best approach is to applying the mask to an image to remove the background. I've been using cv2, and while the masks are great, the edges are a bit jagged in the final images. Is there a best practice for this step? Any advice would be greatly appreciated!
Hi Xuebin Qin,
Great work. Thanks.
Questions 1: Is it ok to build model/train on multi-object images? or it has to be images with one object each?
Question 2: Training on Windows 10 failed message:
DIS\IS-Net>python train_valid_inference_main.py
building model...
batch size: 8
--- create training dataloader ---
------------------------------ train --------------------------------
--->>> train dataset 0 / 1 DIS5K-TR <<<---
-im- DIS5K-TR ../DIS5K/DIS-TR/im : 959
-gt- DIS5K-TR ../DIS5K/DIS-TR/gt : 959
Traceback (most recent call last):
File "train_valid_inference_main.py", line 722, in
main(train_datasets,
File "train_valid_inference_main.py", line 527, in main
train_dataloaders, train_datasets = create_dataloaders(train_nm_im_gt_list,
File DIS\IS-Net\data_loader_cache.py", line 95, in create_dataloaders
gos_dataloaders.append(DataLoader(gos_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers_))
File "Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 268, in init
sampler = RandomSampler(dataset, generator=generator)
File "Anaconda3\lib\site-packages\torch\utils\data\sampler.py", line 102, in init
raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0
Thanks,
Gabew
Hi, I've been training the model with DUTS datasets.
As you have mentioned in the paper, DUTS Images are relatively smaller sized than DIS5K Images.
I have customized layers as mentioned, but outputs are not satisfying.
My question is..
How can I improve output in such small sized datasets?
Also, if DIS is trained with 4K high resolution datasets, what happens when test with lower resolution than 4K?
Thanks.
I am looking forward to your data set production method. When will a general tutorial be released? Thank you very much.
Last time said to be out, for a long time have not been out, in recent days to plan a tutorial,I look forward to your tutorial on making datasets
Whenever trying to load the new model I am getting:
--> 764 magic_number = pickle_module.load(f, **pickle_load_args)
765 if magic_number != MAGIC_NUMBER:
766 raise RuntimeError("Invalid magic number; corrupt file?")
UnpicklingError: invalid load key, '<'.
Been trying with either python 3.7 and newer versions, and different pytorch versions.
Hi,
looking at train_valid_inference_main.py (lines 131-132), GT_Encoder training is validated against training dataset instead of validation dataset (that has been commented out). Why have you make this choice?
Thanks
Hi, relying on Docker and based on an continuumio/anaconda3
image I run DIS and was able (for fun) to make a little desktop utility https://twitter.com/utopiah/status/1554097779013713923 to grab images from anywhere, like videos I'm watching.
Anyway this was a rather straightforward process as I'm relatively familiar with Docker but this morning I discovered https://github.com/replicate/cog which seems to address specifically the challenge of reproducibility in ML. It could be interesting to consider support for it.
Hi @xuebinqin ,
I'm having issues understanding the logic behind the GT mask encoder. Generally, self-supervision encoders do not have skip connections between encoding and decoding layers. Again typically, the aim is to have a bottleneck at the middle part, have the model learn the abstract meaningful concepts, and get the most meaningful information from it. However, in ISNET model, you used an encoder consisting of RSU blocks which is like:
I believe there is a chance that the GT encoder can pass all the information from the input to the output without compressing or processing any high-level info since there are skip connections. I see that this was not the case in your training.
I just wonder about your thoughts on this which is very important to me. Thank you!
Hi,
Thanks for the interesting work.
Just reviewing the paper here, I've got several questions regarding GT encoder.
You described GT encoder to be self-supervised -- Did you mean this as implementing auto-encoder?
In Figure 5(b), the depicted Ground Truth Encoder has only the encoder part -- does this mean that I only need to train the encoder part (not decoder), targeting the GT?
Again in Figure 5(b), do I need to do extra upsampling to make the result of the encoder to have the same size as input? (For clarity, if I put 3X1024X1024 as input, after the green conv layer, the input reshapes into 16X512X512. And after going through EN_1, it will reshape to 64X512X512 and upsample to 3X512X512 (Assuming the upsampling used for u2net is used here in the same manner). Now, the question is how can we compare the upsampled result of EN_1 (3X512X512) and original input (3X1024X1024) in BCE loss calculation?
-- One thought I had was temporarily adding extra upsampling layers for encoders while training the GT-encoder and remove those upsampling layers once I freeze the weights for GT-encoder. Would this be a viable option or did you mean something else?
Thanks in advance :)
Hi,
Thanks for this great work and for uploading the trained weights.
I'm using your pretrained model for inference and I don't really get the purpose of the cache_size parameter.
I figured it means resizing the image berfore inference and this could be usefull for running on big images with low GPU memory.
The thing is for some smaller images the results look better when upscaling them..
In this example of size 450x450 the first (bad) results are when leaving the cache_size parameter blank (no resize?)
and the second (Good) results are when using cache_size=[1024,1024]
Can you explain the purpose of this parameter?
Is it possible to perform instance segmentation rather than dichotomous segmentation with similar architecture?
Hi,
Firstly, thanks for the release.
I've tried on my custom dataset, but the model is not learning and loss taking insane high value.
In the end of 40k epochs training got finished due to early stopping & I got :- Train Loss =-7823, Val_Loss= -46939
This is the result:-
In my training dataset, there are around 7k images for vehicles. I trained without using the pre-trained model.
Any Suggestions? Thanks in Advance
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.