Coder Social home page Coder Social logo

deeplabv3finetuning's Introduction

DeepLabV3FineTuning

Semantic Segmentation : Multiclass fine tuning of DeepLabV3 with PyTorch

The code in this repository performs a fine tuning of DeepLabV3 with PyTorch for multiclass semantic segmentation.

Result Preview

Random result on a test image (not in dataset)

Requirements

Basic dependencies are PyTorch 1.4.0 and torchvision 0.5.0.
I used a conda virtual env, where I installed the following packages :

conda install -c conda-forge -c pytorch python=3.7 pytorch torchvision cudatoolkit=10.1 opencv numpy pillow

Dataset

I created a dataset from my own personal skydiving pictures.
Around 500 images were gathered and annotated using the excellent tool CVAT : https://github.com/opencv/cvat

/!\ On this repo, I only uploaded a few images in ./sample_dataset as to give an idea of the format I used.
I wrote a script to easily convert one of the XML export types (LabelMe ZIP 3.0 for images) of CVAT into label images
There are 5 classes in my example:

  • No-label : 0
  • Person : 1
  • Airplane : 2
  • Ground : 3
  • Sky : 4

How to run training

Once you replace sample_data with your own dataset :

python sources/main_training.py ./sample_dataset ./training_output --num_classes 5 --epochs 100 --batch_size 16 --keep_feature_extract

Best value I obtained were Loss: 0.2066 and Accuracy: 0.8099 with 100 epochs The accuracy is computed as the mean of the IoU (Intersection-over-Union) for all classes.

Step by step

Model

First thing is to fetch a pretrained DeepLabV3 model.
It is pretrained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset.

model_deeplabv3 = models.segmentation.deeplabv3_resnet101(pretrained=use_pretrained, progress=True)

The auxiliary classifier is removed, and the pretrained weights are frozen.

model_deeplabv3.aux_classifier = None
for param in model_deeplabv3.parameters():
    param.requires_grad = False

The pretrained classifier is replaced by a new one with a custom number of classes. Since it comes after the freeze, its weights won't be frozen. They are the ones that we will fine-tune.

model_deeplabv3.classifier = torchvision.models.segmentation.deeplabv3.DeepLabHead(2048, num_classes)

Data Augmentation

Following data augmentation are applied to the training set :

self.transforms = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.RandomCrop((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406, 0], [0.229, 0.224, 0.225, 1])
])

For the validation set, only centered crop and normalization are applied :

self.transforms = transforms.Compose([
    transforms.CenterCrop((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406, 0], [0.229, 0.224, 0.225, 1])
])

To ensure that the same transformation is applied on the input image and the expected output label image, both of them are merged into a 4 channels image prior to transformation. They are then split back as two separate entities for the training.

image = Image.open(img_path)
label = Image.open(label_path)

# Concatenate image and label, to apply same transformation on both
image_np = np.asarray(image)
label_np = np.asarray(label)
new_shape = (image_np.shape[0], image_np.shape[1], image_np.shape[2] + 1)
image_and_label_np = np.zeros(new_shape, image_np.dtype)
image_and_label_np[:, :, 0:3] = image_np
image_and_label_np[:, :, 3] = label_np

# Convert to PIL
image_and_label = Image.fromarray(image_and_label_np)

# Apply Transforms
image_and_label = self.transforms(image_and_label)

# Extract image and label
image = image_and_label[0:3, :, :]
label = image_and_label[3, :, :].unsqueeze(0)

Training

The chosen training loss is Cross Entropy (https://pytorch.org/docs/stable/nn.html#crossentropyloss) since it is well suited for multiclass classification problems.

# Setup the loss function
criterion = nn.CrossEntropyLoss(weight=(torch.FloatTensor(weight).to(device) if weight else None))

The optimizer is SGD with a learning rate of 0.001 and a momentum of 0.9.
Only the classifier parameters are optimized.

params_to_update = []
for name, param in model_deeplabv3.named_parameters():
    if param.requires_grad:
        params_to_update.append(param)
optimizer_ft = optim.SGD(params_to_update, lr=0.001, momentum=0.9)

deeplabv3finetuning's People

Contributors

jnkl314 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

deeplabv3finetuning's Issues

Question about test

I have some question ,when I finished training this code(main.training.py,use this program sample_dataset ) and I need to put in test images to see model training results.
But I don't see the code(ex:test.py) that can be tested, if it is main_inference.py, I still get an error after running, the following is the error message:
No such file or directory: '/tmp/pycharm_project_782/03.03.20_saut_4/000001.png'
Hope you can help me out, thanks

Questions about accuracy

I have some questions about the iou function in train.py.
According to literature, for segmentation tasks it is computed as the intersection over union and it should be the mean over the number of classes for multiclass segmentation.

  1. Why do you compute union as tot_class_preds + tot_class_labels - intersection? I think it's wrong because the union should be just tot_class_preds + tot_class_labels...
  2. Why do you exclude the background class in the iou computation?

Plus, I suggest to add sorted in the dataloader to ensure that images and masks are loaded in correct order (otherwise the pairing may not be right):

self.imgs_files = sorted(glob.glob(os.path.join(folder_path,'Images','*.*')))
self.masks_files = sorted(glob.glob(os.path.join(folder_path,'Masks','*.*')))

Running Prediction

Hi jnkl314 and any other users,

This repo looks pretty interesting and useful. I cloned the repo and got the training to run, and now I am trying to use the fine-tuned model to make mask predictions on images. Only using the sample skydiving data for now, to make things easy.

The code in main_inference seems the most relevant. It appears that the code opens png files from a temp directory (?) from how jnkl314 was running the code in Pycharm:
image = Image.open(f"/tmp/pycharm_project_782/03.03.20_saut_4/{idx:06}.png") from line 29

Obviously we do not all have /tmp/pycharm_project_782/03.03.20_saut_4/ on our machines, so I set up my code to iterate through all files of sample_dataset/val/images (or any folder containing pngs).

The part of jnkl314's code that is causing a problem is main_inference.py lines 29-39:

image = Image.open(f"/tmp/pycharm_project_782/03.03.20_saut_4/{idx:06}.png")
image_np = np.asarray(image)
# image_np = cv2.resize(image_np, 0.5, 0.5, cv2.INTER_CUBIC)
width = int(image_np.shape[1] * 0.3)
height = int(image_np.shape[0] * 0.3)
dim = (width, height)
image_np = cv2.resize(image_np, dim, interpolation=cv2.INTER_AREA)

image = Image.fromarray(image_np)
image = transforms_image(image)

When run, I get this error:

Traceback (most recent call last):
File "sources/main_mask_prediction.py", line 97, in
main()
[ this part of the trace just traces through the new functions I wrapped around jnkl314's code, unimportant ]
File "sources/main_mask_prediction.py", line 59, in predict_one_image
image = transforms_image(image)
File "(...)\lib\site-packages\torchvision\transforms\transforms.py", line 61, in call
img = t(img)
File "(...)\lib\site-packages\torchvision\transforms\transforms.py", line 212, in call
return F.normalize(tensor, self.mean, self.std, self.inplace)
File "(...)\lib\site-packages\torchvision\transforms\functional.py", line 298, in normalize
tensor.sub_(mean).div_(std)
RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0

Calling transform on a freshly opened PNG, just how the code was uploaded, creates a size mismatch. Does this have to do with the image+label transform setup mentioned in the readme? If it does, why was this code uploaded without accounting for that? Or maybe the error is caused by an alpha channel in the PNG?

Any help is greatly appreciated!

Access to dataset to replicate results

Hi!
First of all, thank you for this repository, it's easy to use and expand.
I'm currently having some issues to generate good training results with my own dataset and I'm testing different possible reasons that could lead to the bad results. As part of this, I'd also like to replicate the results you've had with your skydiver dataset, but in order to do that, I'd need access to the whole dataset to run the training properly. Are you willing to publish your annotated dataset on Kaggle or a similar platform?
Cheers

train.py halts at random

I am using the training of this repo in a straightforward way so far, no modifications to the model or changes to the training steps. Calling training via commandline just as described in the Readme.

Sometimes, however, the loop just halts indefinitely! The only thing that will make it resume is inputting a key to the commandline, like pressing 'enter.' CPU usage drops to near-zero for the process. This has been tested by leaving it frozen for over an hour, and it did not unfreeze itself.

Through simple debug print statements I found the location that it halts at is train.py line 67: outputs = model(inputs)['out']
Seeing that's the big action of training, I'm not surprised that's the line where it freezes. Does anyone know why this could be happening? There doesn't seem to be a pattern of when it freezes; usually it freezes 0 times per epoch, sometimes as many as 4 times per epoch.

Any help is appreciated! This problem stops me from pressing 'Go' and letting the model train; it forces me to check back in on it.

potential transform issue

Hi,
In this part of the code

transforms.Normalize([0.485, 0.456, 0.406, 0], [0.229, 0.224, 0.225, 1])

the transforms are applied to both the image and label masks but I think the masks should not be normalized. There is an example implementation that avoids normalizing the masks: You can follow the thread here:

https://github.com/pytorch/vision/blob/3c81d474a2525b11356ca1dde299a9b1ab3715c6/references/segmentation/transforms.py#L85
https://discuss.pytorch.org/t/where-are-the-masks-unnormalized-for-segmentation-in-torchvision-train-file/48113

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.