Coder Social home page Coder Social logo

Comments (7)

karolzak avatar karolzak commented on July 28, 2024

Hi @javadan ,

As you already noticed, this topic was discussed a few times now and to my knowledge there are 2 options:

  • separate model for each class with separate output mask for each class (256x256x1 with 0..1 values range)
  • separate binary mask for each class (256x256x1 with 0..1 values range), stacked together (256x256xNUM_CLASS with 0..1 values range) and used as a target mask for a single model with modified num_classes param

Otherwise, there was an option to +1 to num_classes and change the output shape to (n, w, h, num_classes). Then each class gets a full (w x h) binary mask of its own. (You are still using sigmoid and binary_crossentropy for this?)

You can use either sigmoid (if one pixel can belong to more than one class) or softmax (in case one pixel can only be connected with a single class) but for loss function I'm still using binary_crossentropy, yes.

Then there's what I think I'm interested in,
where images are (n, 256 , 256, 1) (i.e. gray-scale)
and masks are also (n, 256 , 256, 1) because the pixel options are just integers from 0 to 255.
That's what I want, as output, too. I'll read the prediction mask pixel values to get the class numbers.
I see Keras recommends 'sparse_categorical_crossentropy' as the loss function for this use case, and then it apparently doesn't matter if you use sigmoid or softmax.

I'm sorry but I don't understand how this would work. Both sigmoid and softmax activation functions output values between 0..1 so I don't see how would they transform that output into what you're looking for (0..255).
Closest to what you're trying to achieve is using linear regression but I don't see how that could work either.

Again, from what I know in terms of multi-class image segmentation problem it all comes down to 2 different methods which I pointed out at the top of this answer. Can you read these again and let me know what holds you back from using these methods?

Happy to discuss this further if you need

from keras-unet.

javadan avatar javadan commented on July 28, 2024

Hi @karolzak

Ok, I will let you know if I work out how to do it the way I'm describing.
Otherwise, I'll use one of the multiple binary-segmentation methods.

The multiple binary-segmentation methods should work fine for me, once I've turned my 5-class mask into 5 x 1-class masks. I'm just thinking ahead, in case I decide to add more classes later.

If num_classes increases,

Then with an integer-encoded single layer output, no changes would need to be made to the architecture or code, and the size of the network doesn't increase.

With the layer per class methods, the architecture and code and size of the network increases with every new class.

I imagine it's possible, as the occasional answer here and there seem to suggest that softmax and sparse_categorical_crossentropy could allow for integer encoding. (Perhaps the class ids are dividing by 255, to get them between 0 to 1, for training, and then multiplied by 255, to get back to 0 to 255 for the final PNG output).

But anyway, was just finding out if you were familiar with integer-encoding multi-class segmentation. I'll give it a try, and will probably end up using one of your suggested methods, in the end, when it doesn't work.

Thanks for your time

from keras-unet.

karolzak avatar karolzak commented on July 28, 2024

If num_classes increases,
Then with an integer-encoded single layer output, no changes would need to be made to the architecture or code, and the size of the network doesn't increase.
With the layer per class methods, the architecture and code and size of the network increases with every new class.

Well I partially agree with this statement although the change is tiny and it's only the output tensor size that changes so it would never become a concern in terms of network size. On top of that if you write your training logic well then there's no need for code changes while retraining. num_classes can be easily provided automatically based on your masks.
In fact in terms of trainable params it would barely change at all:

  • with num_classes=10
    image
  • with num_classes=1
    image

As you can see above, in comparison to the overall network size the difference is omittable.


I imagine it's possible, as the occasional answer here and there seem to suggest that softmax and sparse_categorical_crossentropy could allow for integer encoding. (Perhaps the class ids are dividing by 255, to get them between 0 to 1, for training, and then multiplied by 255, to get back to 0 to 255 for the final PNG output).

I read through these suggestions and run some experiments with sparse_categorical_crossentropy but it doesn't change much tbh. Yes you can pass in 256x256x1 tensor of integers as Y to calculate the loss function but it does not change the fact that the output tensor from the network still needs to be of shape 256x256xNUM_CLASSES (same as with binary_crossentropy) where NUM_CLASS==max_class_ID. If you use a mask with values like [0 1 2 3 256] your NUM_CLASS needs to be 256 so it would be best to encode 256 into 4 to avoid artificially blowing up the size of the output tensor.
So in fact the network size using sparse_categorical_crossentropy is the same as when using binary_crossentropy because for both of these the output network tensor would be of the same size/shape - the only difference is that for sparse you need target of shape 256x256x1 vs for binary you need 256x256xNUM_CLASS.

Good luck and do let me know how it went!

from keras-unet.

soans1994 avatar soans1994 commented on July 28, 2024

hello,

Can ypu please help me underatand, how can i address the overlapping masks. I have 15 class masks as binary mask. I could use categorical cross entropy loss by one hot encoding the targets, but i manually removed the overlapping pixels from some of the classes. The output is not so accurate. Can i make use of the overlapped binary masks for 15 classes with binary cross entropy loss?

thank you

from keras-unet.

karolzak avatar karolzak commented on July 28, 2024

Hi @soans1994
How big of an overlap are we talking about here?
I suspect the problem of your output not being so accurate might be caused by something else than just overlapping pixels.
When it comes to image segmentation for multiple classes what I found working best is training a separate binary classification model for each class

from keras-unet.

akashsindhu96 avatar akashsindhu96 commented on July 28, 2024

Hi @karolzak
Can you help me understand this line If you use a mask with values like [0 1 2 3 256] your NUM_CLASS needs to be 256 so it would be best to encode 256 into 4 to avoid artificially blowing up the size of the output tensor. Why NUM_CLASS needs to be 256 instead of 5?

from keras-unet.

javadan avatar javadan commented on July 28, 2024

@akashsindhu96 he meant I should just use 4 to represent the value 256. (Output layer would need to be
256x256x256 if I need it to output 0-256 but only needs to be 256x256x5 if I need it to output 0-4)

from keras-unet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.