Coder Social home page Coder Social logo

fcbformer's People

Contributors

cvml-uclan avatar esandml avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

fcbformer's Issues

Couldn't Reproduce the results.

Hi.

I found your work interesting. Therefore, I decided to work on this topic which is new to me. I'm planning to start some work on medical image segmentation and found your article very interesting.

I tried to reproduce the results for both datasets. For example, train on Kvasir-SEG and test on Kvasir-SEG provided the train/test split by you. However, I failed to reproduce the results mentioned in the paper. I tried to run with and without pre-train weights. I also tried to re-train the code 2 times, but still failed to reproduce the results. I can reproduce results 2-3% less compared to what was mentioned in the paper. I was using torch 1.10.0 and cuda 11.3.

Could you please let me know any special tricks or library versions or what strategy should I follow to reproduce the results?

Thank you.

dataset label question

Within the dataset you are mentioning, the mask pixel values are more than 1, even though the number of classes is one.

I though that there would be 0 and 1 values for the pixel of mask image.

Is it OK?

Could you explain??

dataloader

When using the CVC dataset, a TypeError: Input image tensor permitted channel values are [3], but found 1 error occurs. It is likely due to different datasets having different dataloaders. I hope you can upload the dataloader for the CVC dataset.

Loss function

Hi.
Regarding the Dice Loss, I understood that DiceLoss = 1 - DiceScore, but the DiceScore in your code doesn't match the one proposed in VNet (fig 1) in which m1 and m2 are not squared in the denominator compared to VNet's Dice Loss (fig 2). I reckon that m2 might not be squared because its values are either 0 or 1 (or extremely close to 0 or 1 due to the floating things in computer or so) so the power of 2 doesn't change its original values. But for m1, since it's a predicted probability map with its values being in [0,1], the power of 2 DOES change the probabilities to some extent, so the derivative may also change significantly. So why aren't m1 and m2 (or just m2) squared?

image
Figure 1. the Dice Loss proposed in VNet

image
Figure 2. The code implementation. The highlighted line doesn't match with the denominator in VNet's Dice Loss

PVTv2: The conflict between the paper and the code

Hi, I have searched and found no relevant results.
I printed the architecture of class TB (Models/models.py) and found that in get_pyramid() method, the pyramid[] list consisted only 3 feature maps F1, F2, F3 (since the append happens 3 times) instead of 4 feature maps as proposed in your paper. Also, it seems like self.backbone[10] is, I guess, the missing F4. Therefore, there are actually 3 emphasized feature maps after the LE module. This affects the forward() method, i.e. at the first concat step of SFA, instead of concatenating F4_emph & F3_emph as in the paper, F3_emph & F3_emph are concatenated and that's weird.
Is my understanding correct? If so, why isn't F4 used?
Thank you very much!

issue_1
forward fn

PS: I have read the PVT repo of whai362 and in class PyramidVisionTransformer, F4 is not used by default but I don't understand why

Error: C^

Search before asking
I have searched and found no similar bug report.

Hi guys! Thank for great project.
When I follow this guide to train t got error:

image

I have no idea about this error. Please help me.

Encoder experiments

Hello!
I have read some relevant papers cited in your paper's references, one of which is Stepwise Feature Fusion: Local Guides Global, which is the inspiration for PLD+. In this paper, a number of encoder-decoder pairs are experimented, and from Table 4 as in the image below, PLD performs the best with the MiT encoder (MiT is the Segformer’s Encoder). So I wondered if you have tried MiT-B3 encoder before (which has similar #param as PVT v2-B3, at 45.2M) given that PLD is improved into PLD+ and has it been a good match to PLD+?

tab4

Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.