esandml / fcbformer Goto Github PK
View Code? Open in Web Editor NEWOfficial code repository for: FCN-Transformer Feature Fusion for Polyp Segmention (MIUA 2022 paper)
License: Apache License 2.0
Official code repository for: FCN-Transformer Feature Fusion for Polyp Segmention (MIUA 2022 paper)
License: Apache License 2.0
Hi.
I found your work interesting. Therefore, I decided to work on this topic which is new to me. I'm planning to start some work on medical image segmentation and found your article very interesting.
I tried to reproduce the results for both datasets. For example, train on Kvasir-SEG and test on Kvasir-SEG provided the train/test split by you. However, I failed to reproduce the results mentioned in the paper. I tried to run with and without pre-train weights. I also tried to re-train the code 2 times, but still failed to reproduce the results. I can reproduce results 2-3% less compared to what was mentioned in the paper. I was using torch 1.10.0 and cuda 11.3.
Could you please let me know any special tricks or library versions or what strategy should I follow to reproduce the results?
Thank you.
Is this model available for multiple classes?
Hello, I sincerely request a result image file of your trained polyp segmentation for learning and comparison. If it is convenient, please send it to my email: [email protected] .
Thank you very much!
Within the dataset you are mentioning, the mask pixel values are more than 1, even though the number of classes is one.
I though that there would be 0 and 1 values for the pixel of mask image.
Is it OK?
Could you explain??
When using the CVC dataset, a TypeError: Input image tensor permitted channel values are [3], but found 1
error occurs. It is likely due to different datasets having different dataloaders. I hope you can upload the dataloader for the CVC dataset.
Hi.
Regarding the Dice Loss, I understood that DiceLoss = 1 - DiceScore, but the DiceScore in your code doesn't match the one proposed in VNet (fig 1) in which m1
and m2
are not squared in the denominator compared to VNet's Dice Loss (fig 2). I reckon that m2
might not be squared because its values are either 0 or 1 (or extremely close to 0 or 1 due to the floating things in computer or so) so the power of 2 doesn't change its original values. But for m1
, since it's a predicted probability map with its values being in [0,1], the power of 2 DOES change the probabilities to some extent, so the derivative may also change significantly. So why aren't m1
and m2
(or just m2
) squared?
Figure 1. the Dice Loss proposed in VNet
Figure 2. The code implementation. The highlighted line doesn't match with the denominator in VNet's Dice Loss
Hi, I have searched and found no relevant results.
I printed the architecture of class TB (Models/models.py) and found that in get_pyramid() method, the pyramid[] list consisted only 3 feature maps F1, F2, F3 (since the append happens 3 times) instead of 4 feature maps as proposed in your paper. Also, it seems like self.backbone[10] is, I guess, the missing F4. Therefore, there are actually 3 emphasized feature maps after the LE module. This affects the forward() method, i.e. at the first concat step of SFA, instead of concatenating F4_emph & F3_emph as in the paper, F3_emph & F3_emph are concatenated and that's weird.
Is my understanding correct? If so, why isn't F4 used?
Thank you very much!
PS: I have read the PVT repo of whai362 and in class PyramidVisionTransformer, F4 is not used by default but I don't understand why
Hello!
I have read some relevant papers cited in your paper's references, one of which is Stepwise Feature Fusion: Local Guides Global, which is the inspiration for PLD+. In this paper, a number of encoder-decoder pairs are experimented, and from Table 4 as in the image below, PLD performs the best with the MiT encoder (MiT is the Segformer’s Encoder). So I wondered if you have tried MiT-B3 encoder before (which has similar #param as PVT v2-B3, at 45.2M) given that PLD is improved into PLD+ and has it been a good match to PLD+?
Thank you very much!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.