locuslab / convmixer Goto Github PK
View Code? Open in Web Editor NEWImplementation of ConvMixer for "Patches Are All You Need? 🤷"
License: MIT License
Implementation of ConvMixer for "Patches Are All You Need? 🤷"
License: MIT License
if yes , what is the proper script command?
I just changed the distributed script to single gpu, It failed to train.
Hi.
Would you consider providing an open source license for this repo?
First of all, thank you for the interesting work.
I was experimenting the one with patch size 1 and kernel size 9 with CIFAR-10 with the following training settings:
--model tiny_convmixer
-b 64 -j 8
--opt adamw
--epochs 200
--sched onecycle
--amp
--input-size 3 32 32
--lr 0.01
--aa rand-m9-mstd0.5-inc1
--cutmix 0.5
--mixup 0.5
--reprob 0.25
--remode pixel
--num-classes 10
--warmup-epochs 0
--opt-eps 1e-3
--clip-grad 1.0
--scale 0.75 1.0
--weight-decay 0.01
--mean 0.4914 0.4822 0.4465
--std 0.2471 0.2435 0.2616
I could get only 95.89%. I am supposed to get 96.03% according to Table 4 in the paper.
Can you please let me know any setting I missed? Thank you again.
Hi, first of all thanks for a very interesting paper.
I would like to know how long did it take you to train the models? I'm trying to train ConvMixer-768/32 using 2xV100 and one epoch is ~3 hours, so I would estimate that full training would take ~= 2 * 3 * 300 ~= 1800 GPU hours, which is insane. Even if you trained with 10 GPUs it would take ~1 week for one experiment to finish. Are my calculations correct?
Have the author tried to replace the patch embedding with the just convolution?That is, using 1 stride instead of p?
With this setting, this is a standard convolution network like MobileNet. I wonder what would be the performance?Is the performance gain of Convmix due to the patch embedding or the depthwise conv layers?
Very interested in this work, thanks.
where is the weights file after training?
Hi!
This work is pretty interesting, but I think there should are more results like in "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight" as they replace local self-attention with depth-wise convolution in Swin Transformer. Since you conduct an advanced one with a more simple architecture compared to SwinTransformer, so I wonder if ConvMixer can get similar performance on object detection and semantic segmentation.
Why “patches” are all you need?
Patch embedding is Conv7x7 stem,
The body is simply repeated Conv9x9 + Conv1x1,
(Not challenging your work, it's indeed very interesting), but just kindly wondering what's new about this model?
Hi authors. Your paper has demonstrated a quite intriguing observation. I wish you luck with your submission.
Thanks for sharing the code of the submission. When running the code, I got an issue regarding OOM when using the default batch size of 64. In the end I can only run with 8 samples per batch per GPU as my GPUs have only 11GB. I would like to know if you have tried smaller GPUs and achieved the same results. So far, besides learning rate modified according to the linear rule, I haven't made any change yet. If you tried training using smaller GPUs before, could you please share your experience? Thank you very much!
Hello,
I tried convmixer256 on Cifar-10 with the same timm options specified for ImageNet (except the num_classes) and it doesn't go beyond 90% accuracy. Could you please specify the options used for Cifar-10 experiment ?
As the title implies, it would be great to see the training log of convmixer.
https://github.com/tmp-iclr/convmixer/blob/1cefd860a1a6a85369887d1a633425cedc2afd0a/convmixer.py#L18 There is an error:TypeError: conv2d(): argument 'padding' (position 5) must be tuple of ints, not str.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.