mlpc-ucsd / coat Goto Github PK

View Code? Open in Web Editor NEW

222.0 222.0 30.0 7.15 MB

(ICCV 2021 Oral) CoaT: Co-Scale Conv-Attentional Image Transformers

License: Apache License 2.0

Shell 0.16% Python 49.24% C++ 0.09% Cuda 0.88% Jupyter Notebook 49.57% Dockerfile 0.04% Makefile 0.01% Batchfile 0.01%

coat's People

Contributors

Stargazers

Watchers

coat's Issues

About AMP and batch size

Hi,
I'm very impressed by your excellent work! Thanks for sharing your code.

I have questions about the training protocol.

In your paper,

"We train all models with a global batch size of 2048 with the NVIDIA Automatic Mixed Precision(AMP) enabled."

but the training script denotes the batch size of 256, instead of 2048.

I wonder two points from here.

Can I re-produce the result accuracy in this repo by using this command (batch size=256, instead of 2048)?
Does this repo contains AMP?

Thanks in advance :)

Hi, Author.
I want to know are EV and EV(hat) equivalent or approximate in the paper?
Are EV(hat)^l in the second half of formula 7 and formula 8 equivalent or approximate？
Thank you, looking forward to your answer.

Do you have plans to release the pre-trained checkpoint for CoaT

Hello, It is a nice work, I find you have released the pre-trained checkpoint for CoaT-Lite. Do you have plans to release the pre-trained checkpoint for CoaT ? Thanks ~

Re-producing issue

Hi,

For checking re-producibility, I tried to train the coat_lite_mini model(reported 79.1/94.5) and got 78.85/94.42 by using this command :

bash scripts/train.sh coat_lite_mini coat_lite_mini

with the default settings such as the batch size of 256 and using 8 GPUs (TITAN RTX).

Is such a small difference (79.1 vs. 78.9) negligible?

My environment :

sys.platform linux
Python 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
numpy 1.19.2
Compiler GCC 7.5
CUDA compiler CUDA 10.1
detectron2 arch flags 7.5
DETECTRON2_ENV_MODULE
PyTorch 1.7.0
PyTorch debug build True
GPU available True
GPU 0,1,2,3,4,5,6,7 TITAN RTX (arch=7.5)
CUDA_HOME /usr/local/cuda-10.1
Pillow 8.0.1
torchvision 0.8.0
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5
fvcore 0.1.2.post20201218
cv2 Not found

PyTorch built with:

GCC 7.3

C++ Version: 201402

Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)

OpenMP 201511 (a.k.a. OpenMP 4.5)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 10.2

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37

CuDNN 7.6.5

Magma 2.5.2

Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

how to finetune 384x384 model

Hi,

Could you share the script for finetuning 384x384 model?

flops and params for detection tools

Hi, Thanks for your great work. I wonder would you share the tools for calculating flops and parameters for detection?

When will you release the code?

I am very interested in your project. i want to know when will your team releases the code with pre-trained model please

About new detection result in the camera-ready paper in ICCV

Hi,
First of all, congratulations on your acceptance in ICCV 👍.

I have seen your updated paper(arxiv v2) and have some questions.

In Table 3 which shows Mask R-CNN results under MMDetection framework,

(1)
Is the result with FPN 1x is trained ms-train or single-scale train ??

(2)
Do you have a plan to release the new implementation of MMDetection?

Thanks in advance :)

Segmentation architecture

@yix081 @xwjabc thanks for sharing the code base i have following queries

Can we convert this architecture to perform segmentation task ie semanttic segmentation ? is so hwo to do it
Can we convert this architecture to perform object detection ?

Please share ur thoughts Thanks in advance

Will CoaT Small be available?

Dear Weijian,
I recently read your paper on CoaT, it's really excellent work!
I wish to do some further research based on CoaT Small. However, it's not mentioned in your paper or repo, so I wonder if you had implemented CoaT Small and if yes, will the model be made available?
Thanks in advance!

COAT for multilabel classification

@yix081 @xwjabc thanks for sharing the code base , i have few queries on the problem statement which i am working for its gender_age classification of a person ie multilabel recognition problem

my input image size varies from 8056 to 256128 for this input image should i change the patch size from 4 to 16 if so what all other params should i change ?
since it is multilabel classification problem should i change the self.head()= nn.Linear(self.num_features, num_classes) if num_classes > 0 else nn.Identity() line
should i freeze the layers in the transformer and train only the last layer ??
Thanks in advance

Viz of attention maps

@yix081 @xwjabc thanks for you work, it has helped me a lot but had few queries

Can we visualize the attention maps like gradcam / cam to see how the model is learning / learned? do you have a codebase to it or can you suggest how to do it ?
Coat Lite has only serial block and Coat has serial+parallel blocks but the #params Coat Liter is higher than Coat is there any specific reason for this
How to reduce the #params in the CoatLite/Coat <3M drop in accuracy is acceptable
Thanks in advance

mlpc-ucsd / coat Goto Github PK

coat's People

Contributors

Stargazers

Watchers

Forkers

coat's Issues

Recommend Projects

Recommend Topics

Recommend Org