Coder Social home page Coder Social logo

mlpc-ucsd / coat Goto Github PK

View Code? Open in Web Editor NEW
222.0 222.0 30.0 7.15 MB

(ICCV 2021 Oral) CoaT: Co-Scale Conv-Attentional Image Transformers

License: Apache License 2.0

Shell 0.16% Python 49.24% C++ 0.09% Cuda 0.88% Jupyter Notebook 49.57% Dockerfile 0.04% Makefile 0.01% Batchfile 0.01%

coat's People

Contributors

xwjabc avatar yix081 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

coat's Issues

About AMP and batch size

Hi,
I'm very impressed by your excellent work! Thanks for sharing your code.

I have questions about the training protocol.

In your paper,

"We train all models with a global batch size of 2048 with the NVIDIA Automatic Mixed Precision(AMP) enabled."

but the training script denotes the batch size of 256, instead of 2048.

I wonder two points from here.

  1. Can I re-produce the result accuracy in this repo by using this command (batch size=256, instead of 2048)?

  2. Does this repo contains AMP?

Thanks in advance :)

Some questions in the paper

Hi, Author.
I want to know are EV and EV(hat) equivalent or approximate in the paper?
Are EV(hat)^l in the second half of formula 7 and formula 8 equivalent or approximate?
Thank you, looking forward to your answer.

Re-producing issue

Hi,

For checking re-producibility, I tried to train the coat_lite_mini model(reported 79.1/94.5) and got 78.85/94.42 by using this command :

bash scripts/train.sh coat_lite_mini coat_lite_mini

with the default settings such as the batch size of 256 and using 8 GPUs (TITAN RTX).

Is such a small difference (79.1 vs. 78.9) negligible?

My environment :


sys.platform linux
Python 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
numpy 1.19.2
Compiler GCC 7.5
CUDA compiler CUDA 10.1
detectron2 arch flags 7.5
DETECTRON2_ENV_MODULE
PyTorch 1.7.0
PyTorch debug build True
GPU available True
GPU 0,1,2,3,4,5,6,7 TITAN RTX (arch=7.5)
CUDA_HOME /usr/local/cuda-10.1
Pillow 8.0.1
torchvision 0.8.0
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5
fvcore 0.1.2.post20201218
cv2 Not found


PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.2
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.5
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

About new detection result in the camera-ready paper in ICCV

Hi,
First of all, congratulations on your acceptance in ICCV 👍.

I have seen your updated paper(arxiv v2) and have some questions.

In Table 3 which shows Mask R-CNN results under MMDetection framework,

(1)
Is the result with FPN 1x is trained ms-train or single-scale train ??

(2)
Do you have a plan to release the new implementation of MMDetection?

Thanks in advance :)

Segmentation architecture

@yix081 @xwjabc thanks for sharing the code base i have following queries

  1. Can we convert this architecture to perform segmentation task ie semanttic segmentation ? is so hwo to do it
  2. Can we convert this architecture to perform object detection ?

Please share ur thoughts Thanks in advance

Will CoaT Small be available?

Dear Weijian,
I recently read your paper on CoaT, it's really excellent work!
I wish to do some further research based on CoaT Small. However, it's not mentioned in your paper or repo, so I wonder if you had implemented CoaT Small and if yes, will the model be made available?
Thanks in advance!

COAT for multilabel classification

@yix081 @xwjabc thanks for sharing the code base , i have few queries on the problem statement which i am working for its gender_age classification of a person ie multilabel recognition problem

  1. my input image size varies from 8056 to 256128 for this input image should i change the patch size from 4 to 16 if so what all other params should i change ?
  2. since it is multilabel classification problem should i change the self.head()= nn.Linear(self.num_features, num_classes) if num_classes > 0 else nn.Identity() line
  3. should i freeze the layers in the transformer and train only the last layer ??
    Thanks in advance

Viz of attention maps

@yix081 @xwjabc thanks for you work, it has helped me a lot but had few queries

  1. Can we visualize the attention maps like gradcam / cam to see how the model is learning / learned? do you have a codebase to it or can you suggest how to do it ?
  2. Coat Lite has only serial block and Coat has serial+parallel blocks but the #params Coat Liter is higher than Coat is there any specific reason for this
  3. How to reduce the #params in the CoatLite/Coat <3M drop in accuracy is acceptable
    Thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.