Comments (10)
Here I add the speed test dwblocks_speed.py.
test on
python 3.7.11 + torch 1.8.2 + cuda-11.1.1 + cudnn-8.1.1 + V100
from replknet-pytorch.
10x slower: depthwise_conv2d_implicit_gemm.DepthWiseConv2dImplicitGEMM takes 0.01943465073903402s while nn.Conv2d takes 0.0012518405914306641s.
from replknet-pytorch.
Hi, I checked the code and found no "synchronized()" so that the time recorded may not be the actual running time on GPU. I would suggest you follow the speed test script of Swin (https://github.com/microsoft/Swin-Transformer/blob/main/main.py#L287)
from replknet-pytorch.
The test code is a small replication of the phenomenon (depthwise_conv2d_implicit_gemm slower), which occurred in training a large model.
from replknet-pytorch.
The code that adds torch.cuda.synchronize()
before calling time.time()
gives rather close time to the original code.
from replknet-pytorch.
This implementation is not suited for small batch sizes. In this case the batch size is 1, so the cutlass implmentation is slower than pytorch. You can try megengine instead.
from replknet-pytorch.
Thanks for your reply. It help me a lot.
from replknet-pytorch.
I meet the same question.
I trained ATSS detector with ReoLKNet31B and batch_size 1(2080Ti GPU, 11 GB memory..., and 'use_checkpoint' seems to be not compatible with DDP):
- when use torch.nn.Conv2d(), training time is about 1.00s per iteration.
- when use DepthWiseConv2dImplicitGEMM, training time is about 4.87s per iteration.
from replknet-pytorch.
Hi, I encountered with the same problem.
When using nn.Conv2d, the running time of the model is just ~0.5s,
while using the DepthWiseConv2dImplicitGEMM, the time is ~6s.
The batchsize is set to 1 owing to the memory (RTX3060, 1 single GPU, 12G).
from replknet-pytorch.
Thank you for sharing the results. As explained by @xiaocenxiaocen , our implementation is designed to pursue high throughput. Larger the batch size, higher the throughput.
from replknet-pytorch.
Related Issues (20)
- failed when compiled cutlass-master
- failed when compiled cutlass-master HOT 2
- CUDA out of memory when testing on cityspaces
- windoes --Python setup.py install --user is fail
- Warning from setup.py
- Reproducibility for large kernel conv
- NotImplementedError HELP PLEASE!
- depthwise_conv2d_implicit_gemm installation
- RuntimeError: Error compiling objects for extension
- About large depthwise conv2d kernel speed HOT 3
- DepthWiseConv2dImplicitGEMM has no 'padding' class attribute(actually zero) HOT 9
- Using DW 3x3 in stem block
- ModuleNotFoundError: No module named '_depthwise_conv2d_implicit_gemm_C' HOT 2
- Validation metric difference between saving state_dict() and the whole model.
- No module named 'timm.optim.novograd‘ HOT 3
- Welcome update to OpenMMLab 2.0
- Using a transformer type model to calculate ERF yields very poor results
- Questions about re-param of models
- I try to increase the -1 dim ,the big kernal becomes more and more slower than conv2d HOT 1
- Instructions about how to install 19_large_depthwise_conv2d_torch_extension
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from replknet-pytorch.