dingxiaoh / replknet-pytorch Goto Github PK

View Code? Open in Web Editor NEW

832.0 832.0 85.0 21.47 MB

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs (CVPR 2022)

License: MIT License

Python 100.00%

replknet-pytorch's People

Contributors

Stargazers

Watchers

Forkers

jianpingzhonggit mldl recreatemyself lishaobo-github laozhanger ygdr2020 shkarupa-alex zn-ning baoyu2020 ddxk369 codwest xuliangcs wangfp-516 tiantian-han hongyunnchen ssahgal wstchhwp yidan-zhang brianna-dev antecede sicker2020 ccoding04 wangmou21 boltenwang-meta purvang3 thegreattreatsby zqyjason crystraldo liangying-ke shiweiliuiiiiiii likyoo damon328 jackyin68 lwtvickyzz liyi199983 fb-reps jieming1113 richardomrmu silvercherry dustasa sanchit88 zhengfangwu dl-cnn empyriumz helonin wangjie1450 hkzhang-git hityzy1122 xperzy mucaoshen shashankskagnihotri ivorytower152 fanrupin musclemanisme jewelc92 zhangqinghua1008 stnjumu kaylahoo shiyanrubing moonnnnn1 yodhcn dantonf mneildiane zhouhuan-hust ray3417 fffox-abc gg-big-org qhfan vertyxzz flyingdog-huang zeroonegame huangziming0830 cctt126 jinglongdu shigen-stoneroot laoyangui h-hui2277 ycqian6 burnningwings keepersecond liyiersan danglamtung obmutescence

replknet-pytorch's Issues

ERF small fix

Thanks for the great code.

https://github.com/DingXiaoH/RepLKNet-pytorch/blob/main/erf/analyze_erf.py#L60

The ((i*2+1)/1024) **2 can be modified to
(i * 2 + 1) / h * (i * 2 + 1) / w
making the code robust.

RepLKNet-31L 21k weights seems to be wrong

I've ported RepLKNet model to TF (will be published soon).
But during tests i've found that one particular checkpoint - RepLKNet-31L for imagenet21k - produces wrong predictions.

For image https://storage.googleapis.com/tensorflow/keras-applications/tests/elephant.jpg it predicts top-2 classes: 20104, 10871 which are "flowering_almond" and "strainer".

All other checkpoints for 1k and 21k imagenet predict correct classes (3674 for African_elephant in 21k)

About large depthwise conv2d kernel speed

您好，我做了一个关于您提供的大卷积核算子速度的实验。对比pytorch的Conv2d

    if torch.cuda.is_available():
        x = torch.randn(64, 1000, 32, 32).cuda()
        x1 = torch.randn(64, 1000, 32, 32).cuda()
        m1 = DepthWiseConv2dImplicitGEMM(1000, 31, bias=False).cuda()
        m2 = nn.Conv2d(1000, 1000, 31, padding=31// 2, bias=False, groups=1000).cuda()
        
        start = time.time()
        y2 = m2(x1)
        y2.mean().backward()
        end = time.time()

        start1 = time.time()
        y1 = m1(x)
        y1.mean().backward()
        end1 = time.time()


        print("time:",end-start,end1-start1)

结果：
time: 0.0003426074981689453 0.0003437995910644531
经过测试，他们前向和后向所花费的时间几乎是相同的。是pytorch做了相关优化吗？
pytorch 1.9.0 python3.9 cuda 10.2

failed when compiled cutlass-master

here is my tips:
#########first load env###############
module load cuda/11.3.r11.3
module load gcc/8.5.0-gcc-4.8.5-uqa
module load cudnn/8.2.0.53-10.2-gcc-8.5.0-ojx
module load cmake/3.21.4-gcc-8.5.0-g5a
module load anaconda/2021.11
source activate new # I installed pytorch1.10.1
export CUDA_HOME=/share/software/cuda11.3
export CUDA_INSTALL_PATH=${CUDA_HOME}
export CUDACXX=${CUDA_INSTALL_PATH}/bin/nvcc
############ make build and cd build ################
cd /share/home/scz6107/new/MegEngine/cutlass-master/build
rm -rf ./*
cmake .. -DCMAKE_INSTALL_PREFIX=/share/home/scz6107/new/MegEngine/cutlass-install -DCUTLASS_NVCC_ARCHS=80 -DCUTLASS_LIBRARY_KERNELS=all >> cmake.log 2>&1
make cutlass_profiler -j8 >> make_cutlass_profiler.log

but the progress bar as 68% , it is no longer to compile, and no error reported
here is make_cutlass_profiler.log content:
[ 61%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.7a37bb7150cc.cu.o
[ 63%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.227a93a5befa.cu.o
[ 63%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.b762ff12d125.cu.o
[ 63%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.217ed99b47fc.cu.o
[ 63%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.0dc85177c1db.cu.o
[ 63%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.fc659a81f14d.cu.o
[ 63%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.961c8f62a08a.cu.o
[ 63%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.b8fb0ef48f81.cu.o
[ 63%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.f34af35dc9c5.cu.o
[ 63%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.23058aaabfd9.cu.o
[ 63%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.2afc670646fe.cu.o
[ 63%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.8b699930937b.cu.o
[ 65%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.d3bcb8d48c9e.cu.o
[ 65%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.137c08aa2451.cu.o
[ 65%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.312f0a18b453.cu.o
[ 65%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.f79b3060948a.cu.o
[ 65%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.19175cdf15a4.cu.o
[ 65%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.7a95b0639dc5.cu.o
[ 65%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.fcc62e6ec7bd.cu.o
[ 65%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.09f643b6c4e0.cu.o
[ 65%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.e84fe74a3b6b.cu.o
[ 65%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.c0343ad56141.cu.o
[ 65%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.4c8a0e140a14.cu.o
[ 68%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.b8c54e8e8ac1.cu.o
[ 68%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.8ffcb0c63ecb.cu.o
[ 68%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.8d6ad3779501.cu.o
[ 68%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.2fb16cba8788.cu.o
[ 68%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.4185cd92e015.cu.o
[ 68%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.15efa0c229f0.cu.o
[ 68%] Building CUDA object tools/library/CMakeFiles/cutlass_library_objs.dir/cutlass_library_objs.unity.aa491497257c.cu.o

Installing large_depthwise_conv2d_torch_extension failed on Windows 10

My Python version: 3.8.12 / 3.8.13
My Pytorch version: 1.8.2 / 1.10.1

I tried to install the extension from https://github.com/MegEngine/cutlass/tree/master/examples/19_large_depthwise_conv2d_torch_extension. on my computer with Windows 10 OS, but the following errors are raised:

D:\*****\lib\site-packages\torch\include\pybind11\cast.h(1429): error: too few arguments for template template parameter "Tuple"
          detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(1507): here

D:\*****\lib\site-packages\torch\include\pybind11\cast.h(1503): error: too few arguments for template template parameter "Tuple"
          detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(1507): here

Pre-trained weights of RepLKNet-13 for visualizing the ERF

Thanks for your nice work and could you please upload the pre-trained weights of the RepLKNet-13 for us to visualize the ERF?

Best wishes and great thanks!

How to train on your own dataset

failed when compiled cutlass-master

backward_data_fp16.cu(193): error: more than one instance of constructor "cutlass::Tensor4DCoord::Tensor4DCoord" matches the argument list:
function "cutlass::Tensor4DCoord::Tensor4DCoord(cutlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index)"
function "cutlass::Tensor4DCoord::Tensor4DCoord(cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::LongIndex)"
argument types are: (int64_t, int64_t, int64_t, int)

backward_data_fp16.cu(215): error: no instance of constructor "cutlass::conv::kernel::ImplicitBatchedGemmTnDepthwiseConvolution<Mma_, Epilogue_, ThreadblockSwizzle_, ConvOperator, ConvProblemSize_>::Arguments::Arguments [with Mma_=cutlass::conv::threadblock::MmaTnPrecompPipelined<ThreadblockShape, cutlass::conv::threadblock::Dwconv2dTileIterator<cutlass::MatrixShape<64, 32>, ElementSrc, cutlass::layout::TensorNCHW, cutlass::transform::PitchLinearWarpRakedThreadMap<cutlass::layout::PitchLinearShape<32, 64>, 256, cutlass::layout::PitchLinearShape<4, 8>, 8>, 1, 0>, cutlass::transform::threadblock::RegularTileIterator<cutlass::MatrixShape<64, 32>, ElementSrc, cutlass::layout::RowMajorVoltaTensorOpMultiplicandCrosswise<16, 32>, 0, cutlass::transform::PitchLinearWarpRakedThreadMap<cutlass::layout::PitchLinearShape<32, 64>, 256, cutlass::layout::PitchLinearShape<4, 8>, 8>, 16>, cutlass::conv::threadblock::Dwconv2dTileFilterIteratorDgradPrecomp<cutlass::MatrixShape<32, 128>, ElementFilter, cutlass::layout::TensorNCHW, cutlass::transform::PitchLinearWarpRakedThreadMap<cutlass::layout::PitchLinearShape<128, 32>, 256, cutlass::layout::PitchLinearShape<8, 4>, 8>, 1>, cutlass::transform::threadblock::RegularTileIterator<cutlass::MatrixShape<32, 128>, ElementFilter, cutlass::layout::RowMajorVoltaTensorOpMultiplicandBCongruous<16>, 0, cutlass::transform::PitchLinearWarpRakedThreadMap<cutlass::layout::PitchLinearShape<128, 32>, 256, cutlass::layout::PitchLinearShape<8, 4>, 8>, 16>, ElementAccumulator, LayoutDst, cutlass::gemm::threadblock::MmaPolicy<cutlass::gemm::warp::MmaVoltaTensorOp<WarpShape, ElementSrc, cutlass::layout::RowMajorVoltaTensorOpMultiplicandCrosswise<16, 32>, ElementFilter, cutlass::layout::RowMajorVoltaTensorOpMultiplicandBCongruous<16>, ElementAccumulator, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<16, 16, 4>, 32, ElementSrc, cutlass::layout::RowMajor, ElementFilter, cutlass::layout::RowMajor, ElementAccumulator, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, __nv_bool>, cutlass::MatrixShape<0, 0>, cutlass::MatrixShape<0, 0>, 1>, cutlass::NumericArrayConverter<ElementSrc, ElementSrc, 8, cutlass::FloatRoundStyle::round_to_nearest>, cutlass::NumericArrayConverter<ElementFilter, ElementFilter, 16, cutlass::FloatRoundStyle::round_to_nearest>, _nv_bool>, Epilogue=cutlass::epilogue::threadblock::ConvolutionEpilogue<ThreadblockShape, cutlass::layout::TensorNCHW, 1, cutlass::gemm::warp::MmaVoltaTensorOp<WarpShape, ElementSrc, cutlass::layout::RowMajorVoltaTensorOpMultiplicandCrosswise<16, 32>, ElementFilter, cutlass::layout::RowMajorVoltaTensorOpMultiplicandBCongruous<16>, ElementAccumulator, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<16, 16, 4>, 32, ElementSrc, cutlass::layout::RowMajor, ElementFilter, cutlass::layout::RowMajor, ElementAccumulator, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, nv_bool>, cutlass::epilogue::threadblock::Dwconv2dPredicatedTileIterator<cutlass::epilogue::threadblock::OutputTileOptimalThreadMap<cutlass::epilogue::threadblock::OutputTileShape<128, 4, 4, 2, 1>, cutlass::epilogue::threadblock::OutputTileShape<1, 2, 1, 1, 2>, 256, 1, 16>, cutlass::layout::TensorNCHW, ElementDst>, cutlass::epilogue::warp::FragmentIteratorVoltaTensorOp<WarpShape, cutlass::gemm::GemmShape<32, 32, 4>, ElementAccumulator, cutlass::layout::RowMajor>, cutlass::epilogue::warp::TileIteratorVoltaTensorOp<WarpShape, cutlass::gemm::GemmShape<32, 32, 4>, ElementAccumulator, cutlass::layout::RowMajor>, cutlass::epilogue::threadblock::SharedLoadIterator<cutlass::epilogue::threadblock::OutputTileOptimalThreadMap<cutlass::epilogue::threadblock::OutputTileShape<128, 4, 4, 2, 1>, cutlass::epilogue::threadblock::OutputTileShape<1, 2, 1, 1, 2>, 256, 1, 16>::CompactedThreadMap, ElementAccumulator, 4>, cutlass::epilogue::threadblock::Dwconv2dBiasTileIterator<cutlass::layout::TensorNCHW, ElementDst, 1>, EpilogueOp, cutlass::MatrixShape<0, 2>, false>, ThreadblockSwizzle=SwizzleThreadBlock, ConvOperator=cutlass::conv::Operator::kDgrad, ConvProblemSize=cutlass::conv::Conv2dProblemSize]" matches the argument list
argument types are: ({...}, cutlass::TensorRef<ElementSrc, LayoutSrc>, cutlass::TensorRef<ElementSrc, LayoutSrc>, long, long, cutlass::TensorRef<ElementSrc, LayoutSrc>, {...})

3 errors detected in the compilation of "backward_data_fp16.cu".
error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1

Warning from setup.py

There is a concerning warning during the running process of setup.py:

forward_fp32.cu(212): warning: invalid narrowing conversion from "signed long" to "int"

I am concerned about what causes that warning. Thanks a lot!

Visualize the effective receptive field

大神您好，感谢您在重参数化方面的工作，我非常喜欢这篇文章，但同时我有个小问题，论文图一中关于有效感受野您是如何可视化的？如果方便的话能否将这段小code展示一下，相信能够帮助到很多像我这样刚入门的小白! 谢谢您

honestly, my computer CUDA is out of memory, maybe this model not suit for application.

even i cut the last features, i also get cuda out of memory wrong, do u have any solutions?

Reproducibility for large kernel conv

Excuse me. Can i got implementation in Google Colab? Thanks :). Then, is there function to fix the seed for reproducibility for Large kernel conv implementation or your library (depthwise_conv2d_implicit_gemm) ? because i still got different result from different training, after try fix seed reproducibility with pytorch only

About MobileNet shortcut

Can you elaborate on where the shortcut is used for MobileNet V2, Could you also release the code for modified MobileNet?
Many thanks!

Accuracy of EMA is even lower than the trained model

Hi Authors,

Thanks again for your excellent work. I am using your structure and running some experiments, but found that the accuracy of the EMA model is 1% lower than the original trained model. Have you noticed this problem? Or I have missed some important implementation about this.

Thanks,
Shiwei

Using DW 3x3 in stem block

Thank you for such a great paper, however I have some questions.
In section 4.1, you stated in the paper that : "we arrange a DW 3×3 layer to capture low-level patterns". Can you explain more about :

How DW 3x3 layer can capture low-level patterns ?
Does Conv 3x3 has that attribute ?
Why don't you use 1 conv 3x3 instead of dw 3x3 and conv 1x1 ?

how to make large kernel work?

i find the input image size is quiet huge when conduct downstream task,Is this necessary?If training on small pictures, such as 416*416, can the large convolution kernel model still maintain its advantage?
In addition, for the size of the data set, without pre-training on large data sets, only training in small data sets, is the advantage of the large convolution kernel model still?

Recommended envs for compiling 19_large_depthwise_conv2d_torch_extension?

Hi! Congrats for the wonderful work. I'm wondering what's the Recommended envs for compiling 19_large_depthwise_conv2d_torch_extension?

My Envs are as follows:
python 3.7
CUDA 10.2
gcc 5.4.0
torch 1.11.0
But when i implement "./setup.py install --user", I got errors like:

lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_tring<char>, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string<char>, at::Tensor>&’
...
/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1756, in _run_ninja_build                                                                          raise RuntimeError(message) from e 
RuntimeError: Error compiling objects for extension

About seg fine-tuning

Hello, I want to perform fine-tuning in the split task, but I can't find some_config.py in the command line. Do I need to write it myself?

DepthWiseConv2dImplicitGEMM has no 'padding' class attribute(actually zero)

When I ran directly replknet.py with using DepthWiseConv2dImplicitGEMM, I got this error:

RuntimeError: The size of tensor a (56) must match the size of tensor b (26) at non-singleton dimension 3

which seems likely a padding issuse with large kernel size. Then I checked replknet.py and depthwise_conv2d_implicit_gemm.py and found there is no padding parameter.

With debugging, I find out that DepthWiseConv2dImplicitGEMM has no attribute of padding(actually zero), which leads to when calling model.structural_reparam() and ReparamLargeKernelConv.merge_kernel(), incorrect conv2d class will be created (miss the using condtion of DepthWiseConv2dImplicitGEMM and fallback to create a nn.Conv2d with 0 padding).

Get error when of using provided pytorch implementation of DWConv.

I am using the provided DWConv implementation but get the following error.
Trackback:

    self._scaler.scale(loss).backward(create_graph=create_graph)
  File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/s0.3.5/lib/python3.7/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/s0.3.5/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
  File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/s0.3.5/lib/python3.7/site-packages/torch/autograd/function.py", line 253, in apply
    return user_fn(self, *args)
  File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/s0.3.5/lib/python3.7/site-packages/torch/cuda/amp/autocast_mode.py", line 135, in decorate_bwd
    return bwd(*args, **kwargs)
  File "/mnt/cache/liujihao/.local/lib/python3.7/site-packages/depthwise_conv2d_implicit_gemm-0.0.0-py3.7-linux-x86_64.egg/depthwise_conv2d_implicit_gemm.py", line 25
, in backward
    dx = _extension.backward_data_fp32(grad, w)
RuntimeError: input must be contiguous

Any intuition to solve this problem?

No module named 'timm.optim.novograd‘

File "D:\A_File\Project\RepLKNet-pytorch-main\optim_factory.py", line 19, in
from timm.optim.novograd import NovoGrad
ModuleNotFoundError: No module named 'timm.optim.novograd'

Is DepthWiseConv2dImplicitGEMM absolutely necessary?

Hello DingXiao
Can we just use the Conv2D in torch?
Are there other effects besides computational efficiency?

The feasibility of using dilation kernels

Thank you for your nice work!
You know, the large convolution kernel has too much parameters, which is constrained to be used in light model. So I think it is a good idea that replace large kernels with dilation kernels. For example, the 31x31 can be replaced with options as follows:
16x16, dilation=2
11x11, dilation=3
7x7, dilation=5, and etc.

How to use DepthWiseConv2dImplicitGEMM, I only have input channels, kernal, bais

Excuse me, how to use DepthWiseConv2dImplicitGEMM? I only have input channels, kernal, bais, how to replace ordinary nn.conv(in_channels, out_channels, kernel_size, bais), the input parameters seem unreasonable

NotImplementedError HELP PLEASE!

when i use this pretrained model as my change-detection model's backbone,i will get this error,31B,31L,XL,all got this,this's my runtime error,it seems like forward got problem?

Traceback (most recent call last):
File "training.py", line 249, in
run()
File "training.py", line 233, in run
train(
File "training.py", line 169, in train
training_phase(epc)
File "training.py", line 108, in training_phase
it_loss = evaluate(reference, testimg, mask)
File "training.py", line 83, in evaluate
generated_mask = model(reference, testimg).squeeze(1)
File "C:\Anaconda3\envs\TinyCD\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "D:\test\Tiny_model_4_CD\models\change_classifier.py", line 63, in forward
features = self._encode(ref, test)
File "D:\test\Tiny_model_4_CD\models\change_classifier.py", line 70, in _encode
ref, test = layer(ref), layer(test)
File "C:\Anaconda3\envs\TinyCD\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Anaconda3\envs\TinyCD\lib\site-packages\torch\nn\modules\module.py", line 201, in _forward_unimplemented
raise NotImplementedError
NotImplementedError

Porting to TF: need some info on preprocessing

Could you please tell what preprocessing did you used for train/eval models?

Normalization style:

tf: img / 127.5 - 1.
torch: (img / 255. - [0.485, 0.456, 0.406]) / [0.229, 0.224, 0.225]
caffe: img - [103.939, 116.779, 123.68]

Image crop/resize:

crop size if used
resize interpolation: bilinear / bicubic

Did you use same preprocessing for 1k and 21k models?

depthwise_conv2d_implicit_gemm installation

Thank you for your valuable work!!!

I have performed the setup.py by "python3 ./setup.py install --user" and installed depthwise_conv2d_implicit_gemm successfully. However, "depthwise_conv2d_implicit_gemm" and "_depthwise_conv2d_implicit_gemm_C" still can not be imported. I come here for some help. Thanks a lot.

the NET seems not working well with pretrained weight replknet31_base_224_pt1k_basecls.pkl

I put two dichotomies classification datasets into training,one about construction crack and background and the other about cat and dog,both experiments were conducted with default parameters and replknet31_base_224_pt1k_basecls.pkl.The result shows that the loss didn't decline and the evaluation accurracy stay unchanged, the same as the percentage of one particular class,which means the RepLKNet had no classification capability after efficient training.

percentage of cat pics is also about 49%

windoes --Python setup.py install --user is fail

error: command 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\nvcc.exe' failed with exit code 2

training for downstream task

thank you for your nice work. I have one question for you that when training for the downstream tasks, the procedure is that training RepLKNet -> replace backbone on downstream task -> inference after re-params? is it right?

KeyError: "CascadeRCNN: 'RepLKNet is not in the models registry

2022-04-08 13:36:40,827 - mmdet - INFO - Set random seed to 687365578, deterministic: False Traceback (most recent call last): File "/home/lbc/.local/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg return obj_cls(**args) File "/home/lbc/.local/lib/python3.7/site-packages/mmdet/models/detectors/cascade_rcnn.py", line 28, in __init__ init_cfg=init_cfg) File "/home/lbc/.local/lib/python3.7/site-packages/mmdet/models/detectors/two_stage.py", line 32, in __init__ self.backbone = build_backbone(backbone) File "/home/lbc/.local/lib/python3.7/site-packages/mmdet/models/builder.py", line 20, in build_backbone return BACKBONES.build(cfg) File "/home/lbc/.local/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build return self.build_func(*args, **kwargs, registry=self) File "/home/lbc/.local/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "/home/lbc/.local/lib/python3.7/site-packages/mmcv/utils/registry.py", line 45, in build_from_cfg f'{obj_type} is not in the {registry.name} registry')

Re-Parameterizing for switching

How can i turn off Rep in training and turn it on in validation?

Failed when installing 19_large_depthwise_conv2d_torch_extension

Hello, I tried many environments including Ubuntu20.04/Ubuntu18.04 with cuda11.1/cuda11.3+pytorch1.10, but still failed.

Could you share your detailed environment？ Such as the os version, gcc version, cuda driver version , nvcc version, pytorch version, and so on, it's even better if you can provide a docker image build file.

Two error during the compile for 19_large_depthwise_conv2d_torch_extension

My environment:
python 3.8.8
cuda 11.1
pytorch 1.7.1/1.8.1/1.9 all failed

2 errors detected in the compilation of "forward_fp32.cu".
error: command '/usr/local/cuda-11.1/bin/nvcc' failed with exit status 1

forward_fp32.cu(212): error: more than one instance of constructor "cutlass::Tensor4DCoord::Tensor4DCoord" matches the argu
ment list:
            function "cutlass::Tensor4DCoord::Tensor4DCoord(cutlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index, c
utlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index)"
            function "cutlass::Tensor4DCoord::Tensor4DCoord(cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::Long
Index, cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::LongIndex)"
            argument types are: (int64_t, int64_t, int64_t, int)

forward_fp32.cu(232): error: no instance of constructor "cutlass::conv::kernel::ImplicitBatchedGemmTnDepthwiseConvo[6/1944]
ma_, Epilogue_, ThreadblockSwizzle_, ConvOperator, ConvProblemSize_>::Arguments::Arguments [with Mma_=cutlass::conv::thread
block::MmaTnPrecompPipelined<ThreadblockShape, cutlass::conv::threadblock::Dwconv2dTileIterator<cutlass::MatrixShape<64, 8>
, float, cutlass::layout::TensorNCHW, cutlass::transform::PitchLinearStripminedThreadMap<cutlass::layout::PitchLinearShape<
8, 64>, 128, 1>, 1, 0>, cutlass::conv::threadblock::RegularTileIteratorTransposed<cutlass::MatrixShape<64, 8>, float, cutla
ss::layout::ColumnMajor, 1, cutlass::conv::threadblock::DefaultMmaCore<ThreadblockShape, WarpShape, cutlass::gemm::GemmShap
e<1, 1, 1>, float, cutlass::layout::TensorNCHW, 1, float, cutlass::layout::TensorNCHW, 1, ElementDst, LayoutDst, cutlass::$
rch::OpClassSimt, 2, cutlass::arch::OpMultiplyAdd, true, cutlass::conv::ImplicitGemmMode::GEMM_TN, cutlass::arch::CacheOper
ation::Global, cutlass::arch::CacheOperation::Global>::TransposedPitchLinearThreadMapVec, 4>, cutlass::conv::threadblock::D
wconv2dTileFilterIteratorFpropPrecomp<cutlass::MatrixShape<8, 128>, float, cutlass::layout::TensorNCHW, cutlass::conv::thre
adblock::PitchLinearStripminedThreadMapStrided<cutlass::layout::PitchLinearShape<128, 8>, 128, 1>, 1>, cutlass::transform::
threadblock::RegularTileIterator<cutlass::MatrixShape<8, 128>, float, cutlass::layout::RowMajor, 0, cutlass::conv::threadbl
ock::PitchLinearStripminedThreadMapStrided<cutlass::layout::PitchLinearShape<128, 8>, 128, 1>, 4>, ElementDst, LayoutDst, c
utlass::gemm::threadblock::MmaPolicy<cutlass::gemm::warp::MmaSimt<WarpShape, float, cutlass::layout::ColumnMajor, float, cu
tlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8,
4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, 1, cutlass::ComplexTransform::kNone, cutla
ss::ComplexTransform::kNone, __nv_bool>, cutlass::MatrixShape<4, 0>, cutlass::MatrixShape<0, 0>, 1>, cutlass::NumericArrayC
onverter<float, float, 4, cutlass::FloatRoundStyle::round_to_nearest>, cutlass::NumericArrayConverter<float, float, 8, cutl
ass::FloatRoundStyle::round_to_nearest>, __nv_bool>, Epilogue_=cutlass::epilogue::threadblock::ConvolutionEpilogue<Threadbl
ockShape, cutlass::layout::TensorNCHW, 1, cutlass::gemm::warp::MmaSimt<WarpShape, float, cutlass::layout::ColumnMajor, floa
t, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShap
e<8, 4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, 1, cutlass::ComplexTransform::kNone,
cutlass::ComplexTransform::kNone, __nv_bool>, cutlass::epilogue::threadblock::Dwconv2dPredicatedTileIterator<cutlass::epilo
gue::threadblock::OutputTileOptimalThreadMap<cutlass::epilogue::threadblock::OutputTileShape<128, 1, 8, 1, 1>, cutlass::epi
logue::threadblock::OutputTileShape<1, 4, 2, 1, 8>, 128, 1, 32>, cutlass::layout::TensorNCHW, ElementDst>, cutlass::epilogu
e::warp::FragmentIteratorSimt<WarpShape, cutlass::gemm::thread::Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::layo
ut::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd, __n
v_bool>, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8, 4>, cutlass::layout::RowMajo
rInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, cutlass::epilogue::warp::SimtPolicy<WarpShape, cutlass::gemm::thread::
Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::layout::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, c
utlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd, __nv_bool>, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimt
Policy<cutlass::MatrixShape<8, 4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>>>, cutlass::
epilogue::warp::TileIteratorSimt<WarpShape, cutlass::gemm::thread::Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::l
ayout::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd,
__nv_bool>, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8, 4>, cutlass::
layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>>, cutlass::epilogue::threadblock::SharedLoadIterator<cut
lass::epilogue::threadblock::OutputTileOptimalThreadMap<cutlass::epilogue::threadblock::OutputTileShape<128, 1, 8, 1, 1>, c
utlass::epilogue::threadblock::OutputTileShape<1, 4, 2, 1, 8>, 128, 1, 32>::CompactedThreadMap, ElementDst, 4>, cutlass::ep
ilogue::threadblock::Dwconv2dBiasTileIterator<cutlass::layout::TensorNCHW, ElementDst, 1>, EpilogueOp, cutlass::MatrixShape
<0, 17>, false>, ThreadblockSwizzle_=SwizzleThreadBlock, ConvOperator=cutlass::conv::Operator::kFprop, ConvProblemSize_=cut
lass::conv::Conv2dProblemSize]" matches the argument list
argument types are: ({...}, cutlass::TensorRef<ElementSrc, LayoutSrc>, cutlass::TensorRef<ElementSrc, LayoutSrc>, long, long, cutlass::TensorRef<ElementSrc, LayoutSrc>, {...})

Compiling 19_large_depthwise_conv2d_torch_extension

Hi, thanks for sharing the code. I have issue when compling the file of 19_large_depthwise_conv2d_torch_extension.
Python 3.6
torch 1.8
cuda 11.1
vs2019
as follow:

Any ideas?

setup.py build error

F:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\xutility(4430): error: function "torch::OrderedDict<Key, Value>::Item::operator=(const torch::OrderedDict<std
::string, at::Tensor>::Item &) [with Key=std::string, Value=at::Tensor]" (declared implicitly) cannot be referenced -- it is a deleted function
detected during:
instantiation of "_OutIt std::_Move_unchecked(_InIt, _InIt, _OutIt) [with _InIt=torch::OrderedDict<std::string, at::Tensor>::Item *, _OutIt=torch::OrderedDict<std::string, at::Tensor>::Item *]"
F:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\vector(1419): here
instantiation of "std::vector<_Ty, _Alloc>::iterator std::vector<_Ty, _Alloc>::erase(std::vector<_Ty, _Alloc>::const_iterator) [with _Ty=torch::OrderedDict<std::string, at::Tensor>::Item, _Alloc
=std::allocator<torch::OrderedDict<std::string, at::Tensor>::Item>]"
D:\env\pytorh\lib\site-packages\torch\include\torch/csrc/api/include/torch/ordered_dict.h(419): here
instantiation of "void torch::OrderedDict<Key, Value>::erase(const Key &) [with Key=std::string, Value=at::Tensor]"
D:\env\pytorh\lib\site-packages\torch\include\torch\csrc\api\include\torch/nn/modules/container/parameterdict.h(51): here

F:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\xutility(4430): error: function "torch::OrderedDict<Key, Value>::Item::operator=(const torch::OrderedDict<std
::string, std::shared_ptrtorch::nn::Module>::Item &) [with Key=std::string, Value=std::shared_ptrtorch::nn::Module]" (declared implicitly) cannot be referenced -- it is a deleted function
detected during:
instantiation of "_OutIt std::_Move_unchecked(_InIt, _InIt, _OutIt) [with _InIt=torch::OrderedDict<std::string, std::shared_ptrtorch::nn::Module>::Item *, _OutIt=torch::OrderedDict<std::string
, std::shared_ptrtorch::nn::Module>::Item *]"
F:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include\vector(1419): here
instantiation of "std::vector<_Ty, _Alloc>::iterator std::vector<_Ty, _Alloc>::erase(std::vector<_Ty, _Alloc>::const_iterator) [with _Ty=torch::OrderedDict<std::string, std::shared_ptr<torch::nn
::Module>>::Item, _Alloc=std::allocator<torch::OrderedDict<std::string, std::shared_ptrtorch::nn::Module>::Item>]"
D:\env\pytorh\lib\site-packages\torch\include\torch/csrc/api/include/torch/ordered_dict.h(419): here
instantiation of "void torch::OrderedDict<Key, Value>::erase(const Key &) [with Key=std::string, Value=std::shared_ptrtorch::nn::Module]"
D:\env\pytorh\lib\site-packages\torch\include\torch\csrc\api\include\torch/nn/modules/container/moduledict.h(196): here

ModuleNotFoundError: No module named '_depthwise_conv2d_implicit_gemm_C'

Hi, thanks for your nice work. I encountered a problem when I ran python depthwise_conv2d_implicit_gemm.py. I got ModuleNotFoundError: No module named '_depthwise_conv2d_implicit_gemm_C'. Could you please tell me how I can solve it? Thanks a lot.

IndexError: list index out of range

Hi,

Thank you for sharing your interesting work. I run into this error when trying to run this command ".setep.py install -user":

No CUDA runtime is found, using CUDA_HOME='/home/shiweil/cuda-10.3'
running install
running bdist_egg
running egg_info
writing depthwise_conv2d_implicit_gemm.egg-info/PKG-INFO
writing dependency_links to depthwise_conv2d_implicit_gemm.egg-info/dependency_links.txt
writing top-level names to depthwise_conv2d_implicit_gemm.egg-info/top_level.txt
/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/torch/utils/cpp_extension.py:387: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'depthwise_conv2d_implicit_gemm.egg-info/SOURCES.txt'
writing manifest file 'depthwise_conv2d_implicit_gemm.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building '_depthwise_conv2d_implicit_gemm_C' extension
gcc -pthread -B /home/shiweil/miniconda3/envs/pt1.10_cuda11.3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I. -I/gpfs/home5/shiweil/Projects/cutlass/include -I/gpfs/home5/shiweil/Projects/cutlass/tools/library/include -I/gpfs/home5/shiweil/Projects/cutlass/tools/util/include -I/gpfs/home5/shiweil/Projects/cutlass/examples/common -I/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/torch/include -I/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/torch/include/TH -I/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/torch/include/THC -I/home/shiweil/cuda-10.3/include -I/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/include/python3.8 -c frontend.cpp -o build/temp.linux-x86_64-3.8/frontend.o -g -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=_depthwise_conv2d_implicit_gemm_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
Traceback (most recent call last):
File "./setup.py", line 9, in
setup(
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/setuptools/command/install.py", line 67, in run
self.do_egg_install()
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/setuptools/command/install.py", line 109, in do_egg_install
self.run_command('bdist_egg')
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 164, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 150, in call_command
self.run_command(cmdname)
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/distutils/command/install_lib.py", line 107, in build
self.run_command('build_ext')
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 741, in build_extensions
build_ext.build_extensions(self)
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
objects = self.compiler.compile(sources,
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/distutils/ccompiler.py", line 574, in compile
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 489, in unix_wrap_single_compile
cflags = unix_cuda_flags(cflags)
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 456, in unix_cuda_flags
cflags + _get_cuda_arch_flags(cflags))
File "/home/shiweil/miniconda3/envs/pt1.10_cuda11.3/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1626, in _get_cuda_arch_flags
arch_list[-1] += '+PTX'
IndexError: list index out of range

Do you know how to fix this?

Many thanks,
Shiwei

depthwise_conv2d_implicit_gemm slower than nn.Conv2d

🐛 Describe the bug
Calling depthwise_conv2d_implicit_gemm.DepthWiseConv2dImplicitGEMM, on CUDA, is orders of magnitude slower than calling torch.nn.Conv2d.

I have installed it according to README.

cc: @DingXiaoH
Versions
torch 1.8.2+cuda11.1
cuda-11.1.1 + cudnn-8.1.1
both A100 and V100

Which checkpoints are correct

Right now there are multiple checkpoints for some models trained on imagenet1k:
RepLKNet-31B | 224x224
RepLKNet-31B | 384x384

E.g. https://www.evernote.com/l/AS_rHS0dS-hHpJxKoCY-EPvXh1U8w4HJFpc

Which of them should be used?

training problems

Hi, it is a wonderful work!
But I have two problems:

why the training time is so long?
too much video memory occupied.

ERF is smaller after changing the input image size

i changed the input size to 224×224，and compare RepLKNet with resnet101, it seems that resnet101 is larger than RepLKNet-13,
left is RepLKNet-13, right is resnet101

is RepLKNet works well on low-level vision tasks?

Hello, RepLKNet works well on classification, detection, segmentation, is it also works on low-level vision tasks such as super-resolution, denoising.

CUDA out of memory when testing on cityspaces

When I perform the semantic segmentation test according to your method, the following errors are displayed：
//
(open-mmlab) liugengyuan@liugengyuan-Lenovo-Legion-R9000P2021H:~/mmsegmentation$ python -m torch.distributed.launch --nproc_per_node=1 tools/test.py configs/replknet/RepLKNet-31B_1Kpretrain_upernet_80k_cityscapes_769.py RepLKNet-31B_ImageNet-1K_UperNet_Cityscapes.pth --launcher pytorch --eval mIoU

"CLASSES" not found in meta, use dataset.CLASSES instead
"PALETTE" not found in meta, use dataset.PALETTE instead
[ ] 0/500, elapsed: 0s, ETA:/home/liugengyuan/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
/home/liugengyuan/mmsegmentation/mmseg/ops/wrappers.py:23: UserWarning: When align_corners=True, the output would more aligned if input size (6, 6) is x+1 and out size (33, 65) is nx+1
f'When align_corners={align_corners}, '
Traceback (most recent call last):
File "tools/test.py", line 320, in
main()
File "tools/test.py", line 297, in main
format_args=eval_kwargs)
File "/home/liugengyuan/mmsegmentation/mmseg/apis/test.py", line 208, in multi_gpu_test
result = model(return_loss=False, rescale=True, **data)
File "/home/liugengyuan/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/liugengyuan/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/liugengyuan/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/liugengyuan/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 110, in new_func
return old_func(*args, **kwargs)
File "/home/liugengyuan/mmsegmentation/mmseg/models/segmentors/base.py", line 110, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/liugengyuan/mmsegmentation/mmseg/models/segmentors/base.py", line 92, in forward_test
return self.simple_test(imgs[0], img_metas[0], **kwargs)
File "/home/liugengyuan/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 262, in simple_test
seg_logit = self.inference(img, img_meta, rescale)
File "/home/liugengyuan/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 247, in inference
seg_logit = self.whole_inference(img, img_meta, rescale)
File "/home/liugengyuan/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 206, in whole_inference
seg_logit = self.encode_decode(img, img_meta)
File "/home/liugengyuan/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 74, in encode_decode
out = self._decode_head_forward_test(x, img_metas)
File "/home/liugengyuan/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 96, in _decode_head_forward_test
seg_logits = self.decode_head.forward_test(x, img_metas, self.test_cfg)
File "/home/liugengyuan/mmsegmentation/mmseg/models/decode_heads/decode_head.py", line 222, in forward_test
return self.forward(inputs)
File "/home/liugengyuan/mmsegmentation/mmseg/models/decode_heads/uper_head.py", line 138, in forward
output = self._forward_feature(inputs)
File "/home/liugengyuan/mmsegmentation/mmseg/models/decode_heads/uper_head.py", line 132, in _forward_feature
fpn_outs = torch.cat(fpn_outs, dim=1)
RuntimeError: CUDA out of memory. Tried to allocate 1.01 GiB (GPU 0; 5.78 GiB total capacity; 2.33 GiB already allocated; 578.75 MiB free; 3.34 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
//

I have just come into contact with deep learning, so I don't know much about something. I use a single GPU, is it because I don't have enough video memory? How can I solve this problem.Thank you!

bias

Why should bias be set to false，thank you！

RuntimeError: Error compiling objects for extension

When I execute python setup.py install in Windows 10, I get an error.

Traceback (most recent call last):
File "D:\Program-Codes\Python-Codes\RepLKNet-pytorch-main\RepLKNet-pytorch-main\cutlass\examples\19_large_depthwise_conv2d_torch_extension\setup.py", line 9, in
setup(
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools_init_.py", line 87, in setup
return distutils.core.setup(**attrs)
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools_distutils\core.py", line 148, in setup
return run_commands(dist)
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools_distutils\core.py", line 163, in run_commands
dist.run_commands()
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools_distutils\dist.py", line 967, in run_commands
self.run_command(cmd)
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools\dist.py", line 1214, in run_command
super().run_command(command)
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools_distutils\dist.py", line 986, in run_command
cmd_obj.run()
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools\command\install.py", line 74, in run
self.do_egg_install()
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools\command\install.py", line 123, in do_egg_install
self.run_command('bdist_egg')
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools_distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools\dist.py", line 1214, in run_command
super().run_command(command)
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools_distutils\dist.py", line 986, in run_command
cmd_obj.run()
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools\command\bdist_egg.py", line 165, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools\command\bdist_egg.py", line 151, in call_command
self.run_command(cmdname)
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools_distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools\dist.py", line 1214, in run_command
super().run_command(command)
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools_distutils\dist.py", line 986, in run_command
cmd_obj.run()
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools\command\install_lib.py", line 11, in run
self.build()
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools_distutils\command\install_lib.py", line 107, in build
self.run_command('build_ext')
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools_distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools\dist.py", line 1214, in run_command
super().run_command(command)
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools_distutils\dist.py", line 986, in run_command
cmd_obj.run()
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools\command\build_ext.py", line 79, in run
_build_ext.run(self)
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools_distutils\command\build_ext.py", line 339, in run
self.build_extensions()
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\torch\utils\cpp_extension.py", line 765, in build_extensions
build_ext.build_extensions(self)
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools_distutils\command\build_ext.py", line 448, in build_extensions
self._build_extensions_serial()
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools_distutils\command\build_ext.py", line 473, in _build_extensions_serial
self.build_extension(ext)
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools\command\build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\setuptools_distutils\command\build_ext.py", line 528, in build_extension
objects = self.compiler.compile(sources,
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\torch\utils\cpp_extension.py", line 738, in win_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\torch\utils\cpp_extension.py", line 1487, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "D:\Progra-Envir\Anaconda\envs\pytorch\lib\site-packages\torch\utils\cpp_extension.py", line 1824, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

Has anyone solved this problem?thank you.

There is a big gap between the accuracy of Miou running in mmseg and that in the paper

There is a big gap between the accuracy of Miou running in mmseg and that in the paper, and the running configuration is replknet-31b_ 22Kpretrain_ upernet_ 160k_ ade20k_ 640.py. The environment is a 3080ti GPU of jupyter. Do you know the reason

Plug-and-play model

作者您好，感谢你们精彩的工作，我有一个疑问，您的paper中的大卷积核**是否可以迁移到resnet中去替换resnet中的3*3卷积？谢谢

How to extract intermediate features?

I am trying to make some changes to RepLKNet. So I seperate the network into several parts by using ‘.children’. My codes are shown below

import torch
import torch.nn as nn
import torchvision.utils
import torchvision.models as tv_models

from networks import replknet 

if __name__ == "__main__":

   basenet = replknet.create_RepLKNet31B(small_kernel_merged=False,use_checkpoint=True)

   self_stem_block = list(basenet.children())[0]
   self_main_block_0 = list(basenet.children())[1][0]
   self_main_block_1 = list(basenet.children())[1][1]
   self_main_block_2 = list(basenet.children())[1][2]
   self_main_block_3 = list(basenet.children())[1][3]
   self_out_conv = list(basenet.children())[2]
   self_sync_bn = list(basenet.children())[3]
   self_avg_pool = list(basenet.children())[4]
   self_classifier = list(basenet.children())[5]

   x = torch.ones(1,3,224,224).cuda()

   x = self_stem_block(x)

Then it gives the error

Traceback (most recent call last):
  File "test_load_model.py", line 66, in <module>
    x = self_stem_block(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 201, in _forward_unimplemented
    raise NotImplementedError
NotImplementedError

It seems that the type of self_stem_block is nn.ModuleList(). So it does not have forward function.

Anyone knows how to extract intermediate features?

Any suggestion is appreciated.

Validation metric difference between saving state_dict() and the whole model.

Thanks for your excellent work!

I use RepLKNet as the backbone of my depth estimation network. After validating the model at training time, I save it and immediately load it to validate again, but I get different validation metrics from the training time.

I use the standard way in PyTorch to save the state_dict() of RepLKNet, and when I use torch.save() to save the whole model rather than only save the state_dict() of the backbone, this problem disappears. Why does this happen? Looking forward to your reply.