Coder Social home page Coder Social logo

Comments (9)

keyu-tian avatar keyu-tian commented on July 19, 2024

Thank you! 200 ep for pretraining can be a bit insufficient. You may need to adjust some hyperparameters like doubling the learning rate, decreasing the drop path rate, etc.

For Hybird CNN-transformer backbone, SparK can be directly applied on them because SparK does not change the model architecture or parameters. You can refer to our SparK to use sparse layers on CNN (like what we do in https://github.com/keyu-tian/SparK/blob/main/pretrain/encoder.py#L165) and use a multi-scale decoder to reconstruct images (like in https://github.com/keyu-tian/SparK/blob/main/pretrain/decoder.py).

from spark.

bollossom avatar bollossom commented on July 19, 2024

ok, Thank you very much!!!!

from spark.

bollossom avatar bollossom commented on July 19, 2024

Hello, I found during pre-training that using the sparse convolution you gave, it takes about 100 minutes to train one epoch. Is there anything I can do to speed this up?
The training settings are: a model with a total size of about 70M, trained using 8 A100s with 60G memory.
For example, use the sparse convolution implementation in MinkowskiEngine or write a cuda operator?

from spark.

keyu-tian avatar keyu-tian commented on July 19, 2024

@bollossom can you provide details on model type, dataset size and input size, batch size, and GPU utilization? BTW what does 60G video memory mean?

Generally, I believe using MinkowskiEngine won't speed up too much because 1) the masked images are way denser than 3d point clouds, and 2) the lack of optimization on sparse depthwise convolution, sparse group norm, etc.

from spark.

bollossom avatar bollossom commented on July 19, 2024

好的,我首先对我之前的错误的叙述抱歉,我们的模型是使用了sparsk的MC-MAE [base],bs为512,在imagenet上预训练,使用8张64GB的显卡跑,目前跑预训练一轮大概100分钟左右, 不知道有没有好的加速方式

from spark.

keyu-tian avatar keyu-tian commented on July 19, 2024

这个训练速度感觉应该有bug,可以排查下,比如看下GPU利用率和显存占用,或者用 pytorch profiler log一下哪里速度慢
作为参考,ConvNeXt-Base bs=4096 on 32 A100s,每个 epoch 约5分钟

from spark.

bollossom avatar bollossom commented on July 19, 2024

好的,谢谢您的细心指导,请问spark-ResNet-101, bs=4096,32 A100s,预训练大概多少分钟一轮呀

from spark.

keyu-tian avatar keyu-tian commented on July 19, 2024

和ConvNeXt-Base 差不多

from spark.

bollossom avatar bollossom commented on July 19, 2024

好的谢谢您

from spark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.