Coder Social home page Coder Social logo

Comments (8)

Laenita avatar Laenita commented on August 19, 2024

Oh and the newer gpu and the much weaker one trains the same length of time so somewhere is a bottle neck.

from darts.

madtoinou avatar madtoinou commented on August 19, 2024

Hi @Laenita,

Would you mind sharing the value of the parameters? So that we can have an idea of the number of parameters/size of the model.

Is the GPU acceleration being used at 1% for both the old and the new devices?

The pl_trainer_kwargs argument looks good, this is what Pytorch-Lightning expects to enable this acceleration. I would recommend looking up their documentation at this this what Darts relies on for the deep learning models.

from darts.

Laenita avatar Laenita commented on August 19, 2024

Hi @madtoinou

Of course here are my parameters for my model I hope this helps:
input_length_chunk = 20
forecasting_horizon = 3
number_stacks = 4
number_blocks = 5
number_layers = 5
batch_size = 64
dropout_rate = 0.1
number_epochs = 180
number_epochs_val_period = 1

And yes, both the old and newer (and much faster) GPU's are both only showing 1% utilisation and also training the same time on the same model, indicating that something is wrong and heavy under-utilising.

But also the num_loader_workers=1 is not working at all for me, takes more than an hour with num_loader_workers >0.

Thanks for your assistance!

from darts.

igorrivin avatar igorrivin commented on August 19, 2024

Yes, I have the same problem: I am told that num_loader_workers is not a legit parameter.

from darts.

madtoinou avatar madtoinou commented on August 19, 2024

Hi @igorrivin & @Laenita,

As mentioned in another tread, the PR ##2295 is adding support for those arguments. Maybe try installing this branch/copy the changes and see if it solves the bottleneck?

from darts.

Laenita avatar Laenita commented on August 19, 2024

Hi @madtoinou

I have copied the changes from PR ##2295
But now whenever I add persistent_workers= True and num_loader_workers=16 (or even just 1) it gets stuck on Sanity_checking? Did I maybe miss anything? Thank you for your assistance!

from darts.

madtoinou avatar madtoinou commented on August 19, 2024

Which sanity checking are you referring to?

from darts.

Laenita avatar Laenita commented on August 19, 2024

Hi @madtoinou, the best explanation I can show is this PNG where the model first goes into a Sanity Checking Phase before starting training:
Sanity Checking

from darts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.