Coder Social home page Coder Social logo

Need help to speed up model fitting: How can we go parallel processing using different cores for fitting all the models at the same time? about hmsc HOT 4 OPEN

soumyabrrl avatar soumyabrrl commented on July 17, 2024
Need help to speed up model fitting: How can we go parallel processing using different cores for fitting all the models at the same time?

from hmsc.

Comments (4)

jarioksa avatar jarioksa commented on July 17, 2024

I understand you want to run several separate processes simultaneously, each with different model setting. Look at the standard package parallel (it is in the default installation of R). If you used lapply to launch these sequential processes, you can use similarly parLapply (in Windows) or mclapply (in other systems than Windows), and there are also other alternatives. See the help for these commands and the parallel package info in general, including its vignette.

You should take care of a couple of details:

  1. Your sampling runs are of very different lengths, and this means that they will end in very different times. If you launch more processes than cores (cpus) in your system, the default is to wait for the last one to end before launching new processes simultaneously in all cpus. Short process will wait till the longest one ends. So you should set off prescheduling (see the manual pages for parallel commands), or you may be running slower than in sequential processing (where the last process will take ~90% of running time, and all previous were run in 10% before that).
  2. You should think how to set random number seeds for each parallel process. Look at the documentation of the parallel package, in particular the vignette.
  3. If you launch several model runs in parallel, you may have trouble if you also run chains in parallel in each of these processes. Running parallel processes within parallel processes may be suppressed in your system, and you may have no gain. On the other hand, if parallel processing within parallel processes is allowed, you may run out of free processors and have a race situation and your system can stall, and runs can be much slower than in sequential runs of models.
  4. All other things of parallel processing hold. See the parallel vignette, and in particular take care that you have sufficient memory for handling several models in parallel, or you may stall.

from hmsc.

jarioksa avatar jarioksa commented on July 17, 2024

I do think that parallel processing of your models makes little sense, and sequential processing is a sensible strategy. You increase thin 10-fold in each model and this gives approximately similar 10-fold increase in running time of each model. So the relative running times are 1, 10, 100, 1000, 10000 etc, and all previous models took only 11% of the time needed to run your current model. This means that theoretical maximum saving in computing time is 11% given by the longest running model. Further, the purpose of this sequence of models is not to run them all, but find sufficient thinning. When you run models sequentially from faster to slower, you can analyse the last finished model while the next is running, and if the diagnostics indicate that thinning is sufficient, you can keep that model and terminate the running process (which would take nearly 10 times longer than all models so far). The purpose is not to run all these models, but stop as soon as you can. This makes little sense in parallel processing.

Parallel processing can make sense if you have alternative model structures (different fixed effects, random effects, etc.), and you really need to run all these models to compare them later. In that case you can use parallelization tools that are provided by your operating environment and launch several models in parallel. That is a service that is above and outside our Hmsc package. So read guides to parallel package.

from hmsc.

soumyabrrl avatar soumyabrrl commented on July 17, 2024

Thanks, Jari, for the directions. Actually, my system has a moderate setup. So, it was taking a colossal time. I asked about this because I want to remotely move the thinning in a supercomputer facility with multiple cores to appoint for parallel processing.

from hmsc.

jarioksa avatar jarioksa commented on July 17, 2024

Parallel processing can be performed at various levels in Hmsc

  1. You can launch several sampleMcmc models with different definitions. This must be done at the level of R shell and the starting point is to use functions in parallel package to launch sampleMcmc calls.
  2. You can run several chains in parallel in sampleMcmc using argument nParallel. Each chain must be run sequentially, but chains run in prallel.
  3. You may have parallelized linear algebra libraries (BLAS) in your system. In supercomputer you probably have such optimized libraries, but you can also use them in smaller systems (and they may already be in use depending on your sysadmin – that may be you...). Most of the time in sampleMcmc is spent in matrix algebra, and these parallelized libraries can give considerable speed-up even when you nothing about them or their existence.

from hmsc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.