Coder Social home page Coder Social logo

Comments (7)

florianhartig avatar florianhartig commented on July 20, 2024

Hello Rui,

the likelihood is automatically vectorised in createBayesianSetup, so if you set parallel = T, your likelihood should be automatically parallelised. You should see that several Rsessions are open on your system in this case.

If your likelihood is very easy to compute, this will not show in much additional CPU load because most of the time, the CPUs are idle and time is spend in communicating within the socket cluster. I have found in practice that likelihood parallelisation makes sense for likelihoods with > 50ms evaluation time or so. For faster likelihoods, it makes more sense to parallelise the MCMC chains,

Best
Florian

from bayesiantools.

lirui0321 avatar lirui0321 commented on July 20, 2024

Hi Florian,

Many thanks for helping me address my question!

It takes ~1min to compute my likelihood function, where ODEs are computed multiple times against different data sets. I turned on MCMC for 10000 iterations 2 days ago while I posted this question, and it just finished 2000 iterations this morning. The package returned a message "parallel function execution created with39cores", and as you mentioned, R opened multiple sessions after that. However, the total CPU usage is less than 10%, so it seems that the computation has not taken advantage of parallelization. I am wondering if I coded likelihood function wrong, such that computation has to be done sequentially somehow. I remember that when I use package "DEoptim" for parallel computation, it requires, as an argument, a list of names of packages and functions used in my likelihood calculation. Does BayesianTools have a similar requirement?

Rui

from bayesiantools.

florianhartig avatar florianhartig commented on July 20, 2024

Hello Rui,

you can control package export by hand, but per default in BT, your entire environment (data + packages) are exported, so be careful what you have in your environment or control by hand.

What algorithm are you using? Note that parallelisation can only work up to the number of internal chains in your algorithm - so if you run a DEzs with 3 internal chains, it doesn't help if you have 39 cores, it will still only use 3 cores at a time.

If you have a computer with 40 cores, I would think the best use is to run three independent MCMC chains (has to be done by hand) and then set up the DEzs with parallel chains to make best use of your hardware.

Best
F

from bayesiantools.

lirui0321 avatar lirui0321 commented on July 20, 2024

Hi Florian,

Please forgive my naive MCMC questions. I am using a metropolis / AM sampler. Does it mean that I can at best use 3 cores if I only have 3 MCMC chains? If I switch to DEzs sampler, I can maximize the use of CPU by increasing the number of internal chains of DEzs algorithm. Is this a correct understanding?

Could you provide me an example about how to set the number of internal chain vs independent MCMC chain? I believe that I can change the number of MCMC chain by specifying "nrChains" in runMCMC settings list. How to change the number of DEzs internal chain?

In addition, is there any general guidance about when should we use which sampling algorithm?

Thank you!
Rui

from bayesiantools.

florianhartig avatar florianhartig commented on July 20, 2024

Hi Rui,

MCMCs are usually not parallelizable, because the next step depends on the previous step.

There are only a few specific things that can me parallelised, e.g. you can do parallel proposals all Samplers that apply rejection (basically all Samplers in BT, but not implemented in BT), our you can calculate the chains in population MCMCs such as DEzs in parallel.

If you want to use a large number of cores, you should probably go for an SMC, see our recent paper Speich, M., Dormann, C. F., & Hartig, F. (2021). Sequential Monte-Carlo algorithms for Bayesian model calibration–A review and method comparison✰. Ecological Modelling, 455, 109608. https://doi.org/10.1016/j.ecolmodel.2021.109608 The code for this is in a branch of the BT GitHub repo, I haven't managed yet to merge it into the main branch.

As a default for most users with complicated models are runtime problems, I would recommend to

  1. Use DEzs and possibly increase the number of internal chains (this is set by the z-matrix, see help of DEzs)
  2. Turn on parallelisation
  3. If you want to run several independent MCMC chains for convergence checks (recommended), run this in parallel as well, see https://cran.r-project.org/web/packages/BayesianTools/vignettes/InterfacingAModel.html#parallelization

There is an open issue #181 to improve the documentation on the parallelisation, and I'll take this as a nudge to bump this up the priority list

from bayesiantools.

lirui0321 avatar lirui0321 commented on July 20, 2024

Many thanks for detailed explanation, Florian!

For my curiosity, is there a plan to include additional "popular" samplers into BayesianTools? For example, Gibbs and NUTS offered in BUGS and Stan? In your opinion, what are advantage and disadvantage of these samplers over SMC and DE you've included in BayesianTools?

from bayesiantools.

florianhartig avatar florianhartig commented on July 20, 2024

No, currently my idea is that BT will only include "black box samplers" that do not require either derivatives or the structure of the likelihood. For Gibbs or NUTS, you would need a metalanguage such as in JAGS or STAN that allows the sampler to understand the mathematical structure of the likelihood.

from bayesiantools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.