florianhartig / bayesiantools Goto Github PK

View Code? Open in Web Editor NEW

115.0 14.0 29.0 3.94 MB

General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics

Home Page: https://cran.r-project.org/web/packages/BayesianTools/index.html

R 82.38% C++ 0.39% C 0.09% Shell 0.34% RMarkdown 16.80%

cran mcmc smc bayes ecological-models systems-biology optimization

bayesiantools's People

Stargazers

Watchers

bayesiantools's Issues

zUpdateFrequency

currently we have a fixed update interval for the z-Matrix (memory) - the idea is to set this higher for memory / speed reasons in cases where the algorithm needs many iterations, but what I don't like about a fixed interval is that the z-matrix gets filled slowly in the beginning of the MCMC.

Hence, what I would like to do is

fill the Z matrix every iteration until it is full (so we define a parameter max size), after that we can update less frequently

 - if we wanted, can take care that the sample stays representative over the chain, so, for example, if Z gets full, throw 1/2 of Z, and then update with a thinning of 2, so that new values are not more dense than old values, and if it's full again through out again 1/2, but update with a thinning of 1/3 and so on.

 - however, we could program it such that the more recent parts of the chain have a higher representation in the Z matrix, with the idea that they are better converged. However, this might lead to similar problems in a detailed balance proof than for the standard adaptive Metropolis?

in any case, the MCMC chain can then be written parallel and has it's own thinning interval.

References for MCMC samplers

I added a bunch of references for the MCMC samplers in the Vignette, Stefan, can you have a look whether this is the actual reference that you used, and also, could you add the references to the help files where appropriate.

Otherwise, I think the vignette is now cleaned up and, while not great, I think it's OK for a first CRAN release.

DEzs error message with missing boundaries

If lower / upper are not provided in the prior, I get a cryptic error message

# Create a general prior distribution by specifying an arbitrary density function and a
# corresponding sampling function
density = function(par){
  d1 = dunif(par[1], -2,6, log =TRUE)
  d2 = dnorm(par[2], mean= 2, sd = 3, log =TRUE)
  return(d1 + d2)
}

# The sampling is optional but recommended because the MCMCs can generate automatic starting
# conditions if this is provided
sampler = function(n=1){
  d1 = runif(n, -2,6)
  d2 = rnorm(n, mean= 2, sd = 3)
  return(cbind(d1,d2))
}

prior <- createPrior(density = density, sampler = sampler, 
                     lower = NULL, upper = NULL, best = NULL)


# Use this prior in an MCMC 

ll <- function(x) sum(dnorm(x, log = T)) # multivariate normal ll
bayesianSetup <- createBayesianSetup(likelihood = ll, prior = prior)
out <- runMCMC(bayesianSetup = bayesianSetup)

Error in colnames<-(*tmp*, value = c("LP", "LL", "LPr")) :
length of 'dimnames' [2] not equal to array extent

changing to

prior <- createPrior(density = density, sampler = sampler, 
                     lower = c(-3,-3), upper = c(3,3), best = NULL)

Seems to me this is a bug, because a sampling function is provided, so there should be no need for boundaries.

If fixing this, can you have a look to fix this as early as possible? Probably the DEzs wants to access some length or names from the BayesianSetup that depend on the boundaries that are absent in this case. We should take care to fix the problem there, not in the DEzs

Rejection rates in DEzs

I have the experience (Dani and Francesco reporte this as well) that the rejection rate of DEzs in realistic application seems inefficiently high (say, > 95%). I guess the first thing would be to find out why.

One suspicion of mine is that at least parts of this is due to unreasonable starting points, which leave a trace of unreasonable chain element in memory. I wonder if #8 would improve this

Implement quantile residuals

Add support for quantile residuals via DHARMa, in particular in plotTimeSeriesResults where the plotResiduals is still not implemented

new dll registration recommendation in R 3.4

seems since R 3.4.0, dlls are supposed to be registered differently - I'm getting following note

* checking compiled code ... NOTE
File 'BayesianTools/libs/i386/BayesianTools.dll':
  Found no calls to: 'R_registerRoutines', 'R_useDynamicSymbols'
File 'BayesianTools/libs/x64/BayesianTools.dll':
  Found no calls to: 'R_registerRoutines', 'R_useDynamicSymbols'

It is good practice to register native routines and to disable symbol
search.

See 'Writing portable packages' in the 'Writing R Extensions' manual.

Should implement solution in http://stackoverflow.com/questions/42313373/r-cmd-check-note-found-no-calls-to-r-registerroutines-r-usedynamicsymbols

Parallelization of nrChains command

Currently, only internal calculations of the MCMC samplers use parallelization, the nrChains call is not automatically parallelized.

The current workaround for a slow model, running 3 chains is

start 3 independent R instances
if using a sampler like as DEzs that can use parallelization, turn on parallelization in each R instance (see help, but, e.g. DEzs in standard settings can make use of 3 cores per sampler)
After all 3 samplers have finished, save results, read them in, and combine them with createMcmcSamplerList()

However, if parallel = T, we could also add the option to run multiple chains in parallel, instead of the internal parallelization. This may be interesting in some situations

Alternatively, we could try to maintain both the internal and between-chain parallelization, but needs some thinking about how to do this

Improve SMC

At them moment, if iterations = 10, then the likelihood will be divided in 10 equal chunks that are used for the iterative filtering. There was the idea to make the iterations adaptive, such that the interval is always small enough to avoid particle depletion, and large enough so that particles are filtered.

bayesianSetup / parallel handling

The treatment of the parallel option is not properly encapsulated, e.g.

tmp <- generateParallelExecuter(likelihood, parallel, parallelOptions) 
     parallelLikelihood <- tmp$parallelFun
     cl <- tmp$cl
-    
-    }
+    parallel = T
+  }
   parallelDensity<- function(x, ...){`

and

  }else{
     restart <- FALSE
+    if(is.null(settings$parallel)) settings$parallel <- bayesianSetup$parallel
+    if(is.numeric(settings$parallel)) settings$parallel <- TRUE
     setup <- checkBayesianSetup(bayesianSetup, parallel = settings$parallel)
     settings <- applySettingsDefault(settings = settings, sampler = sampler, check = TRUE)
   }

I guess core of the problem is that the user can either provide a class or a function in the setup.

Solution could maybe be to remove the parallel parameter altogether and always read it from the likelihood?

On the other hand, I think the sampler was supposed to be able to overrule the parallel setting in the setup ...

Internal functions in package index

Some (or all?) internal functions are listed in the package index
even though they don't have the @export label.
Might be somewhat confusing for the user.

check thin argument in getPredictiveIntervals

The programming on the thinning in getPredictiveIntervals and related functions doesn't seem to be save / elegant. Revise

Remaining warnings

In the travis log https://travis-ci.org/florianhartig/BayesianTools/jobs/193218694, I still see

generateTestDensityMultiNormal: multiple local function definitions for

  ‘out’ with different formal arguments

getSample.data.frame: possible error in getSample(as.matrix(data),

  parametersOnly = parametersOnly, coda = coda, start = start, end =

  end, thin = thin, whichParameters = whichParameters,

  includesProbabilities = includesProbabilities, reportDiagnostics =

  reportDiagnostics): unused argument (includesProbabilities =

  includesProbabilities)

Extend getSample to coda

See started version in new branch https://github.com/florianhartig/BayesianTools/tree/getSample

Error message after increasing the number of iterations

Hi Florian,

First I'd very like to thank you for creating this package. It's powerful and helps my current project a lot! I really like it.

But I receive this error message while I run my code and I wonder if I could ask for a hand on it.

I am currently running a hierarchical Bayesian model based on DRAM sampler. The code is running ok when I only test it with 5000 iterations. But it returns below error when I increases the number of iterations to 70000:

> **Error in proposal[j, ] = mcmcSampler$proposalGenerator$returnProposal(x = mcmcSampler$current,  : 
>   number of items to replace is not a multiple of replacement length
> In addition: Warning message:
> In x + move * scale :
>   longer object length is not a multiple of shorter object length**

As I couldn't quite understand what this error message means, would you mind helping me on this?

Thank you very much in advance!

Best wishes
Yi-Hsiu

Create shorthand for samplers

Given that we have purged the alternative samplers in #19, we could emulate their functionality by allowing the user to call

runMCMC(..., sample = "DRAM", ...)

which would then call the Metropolis sampler with standard DRAM settings. This would probably be more convenient for inexperienced users than choosing these settings by hand.

Extend possibilities to calculate marginal likelihood

Following discussion here http://stats.stackexchange.com/questions/209810/computation-of-the-marginal-likelihood-from-mcmc-samples/210196

Longterm: switch to coda format?

Having the chains directly written in the coda format
could save some time when using coda diagnostics.

check out CMA-ES

Add documentation / references for the Metropolis sampers

Stefan, can you check references / documentation for the Metropolis samplers, and add them in Vignette and Help

getSample doesn't properly propagate the thinning to coda objects

If plotting the results of a long population MCMCs (e.g. DEzs), automatic thinning is performed, but the thin argument is not properly forwarded by the getSample function.

The problem seems in to be in the general implementation of the getSample function.

This seems to be only a problem of MCMC lists, a single sampler behaves as expected

ll <- generateTestDensityMultiNormal(sigma = "no correlation")
bayesianSetup = createBayesianSetup(likelihood = ll, lower = rep(-10, 3), upper = rep(10, 3))
settings = list(iterations = 30000)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "DEzs", settings = settings)
plot(out)
x = getSample(out, coda = T, thin = "auto")

Check all plots for par / oldpar

Unit tests

All functions / combinations of functions
Exact convergence
Scaling for large dimensions
Robustness to issues (problem in Likelihood, etc)

feature request: custom proposals?

Hi -- I have been starting to use BayesianTools and it is AMAZING. Congrats on such an exciting tool.

I am sure you folks are very busy, and so I totally understand if this idea goes far down the list of your priorities, but: I wonder if it might be possible to allow users to add some custom proposals / moves to the default ones available for a particular MCMC. For complex likelihood models with many parameters (~100), changing the parameters one-at-a-time has its limits. I know that you have DREAM and DREAMzs in there, and I've been experimenting with DREAMzs, but it still seems to mix very poorly. I suspect it might help a lot if the user, knowing something about their model/data, could put in a proposal that would, say, move a block of parameters in the same direction at once, or move some parameters up while moving other parameters down.

This might be as simple as letting the user input some supplementary, pre-defined Z matrices to represent these moves, e.g. the user puts in 0.5 for parameters that they think should move together in a particular proposal, 0s for those that shouldn't, and (say) -0.5 for parameters that should move in the opposite direction from each other. (This all assumes I'm understanding the Z-matrix correctly, as a correlation matrix for the parameters -- which I may not.)

Anyway, like I said, feel free to ignore the idea, but I wanted to put it out there at least, in case someone gets interested in it, or perhaps might get interested in it in the future.

Cheers, Nick

Complete author tags

@stefan-paul , I noted that we don't have author tags for all functions, can you have a look if all functions that you programmed are tagged, and in particular about the DE / DEzs I wasn't so sure how much is from you and how much from Francesco, in doubt I guess you should add both of you

Change Bayesian Setup internal structure

0.2 release major structural change -

I have realized that we are missing 2-3 functions / data items in the BayesianSetup that would be useful. Those are

The raw data
The option to get the likelihood for each data point (provisionally implemented for the WAIC, but we should check if this structure is optimal)
The option to sample from the likelihood (for Bayesian p-values etc.)

It doesn't seem like a big deal to add it, but should give some thought in how to best do this

Finalize package for 0.1.2

Tankred, if you work today, can you finalize the help things and then run all local tests as well as http://win-builder.r-project.org tests and report back if everything is fine? Note that warnings in the tests are treated as errors, i.e. there must be no errors or warnings.

Another bug in getSample with numSamples and population chains

9957088 fixed #36 , but brought another bug in getSample to the surface, which currently let's the automatic tests fail.

I haven't managed to fully comprehend what's going on, but the problem seems to be that the numSamples parameter used in the unit tests hadn't been implemented properly across all possible S3 implementations of getSample

How to forward thin argument in the samplers?

The thin argument in the samplers is not properly passed on, e.g.

library(BayesianTools)
ll <- generateTestDensityMultiNormal(sigma = "no correlation")
bayesianSetup = createBayesianSetup(likelihood = ll, lower = rep(-10, 3), upper = rep(10, 3))
settings = list(iterations = 10000, thin = 10)
mcmcOutput <- runMCMC(bayesianSetup = bayesianSetup, sampler = "DEzs", settings = settings)
plot(mcmcOutput)

we should fix this.

As a general note: together with taking care of the thinning in getSample, coda compatibility and so on, this is getting all rather cumbersome. I wonder if we were not better off to create our own mcmc chain class for the 0.2 release.

Example for SMC with multiple likelihoods

Create a examples that shows how to use the SMC to filter multiple data types / likelihods, in the sense of

a) create setup (prior + data1)
b) SMC1
c) create new setup (flat prior + data 2) - note, the the prior is not really flat, it's just that if the SMC is initialized with the prior, so you don't want to add the prior again the the BayesianSetup
d) initialize new SMC2 with setup and result from SMC1 as output

(check if this all works as expected, especially the init of d). If that all works, add this to the vignette.

createMixWithDefaults

I have come to think that this is a useless function, students in our last school were more confused by this than when coding this by hand.

Remove all use of the function in the package
Add a deprecation warning
We'll remove the function with version 0.2

Error when running R CMD - problem in test?

# plot prior predictive distribution and prior predictive simulations
plotTimeSeriesResults(sampler = out, model = createPredictions, observed = referenceData[,1], error = createError, prior = T, main = "Prior predictive")
#Error in plotTimeSeriesResults(sampler = out, model = createPredictions,  : 
#  T used instead of TRUE

Implement calculating prior predictive distribution

For the predictive plots such as plotTimeSeriesError, we could easily implement the prior predictive as well, instead of the posterior predictive

Passing dotdotdot to likelihood function

Hi Florian,
First, thanks a lot for creating and providing BT! Especially, the implementation of Vrugts' DREAM is very helpful for my work!

As far as I understand your code, the likelihood function only takes care about a vector (or matrix in case of parallelization) of parameter values. Did you consider to pass an arbitrary list of objects to the logdensity function, such as ll = logDensity(x, ...) (aka "dotdotdot")?

Best regards from Münster:)
Dominik

Collect items / todo for BT 0.1.1 Release

I have created a milestone for the next CRAN release that I want to submit at the end of the month

https://github.com/florianhartig/BayesianTools/milestone/1

The main reason is that I want to push the bugfixes with the helpfiles to CRAN. If you can think of anything else that we should do before this release, please assign to the milestone

Add examples to help

Tankred, it couldn't hurt to have a few more examples in the help files.

Example files should be in inst/examples/functionNameHelp.R, and then linked in Roxygen.

Examples should run fast, or set to skip on CRAN

Should we continue support for M, AM, DRAM etc?

Those samplers are redundant because they can all be emulated by Metropolis - should we remove them?

Include PED?

See http://stats.stackexchange.com/questions/242564/dic-vs-ped-in-jags

combineChains

combineChains function seems not existing.
even though the function is in the help.

Add reverse suggests / improves

parallel tempering

test https://en.wikipedia.org/wiki/Parallel_tempering

Large amounts of warnings in RStudio Check

There are many warnings in RStudio Check that need to be resolved for CRAN.

Mostly, these are import problems with

no visible global function definition for ...

see solution here http://stackoverflow.com/questions/31132552/no-visible-global-function-definition-for-median

Inconsistency with iter and burnin

For the population MCMCs (DE / DREAM), iter is divided by the population size, so that iterations = 3000 with DEzs will run 3 chains a 1000 iterations.

For the burn-in argument, however, this is not the case, so, for example, iterations = 3000 and burn-in = 2000 produces an error for DEzs.

I would suggest to adapt the burnin to behave in the same way as iterations

Create suite of test functions

It would be useful to have a better suite of test functions, which cover both

a) typical optimization / MCMC problems, such as https://en.wikipedia.org/wiki/Test_functions_for_optimization , https://de.mathworks.com/matlabcentral/fileexchange/23147-test-functions-for-global-optimization-algorithms

b) typical problems in the likelihood, i.e. model crashes, NA, plateaus,

examples dir

Non-standard file/directory found at top level:

  ‘examples’

move one level up / outside the package?

Benchmark against spBayes

Benchmark against the sampler of https://cran.r-project.org/web/packages/spBayes/index.html

CRAN release

See checklist https://github.com/florianhartig/DHARMa/blob/master/DHARMa/util/CRAN.md

numSamples rounding

getSample(numSamples = 50) will deliver something > 50, depending on the combinations of nrChains and internal chains, as we ceil the number if we have to divide 50 / chains / internalChains ... we could just leave it like that, I'm just wondering if a user would ever rely on the fact to get exactly 50 back, and if we should thus drop the excess samples at the end?

interations = untilConvergence

I got a few comments that it would be nice to set iterations until convergence in runMCMC.

I guess though that this would require an outer wrapper in runMCMC to stop, check convergence, and restart for n chains.

What do you guys think?

Move numSamples to external function

Error in testthat

For some reason I get an error in the following code:

  ll = function(x, sum = TRUE) {
    if(sum) sum(dnorm(x, log = T))
    else dnorm(x, log = T)
  }
  
  setup = createBayesianSetup(ll, lower = c(-10,-10), upper = c(10,10), parallel = 2)

 out <- runMCMC(setup, sampler = "DEzs", settings = list(iterations = 1000))

Yet e.g. with the test functions of the package parallelization works fine:

ll <- generateTestDensityMultiNormal(sigma = "no correlation")
setup <- createBayesianSetup(likelihood = ll, lower = rep(-10, 3), 
                                     upper = rep(10, 3), parallel = 2)

out <- runMCMC(setup, sampler = "DEzs", settings = list(iterations = 1000))

@florianhartig Do you have any idea why this behaves like this?
I also tested other parallel setups (e.g. VSEM as in the example file) and they worked fine.

In Travis this is also only a problem in R:oldrel and R:devel.
https://travis-ci.org/florianhartig/BayesianTools/builds/193218693

florianhartig / bayesiantools Goto Github PK

bayesiantools's People

Stargazers

Watchers

Forkers

bayesiantools's Issues

Recommend Projects

Recommend Topics

Recommend Org