ropensci / stantargets Goto Github PK

View Code? Open in Web Editor NEW

47.0 9.0 9.0 1.61 MB

Reproducible Bayesian data analysis pipelines with targets and cmdstanr

Home Page: https://docs.ropensci.org/stantargets

License: Other

R 98.60% Stan 0.07% TeX 1.33%

r rstats reproducibility high-performance-computing stan bayesian statistics targets make rstats-package

stantargets's Introduction

stantargets

Bayesian data analysis usually incurs long runtimes and cumbersome custom code, and the process of prototyping and deploying custom Stan models can become a daunting software engineering challenge. To ease this burden, the stantargets R package creates Stan pipelines that are concise, efficient, scalable, and tailored to the needs of Bayesian statisticians. Leveraging targets, stantargets pipelines automatically parallelize the computation and skip expensive steps when the results are already up to date. Minimal custom user-side code is required, and there is no need to manually configure branching, so stantargets is easier to use than targets and CmdStanR directly. stantargets can access all of cmdstanr’s major algorithms (MCMC, variational Bayes, and optimization) and it supports both single-fit workflows and multi-rep simulation studies.

Prerequisites

The prerequisites of the targets R package.
Basic familiarity with targets: watch minutes 7 through 40 of this video, then read this chapter of the user manual.
Familiarity with Bayesian Statistics and Stan. Prior knowledge of cmdstanr helps.

How to get started

Read the stantargets introduction and simulation vignettes, and use https://docs.ropensci.org/stantargets/ as a reference while constructing your own workflows. Visit https://github.com/wlandau/stantargets-example-validation for an example project based on the simulation vignette. The example has an RStudio Cloud workspace which allows you to run the project in a web browser.

Example projects

Description	Link
Validating a minimal Stan model	https://github.com/wlandau/targets-stan
Using Target Markdown and `stantargets` to validate a Bayesian longitudinal model for clinical trial data analysis	https://github.com/wlandau/rmedicine2021-pipeline

Installation

Install the GitHub development version to access the latest features and patches.

remotes::install_github("ropensci/stantargets")

The CmdStan command line interface is also required.

cmdstanr::install_cmdstan()

If you have problems installing CmdStan, please consult the installation guide of cmdstanr and the installation guide of CmdStan. Alternatively, the Stan discourse is a friendly place to ask Stan experts for help.

Usage

First, write a _targets.R file that loads your packages, defines a function to generate Stan data, and lists a pipeline of targets. The target list can call target factories like tar_stan_mcmc() as well as ordinary targets with tar_target(). The following minimal example is simple enough to contain entirely within the _targets.R file, but for larger projects, you may wish to store functions in separate files as in the targets-stan example.

# _targets.R
library(targets)
library(stantargets)

generate_data <- function() {
  true_beta <- stats::rnorm(n = 1, mean = 0, sd = 1)
  x <- seq(from = -1, to = 1, length.out = n)
  y <- stats::rnorm(n, x * true_beta, 1)
  list(n = n, x = x, y = y, true_beta = true_beta)
}

list(
  tar_stan_mcmc(
    name = example,
    stan_files = "x.stan",
    data = generate_data()
  )
)

Run tar_visnetwork() to check _targets.R for correctness, then call tar_make() to run the pipeline. Access the results using tar_read(), e.g. tar_read(example_summary_x). Visit the introductory vignette to read more about this example.

How it works behind the scenes

stantargets supports specialized target factories that create ensembles of target objects for cmdstanr workflows. These target factories abstract away the details of targets and cmdstanr and make both packages easier to use. For details, please read the introductory vignette.

Help

Please first read the help guide to learn how best to ask for help.

If you have trouble using stantargets, you can ask for help in the GitHub discussions forum. Because the purpose of stantargets is to combine targets and cmdstanr, your issue may have something to do with one of the latter two packages, a dependency of targets, or Stan itself. When you troubleshoot, peel back as many layers as possible to isolate the problem. For example, if the issue comes from cmdstanr, create a reproducible example that directly invokes cmdstanr without invoking stantargets. The GitHub discussion and issue forums of those packages, as well as the Stan discourse, are great resources.

Participation

Development is a community effort, and we welcome discussion and contribution. Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Citation

citation("stantargets")
#> 
#> To cite stantargets in publications use:
#> 
#>   Landau, W. M., (2021). The stantargets R package: a workflow
#>   framework for efficient reproducible Stan-powered Bayesian data
#>   analysis pipelines. Journal of Open Source Software, 6(60), 3193,
#>   https://doi.org/10.21105/joss.03193
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {The stantargets {R} package: a workflow framework for efficient reproducible {S}tan-powered {B}ayesian data analysis pipelines},
#>     author = {William Michael Landau},
#>     journal = {Journal of Open Source Software},
#>     year = {2021},
#>     volume = {6},
#>     number = {60},
#>     pages = {3193},
#>     url = {https://doi.org/10.21105/joss.03193},
#>   }

stantargets's People

Contributors

Stargazers

Watchers

Forkers

mattwarkentin mbjoseph mattocci27 wlandau-lilly define-ag dmi3kno owain-s thk686 adamwilsonlab

stantargets's Issues

Accept multiple models

Prework

I understand and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

It is often desirable to run multiple models on the same dataset. Let's accept multiple Stan files and turn them into the appropriate targets.

R CMD check error

Both in this repo and seen on https://github.com/r-universe/ropensci/actions/runs/2365300041 via https://ropensci.r-universe.dev/ui#builds

Noting this in case it's relevant.

Separate file logs for stdout and stderr

Prework

I understand and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

Would replace the log argument, which only applies to stdout.

More tar_stan_mcmc_rep*() functions

Prework

I understand and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

We currently have tar_stan_mcmc_rep_summary() to get multiple $summary()s, but users may want to return the actual draws or sampler diagnostics from simulation studies.

Customized summaries

Prework

I understand and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

Use .... At the very least, users should be able to set credible levels.

optimization

Prework

I understand and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

Let's think about target archetypes for optimization. A tar_stan_mle() function could run optimization and return a CmdStanMLE object, as well as define targets for various outputs and summaries. tar_stan_mle_rep*() functions could iterate over multiple simulated datasets and produce summaries.

R Markdown error pkgdown

From https://ropensci.r-universe.dev/builds

I have nothing more specific than

 Quitting from lines 434-435 [unnamed-chunk-27] (introduction.Rmd)
--------------------------------------------------------------------------------
---

Does it ring a bell? 😅 If not happy to help investigate.

Set up discussions

Prework

Read and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
For any problem you identify, post a minimal reproducible example like this one so the maintainer can troubleshoot. A reproducible example is:
- Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- Readable: format your code according to the tidyverse style guide.

Description

Reconfigure issue templates to offload to the discussions: https://github.com/ropensci/targets/tree/main/.github/ISSUE_TEMPLATE
Edit the contributing guide, e.g. ropensci/targets@ae50ff0
Customize issue categories: https://github.com/ropensci/targets/discussions

Suppress more output if quiet is TRUE

Prework

Read and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
For any problem you identify, post a minimal reproducible example so the maintainer can troubleshoot. A reproducible example is:
- Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- Readable: format your code according to the tidyverse style guide.

Description

Some distracting messages are still getting through.

issue labels

Add a forced compilation target if compile = "original"

Prework

I understand and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

That should keeps us safe from race conditions in parallel computing use cases where the workers share the same models. The key will be to force compilation to override the decisions that cmdstanr makes.

Add Pathfinder fitting method

Prework

[ X] I understand and agree to the help guide.
[ X] I understand and agree to the contributing guidelines.
[X ] New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

cmdstanr recently added a new fitting method called Pathfinder. I've searched the repo and can't find any mention of this method, so I don't think it's possible to use stantargets with the pathfinder method.

It would be great to add a tar_stan_pathfinder() (or similar) to enable this functionality in stantargets. I think this would be a fairly minor edit of https://github.com/ropensci/stantargets/blob/main/R/tar_stan_vb.R as the inputs and outputs are similar to tar_stan_vb(). I could potentially take this on at some point, but not for the next few weeks.

Thanks

Love the package, thanks @wlandau!

Intermittent segfaults in tar_stan_mcmc_rep()

Not sure why it's happening. Seems to come up in the vignettes/mcmc_rep.Rmd.

Integration with the Laplace method

https://mc-stan.org/cmdstanr/reference/model-method-laplace.html

variational inference

Prework

I understand and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

Let's think about target archetypes for variational inference. tar_stan_vi() could act like tar_stan_mcmc(), producing a CmdStanFit object and other targets with various output artifacts. Similarly, we could have replication-friendly functions like tar_stan_vi_rep_summary() to output a bunch of summaries over simulated datasets.

generated quantities

Prework

I understand and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

Let's think about target archetypes for generated quantities. A tar_stan_gq() function could act similarly to tar_stan_mcmc(), defining a target that accepts CmdStanMCMC object and some data and returns posterior predictive samples. If we want a multi-rep version of this for simulation studies, we should also implement a tar_stan_mcmc_rep_mcmc() method to give us a dynamic list of CmdStanMCMC objects.

Automatically join non-scalars from data to summaries

Prework

I understand and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

Should really help with simulation-based validation techniques. Currently, data_copy is limited to scalars. If we instead allow a special .join_data list in the Stan dataset and automatically join its contents next to the analogously named variables in the summary output, we can check non-scalars too. How it works in jagstargets: https://wlandau.github.io/jagstargets/articles/mcmc_rep.html

pkgdown site build error

Prework

Read and agree to the code of conduct and contributing guidelines.
Confirm that your issue is most likely a genuine bug in stantargets and not a known limitation, usage error, or bug in another package that stantargets depends on.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
[N/A] Post a minimal reproducible example like this one so the maintainer can troubleshoot the problems you identify. A reproducible example is:
- [N/A] Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- [N/A] Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- [N/A] Readable: format your code according to the tidyverse style guide.

Description

The pkgdown site of stantargets appears to have trouble building in Jenkins: https://dev.ropensci.org/blue/organizations/jenkins/stantargets/detail/stantargets/55/pipeline. @jeroen, anything in particular I should fix on my end?

Update with tarchetypes CRAN feedback

Prework

Read and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
For any problems you identify, post a minimal reproducible example like this one so the maintainer can troubleshoot. A reproducible example is:
- Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- Readable: format your code according to the tidyverse style guide.

Description

use tempfiles more
ensure Rd files have @return explaining target objects.

Speed up testing

Prework

Read and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
For any problems you identify, post a minimal reproducible example like this one so the maintainer can troubleshoot. A reproducible example is:
- Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- Readable: format your code according to the tidyverse style guide.

Description

Before any of the main tests, compile the example model once and stash it in tempdir().
Copy the compiled model to the testing directory.
Add an environment variable to skip tests with compile = "copy".

New replication functions for generated quantities

Prework

I understand and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

tar_stan_gq_rep_draws() and tar_stan_gq_rep_summary(). Should accept one fitted_parameters object and multiple models.

GitHub interactions are temporarily limited because the maintainer is out of office.

Vacation mode

When this issue is open, vacation mode is turned on. That means Github interactions are temporarily limited, so users cannot open or comment on issues or discussions until I return and re-enable interactions (see return date below). When this issue is closed, vacation mode is turned off and interactions are re-enabled and possible again.

Thanks

Vacation mode helps me rest because it prevents tasks from piling up in my absence. Thank you for your patience and understanding.

Day of my return

Already returned.

Vacation mode source code

https://github.com/wlandau/dotfiles/blob/main/github/vacation.R

Terminology: "SBC checking"

c.f. openpharma/brms.mmrm#91

Enable existing cloud integration via targets

Prework

I understand and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

From @mbjoseph in #37: tar_stan_mcmc() etc. have no exposed format argument, so cloud storage is not supported. On the other hand, we don't want to completely expose formats because we want to make efficient choices for several targets automatically on behalf of the user. So we could support a simplified abstraction that chooses to store locally or to a cloud provider (currently only AWS S3).

Use cmdstanr built-in examples

Prework

Read and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
For any problem you identify, post a minimal reproducible example so the maintainer can troubleshoot. A reproducible example is:
- Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- Readable: format your code according to the tidyverse style guide.

Description

Replace tar_stan_file_example() and possibly tar_stan_data_example() with cmdstanr examples where possible.

tar_stan_compile fails with default cloud repository

Prework

Read and agree to the code of conduct and contributing guidelines.
Confirm that your issue is most likely a genuine bug in stantargets and not a known limitation, usage error, or bug in another package that stantargets depends on.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
Post a minimal reproducible example like this one so the maintainer can troubleshoot the problems you identify. A reproducible example is:
- Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- Readable: format your code according to the tidyverse style guide.

Description

tar_stan_compile does not include a repository argument. When the default repository is either "aws" or "gcp" calls fail with "store the condition has length > 1"

Expected result

Model compiles successfully & output is stored.

I believe the intention is for the tar_target_raw within tar_stan_compile to use repository = "local" (like internal calls for name_file in tar_stan_mcmc). Adding this argument to the relevanttar_target_raw fixes the problem, yet I can't quite work out why it shouldn't work with tar_target_raw autodetecting the repository, so I've not submitted the PR which would fix it for fear that I haven't solved the root cause.

Diagnostic information

Stantargets: 092b0b9
Targets: ropensci/targets@f37af16

Reproducible example

library(targets)
tar_script({
    stanfile <- "model.stan"
    tar_option_set(
        resources = tar_resources(
            gcp = tar_resources_gcp(bucket = "my-gcp-bucket"), 
            aws = tar_resources_aws(bucket = "my-aws-bucket")
        ), 
        cue = tar_cue(mode = "always"), 
        format = "qs", 
        repository = "gcp"  # Same error with "aws"
    )
    cmdstanr::write_stan_file("\n        data {\n          int<lower=1> N;\n          vector[N] x;\n          vector[N] y;\n        }\n\n        parameters {\n          real alpha;\n          real beta;\n          real<lower=0> sigma;\n        }\n\n        model {\n          alpha ~ std_normal();\n          beta ~ std_normal();\n          sigma ~ std_normal();\n          y ~ normal(alpha + x * beta , sigma);\n        }\n    ", 
        dir = getwd(), 
        basename = stanfile)
    list(
        stantargets::tar_stan_compile(
            model, 
            stanfile
        )
    )
})
tar_make()
#> • start target model
#> ✖ error target model
#> • end pipeline: 8.632 seconds
#> Error : _store_ the condition has length > 1
#> ✖ Problem with the pipeline.
#> Error:
#> ! problem with the pipeline.

^{Created on 2022-08-02 by the reprex package (v2.0.1)}

R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /apps/R/4.2.0/lib64/R/lib/libRblas.so
LAPACK: /apps/R/4.2.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8      
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] stantargets_0.0.5.9000         fstcore_0.9.12                 testthat_3.1.4                 future.batchtools_0.10.0-9000  batchtools_0.9.16              future_1.26.1                 
 [7] tarchetypes_0.6.0.9000         rlang_1.0.4                    googleCloudStorageR_0.7.0.9000 dplyr_1.0.9                    targets_0.12.1.9000           

loaded via a namespace (and not attached):
  [1] backports_1.4.1        bigrquery_1.4.0        googleAuthR_2.0.0.9000 igraph_1.3.2           paws.common_0.3.17     lazyeval_0.2.2         RApiSerialize_0.1.0    listenv_0.8.0         
  [9] usethis_2.1.6          ggplot2_3.3.6          inline_0.3.19          digest_0.6.29          htmltools_0.5.2        fansi_1.0.3            magrittr_2.0.3         checkmate_2.1.0       
 [17] memoise_2.0.1          covr_3.5.1             base64url_1.4          tzdb_0.3.0             remotes_2.4.2          globals_0.15.1         readr_2.1.2            RcppParallel_5.1.5    
 [25] matrixStats_0.62.0     R.utils_2.12.0         askpass_1.1            prettyunits_1.1.1      colorspace_2.0-3       rappdirs_0.3.3         xfun_0.31              callr_3.7.1           
 [33] crayon_1.5.1           jsonlite_1.8.0         roxygen2_7.2.0         stringfish_0.15.7      brew_1.0-7             glue_1.6.2             xmlparsedata_1.0.5     gtable_0.3.0          
 [41] gargle_1.2.0           V8_4.2.0               distributional_0.3.0   R.cache_0.15.0         pkgbuild_1.3.1         rstan_2.26.13          abind_1.4-5            scales_1.2.0          
 [49] DBI_1.1.3              Rcpp_1.0.9             progress_1.2.2         bit_4.0.4              praise_1.0.0           clisymbols_1.2.0       stats4_4.2.0           StanHeaders_2.26.13   
 [57] xopen_1.0.0            rex_1.2.1              httr_1.4.3             posterior_1.2.2        ellipsis_0.3.2         pkgconfig_2.0.3        loo_2.5.1              R.methodsS3_1.8.2     
 [65] farver_2.1.1           dbplyr_2.2.1           utf8_1.2.2             tidyselect_1.1.2       labeling_0.4.2         munsell_0.5.0          tools_4.2.0            cachem_1.0.6          
 [73] cli_3.3.0              generics_0.1.3         devtools_2.4.3         evaluate_0.15          stringr_1.4.0          fastmap_1.1.0          paws_0.1.12            yaml_2.3.5            
 [81] processx_3.7.0         knitr_1.39             bit64_4.0.5            fs_1.5.2               zip_2.2.0              purrr_0.3.4            mime_0.12              R.oo_1.25.0           
 [89] xml2_1.3.3             brio_1.1.3             compiler_4.2.0         rstudioapi_0.13        curl_4.3.2             reprex_2.0.1           tibble_3.1.7           stringi_1.7.8         
 [97] highr_0.9              cyclocomp_1.1.0        ps_1.7.1               desc_1.4.1.9000        whoami_1.3.0           styler_1.7.0           paws.storage_0.1.12    commonmark_1.8.0      
[105] goodpractice_1.0.3     tensorA_0.36.2         vctrs_0.4.1            pillar_1.7.0           lifecycle_1.0.1        data.table_1.14.2      rcmdcheck_1.4.0        R6_2.5.1              
[113] qs_0.25.3              gridExtra_2.3          parallelly_1.32.0      sessioninfo_1.2.2      codetools_0.2-18       assertthat_0.2.1       pkgload_1.3.0          openssl_2.0.2         
[121] rprojroot_2.0.3        withr_2.5.0            parallel_4.2.0         hms_1.1.1              lintr_3.0.0            fst_0.9.8              grid_4.2.0             tidyr_1.2.0           
[129] waldo_0.4.0            rmarkdown_2.14         cmdstanr_0.5.2.1       lubridate_1.8.0

Include runnable pipelines in all help file examples

Prework

Read and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
For any problem you identify, post a minimal reproducible example so the maintainer can troubleshoot. A reproducible example is:
- Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- Readable: format your code according to the tidyverse style guide.

Description

Each example set up a pipeline and run it. Enclose in an if(FALSE) { } block to reduce check time.

Incorporate jagstargets review feedback

Prework

Read and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
For any problems you identify, post a minimal reproducible example like this one so the maintainer can troubleshoot. A reproducible example is:
- Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- Readable: format your code according to the tidyverse style guide.

Description

ropensci/software-review#425

Expose .cores argument for summary

Prework

I understand and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

posterior::summarise_draws has a .cores option which offers big speedups for summarising larger posteriors. Currently it is not accessible via summary_args.

It'd be very helpful to expose a summary_cores option to tar_stan_summary and tar_stan_mcmc, which could default to 1 to match posterior::summarise_draws defaults. In my fork I've included parallel::detectCores triggered by summary_cores = NULL. I was half way through submitting it as a PR, but saw the prework so raised here first. Currently tested with tar_make_future on slurm (need to set appropriate ncpus in the default resources if called from tar_stan_mcmc).

Happy to submit it as a PR if you'd like, but you may forsee problems that I can't.

Thanks

Integration with brms

Prework

Read and agree to the code of conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
Post a minimal reproducible example so the maintainer can troubleshoot the problems you identify. A reproducible example is:
- Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- Readable: format your code according to the tidyverse style guide.

Description

Hi Will - I've been a user of drake for half a year now, and am just starting to try out targets so was excited to see your post on the Stan discourse about stantargets.

I'm testing stantargets as a part of a Bayesian workflow [Gelman et al. 2020] - for which I was interested in the model specification being a part of the pipeline: specifically integrating brms ability to convert a formula into stan code (brms::make_stancode) and to appropriately shape the data (brms::make_standata).

Intuitively this seemed like it should fit well with the the targetopia/pipeline framework - but my attempt at an implementation had two conflicts with tar_stan_mcmc:

The stan_files argument does not appear to accept a target as an argument - in the implementation below I'm defining this file path as a target so that it is shared between defining the model (brms::make_stancode) and running the model (tar_stan_mcmc).
The data argument does not appear to support arguments of the class standata which are returned by calls to brms::make_standata.

I've indicated my current workarounds in the comments - I'd welcome your views as to their optimality, and any indications as to whether it'd be possible to support this type of workflow more intuitively, or alternatively views as to why this isn't desired!

Reproducible example

library(targets)

tar_script({
  library(targets)
  library(stantargets)
  
  options(crayon.enabled = FALSE)
  tar_option_set(
    packages = c("tibble", "brms"),
    memory = "transient",
    garbage_collection = TRUE
  )
  
  tar_pipeline(
    tar_target(
      data,
      tibble(x = rnorm(10), y=  0.5 * rnorm(10))
    ),
    tar_target(
      stan_model_path,
      "x.stan",
      format = "file"
    ),
    tar_target(
      stan_formula,
      brmsformula(formula = y ~ x)
    ),
    tar_target(
      stan_model,
      make_stancode(
        formula = stan_formula,
        data = data,
        save_model = stan_model_path)
    ),
    tar_target(
      stan_data,
      make_standata(
        formula = stan_formula,
        data = data
      )
    ),
    tar_stan_mcmc(
      example,
      # this fails - explicitly providing the path x.stan works, so long as the
      # stan_model target has run. But this incurs duplication/implicit dependency,
      # and  breaks the pipeline workflow.
      stan_files = stan_model_path,
      # this fails - the data argument expects the object returned to be of
      # type list(), whereas brms::make_standata returns type standata
      data = stan_data,
      refresh = 0,
      init = 1,
      show_messages = FALSE,
      chains = 4,
      parallel_chains = 4,
      iter_warmup = 200,
      iter_sampling = 400
    )
  )
})

^{Created on 2020-12-17 by the reprex package (v0.3.0)}

ropensci / stantargets Goto Github PK

stantargets's Introduction

stantargets

Prerequisites

How to get started

Example projects

Installation

Usage

How it works behind the scenes

Help

Participation

Citation

stantargets's People

Contributors

Stargazers

Watchers

Forkers

stantargets's Issues

Prework

Proposal

Prework

Proposal

Prework

Proposal

Prework

Proposal

Prework

Proposal

Prework

Description

Prework

Description

Prework

Proposal

Prework

Proposal

Thanks

Prework

Proposal

Prework

Proposal

Prework

Proposal

Prework

Description

Prework

Description

Prework

Description

Prework

Proposal

Vacation mode

Thanks

Day of my return

Vacation mode source code

Prework

Proposal

Prework

Description

Prework

Description

Expected result

Diagnostic information

Reproducible example

Prework

Description

Prework

Description

Prework

Proposal

Prework

Description

Reproducible example

Recommend Projects

Recommend Topics

Recommend Org