Coder Social home page Coder Social logo

flyaflya / causact Goto Github PK

View Code? Open in Web Editor NEW
43.0 5.0 12.0 13.11 MB

causact: R package to accelerate computational Bayesian inference workflows in R through interactive visualization of models and their output.

Home Page: http://causact.com

License: Other

R 95.76% TeX 3.91% Rez 0.34%
bayesian-inference dags posterior-probability probabilistic-graphical-models probabilistic-programming rstats-package

causact's Introduction

R-CMD-check CRAN status Codecov test coverage Lifecycle:Maturing DOI DOI

causact

Accelerate computational Bayesian inference workflows in R through interactive modelling, visualization, and inference. The package leverages directed acyclic graphs (DAGs) to create a unified language language for business stakeholders, statisticians, and programmers. Due to its visual nature and simple model construction, causact serves as a great entry-point for newcomers to computational Bayesian inference.

The causact package offers robust support for both foundational and advanced Bayesian models. While introductory models are well-covered, the utilization of multi-variate distributions such as multi-variate normal, multi-nomial, or dirichlet distributions, may not work as expected. There are ongoing enhancements in the pipeline to facilitate construction of these more intricate models.

While proficiency in R is the only requirement for users of this package, it also functions as a introductory probabilistic programming language, seamlessly incorporating the numpyro Python package to facilitate Bayesian inference without the need to learn any syntax outside of the package or R. Furthermore, the package streamlines the process of indexing categorical variables, which often presents a complex syntax hurdle for those new to computational Bayesian methods.

For enhanced learning, the causact package for Bayesian inference is featured in A Business Analyst's Introduction to Business Analytics available at https://www.causact.com/.

Feedback and encouragement is appreciated via github issues.

Installation Guide

To install the causact package, follow the steps outlined below:

CRAN Release Version (Recommended)

For the current stable release, which is tailored to integrate with Python’s numpyro package, employ the following command:

install.packages("causact")

Then, see Essential Dependencies if you want to be able to automate sampling using the numpyro package.

Development Release

If you want the most recent development version (not recommended), execute the following:

install.packages("remotes")
remotes::install_github("flyaflya/causact")

Essential Dependencies

To harness the full potential of causact for DAG visualization and Bayesian posterior analysis, it’s vital to ensure proper integration with the numpyro package. Given the Python-based nature of numpyro, a few essential dependencies must be in place. Execute the following commands after installing causact:

library(causact)
install_causact_deps()

If prompted, respond with Y to any inquiries related to installing miniconda.

Note: If opting for installation on Posit Cloud, temporarily adjust your project’s RAM to 4GB during the installation process (remember to APPLY CHANGES). This preemptive measure helps avoid encountering an Error: Error creating conda environment [exit code 137]. After installation, feel free to revert the settings to 1GB of RAM.

Note: The September 11, 2023 release of reticulate (v1.32) has caused an issue which gives a TypeError: the first argument must be callable error when using dag_numpyro() on windows. If you experience this, install the dev version of reticulate by following the below steps:

  1. Install RTOOLS by using installer at: https://cran.r-project.org/bin/windows/Rtools/

  2. Run this to get the dev version of reticulate:

# install DEV version of reticulate
# install.packages("pak") #uncomment as needed
pak::pak("rstudio/reticulate")

Retrograde Compatibility (Not Advised)

In cases where legacy compatibility is paramount and you still rely on the operationality of the dag_greta() function, consider installing v0.4.2 of the causact package. However, it’s essential to emphasize that this approach is not recommended for general usage:

### EXERCISE CAUTION BEFORE EXECUTING THESE LINES
### Only proceed if dag_greta() is integral to your existing codebase
install.packages("remotes")
remotes::install_github("flyaflya/[email protected]")

Your judicious choice of installation method will ensure a seamless and effective integration of the causact package into your computational toolkit.

Usage

Example taken from https://www.causact.com/graphical-models-tell-joint-distribution-stories.html#graphical-models-tell-joint-distribution-stories with the packages dag_foo() functions further described here:

https://www.causact.com/causact-quick-inference-with-generative-dags.html#causact-quick-inference-with-generative-dags

Create beautiful model visualizations.

#> Initializing python, numpyro, and other dependencies. This may take up to 15 seconds...
#> Initializing python, numpyro, and other dependencies. This may take up to 15 seconds...COMPLETED!
#> 
#> Attaching package: 'causact'
#> The following objects are masked from 'package:stats':
#> 
#>     binomial, poisson
#> The following objects are masked from 'package:base':
#> 
#>     beta, gamma
library(causact)
graph = dag_create() %>%
  dag_node(descr = "Get Card", label = "y",
           rhs = bernoulli(theta),
           data = carModelDF$getCard) %>%
  dag_node(descr = "Card Probability", label = "theta",
           rhs = beta(2,2),
           child = "y") %>%
  dag_plate(descr = "Car Model", label = "x",  
            data = carModelDF$carModel,  
            nodeLabels = "theta",  
            addDataNode = TRUE)  
graph %>% dag_render()

Hide model complexity, as appropriate, from domain experts and other less statistically minded stakeholders.

graph %>% dag_render(shortLabel = TRUE)

Get posterior while automatically running the underlying numpyro code

drawsDF = graph %>% dag_numpyro()
drawsDF  ### see top of data frame
#> # A tibble: 4,000 × 4
#>    theta_Toyota.Corolla theta_Subaru.Outback theta_Kia.Forte theta_Jeep.Wrangler
#>                   <dbl>                <dbl>           <dbl>               <dbl>
#>  1                0.210                0.609           0.192               0.844
#>  2                0.227                0.563           0.239               0.818
#>  3                0.215                0.612           0.256               0.849
#>  4                0.204                0.646           0.253               0.874
#>  5                0.192                0.642           0.303               0.847
#>  6                0.166                0.607           0.255               0.843
#>  7                0.218                0.621           0.220               0.843
#>  8                0.196                0.646           0.251               0.841
#>  9                0.218                0.606           0.223               0.863
#> 10                0.203                0.583           0.241               0.863
#> # ℹ 3,990 more rows

Note: if you have used older versions of causact, please know that dag_numpyro() is a drop-in replacement for dag_greta().

Get quick view of posterior distribution

drawsDF %>% dagp_plot()
Credible interval plots.

Credible interval plots.

OPTIONAL: See numpyro code without executing it (for debugging or learning)

numpyroCode = graph %>% dag_numpyro(mcmc = FALSE)
#> 
#> ## The below code will return a posterior distribution 
#> ## for the given DAG. Use dag_numpyro(mcmc=TRUE) to return a
#> ## data frame of the posterior distribution: 
#> reticulate::py_run_string("
#> import numpy as np
#> import numpyro as npo
#> import numpyro.distributions as dist
#> import pandas as pd
#> import arviz as az
#> from jax import random
#> from numpyro.infer import MCMC, NUTS
#> from jax.numpy import transpose as t
#> from jax.numpy import (exp, log, log1p, expm1, abs, mean,
#>                  sqrt, sign, round, concatenate, atleast_1d,
#>                  cos, sin, tan, cosh, sinh, tanh,
#>                  sum, prod, min, max, cumsum, cumprod )
#> ## note that above is from JAX numpy package, not numpy.
#> 
#> y = np.array(r.carModelDF.getCard)   #DATA
#> x      = pd.factorize(np.array(r.carModelDF.carModel),use_na_sentinel=True)[0]   #DIM
#> x_dim  = len(np.unique(x))   #DIM
#> x_crd  = pd.factorize(np.array(r.carModelDF.carModel),use_na_sentinel=True)[1]   #DIM
#> def graph_model(y,x):
#>  ## Define random variables and their relationships
#>  with npo.plate('x_dim',x_dim):
#>      theta = npo.sample('theta', dist.Beta(2,2))
#> 
#>  y = npo.sample('y', dist.Bernoulli(theta[x]),obs=y)
#> 
#> 
#> # computationally get posterior
#> mcmc = MCMC(NUTS(graph_model), num_warmup = 1000, num_samples = 4000)
#> rng_key = random.PRNGKey(seed = 1234567)
#> mcmc.run(rng_key,y,x)
#> drawsDS = az.from_numpyro(mcmc,
#>  coords = {'x_dim': x_crd},
#>  dims = {'theta': ['x_dim']}
#>  ).posterior
#> # prepare xarray dataset for export to R dataframe
#> dimensions_to_keep = ['chain','draw','x_dim']
#> drawsDS = drawsDS.squeeze(drop = True ).drop_dims([dim for dim in drawsDS.dims if dim not in dimensions_to_keep])
#> # unstack plate variables to flatten dataframe as needed
#> for x_da in drawsDS['x_dim']:
#>  new_varname = f'theta_{x_da.values}'
#>  drawsDS = drawsDS.assign(**{new_varname:drawsDS['theta'].sel(x_dim = x_da)})
#> drawsDS = drawsDS.drop_dims(['x_dim'])
#> drawsDF = drawsDS.squeeze().to_dataframe()"
#> ) ## END PYTHON STRING
#> drawsDF = reticulate::py$drawsDF

Getting Help and Suggesting Improvements

Whether you encounter a clear bug, have a suggestion for improvement, or just have a question, we are thrilled to help you out. In all cases, please file a GitHub issue. If reporting a bug, please include a minimal reproducible example.

Contributing

We welcome help turning causact into the most intuitive and fastest method of converting stakeholder narratives about data-generating processes into actionable insight from posterior distributions. If you want to help us achieve this vision, we welcome your contributions after reading the new contributor guide. Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Further Usage

For more info, see A Business Analyst's Introduction to Business Analytics available at https://www.causact.com. You can also check out the package’s vignette: vignette("narrative-to-insight-with-causact"). Two additional examples are shown below.

Prosocial Chimpanzees Example from Statistical Rethinking

McElreath, Richard. Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC, 2018.

library(tidyverse)
library(causact)

# data object used below, chimpanzeesDF, is built-in to causact package

graph = dag_create() %>%
  dag_node("Pull Left Handle","L",
           rhs = bernoulli(p),
           data = causact::chimpanzeesDF$pulled_left) %>%
  dag_node("Probability of Pull", "p",
           rhs = 1 / (1 + exp(-((alpha + gamma + beta)))),
           child = "L") %>%
  dag_node("Actor Intercept","alpha",
           rhs = normal(alphaBar, sigma_alpha),
           child = "p") %>%
  dag_node("Block Intercept","gamma",
           rhs = normal(0,sigma_gamma),
           child = "p") %>%
  dag_node("Treatment Intercept","beta",
           rhs = normal(0,0.5),
           child = "p") %>%
  dag_node("Actor Population Intercept","alphaBar",
           rhs = normal(0,1.5),
           child = "alpha") %>%
  dag_node("Actor Variation","sigma_alpha",
           rhs = exponential(1),
           child = "alpha") %>%
  dag_node("Block Variation","sigma_gamma",
           rhs = exponential(1),
           child = "gamma") %>%
  dag_plate("Observation","i",
            nodeLabels = c("L","p")) %>%
  dag_plate("Actor","act",
            nodeLabels = c("alpha"),
            data = chimpanzeesDF$actor,
            addDataNode = TRUE) %>%
  dag_plate("Block","blk",
            nodeLabels = c("gamma"),
            data = chimpanzeesDF$block,
            addDataNode = TRUE) %>%
  dag_plate("Treatment","trtmt",
            nodeLabels = c("beta"),
            data = chimpanzeesDF$treatment,
            addDataNode = TRUE)

See graph

graph %>% dag_render(width = 2000, height = 800)

Communicate with stakeholders for whom the statistics might be distracting

graph %>% dag_render(shortLabel = TRUE)

Compute posterior

drawsDF = graph %>% dag_numpyro()

Visualize posterior

drawsDF %>% dagp_plot()

Eight Schools Example from Bayesian Data Analysis

Gelman, Andrew, Hal S. Stern, John B. Carlin, David B. Dunson, Aki Vehtari, and Donald B. Rubin. Bayesian data analysis. Chapman and Hall/CRC, 2013.

library(tidyverse)
library(causact)

# data object used below, schoolDF, is built-in to causact package

graph = dag_create() %>%
  dag_node("Treatment Effect","y",
           rhs = normal(theta, sigma),
           data = causact::schoolsDF$y) %>%
  dag_node("Std Error of Effect Estimates","sigma",
           data = causact::schoolsDF$sigma,
           child = "y") %>%
  dag_node("Exp. Treatment Effect","theta",
           child = "y",
           rhs = avgEffect + schoolEffect) %>%
  dag_node("Pop Treatment Effect","avgEffect",
           child = "theta",
           rhs = normal(0,30)) %>%
  dag_node("School Level Effects","schoolEffect",
           rhs = normal(0,30),
           child = "theta") %>%
  dag_plate("Observation","i",nodeLabels = c("sigma","y","theta")) %>%
  dag_plate("School Name","school",
            nodeLabels = "schoolEffect",
            data = causact::schoolsDF$schoolName,
            addDataNode = TRUE)

See graph

graph %>% dag_render()

Compute posterior

drawsDF = graph %>% dag_numpyro()

Visualize posterior

drawsDF %>% dagp_plot()

causact's People

Contributors

davisvaughan avatar flyaflya avatar jaredws avatar joethorley avatar mddapena avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

causact's Issues

dag_greta() gives warning about ... is not empty

after running dag_greta() I get the following warning. everything still seems to work.

Warning message:
... is not empty.

We detected these problematic arguments:

  • needs_dots

These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?

Validate that causact_graph is passed to dag_ functions

Often times, I will mistakenly assign the output of dag_render() to a graph like this:

library(tidyverse)
library(greta)
library(causact)
### create simple beta-Bernoulli model with plate
graph = dag_create() %>%
  dag_node("Get Card","x",
            rhs = bernoulli(theta)) %>%
  dag_render()

### these give unhelpful errors
graph %>% dag_render()
graph %>% dag_greta()

For all dag_functions, the class of the first argument should be verified as a valid causact_graph.

Converting Beta to Laplace?

Running
`library(tidyverse)
library(causact)

set.seed(2023)
n_subjects <- 20

df <- tibble(id=1:(n_subjects)) %>%
mutate(
cond = sample(c('ctrl','trtmt'),nrow(.), replace=TRUE) ,
trials = sample(5:25, nrow(.), replace=TRUE),
lin_pred = -1+if_else(cond=='trtmt',1,0),
prob = 1 / (1+exp(-lin_pred)),
successes = rbinom(nrow(.),trials, prob)
)

graph <- dag_create() %>%
dag_node("Successes","s", rhs = binomial(trials,prob), data = dfo$successes) %>%
dag_node("Probability","prob", child = "s", rhs = beta(a,b)) %>%
dag_node("a","a", child = "prob", rhs = exponential(1)) %>%
dag_node("b","b", child = "prob", rhs = exponential(1)) %>%
dag_node("Trials","trials", child = "s", data = dfo$trials) %>%
dag_plate("Cond","cond", nodeLabels = c("a","b"), data = dfo$cond, addDataNode = TRUE) %>%
dag_plate("Observation","i", nodeLabels = c("s","trials","prob","cond"))
graph %>% dag_render()
drawsDF <- graph %>% dag_numpyro()
`
gives an error:
Error in py_run_string_impl(code, local, convert) :
ValueError: BinomialProbs distribution got invalid probs parameter.

When I look at the generated numpyro code it is substituting 'Laplace' for Beta. When I change it back to Beta, the numpyro code seem to run ok.

causact_0.5.2

Release causact 0.3.3

Prepare for release:

  • devtools::build_readme()
  • Check current CRAN check results
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Polish NEWS

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()

PYTHON DEPENDENCIES installed successfully but dag_numpyro() not working....

By following the instructions carefully & step-by-step, I have installed the new version of the causact and the PYTHON DEPENDENCIES packages successfully. However, dag_numpyro() does not work. Every time I tried #### STEP 3: TEST THE INSTALLATION, it gives me the following error

drawsDF = graph %>% dag_numpyro() ## see "sample:..."
Error in py_run_string_impl(code, local, convert) :
TypeError: the first argument must be callable
Run reticulate::py_last_error() for details.
causact

I've also tried deinstalling & reinstalling everything: R+RStudio+causact+Dependencies, but still have no luck!
Any suggestion? Thanks!

P.S. Last year, I have installed and used v0.4.2 of the causact package & dependencies without any problem.

Simple graphs without rhs issue warning message

dag_create() %>% dag_node(descr = "Age") %>% dag_render() triggerred warning message due to changes in dplyr 1.0.

Warning message:
... is not empty.

We detected these problematic arguments:

  • needs_dots

These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?

dagp_plot is not scaleable

The dagp_plot() plotting will not scale well beyond around 60-100+ parameters. An intelligent way to show a subset of data when there are too many parameters is required.

Create/Organize/Publish more examples

currently, you can find files testRedesign.R, rethinkingExamples.R, rethinkingCafe.R that are used to show-off causact functionality.

An examples folder should be created to consolidate/organize these examples and add various R files implementing other popular Bayesian examples from here:

https://greta-stats.org/articles/example_models.html#advanced-bayesian-models (all the examples below the advanced Bayesian model section have not been implemented in causact)

An rmd file for showing off these examples might be helpful.

Release causact 0.4.1

Prepare for release:

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()

Release causact 0.5.3

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)

Miscellaneous requests

Hi, I am the reviewer from JOSS (openjournals/joss-reviews#4415). Here are some suggestions for your package (to be continued). I will check it off once you have completed the task.

  • add Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
  • Please add at least one unit test for all exported functions.
  • There is no vignette available for this package. Are eightSchools.R and gretaExamples.R example scripts? If so, please create a folder called vignettes and move these two folders there.

README should have lifecycle badge to indicate status

In particular further down the README it states that

> NOTE: Package is under active development. Breaking changes are to be expected.  Feedback and encouragement is appreciated via github issues or Twitter (https://twitter.com/preposterior).

however this information should be immediately conveyed to users via a lifecycle badge at the top of the README.

See for example https://lifecycle.r-lib.org/articles/stages.html

openjournals/joss-reviews#4415

Release causact 0.4.2

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()

random vectors for multivariate explanatory variables

There does not seem to be an obvious mechanism to define random vectors in a DAG, for example, to use in a multivariate regression. Is there an option/ability to apply dag_plate to index over the X_i's in a multivariate random vector (where the underlying data object would then be a design matrix).

Ability to combine graphs in the workflow.

Currently, a graph like this:

image

requires alot of code. It would be nice to be able to break the code up into smaller chunks and then to combine the chunks in a subsequent step.

Here is the code to recreate the picture:

`r
library(tidyverse)
library(greta)

Simulated cafe data

a <- 3.5 # average morning wait time 14.1
b <- (-1) # average difference afternoon wait time
sigma_a <- 1 # std dev in intercepts
sigma_b <- 0.5 # std dev in slopes
rho <- (-0.7) # correlation between intercepts and slopes

Vector of means

Mu <- c( a , b )

Covaraince matrix

sigmas <- c(sigma_a,sigma_b) # standard deviations
Rho <- matrix( c(1,rho,rho,1) , nrow=2 ) # correlation matrix

now matrix multiply to get covariance matrix

Sigma <- diag(sigmas) %% Rho %% diag(sigmas)

Simulate cafes

N_cafes <- 20
library(MASS)
set.seed(5) # used to replicate example
vary_effects <- mvrnorm( N_cafes , Mu , Sigma )

a_cafe <- vary_effects[,1]
b_cafe <- vary_effects[,2]

library(tidyverse)
ggplot(data.frame(a_cafe = a_cafe, b_cafe = b_cafe), aes(x=a_cafe,y=b_cafe)) + geom_point()

Simulate observations

N_visits <- 10
afternoon <- rep(0:1,N_visits*N_cafes/2)
cafe_id <- rep( 1:N_cafes , each=N_visits )
mu <- a_cafe[cafe_id] + b_cafe[cafe_id]afternoon
sigma <- 0.5 # std dev within cafes
wait <- rnorm( N_visits
N_cafes , mu , sigma )
waitDF <- data.frame( cafe=cafe_id , afternoon=afternoon , wait=wait )

create DAG using causact package

graph = dag_create() %>%
dag_node("Obs Wait Time","x",
data = waitDF$wait,
rhs = normal(mu,sig)) %>%
dag_node("Exp Obs Wait Time","mu",
rhs = alpha_cafe + beta_cafe * afternoon,
child = "x") %>%
dag_node("Exp Morning Wait Time","alpha_cafe",
child = "mu",
rhs = Y[,1],
extract = TRUE) %>%
dag_node("Exp Wait Time Diff.","beta_cafe",
child = "mu",
rhs = Y[,2],
extract = TRUE) %>%
dag_node("Cafe","cafe",
child = "mu",
data = waitDF$cafe) %>%
dag_node("Afternoon","afternoon",
child = "mu",
data = waitDF$afternoon) %>%
dag_node("Exp Wait and Wait Diff","Y",
rhs = multivariate_normal(Mean,Sigma),
child = c("alpha_cafe", "beta_cafe")) %>%
dag_node("Uncorr. Exp. Effects","Mean",
rhs = cbind(alpha,beta),
child = "Y") %>%
dag_node("Covar Matrix","Sigma",
rhs = Sigmas %% Rho %% Sigmas,
child = "Y") %>%
dag_node("Uncorr. Morning Wait","alpha",
rhs = normal(0,10),
child = "Mean",
extract = FALSE) %>%
dag_node("Uncorr. Wait Diff","beta",
rhs = normal(0,10),
child = "Mean",
extract = FALSE) %>%
dag_node("Uncorr Std Devs","Sigmas",
child = "Sigma",
rhs = makeDiagMatrix(c(sig_a,sig_b))) %>%
dag_node("Obs Wait Std Dev","sig",
child = "x",
rhs = cauchy(0,1,truncation = c(0,Inf))) %>%
dag_node("Morning Wait Std Dev","sig_a",
child = "Sigmas",
rhs = cauchy(0,1,truncation = c(0,Inf))) %>%
dag_node("Wait Diff Std Dev","sig_b",
child = "Sigmas",
rhs = cauchy(0,1,truncation = c(0,Inf))) %>%
dag_node("Wait Corr. Matrix","Rho",
child = "Sigma",
rhs = lkj_correlation(2)) %>%
dag_plate("Observation","j",
nodeLabels = c("x","mu","cafe","afternoon")) %>%
dag_plate("Cafes","cafe",
nodeLabels = c("alpha","beta","alpha_cafe","beta_cafe"),
data = waitDF$cafe)

graph %>% dag_render(width = 2400, height = 600)
`

Release causact 0.3.1

Prepare for release:

  • Check that description is informative
  • Check licensing of included files
  • devtools::build_readme()
  • usethis::use_cran_comments()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • Update cran-comments.md
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Update install instructions in README
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Release causact 0.5.0

Prepare for release:

  • Check current CRAN check results
  • Polish NEWS
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Error with installation of numpyro

Following installation instructions leads to:

> drawsDF = graph %>% dag_numpyro() ## see "sample:..."
Error in py_run_string_impl(code, local, convert) : 
  ModuleNotFoundError: No module named 'jax.linear_util'

Release causact 0.5.4

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)

Release causact 0.3.2

Prepare for release:

  • devtools::build_readme()
  • Check current CRAN check results
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Polish NEWS

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()

Create more comprehensive readme.md

The current readme.md file is very short and gives the user no guidance about installation / useage / etc. See the ggplot2, DiagrammeR, greta, etc. readme's for examples.

Release causact 0.5.1

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)

Error when trying to merge nodes

I suspect this is not a bug but rather either something that just can not be done, or I am doing something wrong.

This first code block works fine. Basic problem is to have linear predictors for coefficients of a beta distribution for the probability parameter of some binomial trials under two conditions.

`library(tidyverse)
library(causact)

set.seed(2023)
n_subjects <- 200

df <- tibble(id=1:(n_subjects)) %>%
mutate(
cond = sample(c('ctrl','trtmt'),nrow(.), replace=TRUE) ,
disc_cov = sample(c('a','b','c'),nrow(.), replace=TRUE),
cont_cov = rnorm(nrow(.), 0,1),
trials = sample(5:25, nrow(.), replace=TRUE),
lin_pred = 0.2cont_cov+0.3case_when(disc_cov=='a'~1, disc_cov=='b'0, disc_cov=='c' -1)+if_else(cond=='trtmt',1,0),
prob = 1 / (1+exp(-lin_pred)),
successes = rbinom(nrow(.),trials, prob)
) %>%
select(successes, cond, trials)

graph <- dag_create() %>%
dag_node("Successes", "s", rhs = binomial(trials, prob), data = df$successes) %>%
dag_node("Probability", "prob", child = "s", rhs = beta(lin_pred_a, lin_pred_b)) %>%
dag_node("Linear Predictor A", "lin_pred_a", rhs = c_cond_a, child = "prob") %>%
dag_node("Linear Predictor B", "lin_pred_b", rhs = c_cond_b, child = "prob") %>%
dag_node("Cond Coefficient A", "c_cond_a", rhs = exponential(1), child = c("lin_pred_a")) %>%
dag_node("Cond Coefficient B", "c_cond_b", rhs = exponential(1), child = c("lin_pred_b")) %>%
dag_node("Trials", "trials", child = "s", data = df$trials) %>%
dag_plate("Cond A", "cond_a", nodeLabels = c("c_cond_a"), data = df$cond, addDataNode = TRUE) %>%
dag_plate("Cond B", "cond_b", nodeLabels = c("c_cond_b"), data = df$cond, addDataNode = TRUE) %>%
dag_plate("Observation", "i", nodeLabels = c("s", "trials", "prob", "lin_pred_a", "lin_pred_b","cond_a","cond_b"))

graph %>% dag_render()
drawsDF <- graph %>% dag_numpyro()
drawsDF %>% dagp_plot()`

Plots, etc of the above look ok.

But, this code

graph <- dag_create() %>% dag_node("Successes", "s", rhs = binomial(trials, prob), data = df$successes) %>% dag_node("Probability", "prob", child = "s", rhs = beta(lin_pred_a, lin_pred_b)) %>% dag_node("Linear Predictor A", "lin_pred_a", rhs = c_cond_a, child = "prob") %>% dag_node("Linear Predictor B", "lin_pred_b", rhs = c_cond_b, child = "prob") %>% dag_node("Cond Coefficient A", "c_cond_a", rhs = exponential(1), child = c("lin_pred_a")) %>% dag_node("Cond Coefficient B", "c_cond_b", rhs = exponential(1), child = c("lin_pred_b")) %>% dag_node("Trials", "trials", child = "s", data = df$trials) %>% dag_plate("Cond", "cond", nodeLabels = c("c_cond_a", "c_cond_b"), data = df$cond, addDataNode = TRUE) %>% dag_plate("Observation", "i", nodeLabels = c("s", "trials", "prob", "lin_pred_a", "lin_pred_b","cond"))

gives an error : Error in dag_edge(., from = childrenNewEdgeDF$parentNodeID, to = childrenNewEdgeDF$childID) :
You have attempted to connect 8 to itself, but this would create a cycle and is not a Directed Acyclic Graph. You cannot connect a node to itself in a DAG.

What I am trying to do is merge the two nodes "cond_a" and "cond_b" from the first example into one node "cond" in the second, as both refer to the same data field (df$cond)

I suspect this is not an actual bug - but maybe there is a way to do it?

Log more issues/bugs

Field test the causact package. Everytime I try a new model, I inevitably find an issue that is not supported by the current code base. Please try new models and report bugs.

Release causact 0.5.5

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)

Make causact more generic

Is there a plan to make causact more generic? By that I mean entertaining two scenarios:

  1. Make causact able to generate static plots for any kind of generative DAG, i.e., not restricted to the node types currently supported by Greta

  2. Make causact able to generate (and possibly run) code for other PPLs (i.e., Stan, Numpyro, Turing.jl, etc). Perhaps a plugin architecture could be devised so that interested parties could develop code translation plugins for their PPL of choice.

Keep up the good work!!! :)

Release causact 0.4.0

Prepare for release:

  • devtools::build_readme()
  • Check current CRAN check results
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • Polish NEWS
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Handling deterministic child node

See the issue described here: https://github.com/prabinov42/Greta_causact_question/blob/main/greta_causact_2.md

If a child node is deterministic and not used as a parent node, causact incorrectly labels it as a distribution.

Temporary fix (use mcmc=FALSE):

graph %>% dag_greta(mcmc=FALSE)

Then delete distribution(lbs) <- kgs * 2.2 #LIKELIHOOD in the output code so you are left with the below:

## The below greta code will return a posterior distribution 
## for the given DAG. Either copy and paste this code to use greta
## directly, evaluate the output object using 'eval', or 
## or (preferably) use dag_greta(mcmc=TRUE) to return a data frame of
## the posterior distribution: 
lbs <- as_data(df$weight_lbs)   #DATA
mu     <- normal(mean = 90, sd = 10)      #PRIOR
sigma  <- exponential(rate = 1/20)        #PRIOR
kgs    <- normal(mean = mu, sd = sigma)   #PRIOR
lbs    <- kgs * 2.2   #OPERATION
#### DELETED LINE LOCATION ##########
gretaModel  <- model(kgs,mu,sigma)   #MODEL
meaningfulLabels(graph)
draws       <- mcmc(gretaModel)              #POSTERIOR
drawsDF     <- replaceLabels(draws) %>% as.matrix() %>%
                dplyr::as_tibble()           #POSTERIOR
tidyDrawsDF <- drawsDF %>% addPriorGroups()  #POSTERIOR

Run the above code to get drawsDF. If you now do drawsDF %>% dagp_plot(), you will see kgs as a column.

Create a dag_stan() workflow

Currently, greta is the only computational engine that is supported. Using the rstan package, a dag_stan() function should also be feasible to implement.

dag_render hangs

Trying to build a simple multilevel model I came across this weird behaviour.
Three very similar models below, the last one hangs (i.e. doesn't render), the other two are fine.

library(tidyverse)
library(greta)
library(causact)

df <- tribble(
  ~time, ~subject, ~outcome, ~condition, ~covariate,
  11, "anbkv", 3, "treatment", "b",
  12, "anbkv", 3, "treatment", "b",
  13, "anbkv", 4, "treatment", "b",
  14, "anbkv", 4, "treatment", "b",
  15, "anbkv", 4, "treatment", "b",
  11, "avpov", 3, "control", "b",
  12, "avpov", 3, "control", "b",
  13, "avpov", 3, "control", "b",
  14, "avpov", 3, "control", "b",
  15, "avpov", 3, "control", "b",
  11, "bkydk", 2, "treatment", "a",
  12, "bkydk", 3, "treatment", "a",
  13, "bkydk", 3, "treatment", "a",
  14, "bkydk", 3, "treatment", "a",
  15, "bkydk", 3, "treatment", "a",
  11, "fryng", 3, "control", "a",
  12, "fryng", 3, "control", "a",
  13, "fryng", 3, "control", "a",
  14, "fryng", 2, "control", "a",
  15, "fryng", 3, "control", "a",
  11, "fygyl", 3, "treatment", "b",
  12, "fygyl", 3, "treatment", "b",
  13, "fygyl", 4, "treatment", "b",
  14, "fygyl", 3, "treatment", "b",
  15, "fygyl", 3, "treatment", "b",
  11, "fzvmk", 3, "control", "a",
  12, "fzvmk", 2, "control", "a",
  13, "fzvmk", 2, "control", "a",
  14, "fzvmk", 2, "control", "a",
  15, "fzvmk", 2, "control", "a",
  11, "gjleq", 3, "treatment", "a",
  12, "gjleq", 3, "treatment", "a",
  13, "gjleq", 3, "treatment", "a",
  14, "gjleq", 3, "treatment", "a",
  15, "gjleq", 3, "treatment", "a",
  11, "gnwyo", 3, "control", "b",
  12, "gnwyo", 3, "control", "b",
  13, "gnwyo", 3, "control", "b",
  14, "gnwyo", 3, "control", "b",
  15, "gnwyo", 2, "control", "b"
)

# Example 1 - works fine
dag_create() %>%
  dag_node("outcome", "o", rhs = normal(mu, sd), data = df$outcome) %>%
  dag_node("mu", "mu", rhs = normal(2.5, 2), child = "o") %>%
  dag_node("sd", "sd", rhs = exponential(1), child = "o") %>%
  dag_plate("Condition Effect", "i", nodeLabels = c("mu"), data = df$condition) %>%
  dag_plate("Subject Effect", "s", nodeLabels = c("mu", "sd"), data = df$subject) %>%  #sd here
  dag_plate("Observation", "n", nodeLabels = c("o")) %>%
  dag_render() # renders

# Example 2 - works fine
dag_create() %>%
  dag_node("outcome", "o", rhs = normal(mu, sd), data = df$outcome) %>%
  dag_node("mu", "mu", rhs = normal(2.5, 2), child = "o") %>%
  dag_node("sd", "sd", rhs = exponential(1), child = "o") %>%
  dag_plate("Condition Effect", "i", nodeLabels = c("mu", "sd"), data = df$condition) %>%   #sd here
  dag_plate("Subject Effect", "s", nodeLabels = c("mu"), data = df$subject) %>%
  dag_plate("Observation", "n", nodeLabels = c("o")) %>%
  dag_render() # renders

# Example 3 - hangs
dag_create() %>%
  dag_node("outcome", "o", rhs = normal(mu, sd), data = df$outcome) %>%
  dag_node("mu", "mu", rhs = normal(2.5, 2), child = "o") %>%
  dag_node("sd", "sd", rhs = exponential(1), child = "o") %>%
  dag_plate("Condition Effect", "i", nodeLabels = c("mu", "sd"), data = df$condition) %>%  #sd here
  dag_plate("Subject Effect", "s", nodeLabels = c("mu", "sd"), data = df$subject) %>%      #AND sd here
  dag_plate("Observation", "n", nodeLabels = c("o")) %>%
  dag_render() # hangs

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] causact_0.4.0 greta_0.3.1 bayesplot_1.8.1 janitor_2.1.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
[8] purrr_0.3.4 readr_2.1.1 tidyr_1.1.4 tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1

loaded via a namespace (and not attached):
[1] fs_1.5.1 lubridate_1.8.0 webshot_0.5.2 RColorBrewer_1.1-2 progress_1.2.2 httr_1.4.2
[7] rprojroot_2.0.2 R.cache_0.15.0 tools_4.1.2 backports_1.4.0 utf8_1.2.2 R6_2.5.1
[13] DBI_1.1.1 colorspace_2.0-2 withr_2.4.3 tidyselect_1.1.1 prettyunits_1.1.1 compiler_4.1.2
[19] cli_3.1.0 rvest_1.0.2 xml2_1.3.3 scales_1.1.1 ggridges_0.5.3 tfruns_1.5.0
[25] systemfonts_1.0.2 digest_0.6.29 R.utils_2.11.0 rmarkdown_2.11 svglite_2.0.0 base64enc_0.1-3
[31] pkgconfig_2.0.3 htmltools_0.5.2 parallelly_1.29.0 styler_1.6.2 highr_0.9 dbplyr_2.1.1
[37] fastmap_1.1.0 htmlwidgets_1.5.4 rlang_0.4.12 readxl_1.3.1 rstudioapi_0.13 visNetwork_2.1.0
[43] generics_0.1.1 jsonlite_1.7.2 tensorflow_2.6.0 R.oo_1.24.0 magrittr_2.0.1 kableExtra_1.3.4
[49] Matrix_1.3-4 Rcpp_1.0.7 munsell_0.5.0 fansi_0.5.0 clipr_0.7.1 reticulate_1.22
[55] R.methodsS3_1.8.1 lifecycle_1.0.1 stringi_1.7.6 whisker_0.4 yaml_2.2.1 snakecase_0.11.0
[61] plyr_1.8.6 grid_4.1.2 parallel_4.1.2 listenv_0.8.0 crayon_1.4.2 lattice_0.20-45
[67] haven_2.4.3 cowplot_1.1.1 hms_1.1.1 knitr_1.36 pillar_1.6.4 igraph_1.2.9
[73] datapasta_3.1.0 codetools_0.2-18 reprex_2.0.1 glue_1.5.1 evaluate_0.14 modelr_0.1.8
[79] png_0.1-7 vctrs_0.3.8 tzdb_0.2.0 cellranger_1.1.0 gtable_0.3.0 rematch2_2.1.2
[85] future_1.23.0 assertthat_0.2.1 xfun_0.28 broom_0.7.10 coda_0.19-4 viridisLite_0.4.0
[91] DiagrammeR_1.0.6.1 globals_0.14.0 ellipsis_0.3.2 here_1.0.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.