metrumresearchgroup / bbr Goto Github PK
View Code? Open in Web Editor NEWR interface for model and project management
Home Page: https://metrumresearchgroup.github.io/bbr/
License: Other
R interface for model and project management
Home Page: https://metrumresearchgroup.github.io/bbr/
License: Other
User should have comprehensive documentation of package functions and their usage. Documentation should satisfy the following requirements:
Implementation Details
roxygen2
packagehtml
with the pkgdown
package and hosted through Github at a site linked in the rbabylon README.No tests necessary because this only touches documentation.
As a developer, I would like to know that all rbabylon
tests pass on all three of the following:
Currently the "oldest MPN snapshot that is supported" is 2020-03-24 though note that this does not mean earlier snapshots or versions of dependencies will not work, only that this is the oldest snapshot that has been tested and guaranteed to work.
Currently copy_model_from()
optionally updates the $PROB
line to reflect the new description and run_id. @callistosp has voiced that it would be nice if it would also update all other occurrences of the run_id throughout the control stream. He has some ad-hoc code which does this:
txt <- readLines(mod2) %>%
gsub(paste0('RUN# ', runno), paste0('RUN# ', runno2), .) %>%
gsub(paste0(runno, '.MSF'), paste0(runno2, '.MSF'), .) %>%
gsub(paste0(runno, '.ext'), paste0(runno2, '.ext'), .) %>%
gsub(paste0(runno, '.tab'), paste0(runno2, '.tab'), .) %>%
gsub(paste0(runno, 'par.tab'), paste0(runno2, 'par.tab'),.)
We need to consider whether we want this to be in the copy_model_from()
function, or as a helper that could be used like mod102 <- copy_model_from(mod101, .new_model = "102", .description = "new model desc...") %>% update_mod_file()
We also need to consider that the run_id will be in different places in the file depending on the estimation method, etc.
User should be able to use the based_on
field to unambiguously refer to models, even in cases where nested directories may contain different models with the same base filename.
Requirements
based_on
field stores file paths relative to the yaml the field is stored in (without extensions).run_log()
output in such a way that the user can trace model ancestry unambiguously.User should be able to specify the indices associated with $OMEGA and $SIGMA blocks that are parsed from the control stream with param_labels()
function. This is necessary for, among other things, joining the parameter labels to the parameter estimates (the tibble output from model_summary() %>% param_estimates()
).
Requirements
param_labels()
output tibble.apply_indices()
to add indices to OMEGA and SIGMA blocks for joining to param_estimates()
outputblock()
helper for creating logical vectors that specify which elements are diagonalA user should be able to set a global modeling directory so that they do not need to pass the modeling directory path to each modeling function call. If this global path is set, a user should also be able to specify a model with numeric input, which would be converted to a file path by adding it to the end of the specified global model directory.
Requirements
read_model()
, new_model()
, copy_model_from()
, submit_model()
, or model_summary()
.As a user, I would like to be able to summarize multiple models in batch and have some subset of the information in those summaries extracted into a tibble similar to bbi_run_log_df
(the tibble output from run_log()
). I would also like to be able to easily append that table onto a bbi_run_log_df
.
submit_model()
. This will be the preferred (recommended) method because it will track added metadata along with the model run.model_path
-- path to the control stream filedescription
-- text description of this modeling run. This is analogous to the $PROBLEM
line in the control stream.based_on
-- model run id's for any model that this model is "based on" or inherits from.tags
-- user-defined tags for grouping and organizing model runsbbi_args
-- any arguments that can be passed through to bbi
via the submit_model(.args=list())
argument can also be passed here. Arguments passed in the YAML will override arguments passed in the functions call. Both will override any default or CLI arguments.$PROBLEM
line will be filled with the new model run id and description. The rest of the new control stream file will be an exact copy of the parent.model_path
field in the new YAML will point to the newly copied control stream.description
field in the new YAML will be filled with user provided description.based_on
filed in the new YAML will be filled with run id from the parent model.based_on
field.tags
field in the new model YAML.A user can use either a .yaml or .yml file for the model file.
Requirements
Currently some of the unit tests are failing on the 2020-06-08 MPN snapshot, both in R 4.0 and R 3.6. We need to do the following:
The relevant docker container from drone is here:
docker pull 906087756158.dkr.ecr.us-east-1.amazonaws.com/mpn-complete:2020-06-08
We had previously been using 2020-03-24
in drone, but we suspect it started failing between the 2020-05-09
and 2020-05-27
snapshots.
A user should be prevented from using a version of bbi
that is too old to be compatible with the version of rbabylon
they are using.
The tibble output from run_log()
contains all information necessary to call model_summary()
on each model in the log. The model_summaries()
function should have a method that accepts a run_log tibble and gets the summary for each model.
There are two obvious options, and it's possible that we want both:
list_of_models %>% model_summaries()
summary_log()
that would be a tibble containing relevant information (objective function value, parameter estimates, etc.) for each model in the input tibble. @janelleh had something like that in a google sheet that she had created custom and I believe is currently filling manually.The user should be able check on the status of model runs from rbabylon and reliably get one of the following responses: not submitted
, in queue
, running
, finished
, failed
.
check_nonmem_progress
) should accept either a model object or a valid path to the model (YAML, ctl, or output dir).model_summary()
should call this under the hood and return a meaningful response for models that are not successfully finished
.run_log()
should call this under the hood and add a column for model_status
reflecting the status of each model. Consider if this can/should be done in parallel.This will likely require looking at output folders, as well as something like qstat
(or maybe top
for the local models?). It is not clear how this should be implemented, but it will likely require some consultation with both the scientists and @shairozan to consider all the edge cases.
babylon.yaml
file in a specified directorybbi init
CLI command and correctly parses through the NONMEM directoryThe code that builds a model object should contain a model_type
field that will be used to specify different classes for bbi_nonmem_model
and bbi_stan_model
. The model_type
field will be required to be in a YAML for it be a valid model spec file.
Create a function which takes a run_results
object and outputs a parameter table. Ideally can be implemented in a pipeline using purrr::map
to process multiple runs in parallel.
Proposed syntax:
param_tab(res)
read_results(101) %>% param_tab()
param_tab(res, latex=T)
Input:
run_results
object created from read_results()
Output:
Table containing pertinent parameter information for the given run. Default output style is data.frame
, can alternatively request LaTeX formatted table.
Example row:
Parameter | Unit | Struct | Estimate | 95% CI
------- | ------- | ----- | -----------
CL | L/hr | \theta_{1} | 1.2 | [0.9 - 1.4]
Future extensions:
Questions about how include flexibility for including different amounts of information easily? For example, easily generating a parameter table for all structural tables separate from a parameter table for covariate effects only. This will be dealt with in future issues.
A user should be able to use the paths included in the run_log()
output to unambiguously find model input and output files. A user should be able to do this, even when the models were run on a different machine or in a different directory than the one in which they are currently being read (by run_log()
).
A user should be able to create a bbi_nonmem_model
object either by specifying a path to a control stream and giving a description, or by pointing to a valid YAML model spec file.
This model object should be able to do the following:
submit_model()
in order to trigger a NONMEM runmodel_summary()
to gather output information on a finished model runbbi_stan_model
class eventually that will conform to the same interfaceI would comment here
Results may be generated from after a call to submit_model()
, however for long running processes or modeling jobs queued to the grid, having the result object from the submission is not always possible.
A user should be able to get to the summary by passing the original model object that was passed to submit_model()
or just a path to the output directory.
User should be able to make modifications to the model object and relevant YAML file with composable functions that take either the model object or a character string pointing to the YAML. The following fields should be able to be modified:
Currently, add_tags()
and other similar helper functions built on modify_model_field
expect either a single character object or a vector of characters, e.g. mod %>% add_tags(c("tag1", "tag2"))
. It would be nice to use the following syntax instead: mod %>% add_tags("tag1", "tag2")
Use ...
and typecast to character vector to evaluate all arguments as potential tags.
add_tags <- function(.mod, ...) {
.mod <- modify_model_field(.mod = .mod,
.field = YAML_TAGS,
.value = c(...),
.append = TRUE)
return(.mod)
}
bbi nonmem run [model]
) and passes to subprocesslocal
or sge
modebabylon.yaml
, falling back to looking in the model directory by defaultbbi_nonmem_spec
User should be able to specify a custom name for the .ext
file when calling model_summary()
. This should be passed in via .bbi_args()
.
Currently when you run submit_model()
on a model which has already run (i.e. a results folder exists in your model directory), the model is not run. No warning is given that the model is not being submitted to run. The only way to get around this is to delete the results folder.
.overwrite
argument to submit_model()
would be useful for this situation. This argument would default to FALSE
to prevent people from accidentally overwriting resultssubmit_model(..., .overwrite=F)
is run, but the model will not be submitted due to a results folder already existingThe tibble output by run_log()
currently contains many list columns (for example, tags and decisions) which can have multiple entries for a single row. This makes it difficult to interact with the log without doing some dplyr gymnastics. It is likely that the vctrs
package could be a good approach for this.
The run_log()
output tibble should have methods like filter()
(others?) that can operate on these list columns. For example:
run_log() %>% has_tag("iteration 3")
or run_log() %>% filter("iteration 3" %in% tags)
should return the rows in the run log that contain the tag "iteration 3"
. We need to decide which paradigm (implementing filter()
method or having our own methods like has_tag()
) is a better approachAs mentioned above, the vctrs
package seems like it could be a good option for implementing this. Specifically, that run_log()
should be modified to have columns that are vctrs
objects instead of list columns. We should investigate that and, if it does seem like a good fit, make recommendations for where else this could be used.
A user should be able to pass a list of model objects (or a character vector of paths) to submit_models()
function. The implemented code should convert them to most efficiently call bbi in as few system calls as possible.
_result
object or a file pathCreate a function with similar syntax to existing jsonlite::read_json()
function which takes a json object output from Babylon as argument and creates a run_results
object.
Proposed syntax:
res <- read_results("examples/101.json")
res <- read_results(101, dir="examples")
Input:
JSON file containing information from NONMEM run. This will be produced by a call to bbi nonmem summary --json
Output:
A list
of list
s for post-processing: theta_df, omega_df, sigma_df, run_details, etc.
Converting each of these to separate data.frame
s suitable for downstream use (e.g. get_estimate_table(), get_shrinkage_table()) will be part of future issues.
A user should be able to call copy_model_from()
without worrying about accidentally overwriting an existing model file.
Requirements
#79 could just be a specialized case of this general functionality
As a user, I would like to be able to summarize multiple models in batch.
The model YAML must serve as a "source of truth" for the data that it stores about the model. The user needs to be sure that any model object held in memory is up-to-date with the YAML on disk.
Requirements
reconcile_yaml()
function that can be used to pull changes from the YAML into the model object.Just an idea that came out of our meeting today (@riggsmm and Adam L)
Something like: document records that get commented out at run time
C
Or maybe check to see if data set has changed and document then?
Calling submit_model(.wait = FALSE)
dies with the following error.
Error in strict_mode_error(err_msg) :
Process object must have the following named elements to be converted to an S3 object of class `babylon_process`: `process, stdout, bbi, cmd_args, working_dir` but the following keys are missing: `stdout`
Object has the following keys: process, bbi, cmd_args, working_dir
When a model is still running and model_summary()
is called on the results objects, the OUTPUT is tailed multiple times until either 1) the model completes, or 2) number of attempts (30?) is exceeded. This results in the tail of the OUTPUT being continuously printed to console.
iteration 324 MCMCOBJ= 29416.952359567695
iteration 325 MCMCOBJ= 29413.373263451384
iteration 326 MCMCOBJ= 29490.598695438199
iteration 327 MCMCOBJ= 29430.406526993076
iteration 328 MCMCOBJ= 29416.239276600747
iteration 329 MCMCOBJ= 29428.134365432856
iteration 330 MCMCOBJ= 29445.970213943467
iteration 331 MCMCOBJ= 29435.262195693984
---
Model is still running. Tail of `106/OUTPUT` file:
---
...
iteration 322 MCMCOBJ= 29399.495862674936
iteration 323 MCMCOBJ= 29437.381530674444
iteration 324 MCMCOBJ= 29416.952359567695
iteration 325 MCMCOBJ= 29413.373263451384
iteration 326 MCMCOBJ= 29490.598695438199
iteration 327 MCMCOBJ= 29430.406526993076
iteration 328 MCMCOBJ= 29416.239276600747
iteration 329 MCMCOBJ= 29428.134365432856
iteration 330 MCMCOBJ= 29445.970213943467
iteration 331 MCMCOBJ= 29435.262195693984
---
Model is still running. Tail of `106/OUTPUT` file:
---
If model is still running, tail of OUTPUT should only be printed a single time.
@shairozan can you think of a good way to test whether the flags here are in sync with what's in babylon? Like, is there a way for bbi --help
to return something machine readable that I could test against?
@dpastoor is that kind of testing necessary? It does seem like we should have some way to make sure these flags stay in sync, but it's not obvious to me how to do it.
The user should only need to create one babylon.yaml configuration file, and it should be found by all bbi
calls that need it automatically (if it is in the model directory).
Requirements:
babylon.yaml
will look for it (and error if it can't be found) before calling out to bbi
babylon.yaml
it will be looked for in getOption("rbabylon.model_directory")
submit_model()
calls explicitly.model_path
-- character scaler with path to control streamdescription
-- character scaler with text description of modelbased_on
-- (optional) list of model run id's that are parents of model. If not provided, assumed to be a root nodetags
-- (optional) list of any user tags associated with modelbbi_args
-- (optional) list of arguments passed through to babylon for this model runrun_id
column that can be mapped to any mention of a given model in the based_on
column.The link at the bottom of the README page links to the below page, which cannot be found:
User should be able to parse parameter labels from control stream, similar to deprecated functionality from tidynm package.
Requirements
per discussion
https://www.mail-archive.com/[email protected]/msg06885.html
In a model where the $SIGMA starting values were not specified as a block, displaying the the parameter estimates with the following command resulted in the table below. After talking with Matt and @callistosp the inclusion of SIGMA(2,1) is not the preferred behavior.
model_summary(mod104) %>%
param_estimates()
names | estimate | stderr | fixed |
---|---|---|---|
OMEGA(3,3) | 0.1693070 | 1.97401e-02 | 0 |
SIGMA(1,1) | 0.0397749 | 1.25788e-03 | 0 |
SIGMA(2,1) | 0.0000000 | 1.00000e+10 | 1 |
SIGMA(2,2) | 0.8410060 | 2.87706e+00 | 0 |
In the .ctl file the starting values for $SIGMA were not specified as a block.
$SIGMA
0.05 ; 1 pro error
0.05 ; 2 add error
FINAL PARAMETER ESTIMATE
SIGMA - CORR MATRIX FOR RANDOM EFFECTS - EPSILONS ***
EPS1 EPS2
EPS1
+ 1.99E-01
EPS2
+ 0.00E+00 9.17E-01
STANDARD ERROR OF ESTIMATE
SIGMA - CORR MATRIX FOR RANDOM EFFECTS - EPSILONS ***
EPS1 EPS2
EPS1
+ 3.15E-03
EPS2
+ ......... 1.57E+00
As a user, I would like the bbi_config_log_df
tibble to contain two new columns which verify (using the md5 digests and file paths already present in that tibble) whether the file at the relevant file path matches the md5 digest stored in the tibble. If not, this would indicate that the model is "out-of-date" and should be refreshed in the appropriate way (i.e. probably re-run).
additional testing/banging on using a single root, where the model path is relative to that root.
submit_model("subdir/model")
given there should be no reason you can't just give submit_models one model, is there a reason to keep submit_model
case: user calls runlog to aggregate all models - they then want to extract key information out of model(s). Currently this could be done something along the lines of:
run_log() %>% pull(absolute_model_path)
%>% map_df(~ read_model(.x) %>% model_summary()
)
In this case, the model yaml needs to be double processed. Its not that much time, however given the case of multiple pipes or otherwise could add up unnecessarily.
New:
run_log() %>% pull(.mod) %>%
%>% map_df(~ .x %>% model_summary()
)
Better new given model_summary can handle vector/list of model objects:
run_log() %>% pull(.mod) %>% model_summary()
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.