metrumresearchgroup / bbr Goto Github PK

View Code? Open in Web Editor NEW

21.0 21.0 1.0 7.99 MB

R interface for model and project management

Home Page: https://metrumresearchgroup.github.io/bbr/

License: Other

R 94.45% AMPL 0.66% Shell 0.15% Visual Basic 6.0 4.74%

bbr's People

Contributors

Stargazers

Watchers

Forkers

cathyliao96

bbr's Issues

documentation triage around exported functions

Summary

User should have comprehensive documentation of package functions and their usage. Documentation should satisfy the following requirements:

Unexported (private) functions are not shown in the function reference index.
Function reference index is organized into sensible groups.
- see example in dplyr
S3 methods and other "families" of functions should be grouped together in the docs in such a way that their relationship and usage can be easily and unambiguously understood.

Implementation Details

Documentation will be generated in-code with the roxygen2 package
Documentation will be rendered into html with the pkgdown package and hosted through Github at a site linked in the rbabylon README.

Tests

No tests necessary because this only touches documentation.

Update for testing different snapshots and add dependency version constraints

Summary

As a developer, I would like to know that all rbabylon tests pass on all three of the following:

Current tip of CRAN (plus most recent releases of non-CRAN, usually Metrum, packages)
Most recent MPN snapshot
Oldest MPN snapshot that is supported

Currently the "oldest MPN snapshot that is supported" is 2020-03-24 though note that this does not mean earlier snapshots or versions of dependencies will not work, only that this is the oldest snapshot that has been tested and guaranteed to work.

Tests

no tests necessary because this is CI setup

Update all occurrences of run_id in .ctl or .mod file

Summary

Currently copy_model_from() optionally updates the $PROB line to reflect the new description and run_id. @callistosp has voiced that it would be nice if it would also update all other occurrences of the run_id throughout the control stream. He has some ad-hoc code which does this:

  txt <- readLines(mod2) %>%
    gsub(paste0('RUN# ', runno), paste0('RUN# ', runno2), .) %>%
    gsub(paste0(runno, '.MSF'), paste0(runno2, '.MSF'), .) %>%
    gsub(paste0(runno, '.ext'), paste0(runno2, '.ext'), .) %>%
    gsub(paste0(runno, '.tab'), paste0(runno2, '.tab'), .) %>%
    gsub(paste0(runno, 'par.tab'), paste0(runno2, 'par.tab'),.)

We need to consider whether we want this to be in the copy_model_from() function, or as a helper that could be used like mod102 <- copy_model_from(mod101, .new_model = "102", .description = "new model desc...") %>% update_mod_file()

We also need to consider that the run_id will be in different places in the file depending on the estimation method, etc.

based_on tag correctly stores relative paths

Summary

User should be able to use the based_on field to unambiguously refer to models, even in cases where nested directories may contain different models with the same base filename.

Requirements

The based_on field stores file paths relative to the yaml the field is stored in (without extensions).
This propogates through to the run_log() output in such a way that the user can trace model ancestry unambiguously.

Tests

tests/testthat/test-modify-model-field.R
- safe_based_on works on happy path
- safe_based_on works on happy path with vector
- safe_based_on works on nested dirs
- safe_based_on works with absolute path
- safe_based_on fails when yaml can't be found
- add_based_on() and replace_based_on() work correctly
tests/testthat/test-run-log.R
- run_log matches reference
- run_log() works correctly with nested dirs
tests/testthat/test-new-model.R
- new_model() .based_on arg works
tests/testthat/test-get-based-on.R
- get_based_on works happy path model object
- get_based_on works happy path character
- get_based_on works happy path run_log tibble
- get_based_on constructs ancestry manually
- get_based_on .check_exists=TRUE errors if model is gone
- get_based_on() behaves correctly on missing keys
- get_model_ancestry works happy path model object
- get_model_ancestry works happy path character
- get_model_ancestry works happy path run_log tibble
- get_model_ancestry errors if model is gone
- get_model_ancestry works on run_log tibble with more complicated ancestry

Specify parameter indices to enable joining to param_estimates() output

Summary

User should be able to specify the indices associated with $OMEGA and $SIGMA blocks that are parsed from the control stream with param_labels() function. This is necessary for, among other things, joining the parameter labels to the parameter estimates (the tibble output from model_summary() %>% param_estimates()).

Requirements

User should be able to specify indices for $OMEGA and $SIGMA blocks such that they are correctly parsed into the param_labels() output tibble.
All known patterns of specifying $OMEGA and $SIGMA blocks should be represented in the unit tests.

Added functionality

param-labels.R
- apply_indices() to add indices to OMEGA and SIGMA blocks for joining to param_estimates() output
- block() helper for creating logical vectors that specify which elements are diagonal

Tests

tests/testthat/test-param-labels.R
- build_matrix_indices() parses correctly for ref_block3
- build_matrix_indices() parses correctly for ref_diag3
- build_matrix_indices() parses correctly for ref_mix1
- build_matrix_indices() parses correctly for ref_mix2
- param_labels.bbi_nonmem_model() %>% apply_indices() matches tidynm reference for 101
- param_labels.bbi_nonmem_model() %>% apply_indices() matches tidynm reference for 510
- param_labels.bbi_nonmem_model() %>% apply_indices() matches tidynm reference for 510_fixed
- param_labels.character errors on vector
- block() parses correctly 5
- param_labels.character() %>% apply_indices() matches tidynm reference for 101
- param_labels.character() %>% apply_indices() matches tidynm reference for 510
- param_labels.character() %>% apply_indices() matches tidynm reference for 510_fixed
- parse_param_comment() called internally on PEX_BLOCK3
- parse_param_comment() called internally on PEX_BLOCK32
- parse_param_comment() called internally on PEX_BLOCK32S
- parse_param_comment() called internally on PEX_DBL_CMT
- parse_param_comment() called internally on PEX_KAT_ALL
- parse_param_comment() called internally on PEX_KAT_DBL2
- parse_param_comment() called internally on PEX_KAT_DBL2S
- parse_param_comment() called internally on PEX_KAT_PKPD
- parse_param_comment() called internally on PEX_SAM1

Take integer input for model interaction functions and store MODEL_DIR globally

Summary

A user should be able to set a global modeling directory so that they do not need to pass the modeling directory path to each modeling function call. If this global path is set, a user should also be able to specify a model with numeric input, which would be converted to a file path by adding it to the end of the specified global model directory.

Requirements

User can set a global modeling directory
Global modeling directory is passed by default to the following functions, but can be overriden by a user-provided argument: read_model(), new_model(), copy_model_from(), submit_model(), or model_summary().
If global modeling directory is set, user should be able to pass integer input to identify a model in the global modeling directory with a file name corresponding to the passed integer.

Tests

tests/testthat/test-new-model.R
- compare read_model() and new_model() objects with numeric input
tests/testthat/test-copy-model-from.R
- copy_from_model numeric
tests/testthat/test-submit-model.R
- submit_model(.dry_run=T) with numeric input parses correctly
tests/testthat/test-summary.R
- model_summary.numeric produces expected output
tests/testthat/test-workflow-dryrun.R
- basic workflow is correct using rbabylon.model_directory = 'model-examples'
- summary call is the same using rbabylon.model_directory = 'model-examples'
tests/testthat/test-workflow-bbi.R
- step by step create_model to submit_model to model_summary works
- copying model works and new models run correctly
- run_log() captures runs correctly

Add model_summary() info to run_log()

Summary

As a user, I would like to be able to summarize multiple models in batch and have some subset of the information in those summaries extracted into a tibble similar to bbi_run_log_df (the tibble output from run_log()). I would also like to be able to easily append that table onto a bbi_run_log_df.

Tests

tests/testthat/test-summary-log.R
- summary_log() errors with malformed YAML
- summary_log() returns NULL and warns when no YAML found
- summary_log() works correctly with nested dirs
- summary_log(.recurse = FALSE) works
- add_summary() works correctly
- add_summary() has correct columns
- summary_log works some failed summaries
- summary_log works all failed summaries

Take YAML as model input and enforce metadata schema

Summary

User can submit model runs by passing a YAML path to submit_model(). This will be the preferred (recommended) method because it will track added metadata along with the model run.
Code will enforce that YAML has the following keys
- model_path -- path to the control stream file
- description -- text description of this modeling run. This is analogous to the $PROBLEM line in the control stream.
- based_on -- model run id's for any model that this model is "based on" or inherits from.
- tags -- user-defined tags for grouping and organizing model runs
- bbi_args -- any arguments that can be passed through to bbi via the submit_model(.args=list()) argument can also be passed here. Arguments passed in the YAML will override arguments passed in the functions call. Both will override any default or CLI arguments.

Tests

tests/testthat/test-submit-model.R
- submit_model(.dry_run=T) returns correct command string
tests/testthat/test-utils.R
- check_nonmem_args parses correctly
- format_cmd_args parses correctly

Create new YAML model input file by inheriting from parent YAML

Summary

For the purposes of model development and iteration, a user can create a new YAML model file (and corresponding control stream file) based on a "parent" YAML model file.
User must provide a text description for the new model
Parent control stream will be copied and (optionally) the $PROBLEM line will be filled with the new model run id and description. The rest of the new control stream file will be an exact copy of the parent.
model_path field in the new YAML will point to the newly copied control stream.
description field in the new YAML will be filled with user provided description.
based_on filed in the new YAML will be filled with run id from the parent model.
User can optionally specify additional model run id's to be appended to the based_on field.
User can opt to inherit any tags from the parent model and/or specify new tags for the tags field in the new model YAML.

Tests

tests/testthat/test-copy-model-from.R
- copy_from_model creates accurate copy
- copy_from_model options work

use_bbi test hanging

support both yml and yaml extensions

Summary

A user can use either a .yaml or .yml file for the model file.

Requirements

All functions that search for model files and read them in will correctly recognize both .yaml and .yml model files.
All functions that write model files programmatically will use .yaml

Tests

tests/testthat/test-get-path-from-object.R
- find_yaml_file_path returns correct yaml path
- find_yaml_file_path returns correct yml path
- find_yaml_file_path errors when no file found
- find_yaml_file_path errors when two files found
tests/testthat/test-new-model.R
- new_model() works with yml path
- read_model() returns expected object from yml ext
- read_model() returns expected object from no ext specified
tests/testthat/test-submit-model.R
- submit_model(.dry_run=T) with .mod input and .yml file parses correctly
- submit_model(.dry_run=T) with file path no extension parses correctly

Fix unit tests broken by MPN 2020-06-08

Currently some of the unit tests are failing on the 2020-06-08 MPN snapshot, both in R 4.0 and R 3.6. We need to do the following:

Fix any failing unit tests
Make any changes to the code that are necessary

The relevant docker container from drone is here:
docker pull 906087756158.dkr.ecr.us-east-1.amazonaws.com/mpn-complete:2020-06-08

We had previously been using 2020-03-24 in drone, but we suspect it started failing between the 2020-05-09 and 2020-05-27 snapshots.

version constrain babylon within a particular rbabylon release

Summary

A user should be prevented from using a version of bbi that is too old to be compatible with the version of rbabylon they are using.

Tests

tests/testthat/test-use-bbi.R
- bbi_version and bbi_current_version match
- bbi_version returns nothing with fake bbi
tests/testthat/test-rbabylon.R
- check_bbi_exe() correctly errors or finds paths
- check_bbi_exe() errors on too low version
- check_bbi_version_constraint() works with semantic versioning

Pass output from run_log to model_summaries

The tibble output from run_log() contains all information necessary to call model_summary() on each model in the log. The model_summaries() function should have a method that accepts a run_log tibble and gets the summary for each model.

What should it return?

There are two obvious options, and it's possible that we want both:

A list of summary objects, the same as if the user had passed list_of_models %>% model_summaries()
Something like a summary_log() that would be a tibble containing relevant information (objective function value, parameter estimates, etc.) for each model in the input tibble. @janelleh had something like that in a google sheet that she had created custom and I believe is currently filling manually.

Monitor NONMEM model runs from rbabylon

Summary

The user should be able check on the status of model runs from rbabylon and reliably get one of the following responses: not submitted, in queue, running, finished, failed.

Implementation requirements

The function that does this (we'll call it check_nonmem_progress) should accept either a model object or a valid path to the model (YAML, ctl, or output dir).
model_summary() should call this under the hood and return a meaningful response for models that are not successfully finished.
run_log() should call this under the hood and add a column for model_status reflecting the status of each model. Consider if this can/should be done in parallel.

Implementation details

This will likely require looking at output folders, as well as something like qstat (or maybe top for the local models?). It is not clear how this should be implemented, but it will likely require some consultation with both the scientists and @shairozan to consider all the edge cases.

Create babylon.yaml from within R

Summary

User can create a default babylon.yaml file in a specified directory
Function uses a subprocess to call the bbi init CLI command and correctly parses through the NONMEM directory
Function checks that the NONMEM directory exists and is not empty (though it does not verify a valid NONMEM installation is present).

Tests

tests/testthat/test-rbabylon.R
- bbi_init creates babylon.yaml

class-ify the return type from reading in yaml to separate nonmem vs stan

Summary

The code that builds a model object should contain a model_type field that will be used to specify different classes for bbi_nonmem_model and bbi_stan_model. The model_type field will be required to be in a YAML for it be a valid model spec file.

Tests

tests/testthat/test-classes.R
- create_model_object() correctly assigns class
- create_model_object() fails with non-valid model type
tests/testthat/test-new-model.R
- compare read_model() and new_model() objects
- read_model() returns expected object
- yaml with no model type will fail

build parameter table output

Create a function which takes a run_results object and outputs a parameter table. Ideally can be implemented in a pipeline using purrr::map to process multiple runs in parallel.

Proposed syntax:
param_tab(res)
read_results(101) %>% param_tab()
~~param_tab(res, latex=T)~~

Input:
run_results object created from read_results()

Output:
Table containing pertinent parameter information for the given run. Default output style is data.frame, ~~can alternatively request LaTeX formatted table.~~

Example row:

Parameter | Unit | Struct | Estimate | 95% CI
------- | ------- | ----- | -----------
CL | L/hr | \theta_{1} | 1.2 | [0.9 - 1.4]

Future extensions:
Questions about how include flexibility for including different amounts of information easily? For example, easily generating a parameter table for all structural tables separate from a parameter table for covariate effects only. This will be dealt with in future issues.

Tests

tests/testthat/test-summary.R
- param_estimates() gets expected table
tests/testthat/test-workflow-composable-bbi.R
- step by step create_model to submit_model to model_summary works

model "export" to client

Working directory and paths in run log tibble are correct

Summary

A user should be able to use the paths included in the run_log() output to unambiguously find model input and output files. A user should be able to do this, even when the models were run on a different machine or in a different directory than the one in which they are currently being read (by run_log()).

Tests

tests/testthat/test-run-log.R
- run_log() works correctly with nested dirs
- config_log() works correctly with nested dirs
- add_config() works correctly with nested dirs
tests/testthat/test-workflow-bbi.R
- config_log() works correctly
- run_log() captures runs correctly
- add_config() md5 matches original md5

User can create model object with arguments or YAML

Summary

A user should be able to create a bbi_nonmem_model object either by specifying a path to a control stream and giving a description, or by pointing to a valid YAML model spec file.

This model object should be able to do the following:

keep track of where the relevant control stream, output directory, and YAML files are
persist and modify attributes like tags, description, decisions, and based_on
be passed to submit_model() in order to trigger a NONMEM run
be passed to model_summary() to gather output information on a finished model run
ideally, be extensible enough to add a bbi_stan_model class eventually that will conform to the same interface

Tests

tests/testthat/test-new-model.R
- compare read_model() and new_model() objects
- read_model() returns expected object
tests/testthat/test-modify-model-field.R
- modify_model_field() works correctly
- modify_model_field() de-duplication works
- add_tags() and replace_tags() work correctly
- add_decisions() and replace_decisions() work correctly
- replace_description() works correctly
tests/testthat/test-summary.R
- model_summary.bbi_nonmem_model produces expected output
- model_summary.character produces expected output
tests/testthat/test-workflow-dryrun.R
- basic workflow is correct from different working directories .
- basic workflow is correct from different working directories . change_midstream
- basic workflow is correct from different working directories ..
- basic workflow is correct from different working directories .. change_midstream
- basic workflow is correct from different working directories model-examples
- basic workflow is correct from different working directories model-examples change_midstream
- Summary call is the same after changing directories .
- Summary call is the same after changing directories . change_midstream
- Summary call is the same after changing directories ..
- Summary call is the same after changing directories .. change_midstream
- Summary call is the same after changing directories model-examples
- Summary call is the same after changing directories model-examples change_midstream
tests/testthat/test-workflow-bbi.R
- step by step create_model to submit_model to model_summary works

get all parameter values as a dataframe (Example issue)

I would comment here

as a user I can summarize model results

Summary

Results may be generated from after a call to submit_model(), however for long running processes or modeling jobs queued to the grid, having the result object from the submission is not always possible.

A user should be able to get to the summary by passing the original model object that was passed to submit_model() or just a path to the output directory.

Tests

tests/testthat/test-summary.R
- model_summary.bbi_nonmem_model produces expected output
- model_summary.character produces expected output
- model_summary.numeric produces expected output
- model_summary() fails predictably if it can't find some parts (i.e. model isn't finished)
- model_summary() fails predictably if no .lst file present
tests/testthat/test-new-model.R
- as_model() returns the correct type from a process object
tests/testthat/test-workflow-bbi.R
- step by step create_model to submit_model to model_summary works
tests/testthat/test-workflow-dryrun.R
- basic workflow is correct from different working directories .
- basic workflow is correct from different working directories . change_midstream
- basic workflow is correct from different working directories ..
- basic workflow is correct from different working directories .. change_midstream
- basic workflow is correct from different working directories model-examples
- basic workflow is correct from different working directories model-examples change_midstream
- Summary call is the same after changing directories .
- Summary call is the same after changing directories . change_midstream
- Summary call is the same after changing directories ..
- Summary call is the same after changing directories .. change_midstream
- Summary call is the same after changing directories model-examples
- Summary call is the same after changing directories model-examples change_midstream

yaml modification functions for adding tags, decisions

Summary

User should be able to make modifications to the model object and relevant YAML file with composable functions that take either the model object or a character string pointing to the YAML. The following fields should be able to be modified:

tags
decisions
based_on
description

Tests

tests/testthat/test-modify-model-field.R
- modify_model_field() works correctly
- modify_model_field() de-duplication works
- add_tags() and replace_tags() work correctly
- add_decisions() and replace_decisions() work correctly
- replace_description() works correctly
- reconcile_yaml() pulls in new tags

adding tags/decisions/etc without explicitly passing in vectors

Problem

Currently, add_tags() and other similar helper functions built on modify_model_field expect either a single character object or a vector of characters, e.g. mod %>% add_tags(c("tag1", "tag2")). It would be nice to use the following syntax instead: mod %>% add_tags("tag1", "tag2")

Proposed solution:

Use ... and typecast to character vector to evaluate all arguments as potential tags.

add_tags <- function(.mod, ...) {
  .mod <- modify_model_field(.mod = .mod,
                             .field = YAML_TAGS,
                             .value = c(...),
                             .append = TRUE)
  return(.mod)
}

Run NONMEM model by passing valid command to bbi

Summary

Constructs valid command structure (bbi nonmem run [model]) and passes to subprocess
Runs in either local or sge mode
Accepts all arguments that are valid in the babylon CLI and correctly parses them through to process
Optionally accepts a path to a babylon.yaml, falling back to looking in the model directory by default
Can accept the following as input
- file path to a model control stream file
- file path to a model YAML file
- S3 object of classbbi_nonmem_spec

Tests

tests/testthat/test-submit-model.R
- submit_model(.dry_run=T) returns correct command string
- submit_model(.dry_run=T) with .ctl input parses correctly
- submit_model(.dry_run=T) with .mod input and .yml file parses correctly
- submit_model(.dry_run=T) with file path no extension parses correctly
- submit_model(.dry_run=T) with bbi_nonmem_model object parses correctly
- submit_model(.dry_run=T) with numeric input parses correctly
- submit_models(.dry_run=T) errors with bad input
- submit_models(.dry_run=T) with character and numeric input yaml
- submit_models(.dry_run=T) with character input ctl
- submit_models(.dry_run=T) with list input simple
- submit_models(.dry_run=T) with list input, 2 arg sets
tests/testthat/test-workflow-bbi.R
- step by step create_model to submit_model to model_summary works
tests/testthat/test-utils.R
- check_nonmem_args parses correctly
- format_cmd_args parses correctly

custom file names

Summary

User should be able to specify a custom name for the .ext file when calling model_summary(). This should be passed in via .bbi_args().

Tests

tests/testthat/test-summary.R
- model_summary() works with custom .ext file
- model_summary() works with no .cor file
- model_summary() works with no .cov file
- model_summary() works with no .ext file
- model_summary() works with no .grd file
- model_summary() works with no .shk file

submit_model silently fails

Problem

Currently when you run submit_model() on a model which has already run (i.e. a results folder exists in your model directory), the model is not run. No warning is given that the model is not being submitted to run. The only way to get around this is to delete the results folder.

Proposed solution:

Adding a .overwrite argument to submit_model() would be useful for this situation. This argument would default to FALSE to prevent people from accidentally overwriting results
A warning should be given when submit_model(..., .overwrite=F) is run, but the model will not be submitted due to a results folder already existing

Build functions for filtering from run_log() output

Summary

The tibble output by run_log() currently contains many list columns (for example, tags and decisions) which can have multiple entries for a single row. This makes it difficult to interact with the log without doing some dplyr gymnastics. It is likely that the vctrs package could be a good approach for this.

Desired state

The run_log() output tibble should have methods like filter() (others?) that can operate on these list columns. For example:

Something like either run_log() %>% has_tag("iteration 3") or run_log() %>% filter("iteration 3" %in% tags) should return the rows in the run log that contain the tag "iteration 3". We need to decide which paradigm (implementing filter() method or having our own methods like has_tag()) is a better approach
We need to consider if there are other (non-filtering) methods we want to implement.

Implementation

As mentioned above, the vctrs package seems like it could be a good option for implementing this. Specifically, that run_log() should be modified to have columns that are vctrs objects instead of list columns. We should investigate that and, if it does seem like a good fit, make recommendations for where else this could be used.

Write submit_models (multiple models at once)

Summary

A user should be able to pass a list of model objects (or a character vector of paths) to submit_models() function. The implemented code should convert them to most efficiently call bbi in as few system calls as possible.

Tests

tests/testthat/test-submit-model.R
- submit_models(.dry_run=T) with list input simple
- submit_models(.dry_run=T) with list input, 2 arg sets
- submit_models(.dry_run=T) with character and numeric input yaml
- submit_models(.dry_run=T) errors with bad input
tests/testthat/test-workflow-bbi.R
- copying model works and new models run correctly

Monitor NONMEM output files

Summary

User should be able to access the following information with either a valid S3 _result object or a file path
Check whether the run has finished with a simple function call
Monitor the output directory by easily checking the following
- .lst file
- OUTPUT file
- .grd file
- .ext file
Easily list files in the output directory

Tests

tests/testthat/test-read-output.R
- check_file returns correctly
- check_file head={} tail={}
- tail_output() character dir
- tail_output() character file
- tail_output() res object
- tail_lst() character dir
- tail_lst() character file
- tail_lst() res object
- check_output_dir() character dir
- check_output_dir() res object
- check_output_dir() character dir with filter
- check_output_dir() res object with filter
- check_nonmem_table_output() output matches ref df
- check_ext() character dir default .iter_floor
- check_ext() res object default .iter_floor
- check_ext() character dir default .iter_floor NULL
- check_ext() res object .iter_floor NULL
- check_grd() character dir default .iter_floor
- check_grd() res object default .iter_floor
- check_grd() character dir default .iter_floor NULL
- check_grd() res object .iter_floor NULL
- check_nonmem_progress returns TRUE
- check_nonmem_progress returns FALSE

import and parse model results as json

Create a function with similar syntax to existing jsonlite::read_json() function which takes a json object output from Babylon as argument and creates a run_results object.

Proposed syntax:
res <- read_results("examples/101.json")
res <- read_results(101, dir="examples")

Input:
JSON file containing information from NONMEM run. This will be produced by a call to bbi nonmem summary --json

Output:
A list of lists for post-processing: theta_df, omega_df, sigma_df, run_details, etc.

Converting each of these to separate data.frames suitable for downstream use (e.g. get_estimate_table(), get_shrinkage_table()) will be part of future issues.

Tests

tests/testthat/test-summary.R
- model_summary.bbi_nonmem_model produces expected output
- model_summary.character produces expected output
- model_summary.numeric produces expected output
- model_summary() fails predictably if it can't find some parts (i.e. model isn't finished)
- model_summary() fails predictably if no .lst file present
tests/testthat/test-workflow-composable-bbi.R
- step by step create_model to submit_model to model_summary works

Copying models in typical workflow

Summary

A user should be able to call copy_model_from() without worrying about accidentally overwriting an existing model file.

Requirements

Existing files will not be overwritten by default.
There should be an argument exposed to let users choose to overwrite an existing file if they intend to.

Tests

tests/testthat/test-copy-model-from.R
- copy_from_model .overwrite=FALSE works
- copy_from_model .overwrite=TRUE works

migrate function for moving finished models to sub folder

#79 could just be a specialized case of this general functionality

Write model_summaries (multiple models at once)

Summary

As a user, I would like to be able to summarize multiple models in batch.

Tests

tests/testthat/test-model-summaries.R
- model_summaries.list produces expected output
- model_summaries.list fails with bad list
- model_summaries.character produces expected output
- model_summaries.numeric produces expected output
- model_summaries.bbi_run_log_df produces expected output

Model object should stay in sync with YAML file

Summary

The model YAML must serve as a "source of truth" for the data that it stores about the model. The user needs to be sure that any model object held in memory is up-to-date with the YAML on disk.

Requirements

Any function taking a model object will check against the yaml and error if the md5 hashes don't match, in other words "if the YAML has changed since it was loaded."
The error message mentions the reconcile_yaml() function that can be used to pull changes from the YAML into the model object.

Tests

tests/testthat/test-modify-model-field.R
- reconcile_yaml() pulls in new tags
- check_yaml_in_sync() passes when nothing has changed
- check_yaml_in_sync() fails when YAML has changed and passes after reconciled
- add_tags fails if it wasn't re-assigned previously (testing check_yaml_in_sync)
- submit_model() fails YAML out of sync (testing check_yaml_in_sync)
- model_summary() fails YAML out of sync (testing check_yaml_in_sync)
- copy_model_from() fails YAML out of sync (testing check_yaml_in_sync)

feature idea: document commented records

Just an idea that came out of our meeting today (@riggsmm and Adam L)

Something like: document records that get commented out at run time

read in the data
count by C
dump to csv file

Or maybe check to see if data set has changed and document then?

.wait = FALSE gives stdout missing error

Summary

Calling submit_model(.wait = FALSE) dies with the following error.

Error in strict_mode_error(err_msg) : 
  Process object must have the following named elements to be converted to an S3 object of class `babylon_process`: `process, stdout, bbi, cmd_args, working_dir` but the following keys are missing: `stdout`
Object has the following keys: process, bbi, cmd_args, working_dir

Tests

tests/testthat/test-workflow-bbi.R
- .wait = FALSE returns correctly

model_summary() Intermediate Printout

Problem:

When a model is still running and model_summary() is called on the results objects, the OUTPUT is tailed multiple times until either 1) the model completes, or 2) number of attempts (30?) is exceeded. This results in the tail of the OUTPUT being continuously printed to console.

 iteration          324 MCMCOBJ=    29416.952359567695     
 iteration          325 MCMCOBJ=    29413.373263451384     
 iteration          326 MCMCOBJ=    29490.598695438199     
 iteration          327 MCMCOBJ=    29430.406526993076     
 iteration          328 MCMCOBJ=    29416.239276600747     
 iteration          329 MCMCOBJ=    29428.134365432856     
 iteration          330 MCMCOBJ=    29445.970213943467     
 iteration          331 MCMCOBJ=    29435.262195693984     
---
Model is still running. Tail of `106/OUTPUT` file: 
---
 ...
 iteration          322 MCMCOBJ=    29399.495862674936     
 iteration          323 MCMCOBJ=    29437.381530674444     
 iteration          324 MCMCOBJ=    29416.952359567695     
 iteration          325 MCMCOBJ=    29413.373263451384     
 iteration          326 MCMCOBJ=    29490.598695438199     
 iteration          327 MCMCOBJ=    29430.406526993076     
 iteration          328 MCMCOBJ=    29416.239276600747     
 iteration          329 MCMCOBJ=    29428.134365432856     
 iteration          330 MCMCOBJ=    29445.970213943467     
 iteration          331 MCMCOBJ=    29435.262195693984     
---
Model is still running. Tail of `106/OUTPUT` file: 
---

Proposed solution:

If model is still running, tail of OUTPUT should only be printed a single time.

synchronize flags with snake case update in babylon

@shairozan can you think of a good way to test whether the flags here are in sync with what's in babylon? Like, is there a way for bbi --help to return something machine readable that I could test against?

@dpastoor is that kind of testing necessary? It does seem like we should have some way to make sure these flags stay in sync, but it's not obvious to me how to do it.

Check for babylon.yaml in model directory

Summary

The user should only need to create one babylon.yaml configuration file, and it should be found by all bbi calls that need it automatically (if it is in the model directory).

Requirements:

All calls that need babylon.yaml will look for it (and error if it can't be found) before calling out to bbi
When user does not explicitly specify a path to a babylon.yaml it will be looked for in getOption("rbabylon.model_directory")
If the user wants to use a different configuration file for some models, they can pass the path to that into the submit_model() calls explicitly.

Tests

tests/testthat/test-workflow-dryrun.R
- basic workflow is correct using rbabylon.model_directory = 'model-examples'
- basic workflow is correct using rbabylon.model_directory = 'model-examples'
- basic workflow is correct using rbabylon.model_directory = 'model-examples'
- basic workflow is correct using rbabylon.model_directory = 'model-examples' change_midstream
- basic workflow is correct using rbabylon.model_directory = 'model-examples' change_midstream
- basic workflow is correct using rbabylon.model_directory = 'model-examples' change_midstream
- summary call is the same using rbabylon.model_directory = 'model-examples' .
- summary call is the same using rbabylon.model_directory = 'model-examples' ..
- summary call is the same using rbabylon.model_directory = 'model-examples' model-examples
- summary call is the same using rbabylon.model_directory = 'model-examples' . change_midstream
- summary call is the same using rbabylon.model_directory = 'model-examples' .. change_midstream
- summary call is the same using rbabylon.model_directory = 'model-examples' model-examples change_midstream
- basic workflow is correct from different working directories .
- basic workflow is correct from different working directories ..
- basic workflow is correct from different working directories model-examples
- basic workflow is correct from different working directories . change_midstream
- basic workflow is correct from different working directories .. change_midstream
- basic workflow is correct from different working directories model-examples change_midstream
- Summary call is the same after changing directories .
- Summary call is the same after changing directories ..
- Summary call is the same after changing directories model-examples
- Summary call is the same after changing directories . change_midstream
- Summary call is the same after changing directories .. change_midstream
- Summary call is the same after changing directories model-examples change_midstream
tests/testthat/test-utils.R
- find_config_file_path() parses correctly 1
- find_config_file_path() parses correctly 2
- find_config_file_path() parses correctly 3
- find_config_file_path() errors correctly 1
- find_config_file_path() errors correctly 2

Parse model YAML files into run log table

Summary

User can parse all model YAML files in a directory (and optionally subdirectories) into a run log table.
Table correctly parses all required YAML fields:
- model_path -- character scaler with path to control stream
- description -- character scaler with text description of model
- based_on -- (optional) list of model run id's that are parents of model. If not provided, assumed to be a root node
- tags -- (optional) list of any user tags associated with model
- bbi_args -- (optional) list of arguments passed through to babylon for this model run
Table also has a run_id column that can be mapped to any mention of a given model in the based_on column.
Adding additional diagnostics about the model run is not in scope for this issue, but is intended as future functionality.

Tests

tests/testthat/test-run-log.R
- run_log fails after messing with YAML
- run_log matches reference
tests/testthat/test-workflow-bbi.R
- run_log() captures runs correctly

vignette link in README not currently working

The link at the bottom of the README page links to the below page, which cannot be found:

http://metrumresearchgroup.github.io/rbabylon/

Parse parameter labels from control stream

Summary

User should be able to parse parameter labels from control stream, similar to deprecated functionality from tidynm package.

Requirements

User should be able to parse correctly specified comments in the control stream into a tibble of labels for the parameter estimates.
The resulting tibble does not need to be exactly the same as would have been produced by tidynm, but the test cases from tidynm should be used as test cases for this new functionality
rbabylon should be able to parse the same data (labels, units, type) from the same control streams used by the tidynm tests

Tests

tests/testthat/test-param-labels.R
- parse_param_comment() called internally on PEX_BLOCK3
- parse_param_comment() called internally on PEX_BLOCK32
- parse_param_comment() called internally on PEX_BLOCK32S
- parse_param_comment() called internally on PEX_DBL_CMT
- parse_param_comment() called internally on PEX_KAT_ALL
- parse_param_comment() called internally on PEX_KAT_DBL2
- parse_param_comment() called internally on PEX_KAT_DBL2S
- parse_param_comment() called internally on PEX_KAT_PKPD
- parse_param_comment() called internally on PEX_SAM1
- param_labels.bbi_nonmem_model() %>% apply_indices() matches tidynm reference for 101
- param_labels.bbi_nonmem_model() %>% apply_indices() matches tidynm reference for 510
- param_labels.bbi_nonmem_model() %>% apply_indices() matches tidynm reference for 510_fixed
- param_labels.character errors on vector
- param_labels.character() %>% apply_indices() matches tidynm reference for 101
- param_labels.character() %>% apply_indices() matches tidynm reference for 510
- param_labels.character() %>% apply_indices() matches tidynm reference for 510_fixed

reminder to unset gfortran env for 74

per discussion

https://www.mail-archive.com/[email protected]/msg06885.html

Estimate of Sigma (2,1) is displayed as 0 in model when it shouldn't be displayed

In a model where the $SIGMA starting values were not specified as a block, displaying the the parameter estimates with the following command resulted in the table below. After talking with Matt and @callistosp the inclusion of SIGMA(2,1) is not the preferred behavior.

model_summary(mod104) %>%
  param_estimates()

names	estimate	stderr	fixed
OMEGA(3,3)	0.1693070	1.97401e-02	0
SIGMA(1,1)	0.0397749	1.25788e-03	0
SIGMA(2,1)	0.0000000	1.00000e+10	1
SIGMA(2,2)	0.8410060	2.87706e+00	0

In the .ctl file the starting values for $SIGMA were not specified as a block.

$SIGMA
0.05 ; 1 pro error
0.05 ; 2 add error

.lst file output

FINAL PARAMETER ESTIMATE

 SIGMA - CORR MATRIX FOR RANDOM EFFECTS - EPSILONS  ***


         EPS1      EPS2     
 
 EPS1
+        1.99E-01
 
 EPS2
+        0.00E+00  9.17E-01

STANDARD ERROR OF ESTIMATE

SIGMA - CORR MATRIX FOR RANDOM EFFECTS - EPSILONS  ***


         EPS1      EPS2     
 
 EPS1
+        3.15E-03
 
 EPS2
+       .........  1.57E+00

User can check which models are up-to-date from config_log

Summary

As a user, I would like the bbi_config_log_df tibble to contain two new columns which verify (using the md5 digests and file paths already present in that tibble) whether the file at the relevant file path matches the md5 digest stored in the tibble. If not, this would indicate that the model is "out-of-date" and should be refreshed in the appropriate way (i.e. probably re-run).

Tests

tests/testthat/test-config-log.R
- config_log() reflects model mismatch
- config_log() reflects data mismatch

root model directory hardening

additional testing/banging on using a single root, where the model path is relative to that root.

submit_model("subdir/model")

deprecate submit_model?

given there should be no reason you can't just give submit_models one model, is there a reason to keep submit_model

run log should store model objects as a column for further interaction(s)

case: user calls runlog to aggregate all models - they then want to extract key information out of model(s). Currently this could be done something along the lines of:

run_log() %>% pull(absolute_model_path) 
%>% map_df(~ read_model(.x) %>% model_summary()
)

In this case, the model yaml needs to be double processed. Its not that much time, however given the case of multiple pipes or otherwise could add up unnecessarily.

New:

run_log() %>% pull(.mod) %>% 
%>% map_df(~ .x %>% model_summary()
)

Better new given model_summary can handle vector/list of model objects:

run_log() %>% pull(.mod) %>% model_summary()