pharmar / riskassessment Goto Github PK

View Code? Open in Web Editor NEW

98.0 10.0 23.0 48.03 MB

Risk Assessment Demo App: https://rinpharma.shinyapps.io/riskassessment

Home Page: https://pharmar.github.io/riskassessment/

License: Other

R 94.69% CSS 1.23% Lua 0.03% TeX 3.65% HTML 0.04% JavaScript 0.36%

packages r shiny validation

riskassessment's Introduction

The `{riskassessment}` application

riskassessment is an R package containing a shiny front-end to augment the utility of the riskmetric package within an organizational context. We’re honored to announce this app was recently awarded the title for “Best App” at Shiny Conf 2023 (see Recognition section below).

riskmetric is a framework to quantify an R package’s “risk of use” by assessing a number of meaningful metrics designed to evaluate package development best practices, code documentation, community engagement, and development sustainability. Together, the riskassessment app and the riskmetric package aim to provide some context for validation within regulated industries.

The app extends the functionality of riskmetric by allowing the reviewer to:

analyze riskmetric output without the need to write code in R
contribute personalized comments on the value of individual metrics
categorize a package with an overall assessment (i.e., low, medium, or high risk) based on subjective opinions or after tabulating user(s) consensus after the evaluating metric output
download a static reports with the package risk, metrics outputs, and reviewer summary & comments, plus more
store assessments in a database for future viewing and historical backup
user authentication with privileges & admin-defined roles to manage users and tasks performed in the app

Echo-ing `{riskmetric}`’s Approach to Validation

Validation can serve as an umbrella for various terms, and admittedly, companies will diverge on what may be the “correct approach”. The riskassessment app is built on a rismetric-foundation, whose developers follow the validation philosophy proposed in this white paper published by the R Validation Hub. As such, the scope of riskassessment and riskmetric are only designed to support decision making from that view point. The full robustness and reliability of any software may (and likely will) require deeper inspection by the reviewing party.

Note: Development of both riskassessment and riskmetric were made possible thanks to the R Validation Hub, a collaboration to support the adoption of R within a biopharmaceutical regulatory setting.

Usage

If you are new to using the riskassessment app, welcome! We’d highly encourage you to start exploring the demo version of the app currently deployed on shinyapps.io. There, you’ll find a number of pre-loaded packages just waiting to be assessed. Hands on experience will help you become familiar with the general layout of the app as you poke around and explore.

With that said, you should immediately recognize that the app requires authentication, since it’s intended use is within an organization. There are several pre-defined roles, but the most important is the admin user. By default, the admin can add/delete users, download an entire copy of the database, and modify the metric weights used in calculation of risk scores, define custom decision categories, and automation rules base on risk scores. The demo version of the app tells prompts you with how to log in initially. However, if you launch the app.R file locally, the admin user will have to leverage the password QWERTY1 initially. If you log in with this credential, the app will immediately prompt you to change your password and repeat the the process with your new credentials.

If you want a quick overview of the project and demo of the application, we highly recommend watching the following video walk through from Shiny Conf 2023. At the conference, riskassessment was voted “best application” by conference attendees! The app was also featured at Rstudio::Global 2021.

Installation

We recommend to run/deploy this application in a controlled development environment. Of course, you can install the latest version from GitHub using the code below, but it doesn’t take into consideration other environment dependencies…

# DON'T RUN THIS CODE! There's a better way!
remotes::install_github("pharmaR/riskmetric")
remotes::install_github("pharmaR/riskassessment")

# Run the application 
riskassessment::run_app()

For example, what if you are using a different version of riskmetric than our dev team? Thus, the development team can’t guarantee it’s stability, and we recommend you clone the repo’s R project locally instead. Once cloned/ forked, run the following code in order to take advantage of our renv.lock file which set’s up the project dependencies:

# First, clone the repo from GitHub, then...
# Get dependcies synced using {renv}
renv::activate()
renv::restore()

After this step is complete, you can simply run the contents of app.R to launch and/or deploy the application with default settings! For more information on our dev philosophy as it pertains to package management, please read the “Using renv” article. Then, learn how to manipulate the app’s configuration away from the defaults by reading the guide on “Deployment” which covers how to use the app’s configuration file to tailor the app to your needs.

User Guides and User Feedback

We’re constantly improving the app and it’s documentation. Please explore the user guides that have been developed to date, available on the riskassessment documentation site. Be sure to read the ‘Get Started’ tab and perhaps another article or two!

Of course, if you ever have specific feedback for the developers, or if you encounter a problem / bug within the app, we recommend opening a new issue on GitHub and we’ll address it promptly.

We also want to align with our users on big picture, strategic topics. Specifically, we want to hear from groups who’ve built (or are currently building) their R-package validation process, whether you use riskmetric / riskassessment or not! Ideally, our goal is to form a consensus from companies regarding their validation approach so we can make riskmetric and riskassessment better. For example, we’d love to understand how users are currently weighting the metrics used to calculate a package’s risk score. We’d also love to learn if companies leverage certain risk score thresholds to make GxP environment inclusion (or exclusion) decisions for a package. To facilitate the gathering of this information, we’ve created an incredibly brief questionnaire to let us know where you’re at.

Deployment

As you might expect, certain deployment environments offer persistent storage and others do not. For example, shinyapps.io does not. That means that even our demo app that’s hosted on shinyapps.io contains a a package database that can’t be permanently altered. That’s not advantageous since an organization needs to continually add new packages, publish comments, and make decisions about packages. Thus, we’d recommend exploring these deployment options (which allow persistent storage):

Shiny Server
Posit Connect
ShinyProxy

For more information on each of these, we highly recommend reading our ‘Deployment’ article.

Recognition

In March 2023, Appsilon hosted the 2nd annual Shiny Conf which was fully virtual, boasting approximately 4k registrants. Aaron Clark, package maintainer and R Validation Hub Executive member, presented the {riskassessment} app work in the “Shiny Showcase” among 20+ other app submissions. At the end of the conference, {riskassessment} was awarded the title of “Best App” by popular vote.

riskassessment's People

Contributors

Stargazers

Watchers

riskassessment's Issues

Update code after covr_coverage is implemented in riskmetric

The covr_coverage hasn't been implemented yet. Once it is, we should review the app's code. In particular, we should review the following lines of code.

Saves coverage info to the db. Notice how the information is split, which it may no longer be correct.

https://github.com/pharmaR/risk_assessment/blob/d148fae2b3a469d712a95384ccf562516fdf5a31/Modules/dbupload.R#L182-L186

Reads coverage info from the db. Notice how the information is split, which it may no longer be correct.

https://github.com/pharmaR/risk_assessment/blob/d148fae2b3a469d712a95384ccf562516fdf5a31/Server/testing_metrics.R#L25

Investigate how to manage app authentication

Most common authentication protocols:

LDAP
Active Directory
Google OAuth
PAM
proxied authentication
passwords

Options:

Paid alternative: Shiny Server Pro can handle these protocols.
Open-source alternative: shinyproxy package.

Related issues: #14 (limit roles), #13 (restrict submission decision)

Download-all-reports section doesn't update

The section on downloading all reports doesn't update when another CSV file is uploaded.

How to reproduce the issue

Run the app.
Upload a CSV with a package not in the db.
Repeat step 2.

Package data stored in database is stale/ out of date

So, I haven't uploaded a csv of packages to the app in quite some time. So, when working on the CommunityUsageMetrics Tab (specifically, the Number of Downloads per Month plot) I noticed that when I selected the dplyr package, I was only getting results from as recent as July 2020. I confirmed this by just running the SQL query below, which produces the resulting data frame:

dplyr_cu <- db_fun("SELECT * FROM CommunityUsageMetrics WHERE cum_id ='dplyr'")

I had a hunch that the db only seeks "fresh data" once I upload dplyr in a CSV on the Upload Package tab. I tested re-uploading and nothing changed (probably because dplyr already exists in my db). However, I tried uploading a new package that doesn't yet exist in my db: stringr and then re-ran the sql query and found more recent data populated the table, up to Nov 2020:

stringr_cu <- db_fun("SELECT * FROM CommunityUsageMetrics WHERE cum_id ='stringr'")

The fact that the package data used by the app doesn't update may pose a problem for when app users return to packages uploaded a long time ago to re-assess their risk. All of their metrics could have changed after a few months - not just community usage metrics.

So the question becomes, when do update this information? We could determine if the package data stored in the db is old, and if so, render a button that says "Update Metrics" on the side panel & the db dashboard. I don't think we'd want to update metrics automatically since it would (1) waste resources and (2) completely erase the context in which the package was originally reviewed, if applicable.

Thoughts?

Automate adding custom metrics

So far the app shows only the metrics that are hard-coded in the app. However, we would like to show all the metrics on the package automatically.

Connect to a remote db?

Apparently yes, but not with SQLite
The good news is that MySQL is very similar to SQLite

Blog from Dean Attali: https://shiny.rstudio.com/articles/persistent-data-storage.html

Center justify name of app on 'History of Packages Reviewed' dashboard

Overall app review

Check the app in its entirety

The downloaded html report doesn't look like the preview. Make them match (won't do for now since the infoboxes from shinydashboards don't work well with rmarkdown).
Check wording in each tab.
Check wording in the help button and make changes accordingly.
Related issues: #20, #6.

Database dashboard doesn't update consistently

The package name, decision, and/or comments are not always updated on the database dashboard.

Testing Metrics

I tried looking at the testing metrics for several packages but this doesn't seem to work and only returns

Metric is not applicable for this source of package

Is this a feature yet to be added? ....Where is the app calling covr (or maybe riskmetric should be doing that)?

File mm_report.R repeats code from other tabs

Similar to #32.

i (?) - button next to Assessment Criteria

There's an i icon next to Assessment Criteria on the top left of the application. On hover the i produces a question mark but clicking the i doesn't result in anything. Is this intentional?

Make app more portable with `{renv}`

When @Robert-Krajcik & I opened this app for use, we had some package version conflicts which halted it from running locally on our machines. I'm wondering if this project would be more portable to future uses if it was packaged up in a container, as discussed in a recent R Medicine talk (see link below) for ultimate reproducibility. Thoughts?

https://github.com/vincentmajor/reproducible_RStudio_projects

Ability to export multiple reports in one go

Currently reports must be exported one at a time. This might form part of a wider 'overview' dashboard showing the status of all packages in the DB.

Package dependencies

Should we create a report for each dependency? Or should be we leave this decision to riskmetric?

The version of riskmetric doest appear when a report is ran

Launch Page should limit roles

Maybe use a drop-down list, or read a .csv to match the user id.
Or something else?

Replace observe(s) with observeEvent(s) in sidebar.R

observeEvent is more efficient

Selecting Testing Metrics tab returns error message

This is a bug, regardless of how we recode the testing_metrics.R and/or the tm_report.R

Warning in .testLength(param = num, len = 1, arg = arg) :
  NAs introduced by coercion
Warning: Error in if: missing value where TRUE/FALSE needed
  99: .testInterval
  98: amAngularGauge
  95: func
  82: origRenderFunc
  81: output$test_coverage
   1: runApp

Include info on how the risk is calculated

Add to the app (e.g. to the helper pop up) links to how the risk is calculated.

Allow other pkg versions besides most recent version

Fix versioning number

The version dropdown should display the version(s) uploaded by the user (not the most recent one of the package).
The db should be able to store different versions of the same package. Hence, some tables (like the Packageinfo one) should be modified accordingly (e.g. Packageinfo should have a primary key - package name - and a secondary key - package version). The update and insert may need to be updated.

File tm_report.R repeats code from other tabs

Similar to issues #32, #33.

Remove code duplication

The file Server/communityusage_metrics.R has hard-coded each month of the last year to display the chart "NUMBER OF DOWNLOADS IN PREVIOUS 11 MONTHS". It would be a good idea to remove this code duplication.

https://github.com/pharmaR/risk_assessment/blob/f1b7b8a6e0b3c8ceb337d72cb5036f164bdc0343/Server/communityusage_metrics.R#L111-L185

Improve UI for the Upload Package tab

Is it necessary to store a DB?

I would like to open to discussion the idea of not including the DB in the repo.

Downsides of having the DB as part of the repo:

unintentionally commit the DB with user information,
unnecessary extra file (we can always create the DB or use existing one when the app is ran), and
the end user will most likely use a DB stored somewhere else.

Asynchronous assessment

Make assessment asynchronous so that the user can see metrics on some of the packages while the rest loads.

Need to provide a quick configuration to accesss an external db

Instead of us writing into our own "silos" we should think about connecting to at least a department-wide db

Adding comment results in warning message

Warning in result_fetch(res@ptr, n = n) : SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().

In sidebar.R, function db_fun() needs to be replaced with db_ins()

Packages dashboard & multiple downloads

Multiple downloads reports & packages dashboard

Create a dashboard showing all packages on the db.
Allow selection of packages to download multiple reports.

(issue follows from discussion here #1)

Update wording throughout the app

Review the help text, headings, and content in general.

Update x-axis on `Number of Downloads` plot

I’ve found a solution that addresses our need for a line plot that operates more like timeline plot (ie allows for panning, etc). It appears plotly has those functionalities built-in! Please take a look and play with the widget on this page, which includes the code to reproduce. I also like how we can add buttons to the at the top for common timespans (like 1yr, 2yr, etc).

https://plotly.com/r/range-slider/

Currently, the plot looks like this.

Integration with riskmetric

Remove hardcoded commit number.
Remove webscrapping code and obtain any needed information from riskmetric.
Add new metrics on the 'Maintance Metrics' tab accordingly.
Fix broken code given this update.
Related issues: #49, #45, #42,

Default report name

The reports all default to the same name, 'Report.html'. Ideally the default name should contain the package name. For example, '[pkg_name]_[pkg_version]_risk_assessment'.

Upload/Save of a single package can take over 10s

Going back to branch master, deleting the database and loading the single dplyr package, I get:
Time difference of 11.43519 secs

Switch back to branch fix_versioningII, deleting the database and loading the same dplyr package
Time difference of 8.961269 secs
doing it again (there might be some internet delay time in here)
Time difference of 7.695863 secs

If I comment out these two functions:

          # metric_mm_tm_Info_upload_to_DB(new_package,new_version)
          # metric_cum_Info_upload_to_DB(new_package,new_version)

I can get: Time difference of 0.165879 secs
So clearly this is where the majority of time is being spent.

Suggest using the R profiler (Rprof) to study where the time is actually spent
and figure out what if anything we can do to reduce the elapsed times.

Center hover text within tabsetpanel

Modularize UI and Server code?

This is a stylistic question/proposal:

going through the source code it looks like each component within the application has a matching file name within the UI and Server folders. I don't think it would be difficult to convert these files to shiny modules to be conditionally rendered within app.R but this is assuming we'd want to adopt a shiny module framework.

Thoughts?

return code from dbExecute ?

From: https://rdrr.io/cran/DBI/man/dbExecute.html

dbExecute() always returns a scalar numeric that specifies the number of rows affected by the
statement. An error is raised when issuing a statement over a closed or invalid connection, if the
syntax of the statement is invalid, or if the statement is not a non-NA string.

Example:

con <- dbConnect(RSQLite::SQLite(), ":memory:")

dbWriteTable(con, "cars", head(cars, 3))
dbReadTable(con, "cars")   # there are 3 rows
rc <- dbExecute(
  con,
  "INSERT INTO cars (speed, dist) VALUES (1, 1), (2, 2), (3, 3)"
)
print(paste("rc was",rc))
# [1] "rc was 3"
dbReadTable(con, "cars")   # there are now 6 rows
  speed dist
dbDisconnect(con)

So there's no "return code" per se

Loading a non-existent package yields many un-helpful warning messages.

I had a non-existent package () in my packages.csv file
From the console:

> runApp()
[1] "Log file set to loggit.json"

Listening on http://127.0.0.1:4108
{"timestamp": "2020-09-10T16:39:53-0400", "log_lvl": "ERROR", "log_msg": "Error in extracting general info of the package Cumulonimbus info Error in open.connection(x__COMMA__ __DBLQUOTE__rb__DBLQUOTE__): HTTP error 404.__LF__", "app": "fileupload-webscraping"}
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in for (i in 1:length(args)) { :
  closing unused connection 3 (https://cran.r-project.org/web/packages/Cumulonimbus)
{"timestamp": "2020-09-10T16:39:55-0400", "log_lvl": "ERROR", "log_msg": "Error in extracting cum metric info of the package: Cumulonimbus info Error in open.connection(x__COMMA__ __DBLQUOTE__rb__DBLQUOTE__): HTTP error 404.__LF__", "app": "fileupload-webscraping"}

I'm thinking it would be better to compare against the packages the user has already installed.
If it's not installed, the record is dropped and no 404 error messages get produced as dbupload.R won't see them.
I am also supplying the version of the package the user installed. Maybe add a column for the latest version of the package that dbupload.R collects?

in uploadpackage.R

  # names(pkgs_file) <- tolower(names(pkgs_file))
  pkgs_file$package <- trimws(pkgs_file$package)
  print(paste(pkgs_file$package, collapse = ","))
  pkgs_file$version <- trimws(pkgs_file$version)
  # Check the version for the packages you have installed
  for (i in 1:length(pkgs_file$package)) {
    if (pkgs_file$package[i] %in% installed.packages()) {
    pkgs_file$version[i] <- gsub("'",'"',packageVersion(pkgs_file$package[i]))
    } else {
      message(paste("Package",pkgs_file$package[i],"not installed. Check your spelling."))
      pkgs_file <- pkgs_file[-i,]
    }
  }
  # pkgs_file$version <- packageVersion(pkgs_file$package)

How to convert SQLite to either MySQL or MariaDB

Add weights to the app

Add functionality to the app so that the user can add weights to metrics.

Improve Name for 'Package Review History' Dashboard

Currently, we are using two words that are undesirable: ** History ** and Reviewed which imply that the package(s) have already been checked by someone, when in reality, we're just displaying all packages that have been uploaded to the db.

Potential alternatives could be something similar to below:

"All Uploaded Packages"
"Previously Uploaded Packages"

These terms show up in two places that need to be changed:

Make "Back" button more obvious on 'History of Reviewed Packages' screen

I was having trouble attempting (upon uncommenting this line of code) to left justify the icon on the actionButton while also expanding the size using "fa-2x", but now it seems to work! My mistake! Although it's not center justified vertically.

Using SQLite command line

Take a look at https://www.sqlitetutorial.net/download-install-sqlite/
Download sqlite-tools-win32-x86-3330000.zip
Unzip to C:\sqlite -- No IT support needed.
Look at https://www.sqlitetutorial.net/sqlite-commands
Go to command line
cd C:\sqlite and type: sqlite3

showing some commands below
Note forward-slashes are used when opening the database

.open "C:/Users/rkrajcik/OneDrive - Biogen/Documents/R/RiskAssessment/risk_assessment-master/database.sqlite"
Try clicking on the image to see it better.

Warning message There are 1 result in use. The connection will be released when they are closed

This occurs in module DB.R

See the following discussion on StackOverflow
https://stackoverflow.com/questions/51213886/disconnect-dbi-rsqlite-within-a-function-in-r

SQL queries are typically a three-step process (ignoring connection management):

send the query, accept the returned "result" object
fetch the result using the "result" object
clear the "result" object

The third step is important, as uncleared it represents resources that are being held for that query. Some database connections do not permit multiple simultaneous uncleared results, just one query at a time.

Instead of coding

con <- dbConnect(RSQLite::SQLite(), "./risk_assessment_app.db")
q<-dbSendQuery(con, "select * from MaintenanceMetrics")
q<-dbFetch(q)
q
dbDisconnect(con)

It should be coded using dbClearResult() before dbDisconnect()

con <- dbConnect(RSQLite::SQLite(), "./risk_assessment_app.db")
res<-dbSendQuery(con, "select * from MaintenanceMetrics")
q<-dbFetch(res)
q
dbClearResult(res)
dbDisconnect(con)

or
use dbGetQuery() like this:

con <- dbConnect(RSQLite::SQLite(), "./risk_assessment_app.db")
q<-dbGetQuery(con, "select * from MaintenanceMetrics")
dbDisconnect(con)