Coder Social home page Coder Social logo

pharmar / riskassessment Goto Github PK

View Code? Open in Web Editor NEW
98.0 10.0 23.0 48.03 MB

Risk Assessment Demo App: https://rinpharma.shinyapps.io/riskassessment

Home Page: https://pharmar.github.io/riskassessment/

License: Other

R 94.69% CSS 1.23% Lua 0.03% TeX 3.65% HTML 0.04% JavaScript 0.36%
packages r shiny validation

riskassessment's Introduction

The {riskassessment} application

pharmaverse Lifecycle: experimental R-CMD-check Coverage status

riskassessment is an R package containing a shiny front-end to augment the utility of the riskmetric package within an organizational context. We’re honored to announce this app was recently awarded the title for “Best App” at Shiny Conf 2023 (see Recognition section below).


{riskassessment} app

riskmetric is a framework to quantify an R package’s “risk of use” by assessing a number of meaningful metrics designed to evaluate package development best practices, code documentation, community engagement, and development sustainability. Together, the riskassessment app and the riskmetric package aim to provide some context for validation within regulated industries.

The app extends the functionality of riskmetric by allowing the reviewer to:

  • analyze riskmetric output without the need to write code in R
  • contribute personalized comments on the value of individual metrics
  • categorize a package with an overall assessment (i.e., low, medium, or high risk) based on subjective opinions or after tabulating user(s) consensus after the evaluating metric output
  • download a static reports with the package risk, metrics outputs, and reviewer summary & comments, plus more
  • store assessments in a database for future viewing and historical backup
  • user authentication with privileges & admin-defined roles to manage users and tasks performed in the app

Echo-ing {riskmetric}’s Approach to Validation

Validation can serve as an umbrella for various terms, and admittedly, companies will diverge on what may be the “correct approach”. The riskassessment app is built on a rismetric-foundation, whose developers follow the validation philosophy proposed in this white paper published by the R Validation Hub. As such, the scope of riskassessment and riskmetric are only designed to support decision making from that view point. The full robustness and reliability of any software may (and likely will) require deeper inspection by the reviewing party.

Note: Development of both riskassessment and riskmetric were made possible thanks to the R Validation Hub, a collaboration to support the adoption of R within a biopharmaceutical regulatory setting.

Usage

If you are new to using the riskassessment app, welcome! We’d highly encourage you to start exploring the demo version of the app currently deployed on shinyapps.io. There, you’ll find a number of pre-loaded packages just waiting to be assessed. Hands on experience will help you become familiar with the general layout of the app as you poke around and explore.

With that said, you should immediately recognize that the app requires authentication, since it’s intended use is within an organization. There are several pre-defined roles, but the most important is the admin user. By default, the admin can add/delete users, download an entire copy of the database, and modify the metric weights used in calculation of risk scores, define custom decision categories, and automation rules base on risk scores. The demo version of the app tells prompts you with how to log in initially. However, if you launch the app.R file locally, the admin user will have to leverage the password QWERTY1 initially. If you log in with this credential, the app will immediately prompt you to change your password and repeat the the process with your new credentials.

If you want a quick overview of the project and demo of the application, we highly recommend watching the following video walk through from Shiny Conf 2023. At the conference, riskassessment was voted “best application” by conference attendees! The app was also featured at Rstudio::Global 2021.


riskassessment at shinyConf 2023

Installation

We recommend to run/deploy this application in a controlled development environment. Of course, you can install the latest version from GitHub using the code below, but it doesn’t take into consideration other environment dependencies…

# DON'T RUN THIS CODE! There's a better way!
remotes::install_github("pharmaR/riskmetric")
remotes::install_github("pharmaR/riskassessment")

# Run the application 
riskassessment::run_app()

For example, what if you are using a different version of riskmetric than our dev team? Thus, the development team can’t guarantee it’s stability, and we recommend you clone the repo’s R project locally instead. Once cloned/ forked, run the following code in order to take advantage of our renv.lock file which set’s up the project dependencies:

# First, clone the repo from GitHub, then...
# Get dependcies synced using {renv}
renv::activate()
renv::restore()

After this step is complete, you can simply run the contents of app.R to launch and/or deploy the application with default settings! For more information on our dev philosophy as it pertains to package management, please read the “Using renv article. Then, learn how to manipulate the app’s configuration away from the defaults by reading the guide on “Deployment” which covers how to use the app’s configuration file to tailor the app to your needs.

User Guides and User Feedback

We’re constantly improving the app and it’s documentation. Please explore the user guides that have been developed to date, available on the riskassessment documentation site. Be sure to read the ‘Get Started’ tab and perhaps another article or two!

Of course, if you ever have specific feedback for the developers, or if you encounter a problem / bug within the app, we recommend opening a new issue on GitHub and we’ll address it promptly.

We also want to align with our users on big picture, strategic topics. Specifically, we want to hear from groups who’ve built (or are currently building) their R-package validation process, whether you use riskmetric / riskassessment or not! Ideally, our goal is to form a consensus from companies regarding their validation approach so we can make riskmetric and riskassessment better. For example, we’d love to understand how users are currently weighting the metrics used to calculate a package’s risk score. We’d also love to learn if companies leverage certain risk score thresholds to make GxP environment inclusion (or exclusion) decisions for a package. To facilitate the gathering of this information, we’ve created an incredibly brief questionnaire to let us know where you’re at.

Deployment

As you might expect, certain deployment environments offer persistent storage and others do not. For example, shinyapps.io does not. That means that even our demo app that’s hosted on shinyapps.io contains a a package database that can’t be permanently altered. That’s not advantageous since an organization needs to continually add new packages, publish comments, and make decisions about packages. Thus, we’d recommend exploring these deployment options (which allow persistent storage):

  • Shiny Server

  • Posit Connect

  • ShinyProxy

For more information on each of these, we highly recommend reading our ‘Deployment’ article.

Recognition

In March 2023, Appsilon hosted the 2nd annual Shiny Conf which was fully virtual, boasting approximately 4k registrants. Aaron Clark, package maintainer and R Validation Hub Executive member, presented the {riskassessment} app work in the “Shiny Showcase” among 20+ other app submissions. At the end of the conference, {riskassessment} was awarded the title of “Best App” by popular vote.


Shiny Conf 2023 winner

riskassessment's People

Contributors

aaron-clark avatar anjanadevikondisetti avatar aravindfl1412 avatar borgmaan avatar eduardodudu avatar imran3004 avatar jeff-thompson12 avatar marlycormar avatar mayagans avatar narayanan-iyer-pfizer avatar pandeyfission avatar robert-krajcik avatar scottschumacker avatar xyarz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

riskassessment's Issues

Update code after covr_coverage is implemented in riskmetric

The covr_coverage hasn't been implemented yet. Once it is, we should review the app's code. In particular, we should review the following lines of code.

Saves coverage info to the db. Notice how the information is split, which it may no longer be correct.

https://github.com/pharmaR/risk_assessment/blob/d148fae2b3a469d712a95384ccf562516fdf5a31/Modules/dbupload.R#L182-L186

Reads coverage info from the db. Notice how the information is split, which it may no longer be correct.

https://github.com/pharmaR/risk_assessment/blob/d148fae2b3a469d712a95384ccf562516fdf5a31/Server/testing_metrics.R#L25

Download-all-reports section doesn't update

The section on downloading all reports doesn't update when another CSV file is uploaded.

How to reproduce the issue

  1. Run the app.
  2. Upload a CSV with a package not in the db.
  3. Repeat step 2.

Package data stored in database is stale/ out of date

So, I haven't uploaded a csv of packages to the app in quite some time. So, when working on the CommunityUsageMetrics Tab (specifically, the Number of Downloads per Month plot) I noticed that when I selected the dplyr package, I was only getting results from as recent as July 2020. I confirmed this by just running the SQL query below, which produces the resulting data frame:

dplyr_cu <- db_fun("SELECT * FROM CommunityUsageMetrics WHERE cum_id ='dplyr'")

image

I had a hunch that the db only seeks "fresh data" once I upload dplyr in a CSV on the Upload Package tab. I tested re-uploading and nothing changed (probably because dplyr already exists in my db). However, I tried uploading a new package that doesn't yet exist in my db: stringr and then re-ran the sql query and found more recent data populated the table, up to Nov 2020:

stringr_cu <- db_fun("SELECT * FROM CommunityUsageMetrics WHERE cum_id ='stringr'")

image

The fact that the package data used by the app doesn't update may pose a problem for when app users return to packages uploaded a long time ago to re-assess their risk. All of their metrics could have changed after a few months - not just community usage metrics.

So the question becomes, when do update this information? We could determine if the package data stored in the db is old, and if so, render a button that says "Update Metrics" on the side panel & the db dashboard. I don't think we'd want to update metrics automatically since it would (1) waste resources and (2) completely erase the context in which the package was originally reviewed, if applicable.

Thoughts?

Automate adding custom metrics

So far the app shows only the metrics that are hard-coded in the app. However, we would like to show all the metrics on the package automatically.

Overall app review

Check the app in its entirety

  • The downloaded html report doesn't look like the preview. Make them match (won't do for now since the infoboxes from shinydashboards don't work well with rmarkdown).
  • Check wording in each tab.
  • Check wording in the help button and make changes accordingly.
  • Related issues: #20, #6.

Testing Metrics

I tried looking at the testing metrics for several packages but this doesn't seem to work and only returns

Metric is not applicable for this source of package

Is this a feature yet to be added? ....Where is the app calling covr (or maybe riskmetric should be doing that)?

i (?) - button next to Assessment Criteria

There's an i icon next to Assessment Criteria on the top left of the application. On hover the i produces a question mark but clicking the i doesn't result in anything. Is this intentional?

Package dependencies

Should we create a report for each dependency? Or should be we leave this decision to riskmetric?

Selecting Testing Metrics tab returns error message

This is a bug, regardless of how we recode the testing_metrics.R and/or the tm_report.R

Warning in .testLength(param = num, len = 1, arg = arg) :
  NAs introduced by coercion
Warning: Error in if: missing value where TRUE/FALSE needed
  99: .testInterval
  98: amAngularGauge
  95: func
  82: origRenderFunc
  81: output$test_coverage
   1: runApp

Allow other pkg versions besides most recent version

Fix versioning number

  • The version dropdown should display the version(s) uploaded by the user (not the most recent one of the package).
  • The db should be able to store different versions of the same package. Hence, some tables (like the Packageinfo one) should be modified accordingly (e.g. Packageinfo should have a primary key - package name - and a secondary key - package version). The update and insert may need to be updated.

Is it necessary to store a DB?

I would like to open to discussion the idea of not including the DB in the repo.

Downsides of having the DB as part of the repo:

  • unintentionally commit the DB with user information,
  • unnecessary extra file (we can always create the DB or use existing one when the app is ran), and
  • the end user will most likely use a DB stored somewhere else.

Asynchronous assessment

Make assessment asynchronous so that the user can see metrics on some of the packages while the rest loads.

Adding comment results in warning message

Warning in result_fetch(res@ptr, n = n) : SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().

In sidebar.R, function db_fun() needs to be replaced with db_ins()

Packages dashboard & multiple downloads

Multiple downloads reports & packages dashboard

  • Create a dashboard showing all packages on the db.
  • Allow selection of packages to download multiple reports.

(issue follows from discussion here #1)

Update x-axis on `Number of Downloads` plot

I’ve found a solution that addresses our need for a line plot that operates more like timeline plot (ie allows for panning, etc). It appears plotly has those functionalities built-in! Please take a look and play with the widget on this page, which includes the code to reproduce. I also like how we can add buttons to the at the top for common timespans (like 1yr, 2yr, etc).

https://plotly.com/r/range-slider/

Currently, the plot looks like this.

Integration with riskmetric

Integration with riskmetric

  • Remove hardcoded commit number.
  • Remove webscrapping code and obtain any needed information from riskmetric.
  • Add new metrics on the 'Maintance Metrics' tab accordingly.
  • Fix broken code given this update.
  • Related issues: #49, #45, #42,

Default report name

The reports all default to the same name, 'Report.html'. Ideally the default name should contain the package name. For example, '[pkg_name]_[pkg_version]_risk_assessment'.

Upload/Save of a single package can take over 10s

Going back to branch master, deleting the database and loading the single dplyr package, I get:
Time difference of 11.43519 secs

Switch back to branch fix_versioningII, deleting the database and loading the same dplyr package
Time difference of 8.961269 secs
doing it again (there might be some internet delay time in here)
Time difference of 7.695863 secs

If I comment out these two functions:

          # metric_mm_tm_Info_upload_to_DB(new_package,new_version)
          # metric_cum_Info_upload_to_DB(new_package,new_version)

I can get: Time difference of 0.165879 secs
So clearly this is where the majority of time is being spent.

Suggest using the R profiler (Rprof) to study where the time is actually spent
and figure out what if anything we can do to reduce the elapsed times.

Modularize UI and Server code?

This is a stylistic question/proposal:

going through the source code it looks like each component within the application has a matching file name within the UI and Server folders. I don't think it would be difficult to convert these files to shiny modules to be conditionally rendered within app.R but this is assuming we'd want to adopt a shiny module framework.

Thoughts?

return code from dbExecute ?

From: https://rdrr.io/cran/DBI/man/dbExecute.html

dbExecute() always returns a scalar numeric that specifies the number of rows affected by the
statement. An error is raised when issuing a statement over a closed or invalid connection, if the
syntax of the statement is invalid, or if the statement is not a non-NA string.

Example:

con <- dbConnect(RSQLite::SQLite(), ":memory:")

dbWriteTable(con, "cars", head(cars, 3))
dbReadTable(con, "cars")   # there are 3 rows
rc <- dbExecute(
  con,
  "INSERT INTO cars (speed, dist) VALUES (1, 1), (2, 2), (3, 3)"
)
print(paste("rc was",rc))
# [1] "rc was 3"
dbReadTable(con, "cars")   # there are now 6 rows
  speed dist
dbDisconnect(con)

So there's no "return code" per se

Loading a non-existent package yields many un-helpful warning messages.

I had a non-existent package () in my packages.csv file
From the console:

> runApp()
[1] "Log file set to loggit.json"

Listening on http://127.0.0.1:4108
{"timestamp": "2020-09-10T16:39:53-0400", "log_lvl": "ERROR", "log_msg": "Error in extracting general info of the package Cumulonimbus info Error in open.connection(x__COMMA__ __DBLQUOTE__rb__DBLQUOTE__): HTTP error 404.__LF__", "app": "fileupload-webscraping"}
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in error_handler(x, ...) :
  no available scoring algorithm for metric of class "pkg_metric_error", returning default score of 0.
Warning in for (i in 1:length(args)) { :
  closing unused connection 3 (https://cran.r-project.org/web/packages/Cumulonimbus)
{"timestamp": "2020-09-10T16:39:55-0400", "log_lvl": "ERROR", "log_msg": "Error in extracting cum metric info of the package: Cumulonimbus info Error in open.connection(x__COMMA__ __DBLQUOTE__rb__DBLQUOTE__): HTTP error 404.__LF__", "app": "fileupload-webscraping"}

I'm thinking it would be better to compare against the packages the user has already installed.
If it's not installed, the record is dropped and no 404 error messages get produced as dbupload.R won't see them.
I am also supplying the version of the package the user installed. Maybe add a column for the latest version of the package that dbupload.R collects?

in uploadpackage.R

  # names(pkgs_file) <- tolower(names(pkgs_file))
  pkgs_file$package <- trimws(pkgs_file$package)
  print(paste(pkgs_file$package, collapse = ","))
  pkgs_file$version <- trimws(pkgs_file$version)
  # Check the version for the packages you have installed
  for (i in 1:length(pkgs_file$package)) {
    if (pkgs_file$package[i] %in% installed.packages()) {
    pkgs_file$version[i] <- gsub("'",'"',packageVersion(pkgs_file$package[i]))
    } else {
      message(paste("Package",pkgs_file$package[i],"not installed. Check your spelling."))
      pkgs_file <- pkgs_file[-i,]
    }
  }
  # pkgs_file$version <- packageVersion(pkgs_file$package)

Improve Name for 'Package Review History' Dashboard

Currently, we are using two words that are undesirable: ** History ** and Reviewed which imply that the package(s) have already been checked by someone, when in reality, we're just displaying all packages that have been uploaded to the db.

Potential alternatives could be something similar to below:

  • "All Uploaded Packages"
  • "Previously Uploaded Packages"

These terms show up in two places that need to be changed:

image

image

Using SQLite command line

showing some commands below
Note forward-slashes are used when opening the database

.open "C:/Users/rkrajcik/OneDrive - Biogen/Documents/R/RiskAssessment/risk_assessment-master/database.sqlite"
Try clicking on the image to see it better.

image

Warning message There are 1 result in use. The connection will be released when they are closed

This occurs in module DB.R

See the following discussion on StackOverflow
https://stackoverflow.com/questions/51213886/disconnect-dbi-rsqlite-within-a-function-in-r

SQL queries are typically a three-step process (ignoring connection management):

  • send the query, accept the returned "result" object
  • fetch the result using the "result" object
  • clear the "result" object

The third step is important, as uncleared it represents resources that are being held for that query. Some database connections do not permit multiple simultaneous uncleared results, just one query at a time.

Instead of coding

con <- dbConnect(RSQLite::SQLite(), "./risk_assessment_app.db")
q<-dbSendQuery(con, "select * from MaintenanceMetrics")
q<-dbFetch(q)
q
dbDisconnect(con)

It should be coded using dbClearResult() before dbDisconnect()

con <- dbConnect(RSQLite::SQLite(), "./risk_assessment_app.db")
res<-dbSendQuery(con, "select * from MaintenanceMetrics")
q<-dbFetch(res)
q
dbClearResult(res)
dbDisconnect(con)

or
use dbGetQuery() like this:

con <- dbConnect(RSQLite::SQLite(), "./risk_assessment_app.db")
q<-dbGetQuery(con, "select * from MaintenanceMetrics")
dbDisconnect(con)

Remove code duplication

There is a LOT of code duplication in the app. We should create functions/modules to address this issue.

Related open issues
#32
#33
#34

File cum_report.R repeats code from other tabs

As pointed out by @MayaGans, the files cum_report.R (inside UI and Server) repeat code from other tabs, including communityusage_metrics.R. Ideally, we should have modules (or even functions) that we can reuse as needed throughout the app.

It is not hard to find where the duplication occurs, e.g., the variables are named no_of_downloads vs no_of_downloads1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.