rsquaredacademy / blorr Goto Github PK

View Code? Open in Web Editor NEW

17.0 4.0 3.0 10.24 MB

Tools for developing binary logistic regression models

Home Page: https://blorr.rsquaredacademy.com/

License: Other

R 100.00%

logistic-regression-models regression rstats

blorr's Introduction

blorr

Tools for building binary logistic regression models

Overview

Tools designed to make it easier for users, particularly beginner/intermediate R users to build logistic regression models. Includes comprehensive regression output, variable selection procedures, model validation techniques and a ‘shiny’ app for interactive model building.

Installation

# Install blorr from CRAN
install.packages("blorr")

# Install development version from GitHub
# install.packages("devtools")
devtools::install_github("rsquaredacademy/blorr")

# Install the development version from `rsquaredacademy` universe
install.packages("blorr", repos = "https://rsquaredacademy.r-universe.dev")

Articles

A Short Introduction to the blorr Package

Usage

blorr uses consistent prefix blr_* for easy tab completion.

library(blorr)
library(magrittr)

Bivariate Analysis

blr_bivariate_analysis(hsb2, honcomp, female, prog, race, schtyp)
#>                          Bivariate Analysis                           
#> ---------------------------------------------------------------------
#> Variable    Information Value    LR Chi Square    LR DF    LR p-value 
#> ---------------------------------------------------------------------
#>  female           0.10              3.9350          1        0.0473   
#>   prog            0.43              16.1450         2        3e-04    
#>   race            0.33              11.3694         3        0.0099   
#>  schtyp           0.00              0.0445          1        0.8330   
#> ---------------------------------------------------------------------

Weight of Evidence & Information Value

blr_woe_iv(hsb2, prog, honcomp)
#>                            Weight of Evidence                             
#> -------------------------------------------------------------------------
#> levels    count_0s    count_1s    dist_0s    dist_1s        woe      iv   
#> -------------------------------------------------------------------------
#>   1          38          7           0.26       0.13       0.67     0.08  
#>   2          65          40          0.44       0.75      -0.53     0.17  
#>   3          44          6           0.30       0.11       0.97     0.18  
#> -------------------------------------------------------------------------
#> 
#>       Information Value       
#> -----------------------------
#> Variable    Information Value 
#> -----------------------------
#>   prog           0.4329       
#> -----------------------------

Model

# create model using glm
model <- glm(honcomp ~ female + read + science, data = hsb2,
             family = binomial(link = 'logit'))

Regression Output

blr_regress(model)
#>                              Model Overview                              
#> ------------------------------------------------------------------------
#> Data Set    Resp Var    Obs.    Df. Model    Df. Residual    Convergence 
#> ------------------------------------------------------------------------
#>   data      honcomp     200        199           196            TRUE     
#> ------------------------------------------------------------------------
#> 
#>                     Response Summary                     
#> --------------------------------------------------------
#> Outcome        Frequency        Outcome        Frequency 
#> --------------------------------------------------------
#>    0              147              1              53     
#> --------------------------------------------------------
#> 
#>                   Maximum Likelihood Estimates                    
#> -----------------------------------------------------------------
#>  Parameter     DF    Estimate    Std. Error    z value    Pr(>|z|) 
#> -----------------------------------------------------------------
#> (Intercept)    1     -12.7772       1.9755    -6.4677      0.0000 
#>   female1      1      1.4825        0.4474     3.3139       9e-04 
#>    read        1      0.1035        0.0258     4.0186       1e-04 
#>   science      1      0.0948        0.0305     3.1129      0.0019 
#> -----------------------------------------------------------------
#> 
#>  Association of Predicted Probabilities and Observed Responses  
#> ---------------------------------------------------------------
#> % Concordant          0.8561          Somers' D        0.7147   
#> % Discordant          0.1425          Gamma            0.7136   
#> % Tied                0.0014          Tau-a            0.2794   
#> Pairs                  7791           c                0.8568   
#> ---------------------------------------------------------------

Model Fit Statistics

blr_model_fit_stats(model)
#>                               Model Fit Statistics                                
#> ---------------------------------------------------------------------------------
#> Log-Lik Intercept Only:      -115.644    Log-Lik Full Model:              -80.118 
#> Deviance(196):                160.236    LR(3):                            71.052 
#>                                          Prob > LR:                         0.000 
#> MCFadden's R2                   0.307    McFadden's Adj R2:                 0.273 
#> ML (Cox-Snell) R2:              0.299    Cragg-Uhler(Nagelkerke) R2:        0.436 
#> McKelvey & Zavoina's R2:        0.518    Efron's R2:                        0.330 
#> Count R2:                       0.810    Adj Count R2:                      0.283 
#> BIC:                          181.430    AIC:                             168.236 
#> ---------------------------------------------------------------------------------

Confusion Matrix

blr_confusion_matrix(model)
#> Confusion Matrix and Statistics 
#> 
#>           Reference
#> Prediction   0   1
#>          0 135  26
#>          1  12  27
#> 
#> 
#>                 Accuracy : 0.8100 
#>      No Information Rate : 0.7350 
#> 
#>                    Kappa : 0.4673 
#> 
#> McNemars's Test P-Value  : 0.0350 
#> 
#>              Sensitivity : 0.5094 
#>              Specificity : 0.9184 
#>           Pos Pred Value : 0.6923 
#>           Neg Pred Value : 0.8385 
#>               Prevalence : 0.2650 
#>           Detection Rate : 0.1350 
#>     Detection Prevalence : 0.1950 
#>        Balanced Accuracy : 0.7139 
#>                Precision : 0.6923 
#>                   Recall : 0.5094 
#> 
#>         'Positive' Class : 1

Hosmer Lemeshow Test

blr_test_hosmer_lemeshow(model)
#>            Partition for the Hosmer & Lemeshow Test            
#> --------------------------------------------------------------
#>                         def = 1                 def = 0        
#> Group    Total    Observed    Expected    Observed    Expected 
#> --------------------------------------------------------------
#>   1       20         0          0.16         20        19.84   
#>   2       20         0          0.53         20        19.47   
#>   3       20         2          0.99         18        19.01   
#>   4       20         1          1.64         19        18.36   
#>   5       21         3          2.72         18        18.28   
#>   6       19         3          4.05         16        14.95   
#>   7       20         7          6.50         13        13.50   
#>   8       20         10         8.90         10        11.10   
#>   9       20         13        11.49         7          8.51   
#>  10       20         14        16.02         6          3.98   
#> --------------------------------------------------------------
#> 
#>      Goodness of Fit Test      
#> ------------------------------
#> Chi-Square    DF    Pr > ChiSq 
#> ------------------------------
#>   4.4998      8       0.8095   
#> ------------------------------

Gains Table

blr_gains_table(model)
#>    decile total  1  0       ks tp  tn  fp fn sensitivity specificity accuracy
#> 1       1    20 14  6 22.33346 14 141   6 39    26.41509    95.91837     77.5
#> 2       2    20 13  7 42.09986 27 134  13 26    50.94340    91.15646     80.5
#> 3       3    20 10 10 54.16506 37 124  23 16    69.81132    84.35374     80.5
#> 4       4    20  7 13 58.52907 44 111  36  9    83.01887    75.51020     77.5
#> 5       5    20  3 17 52.62482 47  94  53  6    88.67925    63.94558     70.5
#> 6       6    20  3 17 46.72058 50  77  70  3    94.33962    52.38095     63.5
#> 7       7    20  1 19 35.68220 51  58  89  2    96.22642    39.45578     54.5
#> 8       8    20  2 18 27.21088 53  40 107  0   100.00000    27.21088     46.5
#> 9       9    20  0 20 13.60544 53  20 127  0   100.00000    13.60544     36.5
#> 10     10    20  0 20  0.00000 53   0 147  0   100.00000     0.00000     26.5

Lift Chart

model %>%
  blr_gains_table() %>%
  plot()

ROC Curve

model %>%
  blr_gains_table() %>%
  blr_roc_curve()

KS Chart

model %>%
  blr_gains_table() %>%
  blr_ks_chart()

Lorenz Curve

blr_lorenz_curve(model)

Getting Help

If you encounter a bug, please file a minimal reproducible example using reprex on github. For questions and clarifications, use StackOverflow.

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

blorr's People

Contributors

Stargazers

Watchers

Forkers

guhjy romainfrancois kaushikmanikonda

blorr's Issues

Automated report

Generate automated report for logistic regression. Check the report package.

Interactive tutorial

Add an interactive tutorial using the learnr package.

tibble package breaking changes

== CHECK RESULTS ========================================

checking examples ... ERROR

Running examples in ‘blorr-Ex.R’ failed
The error most likely occurred in:

> ### Name: blr_multi_model_fit_stats
> ### Title: Multi model fit statistics
> ### Aliases: blr_multi_model_fit_stats blr_multi_model_fit_stats.default
>
> ### ** Examples
>
> model <- glm(honcomp ~ female + read + science, data = hsb2,
+ family = binomial(link = 'logit'))
>
> model2 <- glm(honcomp ~ female + read + math, data = hsb2,
+ family = binomial(link = 'logit'))
>
> blr_multi_model_fit_stats(model, model2)
Error: Columns 1, 2 must be named.
Use .name_repair to specify repair.
Execution halted

checking tests ...

 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
                       Added/
  Step    Variable    Removed        AIC           BIC           C(p)
  ----------------------------------------------------------------------
     1       x6       addition     18869.627     18885.434    18865.6270
     2       x1       addition     18571.376     18595.087    18565.3760
     3       x3       addition     18016.724     18048.338    18008.7240
     4       x2       addition     16642.374     16681.891    16632.3740
     5       x5       addition     16640.883     16688.304    16628.8830
     6       x6       removal      16639.219     16678.736    16629.2190
  ----------------------------------------------------------------------══

testthat results
═══════════════════════════════════════════════════════════
OK: 76 SKIPPED: 28 FAILED: 1
1. Error: blr_multi_model_fit_stats prints the correct output
(@test-model-fit-stats.R#154)

  Error: testthat unit tests failed
  Execution halted

Feature: Shiny App

Add a shiny app for users to explore the package interactively.

Influence Diagnostics

Add function blr_influence_diag() to create a panel of plots where the following are plotted against the observation id:

pearson residual
deviance residual
leverage
ci displacement c
ci displacement cbar
delta deviance
delta chisquare

Documentation

Add the following:

Contributing Guide
Issue Template

0.2.0 Checklist

Prepare for release:

devtools::check_win_devel()
rhub::check_for_cran()
Polish NEWS

Perform release:

Wait for CRAN...

Tag release
Bump dev version

Template from r-lib/usethis#338

Return current version

Add a function to return the current version of the package on CRAN and GitHub.

Force variables to be included in all models

Users should be able to specify variables which must be included in the models using the include argument and it should be available in the following procedures:

Bivariate analysis for multiple variables

blr_bivariate_analysis() should be able to accommodate multiple predictors.

Feature: Forward Selection Method

Add a function blr_step_forward() for forward selection of predictors. It should include the following arguments:

model: a binary logistic regression model
include: the predictors to be included in the forward selection
enter: significance level for entering the model
stop: number of predictors to be added to the model before stopping forward selection
details: if TRUE, model summary will be printed to the console after each step. Default value is FALSE

Feature: Specification Error

Add a test (similar to ) to detect specification error.

Import shiny app from xplorerr

Import the shiny app from the xplorerr package.

Remove shiny app

Move shiny app to xplorerr package.

Variable selection procedures should return the final model

The following variable selection procedures should return the final model as an object of class glm():

#45

Feature: Backward Elimination Method

Add a function blr_step_backward() for backward elimination. It should include the following arguments:

model: a binary logistic regression model
include: predictors to be included at the beginning of backward elimination process
retain: significance level at which the predictors will be retained in the model
stop: number of predictors to be eliminated

Check odds ratio when weights are specified

Check the odds ratio estimates and confidence intervals when weights are specified in the regression model.

Use rlang for errors and warnings

Use rlang equivalents for errors, warnings and messages.

Forthcoming release of ggplot2 and blorr

We are contacting you because you are the maintainer of blorr, which imports ggplot2 and uses vdiffr to manage visual test cases. The upcoming release of ggplot2 includes several improvements to plot rendering, including the ability to specify lineend and linejoin in geom_rect() and geom_tile(), and improved rendering of text. These improvements will result in subtle changes to your vdiffr dopplegangers when the new version is released.

Because vdiffr test cases do not run on CRAN by default, your CRAN checks will still pass. However, we suggest updating your visual test cases with the new version of ggplot2 as soon as possible to avoid confusion. You can install the development version of ggplot2 using remotes::install_github("tidyverse/ggplot2").

If you have any questions, let me know!

Feature: Diagnostics vs Leverage Plots

CI Displacements C vs leverage
Diff Chi Square vs leverage
Diff Deviance vs leverage
fitted values vs leverage

Best subset regression

Similar to this.

WOE for multiple variables

Add a function to print WOE & IV for multiple variables.

Multiple Plot Options

Users should be able to select plot from the following libraries:

ggplot2 (default)
plotly
rbokeh

To arrange plots, use grid_plot() in rbokeh and subplot() in plotly.

Print method for two way segmentation

Add a print method for the two way segmentation of the response variable.

Combining plots

Use patchwork to combine plots instead of gridExtra.

Feature: Collinearity Diagnostics

Add a function blr_collinearity_diagnostics() (similar to ) to detect multicollinearity.

Capture rate by decile

Add a plot for visualizing capture rate by decile

0.2.1 Checklist

Prepare for release:

devtools::check_win_devel()
rhub::check_for_cran()
Polish NEWS

Perform release:

Wait for CRAN...

Tag release
Bump dev version

Template from r-lib/usethis#338

Issues with installation

Hi,
I failed to install the package from CRAN/Github. Here are the few errors I found:

Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : there is no package called 'rio'
Installed rio and tried again
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : there is no package called 'carData'
Installed carData and tried again
Error : .onLoad failed in loadNamespace() for 'checkmate', details:
call: NULL
error: 'import' is not an exported object from 'namespace:backports'
ERROR: lazy loading failed for package 'blorr'

What shall I do now?

sessionInfo() # For Reference
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] httr_1.3.1 R6_2.2.2 tools_3.3.3 withr_2.1.1 curl_3.1 memoise_1.0.0
[7] knitr_1.19 git2r_0.18.0 digest_0.6.12 devtools_1.12.0

Compare model fit statistics for two models

blr_model_fit_stats() should compare model fit statistics for different models.

Feature: Stepwise Selection

Add a function blr_stepwise() for step wise selection of predictors. It should include the following arguments:

model: a binary logistic regression model
include: predictors to be included in the model at the beginning of the step wise process
enter: significance level at which the predictor will enter the model
retain: significance level at which the predictor will be retained in the model

Use symbols only in interactive session

Use symbols such as tick and cross only if the function is used in an interactive session.

Feature: Residual Diagnostics Plot

Add the following plots for residual diagnostics:

Use c++ to improve computation of concordance and discordance

The R code for computing concordance and discordance is extremely slow due to the double for loops. Rewrite the code in c++ to reduce the computation time.

README template

Use the standard template for README:

Overview
Installation
Shiny App
Usage
Articles
Features
Getting Help
Code of Conduct

0.3.0 Checklist

Prepare for release:

devtools::check_win_devel()
rhub::check_for_cran()
Polish NEWS

Perform release:

Wait for CRAN...

Tag release
Bump dev version
Publish blog post
Share on Twitter

Template from r-lib/usethis#338

All possible regression

Similar to this

Use line chart for woe

Use a line chart instead of bar chart for weight of evidence

Stepwise logistic regression model selection based on Chi Square Test

Hi,

I was wondering would it be possible to create a function that does model selection based on p values (like ols_step_both_p from the olsrr package). Specifically p values from ANOVA/Chi Square Test similar to the STEPWISE procedure in SAS (https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect029.htm)

Thank you!
Aaron

Error in gains table computation

library(blorr)
model <- glm(honcomp ~ female + read + science, data = hsb2[-200, ], family = binomial(link = "logit"))
blr_gains_table(model)
#> Error: `.data` must have 199 rows, not 200

name repair problem with gains_table_prep()

library(blorr)

model <- glm(
  honcomp ~ female + read + science, data = hsb2,
  family = binomial(link = "logit")
)
gtable <- blr_gains_table(model, hsb2)
#> New names:
#> * value -> value...1
#> * value -> value...2
#> Error: Can't subset columns that don't exist.
#> x Column `value` doesn't exist.

^{Created on 2020-04-29 by the reprex package (v0.3.0)}

This then is one failure we see when testing this package against dplyr 1.0.0. This apparently comes from these lines in gains_table_prep():

  response %>%
    as_tibble() %>%
    bind_cols(predict.glm(model, newdata = data, type = "response") %>%
                as_tibble())

because both have value column, so name repair kicks in.

0.2.2 Release Checklist

Prepare for release:

Check that description is informative
Check licensing of included files
devtools::check_win_devel()
rhub::check_for_cran()

Perform release:

Wait for CRAN...

Template from r-lib/usethis#338

Plot limitations

Currently, the end user cannot modify the appearance of the plots generated by blorr. The plots are useful for absolute beginners while others would want to generate the plots using their own code. To address this issue, all the functions used in generating data for the plots must be exported. Those users who are well versed with R plotting libraries will then be able to use the above exported functions to prep the data and then use their favorite library to generate the plots. The functions used for preparing data for the following plots must be exported:

blr_roc_curve
blr_lorenz_curve
blr_ks_chart
blr_decile_capture_rate
blr_decile_lift_chart

Fit Diagnostics

Add blr_fit_diag() to create a panel of plots where the following are plotted against the fitted values:

delta deviance
delta chi square
leverage
ci displacement c

Leverage Diagnostics

Add blr_leverage_diag() to create a panel of plots where the following are plotted against leverage:

delta deviance
delta chi square
ci displacement c
fitted values

Decile lift chart

Add decile wise lift chart

Feature: Diagnostics vs Fitted Values Plots

CI Displacements C vs fitted values
Diff Chi Square vs fitted values
Diff Deviance vs fitted values

Option to return plot object

Return plot objects instead of printing. Use the argument print_plot with the default value TRUE.

Plots in stepwise selection procedure

Currently, the end user can view the plot only for AIC in the following:

blr_step_aic_backward()
blr_step_aic_forward()
blr_step_aic_both()

The plots must be generated for BIC and deviance as well and the user must be able to choose the plot to be displayed.

Automated report

Integrate the logistic regression report template from reportr.

rsquaredacademy / blorr Goto Github PK

blorr's Introduction

blorr

Overview

Installation

Articles

Usage

Bivariate Analysis

Weight of Evidence & Information Value

Model

Regression Output

Model Fit Statistics

Confusion Matrix

Hosmer Lemeshow Test

Gains Table

Lift Chart

ROC Curve

KS Chart

Lorenz Curve

Getting Help

Code of Conduct

blorr's People

Contributors

Stargazers

Watchers

Forkers

blorr's Issues

Recommend Projects

Recommend Topics

Recommend Org