Coder Social home page Coder Social logo

blorr's Introduction

blorr

Tools for building binary logistic regression models

CRAN_Status_Badge cran checks r-universe R build status Lifecycle: stable Coverage status status

Overview

Tools designed to make it easier for users, particularly beginner/intermediate R users to build logistic regression models. Includes comprehensive regression output, variable selection procedures, model validation techniques and a ‘shiny’ app for interactive model building.

Installation

# Install blorr from CRAN
install.packages("blorr")

# Install development version from GitHub
# install.packages("devtools")
devtools::install_github("rsquaredacademy/blorr")

# Install the development version from `rsquaredacademy` universe
install.packages("blorr", repos = "https://rsquaredacademy.r-universe.dev")

Articles

Usage

blorr uses consistent prefix blr_* for easy tab completion.

library(blorr)
library(magrittr)

Bivariate Analysis

blr_bivariate_analysis(hsb2, honcomp, female, prog, race, schtyp)
#>                          Bivariate Analysis                           
#> ---------------------------------------------------------------------
#> Variable    Information Value    LR Chi Square    LR DF    LR p-value 
#> ---------------------------------------------------------------------
#>  female           0.10              3.9350          1        0.0473   
#>   prog            0.43              16.1450         2        3e-04    
#>   race            0.33              11.3694         3        0.0099   
#>  schtyp           0.00              0.0445          1        0.8330   
#> ---------------------------------------------------------------------

Weight of Evidence & Information Value

blr_woe_iv(hsb2, prog, honcomp)
#>                            Weight of Evidence                             
#> -------------------------------------------------------------------------
#> levels    count_0s    count_1s    dist_0s    dist_1s        woe      iv   
#> -------------------------------------------------------------------------
#>   1          38          7           0.26       0.13       0.67     0.08  
#>   2          65          40          0.44       0.75      -0.53     0.17  
#>   3          44          6           0.30       0.11       0.97     0.18  
#> -------------------------------------------------------------------------
#> 
#>       Information Value       
#> -----------------------------
#> Variable    Information Value 
#> -----------------------------
#>   prog           0.4329       
#> -----------------------------

Model

# create model using glm
model <- glm(honcomp ~ female + read + science, data = hsb2,
             family = binomial(link = 'logit'))

Regression Output

blr_regress(model)
#>                              Model Overview                              
#> ------------------------------------------------------------------------
#> Data Set    Resp Var    Obs.    Df. Model    Df. Residual    Convergence 
#> ------------------------------------------------------------------------
#>   data      honcomp     200        199           196            TRUE     
#> ------------------------------------------------------------------------
#> 
#>                     Response Summary                     
#> --------------------------------------------------------
#> Outcome        Frequency        Outcome        Frequency 
#> --------------------------------------------------------
#>    0              147              1              53     
#> --------------------------------------------------------
#> 
#>                   Maximum Likelihood Estimates                    
#> -----------------------------------------------------------------
#>  Parameter     DF    Estimate    Std. Error    z value    Pr(>|z|) 
#> -----------------------------------------------------------------
#> (Intercept)    1     -12.7772       1.9755    -6.4677      0.0000 
#>   female1      1      1.4825        0.4474     3.3139       9e-04 
#>    read        1      0.1035        0.0258     4.0186       1e-04 
#>   science      1      0.0948        0.0305     3.1129      0.0019 
#> -----------------------------------------------------------------
#> 
#>  Association of Predicted Probabilities and Observed Responses  
#> ---------------------------------------------------------------
#> % Concordant          0.8561          Somers' D        0.7147   
#> % Discordant          0.1425          Gamma            0.7136   
#> % Tied                0.0014          Tau-a            0.2794   
#> Pairs                  7791           c                0.8568   
#> ---------------------------------------------------------------

Model Fit Statistics

blr_model_fit_stats(model)
#>                               Model Fit Statistics                                
#> ---------------------------------------------------------------------------------
#> Log-Lik Intercept Only:      -115.644    Log-Lik Full Model:              -80.118 
#> Deviance(196):                160.236    LR(3):                            71.052 
#>                                          Prob > LR:                         0.000 
#> MCFadden's R2                   0.307    McFadden's Adj R2:                 0.273 
#> ML (Cox-Snell) R2:              0.299    Cragg-Uhler(Nagelkerke) R2:        0.436 
#> McKelvey & Zavoina's R2:        0.518    Efron's R2:                        0.330 
#> Count R2:                       0.810    Adj Count R2:                      0.283 
#> BIC:                          181.430    AIC:                             168.236 
#> ---------------------------------------------------------------------------------

Confusion Matrix

blr_confusion_matrix(model)
#> Confusion Matrix and Statistics 
#> 
#>           Reference
#> Prediction   0   1
#>          0 135  26
#>          1  12  27
#> 
#> 
#>                 Accuracy : 0.8100 
#>      No Information Rate : 0.7350 
#> 
#>                    Kappa : 0.4673 
#> 
#> McNemars's Test P-Value  : 0.0350 
#> 
#>              Sensitivity : 0.5094 
#>              Specificity : 0.9184 
#>           Pos Pred Value : 0.6923 
#>           Neg Pred Value : 0.8385 
#>               Prevalence : 0.2650 
#>           Detection Rate : 0.1350 
#>     Detection Prevalence : 0.1950 
#>        Balanced Accuracy : 0.7139 
#>                Precision : 0.6923 
#>                   Recall : 0.5094 
#> 
#>         'Positive' Class : 1

Hosmer Lemeshow Test

blr_test_hosmer_lemeshow(model)
#>            Partition for the Hosmer & Lemeshow Test            
#> --------------------------------------------------------------
#>                         def = 1                 def = 0        
#> Group    Total    Observed    Expected    Observed    Expected 
#> --------------------------------------------------------------
#>   1       20         0          0.16         20        19.84   
#>   2       20         0          0.53         20        19.47   
#>   3       20         2          0.99         18        19.01   
#>   4       20         1          1.64         19        18.36   
#>   5       21         3          2.72         18        18.28   
#>   6       19         3          4.05         16        14.95   
#>   7       20         7          6.50         13        13.50   
#>   8       20         10         8.90         10        11.10   
#>   9       20         13        11.49         7          8.51   
#>  10       20         14        16.02         6          3.98   
#> --------------------------------------------------------------
#> 
#>      Goodness of Fit Test      
#> ------------------------------
#> Chi-Square    DF    Pr > ChiSq 
#> ------------------------------
#>   4.4998      8       0.8095   
#> ------------------------------

Gains Table

blr_gains_table(model)
#>    decile total  1  0       ks tp  tn  fp fn sensitivity specificity accuracy
#> 1       1    20 14  6 22.33346 14 141   6 39    26.41509    95.91837     77.5
#> 2       2    20 13  7 42.09986 27 134  13 26    50.94340    91.15646     80.5
#> 3       3    20 10 10 54.16506 37 124  23 16    69.81132    84.35374     80.5
#> 4       4    20  7 13 58.52907 44 111  36  9    83.01887    75.51020     77.5
#> 5       5    20  3 17 52.62482 47  94  53  6    88.67925    63.94558     70.5
#> 6       6    20  3 17 46.72058 50  77  70  3    94.33962    52.38095     63.5
#> 7       7    20  1 19 35.68220 51  58  89  2    96.22642    39.45578     54.5
#> 8       8    20  2 18 27.21088 53  40 107  0   100.00000    27.21088     46.5
#> 9       9    20  0 20 13.60544 53  20 127  0   100.00000    13.60544     36.5
#> 10     10    20  0 20  0.00000 53   0 147  0   100.00000     0.00000     26.5

Lift Chart

model %>%
  blr_gains_table() %>%
  plot()

ROC Curve

model %>%
  blr_gains_table() %>%
  blr_roc_curve()

KS Chart

model %>%
  blr_gains_table() %>%
  blr_ks_chart()

Lorenz Curve

blr_lorenz_curve(model)

Getting Help

If you encounter a bug, please file a minimal reproducible example using reprex on github. For questions and clarifications, use StackOverflow.

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

blorr's People

Contributors

aravindhebbali avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

blorr's Issues

tibble package breaking changes

== CHECK RESULTS ========================================

  • checking examples ... ERROR

    Running examples in ‘blorr-Ex.R’ failed
    The error most likely occurred in:
    
    > ### Name: blr_multi_model_fit_stats
    > ### Title: Multi model fit statistics
    > ### Aliases: blr_multi_model_fit_stats blr_multi_model_fit_stats.default
    >
    > ### ** Examples
    >
    > model <- glm(honcomp ~ female + read + science, data = hsb2,
    + family = binomial(link = 'logit'))
    >
    > model2 <- glm(honcomp ~ female + read + math, data = hsb2,
    + family = binomial(link = 'logit'))
    >
    > blr_multi_model_fit_stats(model, model2)
    Error: Columns 1, 2 must be named.
    Use .name_repair to specify repair.
    Execution halted
    
  • checking tests ...

     ERROR
    Running the tests in ‘tests/testthat.R’ failed.
    Last 13 lines of output:
                           Added/
      Step    Variable    Removed        AIC           BIC           C(p)
      ----------------------------------------------------------------------
         1       x6       addition     18869.627     18885.434    18865.6270
         2       x1       addition     18571.376     18595.087    18565.3760
         3       x3       addition     18016.724     18048.338    18008.7240
         4       x2       addition     16642.374     16681.891    16632.3740
         5       x5       addition     16640.883     16688.304    16628.8830
         6       x6       removal      16639.219     16678.736    16629.2190
      ----------------------------------------------------------------------══
    

testthat results
═══════════════════════════════════════════════════════════
OK: 76 SKIPPED: 28 FAILED: 1
1. Error: blr_multi_model_fit_stats prints the correct output
(@test-model-fit-stats.R#154)

  Error: testthat unit tests failed
  Execution halted

Influence Diagnostics

Add function blr_influence_diag() to create a panel of plots where the following are plotted against the observation id:

  • pearson residual
  • deviance residual
  • leverage
  • ci displacement c
  • ci displacement cbar
  • delta deviance
  • delta chisquare

Documentation

Add the following:

  • Contributing Guide
  • Issue Template

0.2.0 Checklist

Prepare for release:

  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • Polish NEWS

Perform release:

  • Bump version (in DESCRIPTION and NEWS)
  • devtools::check_win_devel() (again!)
  • devtools::submit_cran()
  • pkgdown::build_site()
  • Approve email

Wait for CRAN...

  • Tag release
  • Bump dev version

Template from r-lib/usethis#338

Force variables to be included in all models

Users should be able to specify variables which must be included in the models using the include argument and it should be available in the following procedures:

  • blr_step_p_forward()
  • blr_step_aic_forward()
  • blr_step_p_backward()
  • blr_step_aic_backward()
  • blr_step_p_both()
  • blr_step_aic_both()

Feature: Forward Selection Method

Add a function blr_step_forward() for forward selection of predictors. It should include the following arguments:

  • model: a binary logistic regression model
  • include: the predictors to be included in the forward selection
  • enter: significance level for entering the model
  • stop: number of predictors to be added to the model before stopping forward selection
  • details: if TRUE, model summary will be printed to the console after each step. Default value is FALSE

Feature: Backward Elimination Method

Add a function blr_step_backward() for backward elimination. It should include the following arguments:

  • model: a binary logistic regression model
  • include: predictors to be included at the beginning of backward elimination process
  • retain: significance level at which the predictors will be retained in the model
  • stop: number of predictors to be eliminated

Forthcoming release of ggplot2 and blorr

We are contacting you because you are the maintainer of blorr, which imports ggplot2 and uses vdiffr to manage visual test cases. The upcoming release of ggplot2 includes several improvements to plot rendering, including the ability to specify lineend and linejoin in geom_rect() and geom_tile(), and improved rendering of text. These improvements will result in subtle changes to your vdiffr dopplegangers when the new version is released.

Because vdiffr test cases do not run on CRAN by default, your CRAN checks will still pass. However, we suggest updating your visual test cases with the new version of ggplot2 as soon as possible to avoid confusion. You can install the development version of ggplot2 using remotes::install_github("tidyverse/ggplot2").

If you have any questions, let me know!

Multiple Plot Options

Users should be able to select plot from the following libraries:

  • ggplot2 (default)
  • plotly
  • rbokeh

To arrange plots, use grid_plot() in rbokeh and subplot() in plotly.

0.2.1 Checklist

Prepare for release:

  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • Polish NEWS

Perform release:

  • Bump version (in DESCRIPTION and NEWS)
  • devtools::check_win_devel() (again!)
  • devtools::submit_cran()
  • pkgdown::build_site()
  • Approve email

Wait for CRAN...

  • Tag release
  • Bump dev version

Template from r-lib/usethis#338

Issues with installation

Hi,
I failed to install the package from CRAN/Github. Here are the few errors I found:

  1. Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : there is no package called 'rio'
    Installed rio and tried again

  2. Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : there is no package called 'carData'
    Installed carData and tried again

  3. Error : .onLoad failed in loadNamespace() for 'checkmate', details:
    call: NULL
    error: 'import' is not an exported object from 'namespace:backports'
    ERROR: lazy loading failed for package 'blorr'

What shall I do now?

sessionInfo() # For Reference
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] httr_1.3.1 R6_2.2.2 tools_3.3.3 withr_2.1.1 curl_3.1 memoise_1.0.0
[7] knitr_1.19 git2r_0.18.0 digest_0.6.12 devtools_1.12.0

Feature: Stepwise Selection

Add a function blr_stepwise() for step wise selection of predictors. It should include the following arguments:

  • model: a binary logistic regression model
  • include: predictors to be included in the model at the beginning of the step wise process
  • enter: significance level at which the predictor will enter the model
  • retain: significance level at which the predictor will be retained in the model

Feature: Residual Diagnostics Plot

Add the following plots for residual diagnostics:

  • pearson standardised residual vs fitted values
  • pearson standardised residual vs case number
  • deviance residual vs fitted values
  • deviance residual vs case number
  • leverage vs fitted values
  • leverage vs case number
  • CI Displacements C vs case number
  • CI Displacements CBar vs case number
  • Diff Chi Square vs case number
  • Diff Deviance vs case number
  • DfBetas vs case number

README template

Use the standard template for README:

  • Overview
  • Installation
  • Shiny App
  • Usage
  • Articles
  • Features
  • Getting Help
  • Code of Conduct

0.3.0 Checklist

Prepare for release:

  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • Polish NEWS

Perform release:

  • Bump version (in DESCRIPTION and NEWS)
  • devtools::check_win_devel() (again!)
  • devtools::submit_cran()
  • pkgdown::build_site()
  • Approve email

Wait for CRAN...

  • Tag release
  • Bump dev version
  • Publish blog post
  • Share on Twitter

Template from r-lib/usethis#338

Error in gains table computation

library(blorr)
model <- glm(honcomp ~ female + read + science, data = hsb2[-200, ], family = binomial(link = "logit"))
blr_gains_table(model)
#> Error: `.data` must have 199 rows, not 200

name repair problem with gains_table_prep()

library(blorr)

model <- glm(
  honcomp ~ female + read + science, data = hsb2,
  family = binomial(link = "logit")
)
gtable <- blr_gains_table(model, hsb2)
#> New names:
#> * value -> value...1
#> * value -> value...2
#> Error: Can't subset columns that don't exist.
#> x Column `value` doesn't exist.

Created on 2020-04-29 by the reprex package (v0.3.0)

This then is one failure we see when testing this package against dplyr 1.0.0. This apparently comes from these lines in gains_table_prep():

  response %>%
    as_tibble() %>%
    bind_cols(predict.glm(model, newdata = data, type = "response") %>%
                as_tibble())

because both have value column, so name repair kicks in.

0.2.2 Release Checklist

Prepare for release:

  • Check that description is informative
  • Check licensing of included files
  • devtools::check_win_devel()
  • rhub::check_for_cran()

Perform release:

  • Bump version (in DESCRIPTION and NEWS)
  • devtools::check_win_devel() (again!)
  • devtools::submit_cran()
  • pkgdown::build_site()
  • Approve email

Wait for CRAN...

  • Tag release
  • Bump dev version
  • Write blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Template from r-lib/usethis#338

Plot limitations

Currently, the end user cannot modify the appearance of the plots generated by blorr. The plots are useful for absolute beginners while others would want to generate the plots using their own code. To address this issue, all the functions used in generating data for the plots must be exported. Those users who are well versed with R plotting libraries will then be able to use the above exported functions to prep the data and then use their favorite library to generate the plots. The functions used for preparing data for the following plots must be exported:

  • blr_roc_curve
  • blr_lorenz_curve
  • blr_ks_chart
  • blr_decile_capture_rate
  • blr_decile_lift_chart

Fit Diagnostics

Add blr_fit_diag() to create a panel of plots where the following are plotted against the fitted values:

  • delta deviance
  • delta chi square
  • leverage
  • ci displacement c

Leverage Diagnostics

Add blr_leverage_diag() to create a panel of plots where the following are plotted against leverage:

  • delta deviance
  • delta chi square
  • ci displacement c
  • fitted values

Plots in stepwise selection procedure

Currently, the end user can view the plot only for AIC in the following:

  • blr_step_aic_backward()
  • blr_step_aic_forward()
  • blr_step_aic_both()

The plots must be generated for BIC and deviance as well and the user must be able to choose the plot to be displayed.

Automated report

Integrate the logistic regression report template from reportr.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.