alexpghayes / modeltests Goto Github PK

test infrastructure for modeling related tasks

License: Other

R 4.43% HTML 94.05% CSS 1.04% C 0.25% JavaScript 0.09% C++ 0.14%

modeltests's Introduction

Hi, I'm Alex 👋

I'm a PhD candidate in the University of Wisconsin-Madison statistics program. My github is a mixture of research code, #rstats ✨ contributions, and personal data analysis projects. I write long-form explainers on my blog, https://www.alexpghayes.com/.

Research software

fastadi performs self-tuning matrix completion via adaptive thresholding, often outperforming softImpute. See the paper for algorithmic and theoretical details. I have also extended this algorithm to work with matrices where the entire upper triangle is observed as part of some work on citation networks.
aPPR helps you calculate approximate personalized pageranks from large graphs, including those that can only be queried via an API. aPPR additionally performs degree correction and regularization, allowing users to recover blocks from stochastic blockmodels. Read the paper.
vsp performs semi-parametric estimation of latent factors in random-dot product graphs by computing varimax rotations of the spectral embeddings of graphs. The resulting factors are sparse and interpretable. Read the paper.
fastRG samples random-dot product graphs much faster than naive sampling procedures and is especially useful when running simulation studies. See the paper for a description of the fastRG core algorithm.

#rstats

I am involved in a number of open source projects in the tidyverse and tidymodels orbits. I previously maintained the broom package, which currently has ~6 million downloads, and for my contributions am an author on the tidyverse paper. I intermittently participate in the Stan and ROpenSci communities as well.

Teaching materials

classic stats formulas

Other projects

distributions3

Please get in touch if...

you'd like to hire me for a research or data science for social good internship,
you want to discuss design of statistical modeling software,
you want to collaborate on a research project, or
you want to write an explainer together.

Outside of R, I'm a proficient Python user, and can pull together enough SQL, C++, and Julia to get things done.

I am responsive via email.

Last updated 2023-10-20.

modeltests's People

Stargazers

Watchers

Forkers

mattle24 karissawhiting eduardszoecs janlauge malcolmbarrett jyuu sushmitavgopalan16 grantmcdermott karldw gregmacfarlane jamesmartherus

modeltests's Issues

Update the argument and column glossaries

Currently there's a lot of filler and incomplete text. These need to be updated.

Test that `newdata` argument always defaults to `NULL` for `augment()`

It's sane
It makes it easy to determine whether or not to use data or newdata behavior

Add tests for input validation

`tidy()`

conf.int should be logical
conf.level should be greater than zero and less than 1
quick should be logical
etc

`glance()`

All arguments should be logical (I think? Pretty sure the only arguments are TRUE/FALSE switches that add more columns to the output)

`augment()`

data / newdata must be coercable to tibbles
type.predict and type.residuals should match.arg on the function they get passed to

All

All named arguments should get evaluated (see #6)

check_arguments default values

In check_arguments():

Check arguments other than x and ... have default arguments (NULL if there is no meaningful option)
conf.int argument defaults to FALSE
conf.level argument defaults to 0.95

Release modeltests 0.1.6

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
devtools::submit_cran()
Approve email

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
usethis::use_dev_version(push = TRUE)

possible release ahead of upcoming broom release

Hey Alex! Hope you're taking care.

I'm looking towards sending out a broom release in the coming weeks to handle a breakage in the survival tidiers. I'd like to also include the changes from tidymodels/broom#1191, which would be best-tested with the changes from #38 in this repo. Would you be game to send out a modeltests release sometime soon?

Export helper functions for tidier methods

An idea that we tossed around at the tidymodels meeting last week and thought this might be a good solution for…

One major change for broom after 0.7.0 will be the effort to encourage maintainers to implement new tidier methods in the model-owning package rather than broom. This article was written to hopefully ease that process.

At the same time, we’ve deprecated some helper functions from broom and are writing more that (I believe) we don’t plan to export.

It would be great if we could also supply some of these helper functions to people implementing tidier methods for packages external to broom. Not only does it make implementation easier for others, but hopefully prevents people from simply re-exporting lm or glm tidiers for new model classes. There are a few different ways this could come together.

Export from broom: similar issues as the existing helper functions—more maintainability burden and broom is a heavy dependency to ask package maintainers to take on.
Export from a new package: a package seems overboard for this purpose.
Export from modeltests: this would mean, for many maintainers, moving modeltests from Suggests to Imports, which might be undesirable for some. Even with these functions added, though, modeltests should still be a relatively light dependency.

An idea for modeltests, then:

Write out the dplyr dependency from modeltests (doesn’t seem like there’s any grouped stuff in the package, so this should be relatively straightforward.)
Add (revised) broom helpers from utilities.R to modeltests. This would probably exclude augment_columns() and augment_newdata() for now until that rewrite is wrapped up.
Add an article to tidymodels.org / extend the current one to show how each of the helpers can be used in context.

Thoughts?

cc @topepo

Check for `Inf` and `NaN` values in tidy output

In check_tibble()

New augment test for model.frame missing original data

Suppose a user has a dataset with missing values. Now they fit a model on that dataset with na.action = na.omit().

augment(fit)

will augment the complete cases of the data. Now, what happens when:

augment(fit, data = original_data)

The user expects they'll get the original data with NAs for relevant columns. We never test for this.

Additionally, matching residuals(model) to the dataframe original_data is likely to be complicated and lots of effort.

Some options:

Error when data is passed a dataset with missing values
Simply don't add influence measures / use the data argument. Only a couple lm lookalikes actually add additional columns to for data beyond those added for newdata
???

cc @dgrtwo

Final release checks

cc @simonpcouch

Passes R CMD check locallly on my laptop
check_rhub()
check_win_*()

I'll tag you again once I get the results, at which point I'll also submit. Also, please feel free to add yourself as an author or contributor!

Add test that tidy respects conf.int/conf.level arguments

i.e. when conf.int = FALSE:

never get conf.low or conf.high columns

when it's conf.int = TRUE

always get conf.low and conf.high columns

What should happen when conf.level = 0.9 is passed but conf.int still defaults to FALSE?

test that different confidence levels result in different confidence intervals

Add test that augment respects na.action

There's an old old version of this somewhere but I haven't looked into it yet.

Test the exported check_* functions

...who tests the testers?

installation trouble: "augment" not exported from modelgenerics?

I tried this:

devtools::install_github("alexpghayes/modelgenerics")
devtools::install_github("alexpghayes/modeltests")

First call works, but second fails with

Error : object ‘augment’ is not exported by 'namespace:modelgenerics'

Am I missing something obvious? (R version "(2018-07-26 r75007)", x86_64-pc-linux-gnu)

Add augment tests for when both data and newdata take on default values

i.e. somebody calls augment(model)

Release modeltests 0.1.5

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
devtools::submit_cran()
Approve email

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
usethis::use_dev_version(push = TRUE)

check_augment_no_data() advice is bad

Sometimes augment only accepts a data argument. The recommended error message Must specify either "data" or "newdata' argument. should be changed.

check_tibble(method = "augment") column name tests

Checking augment output against the column names of the input data will result in test failures in cases like the following:

fit <- lm(mpg ~ log(hp), mtcars)
broom::augment(fit)
#> # A tibble: 32 x 10
#>    .rownames   mpg log.hp. .fitted .se.fit .resid   .hat .sigma .cooksd
#>  * <chr>     <dbl>   <dbl>   <dbl>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
#>  1 Mazda RX4  21      4.70    22.0   0.614 -1.04  0.0360   3.29 2.01e-3
#>  2 Mazda RX…  21      4.70    22.0   0.614 -1.04  0.0360   3.29 2.01e-3
#>  3 Datsun 7…  22.8    4.53    23.9   0.715 -1.05  0.0487   3.29 2.83e-3
#>  4 Hornet 4…  21.4    4.70    22.0   0.614 -0.644 0.0360   3.29 7.64e-4
#>  5 Hornet S…  18.7    5.16    17.0   0.669  1.65  0.0427   3.28 6.07e-3
#>  6 Valiant    18.1    4.65    22.5   0.637 -4.44  0.0387   3.19 3.94e-2
#>  7 Duster 3…  14.3    5.50    13.4   0.950  0.876 0.0860   3.29 3.77e-3
#>  8 Merc 240D  24.4    4.13    28.2   1.09  -3.82  0.113    3.21 9.92e-2
#>  9 Merc 230   22.8    4.55    23.6   0.699 -0.822 0.0466   3.29 1.65e-3
#> 10 Merc 280   19.2    4.81    20.8   0.579 -1.64  0.0319   3.28 4.37e-3
#> # ... with 22 more rows, and 1 more variable: .std.resid <dbl>

Created on 2018-07-30 by the reprex
package (v0.2.0).

Not precisely sure how we get to the final augmented column names, but it seems like make.names() gets called at some point.

New Column Name Request

Following https://www.tidymodels.org/learn/develop/broom/#glossaries ,

I'd like to request the column name "construct.type" to be added. The cSEM package models both common factors (latent variables) and composite variables.

Unlike lavaan, whether a construct is a common factor or composite variable cannot be inferred by the operation (op) associated between the indicators and its construct in cSEM due to the nature of the statistical methods.

Export test based on ellipsis package?

Once https://github.com/hadley/ellipsis is more rugged, potentially export a test that checks all dots get evaluated.