Coder Social home page Coder Social logo

modeltests's Introduction

Hi, I'm Alex 👋

I'm a PhD candidate in the University of Wisconsin-Madison statistics program. My github is a mixture of research code, #rstats ✨ contributions, and personal data analysis projects. I write long-form explainers on my blog, https://www.alexpghayes.com/.

Research software

  • fastadi performs self-tuning matrix completion via adaptive thresholding, often outperforming softImpute. See the paper for algorithmic and theoretical details. I have also extended this algorithm to work with matrices where the entire upper triangle is observed as part of some work on citation networks.

  • aPPR helps you calculate approximate personalized pageranks from large graphs, including those that can only be queried via an API. aPPR additionally performs degree correction and regularization, allowing users to recover blocks from stochastic blockmodels. Read the paper.

  • vsp performs semi-parametric estimation of latent factors in random-dot product graphs by computing varimax rotations of the spectral embeddings of graphs. The resulting factors are sparse and interpretable. Read the paper.

  • fastRG samples random-dot product graphs much faster than naive sampling procedures and is especially useful when running simulation studies. See the paper for a description of the fastRG core algorithm.

#rstats

I am involved in a number of open source projects in the tidyverse and tidymodels orbits. I previously maintained the broom package, which currently has ~6 million downloads, and for my contributions am an author on the tidyverse paper. I intermittently participate in the Stan and ROpenSci communities as well.

Teaching materials

Other projects

Please get in touch if...

  • you'd like to hire me for a research or data science for social good internship,
  • you want to discuss design of statistical modeling software,
  • you want to collaborate on a research project, or
  • you want to write an explainer together.

Outside of R, I'm a proficient Python user, and can pull together enough SQL, C++, and Julia to get things done.

I am responsive via email.

Last updated 2023-10-20.

modeltests's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

modeltests's Issues

Add tests for input validation

tidy()

  • conf.int should be logical
  • conf.level should be greater than zero and less than 1
  • quick should be logical
  • etc

glance()

  • All arguments should be logical (I think? Pretty sure the only arguments are TRUE/FALSE switches that add more columns to the output)

augment()

  • data / newdata must be coercable to tibbles
  • type.predict and type.residuals should match.arg on the function they get passed to

All

  • All named arguments should get evaluated (see #6)

check_arguments default values

In check_arguments():

  • Check arguments other than x and ... have default arguments (NULL if there is no meaningful option)
  • conf.int argument defaults to FALSE
  • conf.level argument defaults to 0.95

Release modeltests 0.1.6

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)

Export helper functions for tidier methods

An idea that we tossed around at the tidymodels meeting last week and thought this might be a good solution for…

One major change for broom after 0.7.0 will be the effort to encourage maintainers to implement new tidier methods in the model-owning package rather than broom. This article was written to hopefully ease that process.

At the same time, we’ve deprecated some helper functions from broom and are writing more that (I believe) we don’t plan to export.

It would be great if we could also supply some of these helper functions to people implementing tidier methods for packages external to broom. Not only does it make implementation easier for others, but hopefully prevents people from simply re-exporting lm or glm tidiers for new model classes. There are a few different ways this could come together.

  1. Export from broom: similar issues as the existing helper functions—more maintainability burden and broom is a heavy dependency to ask package maintainers to take on.
  2. Export from a new package: a package seems overboard for this purpose.
  3. Export from modeltests: this would mean, for many maintainers, moving modeltests from Suggests to Imports, which might be undesirable for some. Even with these functions added, though, modeltests should still be a relatively light dependency.

An idea for modeltests, then:

  • Write out the dplyr dependency from modeltests (doesn’t seem like there’s any grouped stuff in the package, so this should be relatively straightforward.)
  • Add (revised) broom helpers from utilities.R to modeltests. This would probably exclude augment_columns() and augment_newdata() for now until that rewrite is wrapped up.
  • Add an article to tidymodels.org / extend the current one to show how each of the helpers can be used in context.

Thoughts?

cc @topepo

New augment test for model.frame missing original data

Suppose a user has a dataset with missing values. Now they fit a model on that dataset with na.action = na.omit().

augment(fit)

will augment the complete cases of the data. Now, what happens when:

augment(fit, data = original_data)

The user expects they'll get the original data with NAs for relevant columns. We never test for this.

Additionally, matching residuals(model) to the dataframe original_data is likely to be complicated and lots of effort.

Some options:

  • Error when data is passed a dataset with missing values
  • Simply don't add influence measures / use the data argument. Only a couple lm lookalikes actually add additional columns to for data beyond those added for newdata
  • ???

cc @dgrtwo

Final release checks

cc @simonpcouch

  • Passes R CMD check locallly on my laptop
  • check_rhub()
  • check_win_*()

I'll tag you again once I get the results, at which point I'll also submit. Also, please feel free to add yourself as an author or contributor!

Add test that tidy respects conf.int/conf.level arguments

i.e. when conf.int = FALSE:

  • never get conf.low or conf.high columns

when it's conf.int = TRUE

  • always get conf.low and conf.high columns

What should happen when conf.level = 0.9 is passed but conf.int still defaults to FALSE?

  • test that different confidence levels result in different confidence intervals

installation trouble: "augment" not exported from modelgenerics?

I tried this:

devtools::install_github("alexpghayes/modelgenerics")
devtools::install_github("alexpghayes/modeltests")

First call works, but second fails with

Error : object ‘augment’ is not exported by 'namespace:modelgenerics'

Am I missing something obvious? (R version "(2018-07-26 r75007)", x86_64-pc-linux-gnu)

Release modeltests 0.1.5

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)

check_augment_no_data() advice is bad

Sometimes augment only accepts a data argument. The recommended error message Must specify either "data" or "newdata' argument. should be changed.

check_tibble(method = "augment") column name tests

Checking augment output against the column names of the input data will result in test failures in cases like the following:

fit <- lm(mpg ~ log(hp), mtcars)
broom::augment(fit)
#> # A tibble: 32 x 10
#>    .rownames   mpg log.hp. .fitted .se.fit .resid   .hat .sigma .cooksd
#>  * <chr>     <dbl>   <dbl>   <dbl>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
#>  1 Mazda RX4  21      4.70    22.0   0.614 -1.04  0.0360   3.29 2.01e-3
#>  2 Mazda RX…  21      4.70    22.0   0.614 -1.04  0.0360   3.29 2.01e-3
#>  3 Datsun 7…  22.8    4.53    23.9   0.715 -1.05  0.0487   3.29 2.83e-3
#>  4 Hornet 4…  21.4    4.70    22.0   0.614 -0.644 0.0360   3.29 7.64e-4
#>  5 Hornet S…  18.7    5.16    17.0   0.669  1.65  0.0427   3.28 6.07e-3
#>  6 Valiant    18.1    4.65    22.5   0.637 -4.44  0.0387   3.19 3.94e-2
#>  7 Duster 3…  14.3    5.50    13.4   0.950  0.876 0.0860   3.29 3.77e-3
#>  8 Merc 240D  24.4    4.13    28.2   1.09  -3.82  0.113    3.21 9.92e-2
#>  9 Merc 230   22.8    4.55    23.6   0.699 -0.822 0.0466   3.29 1.65e-3
#> 10 Merc 280   19.2    4.81    20.8   0.579 -1.64  0.0319   3.28 4.37e-3
#> # ... with 22 more rows, and 1 more variable: .std.resid <dbl>

Created on 2018-07-30 by the reprex
package
(v0.2.0).

Not precisely sure how we get to the final augmented column names, but it seems like make.names() gets called at some point.

New Column Name Request

Following https://www.tidymodels.org/learn/develop/broom/#glossaries ,

I'd like to request the column name "construct.type" to be added. The cSEM package models both common factors (latent variables) and composite variables.

Unlike lavaan, whether a construct is a common factor or composite variable cannot be inferred by the operation (op) associated between the indicators and its construct in cSEM due to the nature of the statistical methods.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.