TidyX
Hosts
Ellis Hughes and Patrick Ward.
Ellis has been working with R since 2015 and has a background working as a statistical programmer in support of both Statistical Genetics and HIV Vaccines. He also runs the Seattle UseR Group.
Patrick's current work centers on research and development in professional sport with an emphasis on data analysis in American football. Previously, He was a sport scientist within the Nike Sports Research Lab. Research interests include training and competition analysis as they apply to athlete health, injury, and performance.
Description
The goal of TidyX is to explain how R code works. We are focusing on submissions to the #TidyTuesday Project to help promote the great work being done there.
In this repository, you will find copies of the code we've explained, and the code we wrote to show the concept on a new dataset.
To submit code for review, email us at [email protected]
To watch more episodes, go to our youtube channel.
TidyX Episodes
-
Episode 1: Introduction and Treemaps!
- UseR Highlighted: Courtney Gerver
- Original Tweet
- Source Code
-
Episode 2: The Office, Sentiment, and Wine
- UseR Highlighted: Robin Sifre
- Original Tweet
- Source Code
-
Episode 3: TBI, Polar Plots and the NBA
- UseR Highlighted: Raniere Silva
- Original Tweet
- Source Code
-
Episode 4: A New Hope, {Patchwork} and Interactive Plots
- UseR Highlighted: Maggie Sogin
- Original Tweet
- Source Code
-
Episode 5: Tour de France and {gganimate}
- UseR Highlighted: Owen Churches
- Original Tweet
- Source Code
-
- UseR Highlighted: Priya Shukla
- Original Tweet
- Source Code
-
- UseR Highlighted: Danielle Barnas
- Original Tweet
- Source Code
-
Episode 8: Broadway Line Tracing
- UseR Highlighted: Jake Kaupp
- Original Tweet
- Source Code
-
Episode 9: Tables and Animal Crossing
- UseR Highlighted: Ted Lederas
- Original Tweet
- Source Code
-
Episode 10: Volcanoes and Plotly
- Ellis and Patrick explore this weeks TidyTuesday Dataset!
-
Episode 11: Times Series and Bayes
- UseR Highlighted: Eric Ekholm
- Original Tweet
- Source Code
-
Episode 12: Cocktails with Thomas Mock
- UseR Highlighted: Joshua de la Bruere
- Original Tweet
- Source Code
-
Episode 13: Marble Races and Bump Plots
- UseR Highlighted: Cédric Scherer
- Original Tweet
- Source Code
-
Episode 14: African American Achievements
- UseR Highlighted: Catriona Cunningham
- Original Tweet
- Source Code
-
Episode 15: Juneteenth and Census Tables
- Ellis and Patrick show US Census tables in a report, broken down into divisions and highlight values using {colortable}
- Source Code
-
Episode 16: Caribou Migrations and NBA Shots on Basket
- UseR Highlighted: Jihong Zhang
- Original Tweet
- [Source Code](https://github.com/thebioengineer/TidyX/blob/master/TidyTuesday_Explained/016-Caribou_Migrations_and_Spatial_Analysis/Jihong Zhang - Caribou Migration Map.Rmd)
-
Episode 17: Uncanny X-men and Feature Engineering
- UseR Highlighted: Rebecca Stevick
- Original Tweet
- Source Code
-
Episode 18: Coffee and Random Forest
- UseR Highlighted: Nyssa Silbiger
- Original Tweet
- Source Code
-
Episode 19: Astronauts and Dashboards
- UseR Highlighted: Lauren Pandori
- Original Tweet
- Source Code
-
Episode 20: Cocktails with David Robinson
- UseR Highlighted: David Robinson
- Original Tweet
- Source Code
-
- UseR Highlighted: Roman Link
- Original Tweet
- Source Code
-
Episode 22: European Energy and Ball Hogs
- UseR Highlighted: Kelly Cotton
- Original Tweet
- Source Code
-
Episode 23: Mailbag and Expected Wins
- Ellis and Patrick go into our mailbag and focus on a request we recently had on loops and functions.
- Source Code
-
Episode 24: Waffle plots and Shiny
- UseR Highlighted: Jared Braggins
- Original Tweet
- Source Code
-
- This is a start of a series of episodes covering more in-depth uses for {Shiny}, an R package for creating web applications by Joe Cheng. In this episode we cover basics of Shiny, and explain the concept of reactive programming.
- Source Code
-
Episode 26: Labels and ShinyCARMELO - Part 1
- UseR Highlighted: Mr. Ochiwar
- Original Tweet
- Source Code
-
Episode 27: LIX and ShinyCARMELO - Part 2
- UseR Highlighted: Leon Jessen
- Original Tweet
- Source Code
-
Episode 28: Nearest Neighbors and ReactiveValues
- This week Ellis and Patrick explore how to perform career analysis and projections using the KNN algorithm.Using those concepts, we jump into part three of our shiny demo series where we have shiny execute a KNN for our input players. We show how to create an action button to execute our code, and reactiveValues to store the results to then plot!
- Source Code
-
Episode 29: Palettes and Random Effects
- UseR Highlighted: Kaylea Haynes
- Original Tweet
- Source Code
-
- Patrick and Ellis were inspired this week by all the sentiment analysis performed for #TidyTuesday this week so we decided to look at tweets to show and comment on additional things to be aware of when doing sentiment analysis. Using {rtweet}, we pull over 50,000 tweets that used the #Debate2020, and discuss how context is incredibly important to analysis.
- Source Code
-
- This weeks #TidyTuesday dataset was on NCAA Womens Basketball Tournament appearances. Patrick and Ellis in the past have shown how tables can be used for data visualization, and wanted to learn more about another one. {reactable} is a really cool looking package, so we spend some time showing how to use the package, apply column definitions, and even apply html widgets within the table!
- Source Code
-
Episode 32: Shiny with Eric Nantz
-
This weeks #TidyTuesday dataset was a super fun one. Ellis and Patrick are joined by Eric Nantz, who created a shiny app to explore and animate the data. We talk through several new shiny concepts, like using {golem}, cross-talk, and other shiny packages like {bs4dash}!
-
UseR Highlighted: Eric Nantz
-
-
Episode 33: Beer and State Maps
- UseR Highlighted: Richard Bamattre
- Original Tweet
- Tweet Source Code
- TidyX Source Code
-
- UseR Highlighted: Florence V. Dubois
- Original Tweet
- Tweet Source Code
- TidyX Source Code
-
- UseR Highlighted: Henry Wakefield
- Original Tweet
- Tweet Source Code
- TidyX Source Code
-
- This weeks #TidyTuesday dataset was on Mobile and Landline subscriptions across the world. This week we saw lots of animation type plots, and wanted to add our own. Using {plotly}, we make an interactive plot that animates across time to show how GDP is related to the raw subscription numbers. We also do some exploration with line plots.
- Source Code
-
- Looking back at ones code can show you just how far you have come. Sparked by a conversation between Ben Baldwin (@benbaldwin), Patrick and Ellis, this weeks episode is on code review and refactoring. Ben went into his past and has furnished a set of code for us to try to refactor. In the spirit of things, neither of us looked closely at the code ahead of time, and recorded our initial reactions and process of refactoring Bens code into a function that could be applied to multiple datasets!
- UseR Highlighted: Ben Baldwin
- Original Tweet
- Tweet Source Code
- TidyX Source Code
-
- UseR Highlighted: Tobias Stalder
- Original Tweet
- Tweet Source Code
-
Episode 39: Imputing Missingness
- This weeks we reach into our mailbag to answer a request from Eric Fletcher(@iamericfletcher) on imputing NA's. In this video we scrape 2013 draft data, and impute using various techniques missing times for the three cone event. We also attempt to discuss Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR) - but we decide at the end to leave it to the professionals.
- Source Code
-
Episode 40: Inspiring Women and Plotly
- UseR Highlighted: Jackie Torres
- Original Tweet
- Tweet Source Code
- TidyX Source Code
-
Episode 41: Worm Charts with Alice Sweeting
-
Alice Sweeting(@alicesweeting) joins us as a guest explainer this week! We are very excited to have her on as she explains with us how she worked through creating a worm chart of a super netball game! She talks with us on common techniques she uses to process data, mixing base R with tidyverse. Then we spend some time discussing Alice's background, current role, and advice for folks looking to get started in sports analytics or R programming in general.
-
UseR Highlighted: Alice Sweeting
-
-
Episode 42: Highlighting Lines
- UseR Highlighted: Peter
- Original Tweet
- Tweet Source Code
- TidyX Source Code
-
Episode 43: Funnel Plots, Plotly, and Hockey
- With no #TidyTuesday dataset this week, we decide to continue to work through our learning of plotly. This time, using a tool known as a funnel plot.
- Source Code
-
Episode 44: Transit Costs, steps, and Plotly Maps
- UseR Highlighted: Martin Devaux
- Original Tweet
- Blog Post
- TidyX Source Code
-
Episode 45: NHL Pythagorean Wins and Regression
- This week we reflect back on the past year and combine techniques from multiple episodes. We scrape multiuple tables from the the hockey reference website, use regular expressions to clean and organize the data, and use for loops to determine the optimal pythagorean win exponent. We visualize the data using several different techniques, like scatter and lollipop charts. We show some fun tools with regularizing values for linear regressions and how how to predict and visualize the results.
- Source Code
-
Episode 46: Circle Plots, NHL Salaries, and Logistic Regression
- UseR Highlighted: Natalie O'Shea
- Original Tweet
- Tweet Source Code
- TidyX Source Code
-
Episode 47: NHL Win Probabilities and GT Tables
- This week we play with a new technique for optimizing, the
optim
function! We scrape the 2019-2020 NHL season to generate power rankings for every NHL team and home-ice-edge. We can use this to then predict team winning probability! We then combine that with season summary data to generate a pretty GT table! - Source Code
- This week we play with a new technique for optimizing, the
-
Episode 48: NBA Point Simulations
- In this episode we show how to scrape the current NBA seasons scores to then generate a simple game simulator. Using {purrr} with some base R functions we generate outputs and show how to simulate thousands of games to generate outcome predictions.
- Source Code
-
Episode 49: MLB Batting Simulations
- We continue looking at simulations this week, but this time for individual players. Using {Lahman}, we pull the 2019 MLB player batting stats, and visualize the stats using histograms and density plots. Next, to generate confidence intervals around their batting averages we use rbinom() combined with techniques from the {tidyverse} to make simulation easy. Finally we visualize the data using {gt} combined with {sparklines}.
- Source Code
-
Episode 50: MLB Batting Simulations
- Another MLB Batting episode. This time we use the James Stein Estimator (paper below) to apply a shrinkage estimate to player batting averages to get a "true" estimate, removing luck. Using {Lahman}, we pull the 2018 MLB player batting stats, and explain how to implement the estimator. Next, we compare estimates against the 2019 season. Finally we visualize the data using {gt}, using header spans, and cell styling. For the grand finale we combine this gt table with batting averages with plots using patchwork!.
- Source Code
-
Episode 51: Deploying Models with Shiny
- Sharing the results of a modeling effort is an important skill of any data scientist. However, just sharing the weights of each predictory is often not good enough to get buy in from stakeholders who are understandably skeptical of your results. Using the power of shiny, you can show your stakeholders exactly how your model interprets and then predicts the results. In this episode, we use the {palmerpenguins} package with {randomforest} to generate a model to predict the species of a new penguin. With shiny, we then deploy our model to allow the users to record new penguins attributes to see whether the model things they are an Adelie, Chinstrap, or Gentoo! The output is a boxplot indicating the models probablity for each species given the inputs.
- Source Code
-
Episode 52: Too Many Gentoo with Xaringan
- There are too many Gentoo, your PI proclaims. This weeks episode Patrick and Ellis talk how to use the {xaringan} package to produce reproducible html presentations using Rmarkdown syntax. We discuss how we looked at "raw" tech data and used summary statistics to compare against the gold standard {palmerpenguins} package from Dr. Allison Horst, Dr. Alison Hill, Data from Dr. Kristen Gorman. We use last weeks highly powerful machine learning model to generate presictions of species, and generate a confusion matrix of our data vs the predictions. Finally, we talk about the value of making your presentation based on Rmd and being able to update the presentation at the click of a button.
- Source Code