thebioengineer / tidyx Goto Github PK

HTML 89.62% R 2.59% JavaScript 6.41% CSS 0.15% Ruby 0.01% Less 0.01% SCSS 0.02% Rez 0.01% Roff 0.44% Tcl 0.27% Makefile 0.04% Shell 0.01% TeX 0.45%

tidyx's People

Stargazers

Watchers

Forkers

han-tun meersel katesports gmednick carlospumar-debug alex040604 niklasisaiah sauravg94 dataunirio liston aito123 imarin79 jpzhaoo hammao kumo121 rubinasaf whlying obgeneralao anhnguyendepocen manny-ma lyledanley hrocha sidy2015 abdulrasheedisah pgg1309 hypdoctor ssmoot1 ekt-dar elijahrona brandonlstone ukrcherry krzbanas amanojas jennifer95 lhdjung oyogo tenfnan dar4datascience grandslam77 woodstck jmcoyro jenniferlopes artlesshao yagowin jmromerop

tidyx's Issues

customising a full report

I actually in need and think it would be a good idea to show how to customise an rmarkdown report to follow branding guidlines. this will have colour palette, setting headers and footers, section pictures, referencing charts and tables and so on. I have been researching the topic and still haven't figured it all out

tidymodels versions

I was inspired by your pitch f/x videos to make several tidymodels versions. If interested, you can take a look here:
https://github.com/datadavidz/pitch_fx

moneyball_part2 is the KNN/UMAP
moneyball_part3 is the Decision Tree/Random Forest
moneyball_part4 is the xgboost
moneyball_part5 is the Naive Bayes

Disclaimer: I am a novice at tidymodels.

I found the results similar/same as what you showed using caret or the native packages. I like the consistency in the tidymodels framework when switching across different models.

-Dave

data cleaning

Data cleaning is super important. We should do a series on data cleaning for some messy data set.

Question on Episode 124

I probably didn't understand the problem completely but I was curious after watching Episode 124 why not just make a 2 column tibble and then group_by fruit and summarize toString?

library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.2.1
#> Warning: package 'tibble' was built under R version 4.2.1
#> Warning: package 'tidyr' was built under R version 4.2.1
#> Warning: package 'dplyr' was built under R version 4.2.1

lst1 <- list(winter = c("apple", "orange", "banana"), 
            summer = c("pear", "kiwi", "grape", "apple"),
            spring = c("cherry", "apple", "grape", "banana"))

lst1 |> 
 enframe(name = "season", value = "fruit") |>
 unnest(fruit) |>
 group_by(fruit) |>
 summarize(season = toString(season))
#> # A tibble: 7 Ã— 2
#>   fruit  season                
#>   <chr>  <chr>                 
#> 1 apple  winter, summer, spring
#> 2 banana winter, spring        
#> 3 cherry spring                
#> 4 grape  summer, spring        
#> 5 kiwi   summer                
#> 6 orange winter                
#> 7 pear   summer

^{Created on 2022-11-15 by the reprex package (v2.0.0)}

Databases

Using databases, accessing, writing, joins, etc.

Feature engineering

A cool segment would be to go over feature engineering approaches and variable selection.

Multi User DB editing

Hi Patrick and Ellis,

Loving the series on using databases. If possible could you cover a situation where you could have multiple people simultaneously editing data from the same database.

Using the example from episode 75.

2 different coaches, in different shiny sessions that happen to be going through and providing comments on the same game at the same time. How would the following situation resolve.

(events happen in order)

Both coaches (on separate machines) open up the shiny app.
Both coaches are presented with pbp DT in the app that has no comments.
Coach 1 adds a comment to play 1 and commits it to the db.
Coach 2 then also wants to give a comment to play 1. Keeping in mind, when this coach pulled the data from the DB, there was no comment for play 1.
Coach 2 then writes their comment, and presses commit.

In the above situation I think your code as of episode 75 would mean any of Coach 1's comments made between Coach 2 pulling the data from the database and Coach 2 pressing their commit button, get overwritten by Coach 2's commit.

There are a few different solutions that could work for this specific situation, give each coach a specific column, store comments made by different coaches in different tables etc.

I'm interested to see if there's a reasonable way, similar to episode 73 where you could do some sort of reactive polling to provide a way of interacting with the database in a similar vein to multiple people editing a google sheet at once etc. I understand it isn't a realistic use-case to see what other users are typing, character by character but at least something to see if the data in your session of the app is "out of date"? as in someone else has edited/commented/added to that data.

Thanks again for all your time and effort with the TidyX series!

REF: Episode 132 - How to include AND, OR, NOT ops in the [SEARCH] field of the Output Shiny Form?

Hi Ellis & Pat,

Episode 132: EXCELLENT!!!.

How to include
AND, OR, NOT ops
in a query in the [SEARCH] field
(shown at the top of
the Output Results Shiny Form) ?.

I saw that
a space btw 2 words in [SEARCH]
acts as an AND op. :
ie - [ fran big ]
Great!.

But what about a more complex SEARCH query,
ie:
[ bu OR han ] ## search and show only rows with "bulls" OR with "Hanson" ?.

SFd99
latest R, Rstudio + Ubuntu Linux 20.04
San Francisco
============

Models of p(hit) given throw type

like in episode 45

Real messy data

publishedweek2420212.xlsx

This is a classic example of what the UK Government statistics organisation (Office for National Statistics (ONS)) produce. Great data but in a very "untidy" spread sheet. For example look at the weekly figures 2021 sheet. In the first column we have both age groups (combined and by gender) but also regions. The other column headings are dates

line tracing

Shiny data dictionary creator

An idea you might want to tackle once is a Shiny app that generates a data dictionary from a datafile that is uploaded by the user (name, description, data type). Maybe something like the data dictionaries that come with TidyTuesday datasets? As an option a user would also be able to add a description to the variables in the dictionary, or if labels were assigned this would serve as the description field. For factor variables the different values could also be shown. Such a tool would be very useful for non-data scientist researchers to document their data in order to facilitate data re-use via open data repositories. Keep up the good work!

produce excel report with formatted fill colors

Hi Ellis and Patrick! I have a question, that might be interesting for a future episode? :)

Related to excel files that we produce on a weekly basis, but where cells that have a different value from the previous week are highlighted with Excel-s fill format. I am going through the openxlsx documentation, but do not seem to find something particular that addresses this question. Perhaps you guys can help :)

Cheers and thanks!
Adrian

thebioengineer / tidyx Goto Github PK

tidyx's People

Stargazers

Watchers

Forkers

tidyx's Issues

Recommend Projects

Recommend Topics

Recommend Org