openintrostat / oilabs Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 19.0 4.45 MB

🛑 This package has been deprecated and datasets and functionality have been moved to the openintro package

Home Page: https://github.com/OpenIntroStat/openintro

License: Creative Commons Zero v1.0 Universal

R 100.00%

oilabs's People

Contributors

Stargazers

Watchers

Forkers

beanumber ameliamn mine-cetinkaya-rundel davidaarmstrong ismayc drbaguiar reedies merico34 kralljr andrewpbray mason-datamaterials tanmayg3 astronomerforfun

oilabs's Issues

add ames data from confidence intervals lab to package

Sampling dist plot for the inference function

Hey @mine-cetinkaya-rundel , when I run the following code,

library(oilabs)
library(dplyr)
us12 <- 
  atheism %>% 
  filter(nationality == "United States" & year == "2012")
inference(y = us12$response, est = "proportion", type = "ci", 
          method = "theoretical", success = "atheist")

I get a plot of the distribution of the sample on the left size of what looks to be a 1 by 2 plot. Is the approximated normal sampling dist supposed to be plotting on the right side?

Add data and documentation for intro-to-r

I'm working on it.

lazyData = FALSE?

Is there a reason why we have lazyData = FALSE in the DESCRIPTION? Hadley advises against this:

I recommend that you always include LazyData: true in your DESCRIPTION. devtools::create() does this for you.

devtools::use_readme_rmd()?

Or similar to provide installation instructions and example use cases.

plot_ss() and locator() function issue on RStudio Server

The plot_ss() function in the simple linear regression lab is not placing the two points that define the LS line exactly where I click, but only on RStudio Server and not Desktop. This has to do with the locator() function. Oddly however, this bug only occurs part of the time for me. Here's a MWE:

# Test plot_ss
library(mosaic)
library(oilabs)
data(mlb11)
plot_ss(x = at_bats, y = runs, data = mlb11)
# Trying to locate (4,4)
plot(1:10, 1:10)
locator(1)

Are any other users of RStudio Server getting this bug?

calc_streak() doesn't actually take a data.frame

The Help for calc_streak() says:

x A data frame or character vector of hits ("H") and misses ("M").

But it won't actually work with a data frame argument.

kobe_basket %>% select(shot) %>% calc_streak()
Error in calc_streak(.) : 
  Input should only contain hits ("H") and misses ("M")

Could it?

Data in oilabs, openintro -- open to some renaming/recoding?

I don't know how married you are to the current names and coding scheme, but I would love to see some changes:

Don't use 0/1 coding for things like male/female, or at least have an additional variable that doesn't use an arbitrary assignment like this. (Perhaps using male or female as the 0/1 -- or better TRUE/FALSE -- code and sex coded with "male"/"female" or "M"/"F" or something similar.)
Don't skimp on characters. Make things clearer by using a few more letters. Examples: bdims -> BodyDims, hgt -> height, etc.
I love using capitalization for data frame names and lower case for variable names, but I realize that that many do not use that convention. For teaching, I think it is very handy to have data sets clearly distinguished from variables.
Don't mask data sets from other packages -- especially not datasets. datasts definitely has dibs. So do other packages that have been on CRAN for a while.
Put data into a separate package from functions. That makes maintenance easier and makes it easier for people to use your data sets in other contexts without having to put your functions on their search path.
I haven't done a check for consistent naming and coding, but if it is inconsistent, I recommend moving toward consistency unless there is a good reason not to do so.

could plot_ss support the formula interface?

This seems like it wouldn't be that hard to do. Could we write a plot_ss.formula() method that takes a formula as its first argument and a data frame as its data argument?

Release to CRAN?

@andrewpbray @mine-cetinkaya-rundel Since we're using this throughout the stats courses, I think it's best if we get the package on CRAN (ideally by EOY). I can help with the process if necessary.

Besides making it more accessible for students who want to experiment outside of the course, it will also make the documentation easier to access from within DataCamp (since all CRAN packages automatically appear on RDocumentation.org).

Thoughts on what needs to happen before it's ready for release?

add plot_ci function from confidence intervals lab

make plot_ci() work with formula interface?

In addition to:

plot_ci(lo, hi, m)

could we also have

plot_ci( ~ lo + hi + m, data = dataframe)

?
Could this be achieved by making plot_ci() an S3 generic with methods for numeric and formula?

make calc_streak use rle() instead of custom code?

This should be a one-liner but it's not due to the way we're interpreting streaks of length 0.

Perhaps someone more clever than me can figure this out.

Add data and documentation for intro-to-data

I'm on it.

Add data and documentation for probability

I'm working on it.

make inference() go away

See (beanumber/oiLabs-mosaic#2)

No offense to whoever wrote this, but I think I hate inference(). It's the worst: an undocumented, magic, black box function.

I'll offer two potential solutions:

Document this function thoroughly
Make it go away.

I favor the latter. The question is does this foster or impede understanding of statistical concepts? I argue the latter. Is it realistic to suggest that if you want to do inference, you can just plug into a mysterious function? Or should we be reinforcing a conceptual understanding of inference by breaking the procedures into small steps? And isn't it important that a student specify how she is doing inference?

I don't teach with this anyway. If you want to do a t-test, use t.test(). If you want to find a p-value in a normal sampling distribution, use pnorm(). If you want to find a p-value in a t-distribution, use pt(). If you want to find a p-value in a data-generated sampling distribution, use mosaic::pdata().

On a slightly more esoteric level, I'm not sure that any of the functions in this package are really useful. I don't mind teaching students about functions in dplyr or mosaic because I know those packages are widely-used and likely to be well-supported in the future. But the function in this package never get used outside of these labs, so are they really necessary?

/diatribe

cleanup after 98616d5e

There are a lot of things that need to be cleaned up here, mostly having to do with documentation.

Error in FUN(X[[1L]], ...)

In trying to run the inference function on to compare the two mean sleep hours for whether or not someone has a job, I am receiving the following error.

> inference(y = d$sleep_hrs, 
+           x = d$job, 
+           est = "mean", 
+           type = "ht", 
+           null = 0, 
+           alternative = "twosided", 
+           method = "theoretical")
Response variable: numerical, Explanatory variable: categorical
Difference between two means
Summary statistics:
n_no = 12, mean_no = 7.2083, 
Error in FUN(X[[1L]], ...) : could not find function "FUN"

I believe the error occurs in the following code of the inference function. It looks like it doesn't find the sd function.

 # summary statistics and eda plots
            cat("Summary statistics:\n")
            if (est == "mean") {
                for (i in 1:x_levels) {
                  cat("n_", names(by(y, x, length))[i], " = ", by(y, x, length)[i], ", ", sep = "")
                  cat("mean_", names(by(y, x, mean))[i], " = ", round(by(y, x, mean)[i], 4), ", ", 
                    sep = "")
                  cat("sd_", names(by(y, x, sd))[i], " = ", round(by(y, x, sd)[i], 4), "\n", sep = "")
                }

data set issues?

It looks like you have some issues with data sets. Example: I see bdims in both openintro and oilabs, but it doesn't appear to be exported from oilabs.

> library(oilabs)
> head(bdims)
Error in head(bdims) : object 'bdims' not found
> data(bdims, package = "oilabs")
> head(bdims)
# A tibble: 6 x 25
  bia.di bii.di bit.di che.de che.di elb.di wri.di kne.di ank.di sho.gi che.gi wai.gi nav.gi
   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1   42.9   26     31.5   17.7   28     13.1   10.4   18.8   14.1   106.   89.5   71.5   74.5
2   43.7   28.5   33.5   16.9   30.8   14     11.8   20.6   15.1   110.   97     79     86.5
3   40.1   28.2   33.3   20.9   31.7   13.9   10.9   19.7   14.1   115.   97.5   83.2   82.9
4   44.3   29.9   34     18.4   28.2   13.9   11.2   20.9   15     104.   97     77.8   78.8
5   42.5   29.9   34     21.5   29.4   15.2   11.6   20.7   14.9   108.   97.5   80     82.5
6   43.3   27     31.5   19.6   31.3   14     11.5   18.8   13.9   120.   99.9   82.5   80.1
# ... with 12 more variables: hip.gi <dbl>, thi.gi <dbl>, bic.gi <dbl>, for.gi <dbl>,
#   kne.gi <dbl>, cal.gi <dbl>, ank.gi <dbl>, wri.gi <dbl>, age <int>, wgt <dbl>, hgt <dbl>,
#   sex <fct>

I recommend putting the data in just one place and using lazy data.

unable to install.packages("oilabs") in rstudio.cloud (R3.6.0)

receive the warning

Warning in install.packages :
package 'oilabs' is not available (for R version 3.6.0)

Does this pass CRAN checks?

I'm suspecting not.