Coder Social home page Coder Social logo

openintrostat / oilabs Goto Github PK

View Code? Open in Web Editor NEW
10.0 10.0 19.0 4.45 MB

๐Ÿ›‘ This package has been deprecated and datasets and functionality have been moved to the openintro package

Home Page: https://github.com/OpenIntroStat/openintro

License: Creative Commons Zero v1.0 Universal

R 100.00%

oilabs's People

Contributors

ameliamn avatar andrewpbray avatar beanumber avatar ismayc avatar mine-cetinkaya-rundel avatar rudeboybert avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

oilabs's Issues

Sampling dist plot for the inference function

Hey @mine-cetinkaya-rundel , when I run the following code,

library(oilabs)
library(dplyr)
us12 <- 
  atheism %>% 
  filter(nationality == "United States" & year == "2012")
inference(y = us12$response, est = "proportion", type = "ci", 
          method = "theoretical", success = "atheist")

I get a plot of the distribution of the sample on the left size of what looks to be a 1 by 2 plot. Is the approximated normal sampling dist supposed to be plotting on the right side?

lazyData = FALSE?

Is there a reason why we have lazyData = FALSE in the DESCRIPTION? Hadley advises against this:

I recommend that you always include LazyData: true in your DESCRIPTION. devtools::create() does this for you.

plot_ss() and locator() function issue on RStudio Server

The plot_ss() function in the simple linear regression lab is not placing the two points that define the LS line exactly where I click, but only on RStudio Server and not Desktop. This has to do with the locator() function. Oddly however, this bug only occurs part of the time for me. Here's a MWE:

# Test plot_ss
library(mosaic)
library(oilabs)
data(mlb11)
plot_ss(x = at_bats, y = runs, data = mlb11)
# Trying to locate (4,4)
plot(1:10, 1:10)
locator(1)

Are any other users of RStudio Server getting this bug?

calc_streak() doesn't actually take a data.frame

The Help for calc_streak() says:

x A data frame or character vector of hits ("H") and misses ("M").

But it won't actually work with a data frame argument.

kobe_basket %>% select(shot) %>% calc_streak()
Error in calc_streak(.) : 
  Input should only contain hits ("H") and misses ("M")

Could it?

Data in oilabs, openintro -- open to some renaming/recoding?

I don't know how married you are to the current names and coding scheme, but I would love to see some changes:

  1. Don't use 0/1 coding for things like male/female, or at least have an additional variable that doesn't use an arbitrary assignment like this. (Perhaps using male or female as the 0/1 -- or better TRUE/FALSE -- code and sex coded with "male"/"female" or "M"/"F" or something similar.)

  2. Don't skimp on characters. Make things clearer by using a few more letters. Examples: bdims -> BodyDims, hgt -> height, etc.

  3. I love using capitalization for data frame names and lower case for variable names, but I realize that that many do not use that convention. For teaching, I think it is very handy to have data sets clearly distinguished from variables.

  4. Don't mask data sets from other packages -- especially not datasets. datasts definitely has dibs. So do other packages that have been on CRAN for a while.

  5. Put data into a separate package from functions. That makes maintenance easier and makes it easier for people to use your data sets in other contexts without having to put your functions on their search path.

  6. I haven't done a check for consistent naming and coding, but if it is inconsistent, I recommend moving toward consistency unless there is a good reason not to do so.

Release to CRAN?

@andrewpbray @mine-cetinkaya-rundel Since we're using this throughout the stats courses, I think it's best if we get the package on CRAN (ideally by EOY). I can help with the process if necessary.

Besides making it more accessible for students who want to experiment outside of the course, it will also make the documentation easier to access from within DataCamp (since all CRAN packages automatically appear on RDocumentation.org).

Thoughts on what needs to happen before it's ready for release?

make plot_ci() work with formula interface?

In addition to:

plot_ci(lo, hi, m)

could we also have

plot_ci( ~ lo + hi + m, data = dataframe)

?
Could this be achieved by making plot_ci() an S3 generic with methods for numeric and formula?

make inference() go away

See (beanumber/oiLabs-mosaic#2)

No offense to whoever wrote this, but I think I hate inference(). It's the worst: an undocumented, magic, black box function.

I'll offer two potential solutions:

  1. Document this function thoroughly
  2. Make it go away.

I favor the latter. The question is does this foster or impede understanding of statistical concepts? I argue the latter. Is it realistic to suggest that if you want to do inference, you can just plug into a mysterious function? Or should we be reinforcing a conceptual understanding of inference by breaking the procedures into small steps? And isn't it important that a student specify how she is doing inference?

I don't teach with this anyway. If you want to do a t-test, use t.test(). If you want to find a p-value in a normal sampling distribution, use pnorm(). If you want to find a p-value in a t-distribution, use pt(). If you want to find a p-value in a data-generated sampling distribution, use mosaic::pdata().

On a slightly more esoteric level, I'm not sure that any of the functions in this package are really useful. I don't mind teaching students about functions in dplyr or mosaic because I know those packages are widely-used and likely to be well-supported in the future. But the function in this package never get used outside of these labs, so are they really necessary?

/diatribe

cleanup after 98616d5e

There are a lot of things that need to be cleaned up here, mostly having to do with documentation.

Error in FUN(X[[1L]], ...)

In trying to run the inference function on to compare the two mean sleep hours for whether or not someone has a job, I am receiving the following error.

> inference(y = d$sleep_hrs, 
+           x = d$job, 
+           est = "mean", 
+           type = "ht", 
+           null = 0, 
+           alternative = "twosided", 
+           method = "theoretical")
Response variable: numerical, Explanatory variable: categorical
Difference between two means
Summary statistics:
n_no = 12, mean_no = 7.2083, 
Error in FUN(X[[1L]], ...) : could not find function "FUN"

I believe the error occurs in the following code of the inference function. It looks like it doesn't find the sd function.

 # summary statistics and eda plots
            cat("Summary statistics:\n")
            if (est == "mean") {
                for (i in 1:x_levels) {
                  cat("n_", names(by(y, x, length))[i], " = ", by(y, x, length)[i], ", ", sep = "")
                  cat("mean_", names(by(y, x, mean))[i], " = ", round(by(y, x, mean)[i], 4), ", ", 
                    sep = "")
                  cat("sd_", names(by(y, x, sd))[i], " = ", round(by(y, x, sd)[i], 4), "\n", sep = "")
                }

data set issues?

It looks like you have some issues with data sets. Example: I see bdims in both openintro and oilabs, but it doesn't appear to be exported from oilabs.

> library(oilabs)
> head(bdims)
Error in head(bdims) : object 'bdims' not found
> data(bdims, package = "oilabs")
> head(bdims)
# A tibble: 6 x 25
  bia.di bii.di bit.di che.de che.di elb.di wri.di kne.di ank.di sho.gi che.gi wai.gi nav.gi
   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1   42.9   26     31.5   17.7   28     13.1   10.4   18.8   14.1   106.   89.5   71.5   74.5
2   43.7   28.5   33.5   16.9   30.8   14     11.8   20.6   15.1   110.   97     79     86.5
3   40.1   28.2   33.3   20.9   31.7   13.9   10.9   19.7   14.1   115.   97.5   83.2   82.9
4   44.3   29.9   34     18.4   28.2   13.9   11.2   20.9   15     104.   97     77.8   78.8
5   42.5   29.9   34     21.5   29.4   15.2   11.6   20.7   14.9   108.   97.5   80     82.5
6   43.3   27     31.5   19.6   31.3   14     11.5   18.8   13.9   120.   99.9   82.5   80.1
# ... with 12 more variables: hip.gi <dbl>, thi.gi <dbl>, bic.gi <dbl>, for.gi <dbl>,
#   kne.gi <dbl>, cal.gi <dbl>, ank.gi <dbl>, wri.gi <dbl>, age <int>, wgt <dbl>, hgt <dbl>,
#   sex <fct>

I recommend putting the data in just one place and using lazy data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.