openintrostat / openintro Goto Github PK

View Code? Open in Web Editor NEW

226.0 226.0 176.0 173.01 MB

📦 R package for data and supplemental functions for OpenIntro resources

Home Page: http://openintrostat.github.io/openintro/

License: GNU General Public License v3.0

R 100.00%

data openintro rstats rstats-package

openintro's People

Contributors

Stargazers

Watchers

Forkers

alandbravo mgenty cybernetics anjarebber dansim1 boazhillebrand darnocsdata olaobaju aranandhu domfernandez sunsure mariomicheli tmzapsr fongoses fdzul afey las-services neurosapiens luis-llena michael6010 imran1570 buesa82 julexxx kumanoit prakashsahoo vugsus avyayah amponsem dmik1978 beanumber alvarofierroclavero belabarr kristianharrington xiang-tischhauser ariel77 mbh038 marata459 hangjun rebpax christophersneed emmanuelq2 anenciu marius001 rflsierra balajivinodap junjiemao msheker bhagu emlcao bekterra zhongday mjdata nguyennhatnam cjaxx fooway yassod boziffer msourav16 latuji mascaaj joseamunozmata apoje gdsttian epsimatic88 benjaminbenwu imanojkumar rmatam andrewpbray infimath maplen76 3dan3 gth158a lzcheng naresh223 hkejigu chenwei900402 lakshmikanthgd99 u6yuvi teuffy datagm tptrang kkc-krish abhishekhp2016 sierxue y44k0v pjbaudin gabefrei bio-analyst alexdamiao86 ladzzzz123 mason-datamaterials matryosh road-to-rio shivam11 shubh26 ishwarvh adityavs oleksiyanokhin mangohero1985 henfee

openintro's Issues

[Bug]: fastfood data has incorrect salad variable

Contact Details

[email protected]

Bug

The fastfood data set has a salad variable with all 515 values "Other". Looking at the item descriptions, it does appear that there are actual salads in the data set.

Reproducible Example

library(openintro)
#> Loading required package: airports
#> Loading required package: cherryblossom
#> Loading required package: usdata
table(fastfood$salad)
#>
#> Other
#> 515

Expected Behavior

I expected to see some foods classified as salads and others, not.

Session Info

No response

Additional context

No response

chapter 5 - anova - bat10 dataset not found

Hi,

Please I don't find bat10 dataset use in the ANOVA of chapter 5 Inference for numerical data

Where could I download it please ?

rosling_responses mentioned in text but not present in package

On page 191 the Fourth Edition of the textbook mentions the rosling_responses data set:

"We will use the rosling_responses data set to evaluate the hypothesis test ..."

Use of the texttt font for "rosling_responses" suggests that such a data set exists in the package, but it doesn't.

ask documentation

it says "something is wrong" but nothing is wrong

yrbss isn't in the OpenIntro packages

Hi,
the yrbss data is used in the OpenIntro text.
The yrbss data is available to download on the Github site.
So far as I can tell, the yrbss data hasn't been added to the OpenIntro packages.
Should it be?

Remove message that appears when package loads

Referring to the text that says "Please visit openintro.org for free statistics". It shows up in the compiled markdown documents (as shown below), and yes, it's possible to mute that with the message = FALSE option in the chunk, but I think we want to be careful about teaching those to students who are new to R.

Leaving the issue here to be consider before the next version of the package...

Add Nick Paterno to the package author list before next CRAN release

Nick has added a handful of data sets.

As per the Korean font error

Hi,

When I render the image with Korean Character, the Korean characters are broken.
-. myPDF in variable.R

However, for instance, when I test the CairoPDF, the Korean characters are rendered correctly, but there are width and height issues.

I think that the other asian characters will have similar issues when using openintro package.

Thank you.

Make sure to alias countyComplete with county_complete

nuff said

dotPlot() collides with mosaic::dotPlot()

From looking at your examples, I'm not exactly sure what the purpose of your dotPlot() is supposed to be, but it is unfortunate that you have chosen a name that conflicts with the version in the mosaic package, which makes the kind of dot plot often seen in introductory statistics courses.

mosaic::dotPlot( ~ rnorm(500), width = 0.1)

ipo data format and examples need revisiting

Revisit the format of the data and the examples provided

Adjust treeDiag() function to have an option to output counts

For example, rather than give probabilities, show how 1000 cases would cascade through the tree.

find source of murders dataset

it's not documented

add "babies" dataset referenced in EOCE 6.1 of ISRS

Unless I am missing it, this is neither the births nor the ncbirths data set.

Side note: Is it necessary to have both births and ncbirths?

code by chapter

Is there a place where I can find the R code by chapter for the openintro book ?

Description sentence

Make dataset one word in DESCRIPTION

ami_occurrences - need better description

Are observations people, days, something else? Docs will need to be updated accordingly.

yrbss documentation

Do we know which year's survey is included in this dataset? Also, do we know if the variable called gender is what's identified in the 2017 data documentation as sex?

I'm happy to do a PR to clarify those things if we can track them down.

Push new package version to CRAN?

@mine-cetinkaya-rundel can we push a new version of openintro to CRAN? We're using the latest version from github in your datacamp courses, but I think there's been some confusion among students since it differs from what's on CRAN.

Why mask data sets in datasets?

This seems unnecessary and confusing:

library(openintro)
## Please visit openintro.org for free statistics materials
## 
## Attaching package: ‘openintro’
## 
## The following objects are masked from ‘package:datasets’:
## 
##     cars, chickwts, trees

Add a page with csv download for all data

This would be helpful for non-R users of the datasets.

@DavidDiez I know you host these on openintro.org but keeping synced seems a challenge. I could automate it here and post on the package websites and openintro.org could point to them. Or I suppose you could build the page on your end based on the automatically generated files in this repo as well. We should discuss which approach is preferable, but at least automatically generating files as we update the package seems like a good idea.

Add additional citation to BAC

openintro/R/data-bac.R

Line 15 in 7b3c5f2

#' @source J. Malkevitch and L.M. Lesser. For All Practical Purposes:

From Jack Miller:

The blood alcohol data set has been around since 1992 and appeared in the Electronic Encyclopedia of Statistical Examples and Exercises. I worked on EESEE and used the data sets at OSU, so I am very familiar with that particular citation. :-) Here is a URL for that particular "story" in EESEE: http://bcs.whfreeman.com/WebPub/Statistics/shared_resources/EESEE/BloodAlcoholContent/index.html.

This change will need to propagate to IMS and other books that reference this dataset as well.

Contact Details

No response

Source

The aldrin dataset from slides is not in the package.

Source is at https://github.com/OpenIntroStat/openintro-statistics-slides/tree/master/Chp%207/7-5_anova/figures/aldrin.

Desired Solution

No response

Alternative Solutions

No response

Additional context

No response

Email data corrections

In both email and email50 there are variables in the docs that don't exist in the data: period_mess and signoff -- should be removed from docs
email50 example code yields FALSE (random sampling change might be the cause?)
In both datasets indicator variables should be factors
cc is numeric, not indicator

Rewrite examples in mlbbat10

Current in \dontrun{} but they need to be checked and rewritten

qqnormsim() ideas

Use scales == "free" or better, add a scales argument that defaults to "free". [Else a sample with an outlier will cause the other plots to look quite different from how they would look if they were generated in isolation.]
Don't hard code the number of simulations. Let 8 be the default if you like.
rename first argument? It's a bit of an odd name. But I'm guessing it will typically be used without naming, so this is not such a big deal.
Consider a version that doesn't label the original data but makes it one of the sample (randomly selecting which location). Not sure the best way to do the "reveal".
Perhaps add a seed argument that sets the seed used. That would solve the reveal issue in one way, since the plot could be generated again withe the original data set distinguished.
Complete the documentation and include examples.

openintrostat / openintro Goto Github PK

openintro's People

Contributors

Stargazers

Watchers

Forkers

openintro's Issues

Contact Details

Bug

Reproducible Example

Expected Behavior

Session Info

Additional context

Contact Details

Source

Desired Solution

Alternative Solutions

Additional context

Recommend Projects

Recommend Topics

Recommend Org