Coder Social home page Coder Social logo

nycflights13's Introduction

tidyverse

CRAN status R-CMD-check Codecov test coverage

Overview

The tidyverse is a set of packages that work in harmony because they share common data representations and API design. The tidyverse package is designed to make it easy to install and load core packages from the tidyverse in a single command.

If you’d like to learn how to use the tidyverse effectively, the best place to start is R for Data Science (2e).

Installation

# Install from CRAN
install.packages("tidyverse")
# Install the development version from GitHub
# install.packages("pak")
pak::pak("tidyverse/tidyverse")

If you’re compiling from source, you can run pak::pkg_system_requirements("tidyverse"), to see the complete set of system packages needed on your machine.

Usage

library(tidyverse) will load the core tidyverse packages:

You also get a condensed summary of conflicts with other packages you have loaded:

library(tidyverse)
#> ── Attaching core tidyverse packages ─────────────────── tidyverse 2.0.0.9000 ──
#> ✔ dplyr     1.1.3     ✔ readr     2.1.4
#> ✔ forcats   1.0.0     ✔ stringr   1.5.0
#> ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
#> ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
#> ✔ purrr     1.0.2     
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

You can see conflicts created later with tidyverse_conflicts():

library(MASS)
#> 
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#> 
#>     select
tidyverse_conflicts()
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ✖ MASS::select()  masks dplyr::select()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

And you can check that all tidyverse packages are up-to-date with tidyverse_update():

tidyverse_update()
#> The following packages are out of date:
#>  * broom (0.4.0 -> 0.4.1)
#>  * DBI   (0.4.1 -> 0.5)
#>  * Rcpp  (0.12.6 -> 0.12.7)
#>  
#> Start a clean R session then run:
#> install.packages(c("broom", "DBI", "Rcpp"))

Packages

As well as the core tidyverse, installing this package also installs a selection of other packages that you’re likely to use frequently, but probably not in every analysis. This includes packages for:

  • Working with specific types of vectors:

    • hms, for times.
  • Importing other types of data:

    • feather, for sharing with Python and other languages.
    • haven, for SPSS, SAS and Stata files.
    • httr, for web apis.
    • jsonlite for JSON.
    • readxl, for .xls and .xlsx files.
    • rvest, for web scraping.
    • xml2, for XML.
  • Modelling

    • modelr, for modelling within a pipeline
    • broom, for turning models into tidy data

Code of Conduct

Please note that the tidyverse project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

nycflights13's People

Contributors

balthasars avatar bbrewington avatar beanumber avatar elben10 avatar hadley avatar hughparsonage avatar ianmcook avatar jozefhajnala avatar krlmlr avatar rmcd1024 avatar seankross avatar sjackman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nycflights13's Issues

Query: Package documentation reference to American Airways

Hello,

The description section of the planes data set CRAN documentation includes a note about 'American Airways' (AA) and 'Envoy Air' (MQ). Should this note refer to 'American Airlines' instead?

The airlines data set associates the initials AA with 'American Airlines' (and the letters MQ with 'Envoy Air'). Then, it might be easy to infer that this could be just a typo. However, it took me a while to understand the anti_join()documentation example since the American Airways observations do not appear in any data set even when I match the carrier variable using the rest of the join functions.

Thanks a lot for your time and I hope that everyone is safe and healthy in these times.

Best,

Sicabí

Release nycflights13 1.0.0

  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Polish NEWS
  • Bump version (in DESCRIPTION and NEWS)
  • devtools::check_win_devel() (again!)
  • devtools::submit_cran()
  • Approve email
  • Tag release
  • Bump dev version
  • Tweet

Template from r-lib/usethis#338

Release nycflights13 1.0.1

Prepare for release:

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()

Upkeep for nycflights13 (2022)

2022

  • Handle and close any still-open master --> main issues
  • usethis:::use_codecov_badge("tidyverse/nycflights13")
  • Update pkgdown site using instructions at https://tidytemplate.tidyverse.org
  • Update lifecycle badges with more accessible SVGs: usethis::use_lifecycle()

2023

  • Update email addresses *@rstudio.com -> *@posit.co
  • Update copyright holder in DESCRIPTION: person("Posit Software, PBC", role = c("cph", "fnd"))
  • Run devtools::document() to re-generate package-level help topic with DESCRIPTION changes
  • use_tidy_logo()
  • usethis::use_tidy_coc()
  • Use pak::pak("org/pkg") in README
  • Consider running use_tidy_dependencies() and/or replace compat files with use_standalone()
  • Use cli errors or file an issue if you don't have time to do it now
  • use_standalone("r-lib/rlang", "types-check") instead of home grown argument checkers;
    or file an issue if you don't have time to do it now
  • Add alt-text to pictures, plots, etc; see https://posit.co/blog/knitr-fig-alt/ for examples

Eternal

  • use_package("R", "Depends", "3.6")
  • usethis::use_tidy_description()
  • usethis::use_tidy_github_actions()
  • devtools::build_readme()
  • Re-publish released site if needed

Created on 2023-10-30 with usethis::use_tidy_upkeep_issue(), using usethis v2.2.2.9000

airport code "BFT" is not unique?

> filter(airports, faa == "BFT")
Source: local data frame [2 x 7]

  faa               name      lat       lon alt tz dst
1 BFT           Beaufort 32.47741 -80.72316  37 -5   A
2 BFT BFT County Airport 32.41083 -80.63500 500 -5   A

Obviously, this is an upstream issue, but might we want to filter these out? A hacky temporary solution is:

  filter(name != "Beaufort") %>%

in the creation of the airports table.

Dillant Hopkins Airport lat and long are switched and lat * -1

I was looking at the airports data and noticed that Dillant Hopkins Airport was at 72.27, i.e. the furthest north. I knew there were airports in Alaska to this seemed incorrect. The lat and lon in this file show lat of 42.89, long of -72.27, which is correct:
image
It is shown incorrect in the csv:
image
and also incorrect when you load in R:
image

I didn't have time to look into where/why this is happening but wanted to report, Vermont is not the furthest north airport :-)

distance is not flown distance

The dataset documentation says

distance Distance flown

But according to BTS's glossary, distance is "Distance between airports (miles)".
Rarely this is the flown distance due to the fact that flights do not fly great circle paths.
Also some flights are

  • diverted to a different airport so DivDistance will cater for that case.
  • cancelled, i.e. flight 4412 on 2013-01-30

HTH

weather.r requires revision

weather.r no longer runs correctly as the mesonet file format appears to have changed slightly. I modified the code and was planning to issue a pull request, but the time_hour value in the file is different. I presume this occurred because no tz was specified but I wanted to ask before

query about including both incoming and outgoing flights

I realize that there are serious space limitations on the size of packages on CRAN. But would it be feasible within those constraints to include both incoming and outgoing flights? This would allow a whole series of questions to be answer about differences between departures and arrivals at JFK, for example.

Download functions in /data-raw have old links

The links used in airlines, flights, and planes are no longer valid

Can be fixed immediately:

I'm working on fixing:

  • flights: bts.gov no longer uses the same url format. Instead of year and month being specified in the URL, there is now an input form and all parameters are past through that

Customizing airports and years with the groundcontrol package

I wrote a package groundcontrol that adapted the code in nycflights13 to allow the user to create a package like this one, but specify the airports, year, and whether they want to include flight to or from those airports. Would you have any interest in (a) including those functions inside this package, or (b) sharing a common codebase?

Hourly precipitation calculation in weather.R incorrect

When aggregated, the hourly precipitation numbers in the weather dataframe do not match official NOAA daily totals for the same locations. The calculation of hourly precipitation is somewhat involved. The issue is that hourly cumulative totals reset at 51 minutes (this is not invariably true, but it appears true for the 3 NY airports in 2013). My pull request #26 addresses this.

See this page for an example using ASOS data to match NOAA daily totals.

add cancellation status?

The dataset is fabulous and works extremely well for teaching purposes. But would it be possible to add cancellation status? It appears as if cancelled flights are not included.

weather uses two timezones - not clear which matches flights

In weather, the time_hour variable is offset by five hours from the time displayed across the year, month, day, and hour variables.

screen shot 2017-01-01 at 12 46 53 pm

It is not clear which time matches the times in flights (where year, month, day, hour, and time_hour all agree). Given the offset, it is possible that time_hour is in the America/New_York timezone and the other variables are in UTC.

Move `master` branch to `main`

The master branch of this repository will soon be renamed to main, as part of a coordinated change across several GitHub organizations (including, but not limited to: tidyverse, r-lib, tidymodels, and sol-eng). We anticipate this will happen by the end of September 2021.

That will be preceded by a release of the usethis package, which will gain some functionality around detecting and adapting to a renamed default branch. There will also be a blog post at the time of this master --> main change.

The purpose of this issue is to:

  • Help us firm up the list of targetted repositories
  • Make sure all maintainers are aware of what's coming
  • Give us an issue to close when the job is done
  • Give us a place to put advice for collaborators re: how to adapt

message id: euphoric_snowdog

query about including both incoming vs. outgoing flights

Is it feasible to include both incoming and outgoing flights in the nycflights13 package? I know that large packages are frowned upon by CRAN, but could an exception be made?

Is the problem one of a single large table? Could the "flights" table be split into two parts (and built together at package installation time)?

This might allow a version of the package on github to include more cities and years without running into the "Github doesn't like files greater than 50MB".

If you have general advice on providing access to larger datasets via R packages hosted on github, I'd be all ears (and suspect that @beanumber and @rpruim would as well!).

why this code not showing any request?flights_sml <- select(flights, year:day, ends_with("delay"), distance, air_time)

flights_sml <- select(flights, year:day, ends_with("delay"), distance, air_time)

flights_sml <- select(flights, year:day, ends_with("delay"), distance, air_time)
mutate(flights_sml, gain = dep_delay - arr_delay, speed = distance/air_time * 60)

A tibble: 336,776 × 9

year month   day dep_delay arr_delay distance air_time  gain speed


1 2013 1 1 2 11 1400 227 -9 370.
2 2013 1 1 4 20 1416 227 -16 374.
3 2013 1 1 2 33 1089 160 -31 408.
4 2013 1 1 -1 -18 1576 183 17 517.
5 2013 1 1 -6 -25 762 116 19 394.
6 2013 1 1 -4 12 719 150 -16 288.
7 2013 1 1 -5 19 1065 158 -24 404.
8 2013 1 1 -3 -14 229 53 11 259.
9 2013 1 1 -3 -8 944 140 5 405.
10 2013 1 1 -2 8 733 138 -10 319.

ℹ 336,766 more rows

ℹ Use print(n = ...) to see more rows

flights_sml <- select(flights, year:day, ends_with("delay"), distance, air_time)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.