Coder Social home page Coder Social logo

gapminder's Introduction

gapminder

DOI CRAN status R-CMD-check

The is a data package with an excerpt from the Gapminder data. The main object in this package is the gapminder data frame or “tibble”. There are other goodies, such as the data in tab delimited form, a larger unfiltered dataset, premade color schemes for the countries and continents, and ISO 3166-1 country codes. The primary use case is for teaching and writing examples.

Installation

Install gapminder from CRAN:

install.packages("gapminder")

Quick look

Here we do a bit of data aggregation and plotting with the gapminder data:

library(gapminder)
library(dplyr)
library(ggplot2)

aggregate(lifeExp ~ continent, gapminder, median)
#>   continent lifeExp
#> 1    Africa 47.7920
#> 2  Americas 67.0480
#> 3      Asia 61.7915
#> 4    Europe 72.2410
#> 5   Oceania 73.6650

gapminder %>%
  filter(year == 2007) %>%
  group_by(continent) %>%
  summarise(lifeExp = median(lifeExp))
#> # A tibble: 5 × 2
#>   continent lifeExp
#>   <fct>       <dbl>
#> 1 Africa       52.9
#> 2 Americas     72.9
#> 3 Asia         72.4
#> 4 Europe       78.6
#> 5 Oceania      80.7

ggplot(gapminder, aes(x = continent, y = lifeExp)) +
  geom_boxplot(outlier.colour = "hotpink") +
  geom_jitter(position = position_jitter(width = 0.1, height = 0), alpha = 1 / 4)

For more, see the Get started vignette.

gapminder's People

Contributors

aammd avatar jennybc avatar rudeboybert avatar wibeasley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gapminder's Issues

Documenting the sources of the gapminder data

First, thanks for providing this dataset for use in R. It is something that I use increasingly for good examples of data vis and data science in my teaching.

I'm writing several different things relating to the gapminder data, that originally came from Hans Roslings' site, http://gapminder.org

For stuff only in R, using the variables and the time points you selected, the documentation in the package is sufficient. But it is not sufficient if I want to discuss the historical development of the moving bubble chart and the data available now on http://gapminder.org, or at an earlier stage.

Could you please add a details section to the documentation to describe how the current gapminder data was derived from the original sources. e.g., you selected only years 1957--2007, from the ranges
of years that went back to, I think 1807. If there is an original source file you used, and then selected
just the main variables you've included, you could document that too.

thanks,
-Michael

Possible enhacement: include ISO 3166-1 alpha-3 codes

Hello Jenny Bryan

Thank your for creating this package.

I want to know if it is possible to include ISO 3166-1 alpha-3 codes for the countries. I use your package to experiment and taught the tidyverse ecosystem.

I want to use also your package to taught how to create maps but the names of the countries in gapminder::gapminder are not good when trying to merger with datasets that contains information about geometries to create maps.

For example some countries like USA are pointed out as United States or United States of America so sometimes we lose information where it is better to identify a country by a standard code:

library(tidyverse)
library(rnaturalearth)

gapminder::gapminder %>%
  filter(year == 2007) %>% 
  right_join(y = ne_countries(scale = 10,
                              type = "countries",
                              returnclass = "sf") %>%
               select(adm0_a3, admin, geometry),
             by = c("country" = "admin")) %>% 
  ggplot() +
  geom_sf(aes(geometry = geometry, fill = lifeExp))

Created on 2021-08-10 by the reprex package (v2.0.1)

Verify Kuwait GDP per capita

I came across this while playing around with the Software Carpentry lesson materials:

library(gapminder)
library(ggplot2)
ggplot(data=gapminder, aes(x=year, y=gdpPercap, by=country)) + 
  geom_line(aes(color=continent)) +
  geom_line(data=gapminder[gapminder$country == "Kuwait",], aes(color=country), size=2)

I suspect something wonky has happened with the gdpPercap data for Kuwait. I cannot reproduce this using data from gapminder.org. The data doesn't match "Income per person (GDP/capita, PPP$ inflation-adjusted)", whose trend suggests that the gdpPercap data may have been flipped over the time points in the dataset.

Clarify licensing of Gapminder data and this package

@billymeinke found this during rrhack

Gapminder dataset terms of use (data is CC BY 3.0):
https://docs.google.com/document/pub?id=1POd-pBMc5vDXAmxrpGjPLaCSDSWuxX6FLQgq5DhlUhM

"Gapminder Content: All Content (other than computer software) owned by Gapminder and made available by Gapminder on the Websites or through the Services is licensed under the Creative Commons Attribution 3.0 Unported license, unless marked otherwise."

I need to determine if this changes the license on this package and/or wording about how Gapminder data is licensed.

Could there be errors in this data set?

Hello! I was pointed to this repo by this Twitter exchange:
https://twitter.com/R_Graph_Gallery/status/920074231269941248

I'm working on a lesson for my students, and took some inspiration from
https://python-graph-gallery.com/341-python-gapminder-animation/
... which uses your data.

A line plot of all life expectancies shows a dramatic drop for one country in 1977 and another in 1992—the first corresponds to Cambodia, but the value (31.2) is not consistent with the actual life expectancy in Cambodia during the crisis in the 70s, which was around 20 years old!

Have a look at my draft:
http://go.gwu.edu/engcomp2lesson4

It's an unexecuted Jupyter notebook (as we push with outputs only when finalized to avoid diff bloat).

When I look at the text data in this repo, I find the same: Cambodia in 1977 = 31.2
However, various sources report a life expectancy there in 1977 that was < 20.
For example: https://data.worldbank.org/country/cambodia

The other dip is Rwanda in 1992 = 23.6
But the World Bank gives 28.1
https://data.worldbank.org/country/rwanda

So I wonder: did something go awry when preparing this data set?

Source for the GDP data?

Hello, this is such an amazing resource! I've been hoping to expand the data forward in time and while your documentation is incredibly detailed, it seems due to difficulties with the structure of the GDP data the documentation there is less detailed. Do you remember by chance which of the many GDP time series from gapminder you downloaded?

Move `master` branch to `main`

Cc @jennybc

The master branch of this repository will soon be renamed to main, as part of a coordinated change across several GitHub organizations (including, but not limited to: tidyverse, r-lib, tidymodels, and sol-eng). We anticipate this will happen by the end of September 2021.

That will be preceded by a release of the usethis package, which will gain some functionality around detecting and adapting to a renamed default branch. There will also be a blog post at the time of this master --> main change.

The purpose of this issue is to:

  • Help us firm up the list of targetted repositories
  • Make sure all maintainers are aware of what's coming
  • Give us an issue to close when the job is done
  • Give us a place to put advice for collaborators re: how to adapt

message id: entire_lizard

country outlines

I think it would be really cool to have sf object with country outlines included in gapminder. Right now we are chasing it in maptools, rnaturalearth and spData. Would be really cool to have gapminder that can be plotted with geom_sf. Let me know if you support the idea and would like a PR.

README suggests data for 1955 is available in gapminder

The README says the following, but data for 1955 is not available in the gapminder data frame - suggest for it to be corrected to 1957

Package contains two main data frames or tibbles:

  • gapminder: 12 rows for each country (1952, 1955, …, 2007). It’s a subset of …
  • gapminder_unfiltered: more lightly filtered and therefore about twice as many rows.

Facing an issue while loading gapminder.

Hello, I am new to R. I wanted to do some data visualization using gapminder datasets. I was able to install it without any issues but I am getting this error while loading the library.
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘pillar’ Error: package or namespace load failed for ‘gapminder

I have tried installing pillar separately as well. There I get following Errors,


Error : object 'glue_collapse' is not exported by 'namespace:glue' ERROR: lazy loading failed for package 'cli' removing 'C:/Users/BabaAkk/Documents/R/win-library/3.3/cli' Warning in install.packages : running command '"C:/PROGRA~1/R/R-33~1.3/bin/x64/R" CMD INSTALL -l "C:\Users\BabaAkk\Documents\R\win-library\3.3" C:\Users\BabaAkk\AppData\Local\Temp\RtmpUFQ9Dg/downloaded_packages/cli_2.0.2.tar.gz' had status 1 Warning in install.packages : installation of package ‘cli’ had non-zero exit status ERROR: dependencies 'cli', 'vctrs' are not available for package 'pillar' removing 'C:/Users/BabaAkk/Documents/R/win-library/3.3/pillar' Warning in install.packages : running command '"C:/PROGRA~1/R/R-33~1.3/bin/x64/R" CMD INSTALL -l "C:\Users\BabaAkk\Documents\R\win-library\3.3" C:\Users\BabaAkk\AppData\Local\Temp\RtmpUFQ9Dg/downloaded_packages/pillar_1.4.3.tar.gz' had status 1 Warning in install.packages : installation of package ‘pillar’ had non-zero exit status

Could you please help

Add the data in various untidy forms?

I'm watching @aammd present parts of a new SWC lesson based on Gapminder. Specifically on tidying data. Maybe I should put some untidy versions of this in the package itself? As data.frame or as delimited file?

Can you add a "Cite this package" section in README?

I see a brief note at the end of the README that says that the package is under CC-BY. I also see under "Releases" that Zenodo integration is turned on. So I assume there is a DOI for this package.

It would be nice to have the complete citation (with year and DOI), so we don't have to dig around so much (I still don't find the DOI).

Thanks!

Imputation of China at 1952 causes the pop and gdpPercap fields to be different than rest of values

When cleaning the gapminder data, an imputation is being done to fill missing values for China in 1952. The pop field ends up being a fraction (while all other pop are integers) and the gdpPercap has many decimal digits (while all other gdpPercap have 6).

This causes gapminder to not be self-reproducible (what's the proper way to word this?)

Reproducible code:

library(gapminder)
gDat <- gapminder
dput(gDat, "gdat.dput")
gDat2 <- dget("gdat.dput")
identical(gDat, gDat2)

the result is FALSE. But if we change the population to integer and gdpPercap to 6 digits for China-1952 (row 289), we get TRUE:

library(gapminder)
gDat <- gapminder
gDat[289, "pop"] <- round(gDat[289, "pop"])
gDat[289, "gdpPercap"] <- round(gDat[289, "gdpPercap"], 6)
dput(gDat, "gdat.dput")
gDat2 <- dget("gdat.dput")
identical(gDat, gDat2)

data after 2007

Hi,

May I ask is there any plan on updating the data to 2018 ? the data on the package is still on 2007

Thank you

Move China 1952 data imputation into separate script and document the how

Data imputation seems like a fairly fundamental manipulation to a dataset to me, so where and how this is done should be as transparent as possible. Right now this step is lumped together with the standardization on every 5 years in 07_filter-every-five-years, and the name of the file doesn't offer any suggestion that this is the place where to find it.

Also, the imputations steps undertaken, while evident from studying the code, don't seem to be summarized somewhere in plain English (although I may well be missing where they are). I'd suggest adding that at the top of a stand-alone script for accomplishing this step.

Upkeep for gapminder

Pre-history

  • usethis::use_readme_rmd()
  • usethis::use_roxygen_md()
  • usethis::use_github_links()
  • usethis::use_pkgdown_github_pages()
  • usethis::use_tidy_github_labels()
  • usethis::use_tidy_style()
  • usethis::use_tidy_description()
  • urlchecker::url_check()

2020

  • usethis::use_package_doc()
    Consider letting usethis manage your @importFrom directives here.
    usethis::use_import_from() is handy for this.
  • usethis::use_testthat(3) and upgrade to 3e, testthat 3e vignette
  • Align the names of R/ files and test/ files for workflow happiness.
    The docs for usethis::use_r() include a helpful script.
    usethis::rename_files() may be be useful.

2021

  • usethis::use_tidy_dependencies()
  • usethis::use_tidy_github_actions() and update artisanal actions to use setup-r-dependencies
  • Remove check environments section from cran-comments.md
  • Bump required R version in DESCRIPTION to 3.5
  • Use lifecycle instead of artisanal deprecation messages, as described in Communicate lifecycle changes in your functions

2022

2023

Necessary:

Optional:

  • Review 2022 checklist to see if you completed the pkgdown updates
  • Prefer pak::pak("org/pkg") over devtools::install_github("org/pkg") in README
  • Consider running use_tidy_dependencies() and/or replace compat files with use_standalone()
  • use_standalone("r-lib/rlang", "types-check") instead of home grown argument checkers
  • Add alt-text to pictures, plots, etc; see https://posit.co/blog/knitr-fig-alt/ for examples

Plot circles incorrectly scaled?

The README.Rmd has this code for a ggplot2 plot:

ggplot(subset(gapminder, year == 2007 & continent != "Oceania"),
       aes(x = gdpPercap, y = lifeExp)) +
       scale_x_log10(limits = c(150, 115000)) + ylim(c(16, 96)) +
       geom_point(aes(size = sqrt(pop/pi)), pch = 21, color = 'grey20',
       show_guide = FALSE) + scale_size_continuous(range=c(1,40)) +
       facet_wrap(~ continent) + coord_fixed(ratio = 1/43) +
       aes(fill = country) + scale_fill_manual(values = country_colors) +
       theme_bw() + theme(strip.text = element_text(size = rel(1.1)))

I think the aes(size = sqrt(pop/pi)) should be aes(size = pop), and scale_size_continuous(range=c(1,40)) should be scale_size_area(max_size=40).

Also, color = 'grey20' should be unnecessary, shouldn't it?

foo

library(ggplot2)
library(gapminder)
ggplot(subset(gapminder, continent != "Oceania"),
       aes(x = year, y = lifeExp, group = country, color = country)) +
  geom_line(lwd = 1, show.legend = FALSE) + facet_wrap(~ continent) +
  scale_color_manual(values = country_colors) +
  theme_bw() + theme(strip.text = element_text(size = rel(1.1)))

Created on 2018-02-06 by the reprex package (v0.2.0).

Clear warnings from readr

When I revisit data cleaning, deal with the integer-ness (or lack thereof) of pop.

I thought my only problem was with the imputed China value. See #4.

But I just did this, which shows my work is not yet done:

> library(readr)
> gap_tsv <- read_delim(gap_tsv, "\t")
Warning message:
11 problems parsing '/Users/jenny/resources/R/libraryCRAN/gapminder/gapminder.tsv'. See problems(...) for more details. 
> problems(gap_tsv)
Source: local data frame [11 x 4]

   row col               expected  actual
1  289   5 no trailing characters .999989
2  697   5 no trailing characters .72e+08
3  698   5 no trailing characters .09e+08
4  699   5 no trailing characters .54e+08
5  700   5 no trailing characters .06e+08
6  701   5 no trailing characters .67e+08
7  702   5 no trailing characters .34e+08
8  703   5 no trailing characters .08e+08
9  704   5 no trailing characters .88e+08
10 705   5 no trailing characters .72e+08
11 706   5 no trailing characters .59e+08

Release gapminder 1.0.0

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('major')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.