Coder Social home page Coder Social logo

stefan-schroedl / plotluck Goto Github PK

View Code? Open in Web Editor NEW
49.0 4.0 9.0 1.15 MB

R tool for automated creation of ggplots. Examines one, two, or three variables and creates, based on their characteristics, a scatter, violin, box, bar, density, hex or spine plot, or a heat map. Also automates handling of observation weights, log-scaling of axes, reordering of factor levels, and overlays of smoothing curves and median lines.

License: Other

R 100.00%
r ggplot2 data-visualization plot heatmap violin-plot spine-plot scatter-plot box-plot boxplot density-plot visualization hexbin scatter cleveland automatic

plotluck's Introduction

GGPlot2 version of "I'm Feeling Lucky!" CRAN Version CRAN Downloads

Purpose

For exploratory data analysis in R, let users focus on what to plot, not how.

Installation

To install the latest development branch:

install.packages('devtools')
library(devtools)
devtools::install_github("stefan-schroedl/plotluck")

Motivation

Imagine you have given a new R data frame, and would like to get an overview of the distributions, or see how each column interacts with a specific target column. Typically, you would have to go through each column, and create a 1D or 2D plot depending on its type (e.g., a scatter plot for 2 numerical variables, or a box plot for one factor and one numeric variable). After looking at it, you might realize that outliers make it hard to see most of the data, so you plot it again with a logarithmic axis transform. Or, in the case of a box plot, if you have many factor levels, you might want to sort them first by the y-value.

Plotluck is a tool for exploratory data visualization in R that automates such steps. It creates complete graphics based on ggplot; the only things that have to be specified are the data frame, a formula, and optionally a weight column.

Example

library(plotluck)
data(diamonds, package='ggplot2')
plotluck(diamonds, price~cut+color)

plot of chunk unnamed-chunk-2

Features

  • Automatic determination of the type of plot, based on the data types of the columns. Supports scatter, box, violin, bar, density, hex and spine plot, and heat maps.
  • Overlays of smoothing curves and median lines.
  • Automatic reordering of factor levels according to dependent variable.
  • Automatic application of axis scaling, when appropriate (logarithmic or log-modulus).
  • Correct handling and visualization of instance weights.
  • Support for missing values in factors.
  • If the data set is too large to plot, sampling is applied.
  • You can also create a grid of plots:
    • Distribution of each column in a data frame;
    • One target column against all others, ordering plots by degree of dependency (conditional entropy);
    • All pairs of columns.

What plotluck is not built for

Plotluck is designed for generic out-of-the-box plotting, and not suitable to produce more specialized types of plots that arise in specific application domains (e.g., association, stem-and-leaf, star plots, geo maps, etc). It is restricted to at most three variables. Parallel plots with variables on different scales (such as time series of multiple related signals) are not supported.

Learn More

You can find more examples under tests/testthat/test_plotluck.R.

More background is given in the vignette.

plotluck's People

Contributors

cregouby avatar stefan-schroedl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

plotluck's Issues

Requires R >= 3.3.1

Is this an actual requirement (as in it really needs a feature from this version), or will it run on older versions as well e.g. 3.1?

Consider swapping order of formula, data args of plotluck for easier piping

I realise that having fn(formula, data) is traditional, and follows the pattern of many functions in the stats and lattice packages.

However, many modern packages use data as the first argument (notably ggplot2 and the tidyverse packages). This makes it easier to pipe commands using magrittr. For example, I can do

iris %>%
  ggplot(aes(Petal.Length, Species)) + 
  geom_point()

Since the data argument isn't first, piping isn't as clean with plotluck.

It would be really useful if you could either

  1. swap the order of formula and data arguments in the plotluck function, or
  2. if you don't want to break backwards compatibility, introduce a new function (luckplot?) which does the same thing but has the arguments reversed.

ggplot2 3.3.4 deprecated message : The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as of ggplot2 3.3.4.

Current behavior

The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as of ggplot2 3.3.4.
ℹ The deprecated feature was likely used in the plotluck package.

Expected behavior

no such warning

ReprEx

plotluck::plotluck(mtcars, mpg~.)
#> Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
#> of ggplot2 3.3.4.
#> ℹ The deprecated feature was likely used in the plotluck package.
#>   Please report the issue at
#>   <https://github.com/stefan-schroedl/plotluck/issues>.
#> Warning: The dot-dot notation (`..scaled..`) was deprecated in ggplot2 3.4.0.
#> ℹ Please use `after_stat(scaled)` instead.
#> ℹ The deprecated feature was likely used in the plotluck package.
#>   Please report the issue at
#>   <https://github.com/stefan-schroedl/plotluck/issues>.
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : pseudoinverse used at 4
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : neighborhood radius 2
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : reciprocal condition number 1.8444e-17
#> Warning in predLoess(object$y, object$x, newx = if
#> (is.null(newdata)) object$x else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at 4
#> Warning in predLoess(object$y, object$x, newx = if
#> (is.null(newdata)) object$x else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius 2
#> Warning in predLoess(object$y, object$x, newx = if
#> (is.null(newdata)) object$x else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
#> number 1.8444e-17
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Created on 2023-01-07 by the reprex package (v2.0.1)

ggplot2 3.4.0 deprecation warning : The dot-dot notation (`..scaled..`) was deprecated in ggplot2 3.4.0. Please use `after_stat(scaled)` instead.

Current behavior

When running the excellent {plotluck} features on basic datasets, I get :

 The dot-dot notation (`..scaled..`) was deprecated in ggplot2 3.4.0. 
Please use `after_stat(scaled)` instead.

Expected behavior

no such warning

ReprEx

plotluck::plotluck(mtcars, mpg~.)
#> Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
#> of ggplot2 3.3.4.
#> ℹ The deprecated feature was likely used in the plotluck package.
#>   Please report the issue at
#>   <https://github.com/stefan-schroedl/plotluck/issues>.
#> Warning: The dot-dot notation (`..scaled..`) was deprecated in ggplot2 3.4.0.
#> ℹ Please use `after_stat(scaled)` instead.
#> ℹ The deprecated feature was likely used in the plotluck package.
#>   Please report the issue at
#>   <https://github.com/stefan-schroedl/plotluck/issues>.
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : pseudoinverse used at 4
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : neighborhood radius 2
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : reciprocal condition number 1.8444e-17
#> Warning in predLoess(object$y, object$x, newx = if
#> (is.null(newdata)) object$x else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used at 4
#> Warning in predLoess(object$y, object$x, newx = if
#> (is.null(newdata)) object$x else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius 2
#> Warning in predLoess(object$y, object$x, newx = if
#> (is.null(newdata)) object$x else if (is.data.frame(newdata))
#> as.matrix(model.frame(delete.response(terms(object)), : reciprocal condition
#> number 1.8444e-17
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Created on 2023-01-07 by the reprex package (v2.0.1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.