has2k1 / plotnine Goto Github PK

View Code? Open in Web Editor NEW

4.0K 63.0 213.0 144.09 MB

A Grammar of Graphics for Python

Home Page: https://plotnine.org

License: MIT License

Makefile 0.17% Python 99.83%

plotting grammar graphics python data-analysis

plotnine's Issues

Create gallery

Examples of extensions:

https://github.com/sphinx-doc/sphinx/tree/master/sphinx/ext

A gallery extension

https://github.com/sphinx-gallery/sphinx-gallery

A custom gallery extension
https://github.com/mwaskom/seaborn/blob/master/doc/sphinxext/plot_generator.py builds https://github.com/mwaskom/seaborn/tree/master/examples which results in https://seaborn.pydata.org/examples/index.html

Styling
http://matplotlib.org/devdocs/tutorials/index.html

Will have to come up with a custom solution.

Boxplot fails for categories that have only one sample

When creating a boxplot plotting some continuous value against a categorical X axis, it fails when for any category there is only one sample in the data frame.

E.g. this works fine

df = pd.DataFrame(
    {
        'weight': np.random.normal(size=20),
        # Creating two categories, one with 18 samples, one with 2 samples
        'category': pd.Categorical(18 * [0] + 2 * [1], categories=[0,1], ordered=True)
    }
)

(
    ggplot(df, aes(x='category', y='weight'))
    + geom_boxplot()
)

Producing the following plot:

However, this example fails:

df = pd.DataFrame(
    {
        'weight': np.random.normal(size=20),
        # Creating two categories, one with 19 samples, one with 1 sample
        'category': pd.Categorical(19 * [0] + 1 * [1], categories=[0,1], ordered=True)
    }
)

(
    ggplot(df, aes(x='category', y='weight'))
    + geom_boxplot()
)

Below is the trace from the error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/.virtualenvs/pandas/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
    691                 type_pprinters=self.type_printers,
    692                 deferred_pprinters=self.deferred_printers)
--> 693             printer.pretty(obj)
    694             printer.flush()
    695             return stream.getvalue()

~/.virtualenvs/pandas/lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj)
    378                             if callable(meth):
    379                                 return meth(obj, self, cycle)
--> 380             return _default_pprint(obj, self, cycle)
    381         finally:
    382             self.end_group()

~/.virtualenvs/pandas/lib/python3.6/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
    493     if _safe_getattr(klass, '__repr__', None) is not object.__repr__:
    494         # A user-provided repr. Find newlines and replace them with p.break_()
--> 495         _repr_pprint(obj, p, cycle)
    496         return
    497     p.begin_group(1, '<')

~/.virtualenvs/pandas/lib/python3.6/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
    691     """A pprint that just redirects to the normal repr function."""
    692     # Find newlines and replace them with p.break_()
--> 693     output = repr(obj)
    694     for idx,output_line in enumerate(output.splitlines()):
    695         if idx:

~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/ggplot.py in __repr__(self)
     81         Print/show the plot
     82         """
---> 83         self.draw()
     84         plt.show()
     85         return '<ggplot: (%d)>' % self.__hash__()

~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/ggplot.py in draw(self)
    138         # assign a default theme
    139         self = deepcopy(self)
--> 140         self._build()
    141 
    142         # If no theme we use the default

~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/ggplot.py in _build(self)
    234 
    235         # Apply and map statistics
--> 236         layers.compute_statistic(layout)
    237         layers.map_statistic(self)
    238 

~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/layer.py in compute_statistic(self, layout)
     92     def compute_statistic(self, layout):
     93         for l in self:
---> 94             l.compute_statistic(layout)
     95 
     96     def map_statistic(self, plot):

~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/layer.py in compute_statistic(self, layout)
    369         data = self.stat.use_defaults(data)
    370         data = self.stat.setup_data(data)
--> 371         data = self.stat.compute_layer(data, params, layout)
    372         self.data = data
    373 

~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/stats/stat.py in compute_layer(cls, data, params, layout)
    194             return cls.compute_panel(pdata, pscales, **params)
    195 
--> 196         return groupby_apply(data, 'PANEL', fn)
    197 
    198     @classmethod

~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/utils.py in groupby_apply(df, cols, func, *args, **kwargs)
    615         # do not mark d as a slice of df i.e no SettingWithCopyWarning
    616         d.is_copy = None
--> 617         lst.append(func(d, *args, **kwargs))
    618     return pd.concat(lst, axis=axis, ignore_index=True)
    619 

~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/stats/stat.py in fn(pdata)
    192                 return pdata
    193             pscales = layout.get_scales(pdata['PANEL'].iat[0])
--> 194             return cls.compute_panel(pdata, pscales, **params)
    195 
    196         return groupby_apply(data, 'PANEL', fn)

~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/stats/stat.py in compute_panel(cls, data, scales, **params)
    221         for _, old in data.groupby('group'):
    222             old.is_copy = None
--> 223             new = cls.compute_group(old, scales, **params)
    224             unique = uniquecols(old)
    225             missing = unique.columns.difference(new.columns)

~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/stats/stat_boxplot.py in compute_group(cls, data, scales, **params)
     69         labels = ['x', 'y']
     70         X = np.array(data[labels])
---> 71         res = boxplot_stats(X, whis=params['coef'], labels=labels)[1]
     72         try:
     73             n = data['weight'].sum()

~/.virtualenvs/pandas/lib/python3.6/site-packages/matplotlib/cbook.py in boxplot_stats(X, whis, bootstrap, labels, autorange)
   1998         labels = repeat(None)
   1999     elif len(labels) != ncols:
-> 2000         raise ValueError("Dimensions of labels and X must be compatible")
   2001 
   2002     input_whis = whis

ValueError: Dimensions of labels and X must be compatible

This can be worked around by removing the classes with only a single sample from the data frame and overlaying only those with a geom_point(), as there is no interesting boxplot for them anyway, but it's a bit of a hassle and would be nicer if it just worked.

Can you demonstrate a method to add a watermark to an image?

Here are a couple examples:

bottom left corner:

across the middle:

putting tick labels in LaTex formatted scientific notation?

Is there a way to put tick labels in real scientific notation using LaTex formatting? I want something like 1e7 to appear as a 10 with the superscript exponent 7.

Add geom_map geometry for plotting geographic maps

ggplot2 has geom_map for plotting geography. I can imagine a nice interface with geopandas, which has gpd.GeoDataFrame.plot() through matplotlib. Here's a demo from the docs:

  ggplot(crimesm, aes(map_id = state)) +
    geom_map(aes(fill = value), map = states_map) +
    expand_limits(x = states_map$long, y = states_map$lat) +
    facet_wrap( ~ variable)

when the default scale could be either discrete or continuous

I have a request to support a frequent use case. (Apologies if I have just missed the way to do this, I looked!) There exist cases where a column could reasonably map to either a discrete or a continuous scale, and I would like a user option to specify which one to use.

A common example is a data column that is integer type (defaults to continuous scale), but actually only takes on a handful of values in practice (meaning it could be reasonably mapped to a discrete scale). There is not an exact number of distinct values that can differentiate whether or not the integer variable should be treated as discrete or continuous, it ultimately depends on the user's need. From what I can see, plotnine assigns the default scale based on the pandas data type in the data frame, and thus the only way to change the default behavior is to mutate the data frame itself before creating the plot object. Obviously, mutating the dataframe works fine as a workaround.

Here is my example:

import pandas as pd
import random
import plotnine

df = pd.DataFrame({'a': [random.uniform(0,1) for i in range(15)], 
                   'b': [random.uniform(0,1) for i in range(15)], 
                   'c': [random.randint(0,5) for i in range(15)]
})

ggplot(df, aes('a', 'b', color='c')) + geom_text(aes(label='c')) + geom_line(aes(group='c'))

FYI: in R::ggplot2 I would accomplish this by casting the variable to a factor in the aes:

ggplot(df, aes(a, b, color=factor(c)) + ...

In case it isn't clear, the reason this matters to me is it impacts how I can manipulate the color scale. For example, in the case where the integer code is binary, the default behavior is to assign the colors purple and yellow (extremes of the 'viridis' color scale), that yellow is an unfortunately hard to see default, and changing it is harder than it seems like it needs to be.

How does this package differ from ggpy?

Can you please give a quick overview of how plotnine differs from ggpy, and what are the pros and cons of each approach?

Cannot map to computed aesthetics if there are part of a larger statement

import pandas as pd
import numpy as np
from plotnine import *

df = pd.DataFrame({'x': [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]})
ggplot(df) + geom_bar(aes(x='x', fill='..count..'))           # good
ggplot(df) + geom_bar(aes(x='x', fill='np.log(..count..)'))   # bad
ggplot(df) + geom_bar(aes(x='x', fill='..count.. + 2'))       # bad

AttributeError: 'numpy.float64' object has no attribute 'lower'

Problem likely in scales.add_defaults

Are Vector Image File Formats supported ? SVG, EPS etc...

Rendering/Exporting images are vector graphic formats supported, Specifically does the library support SVG files?

free_y ignored with facet_grid('. ~ var'), but works with facet_grid('var ~ .') and facet_wrap()

df = pd.DataFrame({'x': np.random.rand(100), 'y': np.random.rand(100)})
df['color'] = np.floor(df['y'] * 3).astype('str')

# Bad: free_y ignored
ggplot(df, aes('x', 'y', color='color')) + geom_point() + facet_grid('. ~ color', scales='free_y')

# Good
ggplot(df, aes('x', 'y', color='color')) + geom_point() + facet_grid('color ~ .', scales='free_y')

# Good
ggplot(df, aes('x', 'y', color='color')) + geom_point() + facet_wrap('color', scales='free_y')

Y-axis labels duplicate 0 with log10 axis

I think there is a weird formatting issue where very small numbers get printed as 0 after an axis has been transformed with log10. See example below with two 0 on the y-axis

import pandas as pd
from plotnine import ggplot, aes, scale_y_log10, geom_boxplot

reprex = pd.DataFrame({'value': [0.0000000001,0.00000000001,1,5,100000,4739273,11,0.0001,0.00000000001, 0.00001,0.00001,1,5,100000,4739273,11,0.0001,0.00001],
                           'cat':['1','1','1','1','1','1','1','1','1','2','2','2','2','2','2','2','2','2']})

p = ggplot(reprex, aes('cat', 'value')) + scale_y_log10() + geom_boxplot()

compare to ggplot2 which transforms the numbers into scientific notation

Create scale_color_palettable

Maybe it should be a function that returns a continuous or discrete scale depending on the arguments which are used to pick a palette from palettable.

Related to: has2k1/mizani#2

Alpha for points is ignored in aes() call.

df = pd.DataFrame(np.random.random([10, 2]), columns=['X', 'Y'])
ggplot(df, aes(x='X', y='Y', alpha=0.1)) + geom_point()

produces this:

How do I transform tick labels without explicitly specifying breaks?

Let's say I have a scatter plot where the x-axis date is pandas.Timestamp objects. By default, plotnine renders the x tick labels on my plot such such that they run into each other.

I'd like to do an arbitrary transformation on the x-axis tick labels. For example, I'd like the tick labels to be blank except for the first day of the year, in which case I want the tick label to just be the year, so "2017-03-04" would transform to "", and "2016-01-01" would transform to "2016".

However, I don't want to explicitly specify the breaks. Instead, I want plotnine to use its default algorithm for determining where to put the breaks.

Is this possible to do with plotnine?

Can plotnine make a matrix of scatterplots?

Something similar to R's pairs or plotmatrix functions; see e.g. here?

qplot: Axis labels can be extracted from pandas.Series names

In the command qplot(x=trips.grut_length, y=trips.erut_length), trips.grut_length and trips.erut_length are instanceof pandas.Series and have name. But qplot seems to show their str() as axis label instead.

geom_polygon aes group

I attempted to use the geom_polygon method by adapting a ggplot2 geom_polygon example. I omitted the group aesthetic which is not implemented in plotnine. I want to adapt some of my R ggplot2 geo visualization code, but I need to understand plotnine's implementation of geom_polygon first. I can contribute some geo examples to plotnine. Here is a notebook that I saved as a gist.
https://gist.github.com/stoneyv/df80e7cdfcd64ad6199c6faccccd215d
I pip installed source cloned from master into a conda virtual environment. I am able to successfully produce example visualizations from the notebooks in the plotnine-examples repo. I will try and use the debugger pdb or pudb to step through plotnine this week.

Is there an alternative to the 'width' argument for boxplots?

I want to make boxplot integrated in a violinplot.

In R, this can be achieved with code like this:

ggplot(df, aes(x=series,y=value)) +
  geom_violin() +
  geom_boxplot(width=0.2)

In Python with plotnine I miss the width argument and therefore the boxplots cannot be made smaller. I looked into the documentary of plotnine but did not find it. Does anybody know if there is a different way to make this with plotnine?

Thanks for any suggestion!

How to draw subplots?

plotnine is made using matplotlib as the back-end, so I'm guessing there must be a way to draw subplots (without using faceting).

Is there a way to do so? I'd be happy to contribute to the documentation if someone points out a solution.

Boxplots don't get drawn properly when there are zeros and it is log transformed

I came across an issue where some parts of the box plot don't get drawn when I log transform data that had a lot of zeroes in it. The image below illustrates what I mean. Having a quick look at the code I can see that cbook.boxplot_stats is returning -inf for some things like IQR or min values which I guess is causing weirdness.

import pandas as pd
from plotnine import *

reprex = pd.DataFrame({'value': [0,0,1,5,100000,4739273,11,0.0001,0, 0.00001,0.00001,1,5,100000,4739273,11,0.0001,0.00001], 
                       'cat':['1','1','1','1','1','1','1','1','1','2','2','2','2','2','2','2','2','2']})
ggplot(reprex, aes('cat', 'value')) + scale_y_log10() + geom_boxplot()

Use aliased imports in the gallery

I would like to suggest using "import plotnine as p9" (or something like that) in the gallery code instead of "from plotnine import *", and then using "p9" as a prefix for all identifiers defined by plotnine. That would make it much clearer which of them come from plotnine and which don't.

Transforming the axis with my function

Hello,

I am trying to do a custom scaling of the x-axis. I see that the scale_x_continuous function accepts a "trans" parameter, which can be a function. Here's my function (pretty simple)

def scale_frets(frets):
     return ([math.log(x + 1, 1.3) for x in frets])

It's not clear how I should use this. Should I be calling by a string representation of the name, like:

scale_x_continuous(trans = 'scale_frets', breaks = list(range(0,1)), minor_breaks = list(range(0,22)), limits=(0, 21))

Thank you.
(p.s. in case you're wondering, I'm making a graphical representation of a guitar neck. I'm not sure my scaling function is correct, but once I get this function working I'll play around with it until it looks right.)

[Question] Is there a way to get the fig, ax from a ggplot?

A silly question. But I've been googling around and digging into the source code since the day I knew plotnine. No good luck.

I've used matplotlib for a long time. From time to time, I encounter with the ggplot2 code or figure of R. I don't understand neither ggplot2 nor R. So I just copy the code, change a few lines to match my data, and draw the figure. If I couldn't get what I want by this, nor could I do it the matplotlib way, I'm out of choice. And also I have to export my data from python to R.

plotnine is very useful for me to draw some ggplot2 figure in python. It will be more useful if I can just get the figure and axis from plotnine's ggplot object, and do something with them.

Parsing issue w/ '~class' in facet_wrap?

I found this issue when working through Chapter 3 of the "R for Data Science" book (using plotnine). I was working within a Jupyter notebook, using Python 3.6.

%matplotlib inline
from plotnine import *
from plotnine.data import *

ggplot(data=mpg) + geom_point(mapping=aes(x='displ', y='hwy')) + facet_wrap('~class')

I get the interesting error:

File "", line 1
class
^
SyntaxError: unexpected EOF while parsing

Replacing '~class' with '~cyl' gives me a lovely plot, and no error.

To see if Jupyter was contributing to the problem, I created a simple script:

# foo.py
from plotnine import *
from plotnine.data import *

(ggplot(data=mpg) + geom_point(mapping=aes(x='displ', y='hwy')) 
 + facet_wrap('~class')).save('foo.png')

I get a somewhat more involved traceback, but essentially the same error:

$ python foo.py
/home/grant/Envs/py36/lib64/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
from pandas.core import datetools
Traceback (most recent call last):
File "foo.py", line 5, in
+ facet_wrap('~class')).save('foo.png')
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/ggplot.py", line 571, in save
raise err
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/ggplot.py", line 568, in save
_save()
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/ggplot.py", line 533, in _save
fig = figure[0] = self.draw()
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/ggplot.py", line 141, in draw
self._build()
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/ggplot.py", line 222, in _build
layout.setup(layers, self)
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/facets/layout.py", line 59, in setup
self.layout = self.facet.compute_layout(data)
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/facets/facet_wrap.py", line 73, in compute_layout
self.vars, drop=self.drop)
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/facets/facet.py", line 519, in combine_vars
for df in data if df is not None]
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/facets/facet.py", line 519, in
for df in data if df is not None]
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/facets/facet.py", line 630, in eval_facet_vars
res = env.eval(name, inner_namespace=data)
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/patsy/eval.py", line 164, in eval
code = compile(expr, source_name, "eval", self.flags, False)
File "", line 1
class
^
SyntaxError: unexpected EOF while parsing

Again, if I replace facet_wrap('~class') with facet_wrap('~cyl'), I get a lovely plot and no error.

I'm really looking forward to using plotnine (and finally learning ggplot). Thanks for all of your hard work!

AttributeError: 'module' object has no attribute 'viewkeys'

I am getting the following issue:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-1c35145c0e99> in <module>()
----> 1 (ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)')) + geom_point())

/Users/fred/anaconda/lib/python2.7/site-packages/plotnine/geoms/geom.pyc in __init__(self, *args, **kwargs)
     37 
     38         # separate aesthetics and parameters
---> 39         self.aes_params = copy_keys(kwargs, {}, self.aesthetics())
     40         self.params = copy_keys(kwargs, deepcopy(self.DEFAULT_PARAMS))
     41         self.mapping = kwargs['mapping']

/Users/fred/anaconda/lib/python2.7/site-packages/plotnine/geoms/geom.pyc in aesthetics(cls)
     83         Return all the aesthetics for this geom
     84         """
---> 85         main = six.viewkeys(cls.DEFAULT_AES) | cls.REQUIRED_AES
     86         other = {'group'}
     87         # Need to recognize both spellings

AttributeError: 'module' object has no attribute 'viewkeys'

when trying to run something simple as:
(ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)')) + geom_point())

Animating p9 figures using ArtistAnimation class of matplotlib

Hi,

I'm trying to make animations using p9 figures, but I end up with matplotlib exceptions. I was wondering whether there is a convenient way to make animations out of plotnine plots.

Here is what I've tried so far:

from plotnine import *
from plotnine.data import *

from matplotlib.animation import ArtistAnimation
import matplotlib.pyplot as plt
import numpy as np

def plot1(y):
    return plt.scatter(y[:, 0], y[:, 1], c='black'),

def plot2(y):
    return (qplot(y[:, 0], y[:, 1], xlab='x', ylab='y') +
            theme_minimal()).draw(),

# Use mtcars as toy data
X = mtcars[['disp', 'hp']].as_matrix()

# Add little noise to make animation cool
data = [X+np.random.normal(0, 1, (X.shape[0], X.shape[1])) for _ in range(50)]

fig = plt.figure(figsize=(8, 8))
artists = [plot1(x) for x in data]
ani = ArtistAnimation(fig, artists, interval=100, repeat_delay=500)
ani.save('/tmp/animation.mp4')

fig = plt.figure(figsize=(8, 8))
artists = [plot2(x) for x in data]
ani = ArtistAnimation(fig, artists, interval=100, repeat_delay=500)
ani.save('/tmp/animation2.mp4')

Here is the exception I get:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-39-98cef8b8381e> in <module>()
     25 artists = [plot2(x) for x in data]
     26 ani = ArtistAnimation(fig, artists, interval=100, repeat_delay=500)
---> 27 ani.save('/tmp/animation2.mp4')

~/.miniconda3/lib/python3.6/site-packages/matplotlib/animation.py in save(self, filename, writer, fps, dpi, codec, bitrate, extra_args, metadata, extra_anim, savefig_kwargs)
   1055                 for anim in all_anim:
   1056                     # Clear the initial frame
-> 1057                     anim._init_draw()
   1058                 for data in zip(*[a.new_saved_frame_seq()
   1059                                   for a in all_anim]):

~/.miniconda3/lib/python3.6/site-packages/matplotlib/animation.py in _init_draw(self)
   1374         # Flush the needed figures
   1375         for fig in figs:
-> 1376             fig.canvas.draw_idle()
   1377 
   1378     def _pre_draw(self, framedata, blit):

AttributeError: 'NoneType' object has no attribute 'canvas'

Is there a way to get "proper" artists for ArtistAnimation class?

scale_x_datetime **kwargs for breaks and labels

Hi,

Is it possible to define the **kwargs for scale_x_datetime in the same way as the Python ggplot library, for example using breaks='1 week' and labels='%W' ? The ggplot library is using date_breaks and date_format helpers to achieve this goal, is there an equivalent in plotnine?

The code below throws an error: PlotnineError: 'Breaks and labels have unequal lengths'

Using plotnine.__version__ = '0.2.1'

import random
import pandas as pd
import plotnine

n = 100
df = pd.DataFrame({'date': pd.date_range(start='2017-01-01', periods=n), 
                   'value': [random.randrange(0, 100) for x in range(n)]})

ggplot(df, aes('date', 'value')) + \
    geom_line() + \
    scale_x_date(breaks='1 week', labels='%W') + \
    scale_y_continuous()

This is what I am referring to, from ggplot ggplot scales docs

ggplot(meat, aes('date','beef')) + \
    geom_line() + \
    scale_x_date(breaks=date_breaks('10 years'),
                 labels=date_format('%B %-d, %Y'))

Thanks for your work, great library coverage compared to the original ggplot2 in R.

Facets don't work with non-mapped geoms

In ggplot2, it's possible to do this:

df = data.frame(a = rep(c(1,2), 100), b=rnorm(200), c=rnorm(200))
ggplot(df, aes(x=b, y=c)) + geom_point() + geom_abline(intercept=0, slope=1) + facet_wrap('a')

in plotnine, the equivalent doesn't work:

df = pd.DataFrame(dict(a=['a','b'] * 100, b=np.random.random(200), c=np.random.random(200)))
ggplot(df, aes(x='b', y='c')) + geom_point() + \
    geom_abline(intercept=0, slope=1) + \
    facet_wrap('a')


Out[38]: ---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
~/miniconda3/envs/science/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
    691                 type_pprinters=self.type_printers,
    692                 deferred_pprinters=self.deferred_printers)
--> 693             printer.pretty(obj)
    694             printer.flush()
    695             return stream.getvalue()

~/miniconda3/envs/science/lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj)
    378                             if callable(meth):
    379                                 return meth(obj, self, cycle)
--> 380             return _default_pprint(obj, self, cycle)
    381         finally:
    382             self.end_group()

~/miniconda3/envs/science/lib/python3.6/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)                                                                                                 
    493     if _safe_getattr(klass, '__repr__', None) is not object.__repr__:
    494         # A user-provided repr. Find newlines and replace them with p.break_()
--> 495         _repr_pprint(obj, p, cycle)
    496         return
    497     p.begin_group(1, '<')

~/miniconda3/envs/science/lib/python3.6/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)                                                                                                    
    691     """A pprint that just redirects to the normal repr function."""
    692     # Find newlines and replace them with p.break_()
--> 693     output = repr(obj)
    694     for idx,output_line in enumerate(output.splitlines()):
    695         if idx:

~/miniconda3/envs/science/lib/python3.6/site-packages/plotnine/ggplot.py in __repr__(self)
     82         Print/show the plot
     83         """
---> 84         self.draw()
     85         plt.show()
     86         return '<ggplot: (%d)>' % self.__hash__()

~/miniconda3/envs/science/lib/python3.6/site-packages/plotnine/ggplot.py in draw(self)
    139         # assign a default theme
    140         self = deepcopy(self)
--> 141         self._build()
    142 
    143         # If no theme we use the default

~/miniconda3/envs/science/lib/python3.6/site-packages/plotnine/ggplot.py in _build(self)
    220         # Initialise panels, add extra data for margins & missing
    221         # facetting variables, and add on a PANEL variable to data
--> 222         layout.setup(layers, self)
    223 
    224         # Compute aesthetics to produce data with generalised

~/miniconda3/envs/science/lib/python3.6/site-packages/plotnine/facets/layout.py in setup(self, layers, plot)                                                                                                  
     57         # Generate panel layout
     58         data = self.facet.setup_data(data)
---> 59         self.layout = self.facet.compute_layout(data)
     60         self.layout = self.coord.setup_layout(self.layout)
     61         self.check_layout()

~/miniconda3/envs/science/lib/python3.6/site-packages/plotnine/facets/facet_wrap.py in compute_layout(self, data)                                                                                             
     71 
     72         base = combine_vars(data, self.plot.environment,
---> 73                             self.vars, drop=self.drop)
     74         n = len(base)
     75         dims = wrap_dims(n, self.nrow, self.ncol)

~/miniconda3/envs/science/lib/python3.6/site-packages/plotnine/facets/facet.py in combine_vars(data, environment, vars, drop)                                                                                 
    520     # For each layer, compute the facet values
    521     values = [eval_facet_vars(df, vars, environment)
--> 522               for df in data if df is not None]
    523 
    524     # Form the base data frame which contains all combinations

~/miniconda3/envs/science/lib/python3.6/site-packages/plotnine/facets/facet.py in <listcomp>(.0)
    520     # For each layer, compute the facet values
    521     values = [eval_facet_vars(df, vars, environment)
--> 522               for df in data if df is not None]
    523 
    524     # Form the base data frame which contains all combinations

~/miniconda3/envs/science/lib/python3.6/site-packages/plotnine/facets/facet.py in eval_facet_vars(data, vars, env)
    637             res = data[name]
    638         else:
--> 639             res = env.eval(name, inner_namespace=data)
    640         facet_vals[name] = res
    641 

~/miniconda3/envs/science/lib/python3.6/site-packages/patsy/eval.py in eval(self, expr, source_name, inner_namespace)
    164         code = compile(expr, source_name, "eval", self.flags, False)
    165         return eval(code, {}, VarLookupDict([inner_namespace]
--> 166                                             + self._namespaces))
    167 
    168     @classmethod

<string> in <module>()

NameError: name 'a' is not defined

If the geom_abline call is removed, this works correctly. I think generally, if a variable isn't available to facet on, then any geoms with that variable missing should be plotted with all available data.

Cannot plot data if it all lies between integer exponents of log 10 (scale_*_log10)

I'm using plotnine v0.2.1 and attempting to make a log-log plot:

import pandas as pd
import matplotlib.pyplot as plt
from plotnine import *

df = pd.read_json("""{"index":{"0":49,"1":99,"2":199},"h":{"0":0.5,"1":0.25,"2":0.125},"t":{"0":25.0,"1":25.0,"2":25.0},"vals":{"0":[-60.7488535913,-3.2056655669],"1":[-47.4324189895,-2.3908288923],"2":[-35.7413627393,-1.8986336166]},"x":{"0":-60.7488535913,"1":-47.4324189895,"2":-35.7413627393},"y":{"0":-3.2056655669,"1":-2.3908288923,"2":-1.8986336166},"mag":{"0":60.8333749219,"1":47.4926355764,"2":35.7917563145}}""")

#Works
ggplot(df, aes(x='h', y='mag')) + geom_line()

#Doesn't work
ggplot(df, aes(x='h', y='mag')) + geom_line() + scale_y_log10() + scale_y_log10()

#Works
plt.loglog(df['h'],df['mag'])
plt.show()

However, it fails with the error:

... ggplot(df, aes(x='h', y='mag')) + geom_line() + scale_y_log10() + scale_y_log10()
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/local/lib/python3.5/dist-packages/plotnine/ggplot.py", line 84, in __repr__
    self.draw()
  File "/usr/local/lib/python3.5/dist-packages/plotnine/ggplot.py", line 141, in draw
    self._build()
  File "/usr/local/lib/python3.5/dist-packages/plotnine/ggplot.py", line 264, in _build
    layout.setup_panel_params(self.coordinates)
  File "/usr/local/lib/python3.5/dist-packages/plotnine/facets/layout.py", line 183, in setup_panel_params
    self.panel_scales_y[j])
  File "/usr/local/lib/python3.5/dist-packages/plotnine/coords/coord_cartesian.py", line 68, in setup_panel_params
    out.update(train(scale_y, self.limits.xlim, 'y'))
  File "/usr/local/lib/python3.5/dist-packages/plotnine/coords/coord_cartesian.py", line 57, in train
    out = scale.break_info(rangee)
  File "/usr/local/lib/python3.5/dist-packages/plotnine/scales/scale.py", line 535, in break_info
    labels = self.get_labels(major)
  File "/usr/local/lib/python3.5/dist-packages/plotnine/scales/scale.py", line 629, in get_labels
    labels = self.trans.format(breaks)
  File "/usr/local/lib/python3.5/dist-packages/mizani/formatters.py", line 378, in _log_format
    dmin = np.log(np.min(x))/np.log(base)
  File "/usr/lib/python3/dist-packages/numpy/core/fromnumeric.py", line 2352, in amin
    out=out, **kwargs)
  File "/usr/lib/python3/dist-packages/numpy/core/_methods.py", line 29, in _amin
    return umr_minimum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation minimum which has no identity

I think all of the values used in the loglog plot should be legal and, indeed, loglog plotting works in pyplot (I mention this not to lord pyplot over you better, but to try to rule out data issues).

What about polar coords?

I need a pie, but without polar coords I can't cook it.

How to set lower axis limit only

To better appreciate relative differences, having plots starting at 0 is often required (e.g. https://stackoverflow.com/questions/11214012/set-only-lower-bound-of-a-limit-for-ggplot)

In ggplot2 there are several options to achive this with options, e.g.
the 'expand_limits' function,
scale_y_continious(limits=c(0, NA))
or ylim(c(0, NA)).

However I did not manage to replicate this function in plotnine, apart from calculating the limits 'manually' beforehand (see below). Is there any nicer way to do this?

from plotnine import *
from plotnine.data import mtcars

ymax = mtcars['mpg'].max()*1.1
(ggplot(mtcars, aes('wt', 'mpg'))
 + geom_point()
+ ylim(0, ymax))`

True/False both plot as False

df = pd.DataFrame({'x': [1, 2], 'y': [True, False]})
ggplot(df, aes('x', 'y')) + geom_point()

Strip text box doesn't adapt to angle

A ggplot worth a thousand words.

import pandas as pd
from sklearn import datasets
from plotnine import *
iris = datasets.load_iris()

iris_df = pd.concat([pd.DataFrame(iris.data, columns=iris.feature_names),
                     pd.DataFrame(iris.target, columns=['Species'])], axis=1)
iris_m = iris_df.melt(id_vars=['Species'])
iris_m.Species = iris_m.Species.astype('category')

ggplot(iris_m, aes(x='value')) +\
    facet_grid('variable ~ .', scales='free') +\
    geom_histogram(aes(fill='Species'), bins=30) +\
    theme(strip_text_y=element_text(angle=0))

Similar behavior with strip_text_x.

legend_key not impacting theme

Yesterday I wrote a blog post creating a standard visualization from my research in plotnine; the final plot / code is available here. Everything works great, but there's one thing that I couldn't get to work: the legend_key element I passed to the final call to theme() didn't actually change the plot aesthetics. My understanding is that the way I wrote the code, i.e.

legend_key=element_rect(fill='white', color='white')

should've made each of the line marker glyphs in the legend have a white background. But, they remained grey. Is this a mis-understanding on my part, or a glitch?

scale differs when starting at zero

First of all, thanks for the great implementation of ggplot2 for Python.

Unfortunately, I might have spotted an issue with the scaling of a smoothed facet plot. The following code creates a plot that starts at 0.01 based on the data:

plot = (ggplot(df_docs_model) +
    aes(x='quarter_num', y='value', color='factor(topic)') +
    geom_smooth(method='loess', span = 0.1, alpha=0.2, size=2, show_legend=False) + 
    scale_x_continuous(breaks=list(range(1994, 2019, 2))) +
    scale_y_continuous(breaks=[y / 100 for y in range(0, 15, 1)]) +
    facet_wrap('~topic'))

When I set the scale manually to make sure the y-axis starts by zero, the axis adjusts accordingly. As you can tell from the plots, however, the values between these figures differ.
To set the y-scale I used the this statement:

    scale_y_continuous(expand = (0,0), limits = (0,0.1))

I suspect that the statement above restricts the data (only values between 0 and 0.1 are considered) instead of limiting the scale. Do I use the wrong statement? Or is this an issue in the present implementation? Thanks for any clarification.

Ordering facet sub plots

Is it possible to order plots by some arbitrary order? Now the sub plots are ordered alphabetically. I do not know R but I understand that in ggplot2 you can solve the problem by ordering "levels" that correspond to pandas categoricals.

Thanks and amazing package!

geom_bar() broken when used with ylim() or lims(y=)

geom_bar() along with ylim(), or lims(y=()) is broken. If limit does not include 0 in its range, we get a blank plot. Here is one example:

df = pd.DataFrame({
    'variable': ['gender', 'gender', 'age', 'age', 'age', 'income', 'income', 'income', 'income'],
    'category': ['Female', 'Male', '1-24', '25-54', '55+', 'Lo', 'Lo-Med', 'Med', 'High'],
    'value': [60, 40, 50, 30, 20, 10, 25, 25, 40],
})
df['variable'] = pd.Categorical(df['variable'], categories=['gender', 'age', 'income'])
dodge_text = position_dodge(width=0.9)                              # new

# Works
(ggplot(df, aes(x='variable', y='value', fill='category'))
 + geom_bar(stat='identity', position='dodge', show_legend=False)   # modified                                            
)

# Works
(ggplot(df, aes(x='variable', y='value', fill='category'))
 + geom_bar(stat='identity', position='dodge', show_legend=False)   # modified            
 + lims(y=(-5, 60)                               
)

# Does NOT work

(ggplot(df, aes(x='variable', y='value', fill='category'))
 + geom_bar(stat='identity', position='dodge', show_legend=False)   # modified            
 + lims(y=(15, 60)                               
)

Scaling dates

How can achieve this ggplot2 functionality in plotnine?

ggplot() +
     geom_line(data=df, aes(x=date, y=value)) +
     scale_x_date(date_breaks = "1 week")

Right now, my x-axis defaults to hundreds of smashed together dates which is completely unreadable.

Thanks!

geom_smooth messes up groupings for numeric columns

I've run into some issues with smoothing over data with numeric columns for colors.

Smoothing (and potentially other methods) seem to run into problems related to DISCRETE_COLUMNS.
I've attached a small notebook printout detailing the failure case.

plotnine_group_smooth.pdf

Simple way to add watermarks to figures

Possible syntax

(ggplot()
 ...
 + watermark('image.png', ...)
)

See: #41

Reorder geom_bar based on values

What is the way in which to achieve the reordering of the bars according to its values.

Example

df = pd.DataFrame([('a', 1), ('b', 20), ('c', 5)], columns=['category', 'value'])
ggplot(df, aes('category', 'value')) + geom_bar(stat='identity')

Bars are ordered A, B, C. Instead I want this reordered on the values, thus B, C, A.

Here is a thread on how this is achieved in R, but my question is how this can be done in plotnine
https://stackoverflow.com/questions/33613385/sort-bar-chart-by-sum-of-values-in-ggplot

Is there an equivalent reorder function?

Exception in stat_density when x limits are set

(ggplot(diamonds) +
  + aes('depth', fill='cut', color='cut')
  + geom_density(alpha=0.1)
  + xlim(55, 70)
)

`strip_text` sizing in `facet_wrap` is off

The sizing of the strips with the labels above the plots in facet_wrap is inconsistent. If I facet_wrap a plot with just two rows, I get what seem to me to be overly narrow strips such that the label text is at the edges of the strip.

I can increase the strip height with + theme(strip_text=element_text(lineheight=1.8)). But then the strips become far too tall when there are more rows in the faceted plot.

I have played around with this some, and I think the ultimate problem is that the way that the height of the strip text is computed must depend on the total height of the faceted plots, such that plots with more rows end up with taller strip texts.

Let me know if you want me to post example images.

Example:

from plotnine import *
import pandas as pd

pdat = pd.DataFrame({'x': range(10), 'y': range(10)})

p = (ggplot(pdat, aes(x='x', y='y'))+
   geom_smooth(alpha=0.1))
p.draw()


p = (ggplot(pdat, aes(x='x', y='y'))+
   geom_line(alpha=0.1))
p.draw()

How to specify origin in geom_histogram

Thanks for your work! I have used plotnine in Kaggle kernels and it seems quite good.
However, does the geom_histogram support origin parameter now? For example:

ggplot(train[~train['Age'].isnull()], aes('Age', fill='factor(Survived)')) \
+ geom_histogram(binwidth=1, alpha=0.5, position='identity', origin=0) \
+ scale_fill_manual(values=[ns_color, s_color])

will give the following error.

---------------------------------------------------------------------------
PlotnineError                             Traceback (most recent call last)
<ipython-input-29-d387ec79f952> in <module>()
      1 # fill can be used to group variables. The fill (grouping) variable must be categorical in pandas.
      2 # Otherwise, we should transform it with astype or simple just 'factor(variable)'
----> 3 ggplot(train[~train['Age'].isnull()], aes('Age', fill='factor(Survived)')) + geom_histogram(binwidth=1, alpha=0.5, position='identity', origin=0) + scale_fill_manual(values=[ns_color, s_color])

/opt/conda/lib/python3.6/site-packages/plotnine/geoms/geom.py in __init__(self, *args, **kwargs)
     43         self._stat = stat.from_geom(self)
     44         self._position = position.from_geom(self)
---> 45         self.verify_arguments(kwargs)     # geom, stat, layer
     46 
     47     @staticmethod

/opt/conda/lib/python3.6/site-packages/plotnine/geoms/geom.py in verify_arguments(self, kwargs)
    249             msg = ("Parameters {}, are not understood by "
    250                    "either the geom, stat or layer.")
--> 251             raise PlotnineError(msg.format(unknown))
    252 
    253     def handle_na(self, data):

PlotnineError: "Parameters {'origin'}, are not understood by either the geom, stat or layer."

unable to combine geom_vline with facet_wrap

I can create a plot that either facets, or has geom_vline but not both. Examples below:

# this works
g = (pn.ggplot(df, pn.aes('spread_pct'))
     #+ pn.geom_vline(xintercept=0.5)
     + pn.geom_histogram()
     + pn.facet_wrap('~optionType'))
print(g)

# this works
g = (pn.ggplot(df, pn.aes('spread_pct'))
     + pn.geom_vline(xintercept=0.5)
     + pn.geom_histogram())
     #+ pn.facet_wrap('~optionType'))
print(g)

# this doesn't work
g = (pn.ggplot(df, pn.aes('spread_pct'))
     + pn.geom_vline(xintercept=0.5)
     + pn.geom_histogram()
     + pn.facet_wrap('~optionType'))
print(g)

This produces a NameError, see the traceback below:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-252-a98d15d74a1d> in <module>()
      3      + pn.geom_histogram()
      4      + pn.facet_wrap('~optionType'))
----> 5 print(g)

~/anaconda3/envs/pymc3/lib/python3.6/site-packages/plotnine/ggplot.py in __repr__(self)
     82         Print/show the plot
     83         """
---> 84         self.draw()
     85         plt.show()
     86         return '<ggplot: (%d)>' % self.__hash__()

~/anaconda3/envs/pymc3/lib/python3.6/site-packages/plotnine/ggplot.py in draw(self)
    139         # assign a default theme
    140         self = deepcopy(self)
--> 141         self._build()
    142 
    143         # If no theme we use the default

~/anaconda3/envs/pymc3/lib/python3.6/site-packages/plotnine/ggplot.py in _build(self)
    220         # Initialise panels, add extra data for margins & missing
    221         # facetting variables, and add on a PANEL variable to data
--> 222         layout.setup(layers, self)
    223 
    224         # Compute aesthetics to produce data with generalised

~/anaconda3/envs/pymc3/lib/python3.6/site-packages/plotnine/facets/layout.py in setup(self, layers, plot)
     57         # Generate panel layout
     58         data = self.facet.setup_data(data)
---> 59         self.layout = self.facet.compute_layout(data)
     60         self.layout = self.coord.setup_layout(self.layout)
     61         self.check_layout()

~/anaconda3/envs/pymc3/lib/python3.6/site-packages/plotnine/facets/facet_wrap.py in compute_layout(self, data)
     71 
     72         base = combine_vars(data, self.plot.environment,
---> 73                             self.vars, drop=self.drop)
     74         n = len(base)
     75         dims = wrap_dims(n, self.nrow, self.ncol)

~/anaconda3/envs/pymc3/lib/python3.6/site-packages/plotnine/facets/facet.py in combine_vars(data, environment, vars, drop)
    517     # For each layer, compute the facet values
    518     values = [eval_facet_vars(df, vars, environment)
--> 519               for df in data if df is not None]
    520 
    521     # Form the base data frame which contains all combinations

~/anaconda3/envs/pymc3/lib/python3.6/site-packages/plotnine/facets/facet.py in <listcomp>(.0)
    517     # For each layer, compute the facet values
    518     values = [eval_facet_vars(df, vars, environment)
--> 519               for df in data if df is not None]
    520 
    521     # Form the base data frame which contains all combinations

~/anaconda3/envs/pymc3/lib/python3.6/site-packages/plotnine/facets/facet.py in eval_facet_vars(data, vars, env)
    628 
    629     for name in vars:
--> 630         res = env.eval(name, inner_namespace=data)
    631         facet_vals[name] = res
    632 

~/anaconda3/envs/pymc3/lib/python3.6/site-packages/patsy/eval.py in eval(self, expr, source_name, inner_namespace)
    164         code = compile(expr, source_name, "eval", self.flags, False)
    165         return eval(code, {}, VarLookupDict([inner_namespace]
--> 166                                             + self._namespaces))
    167 
    168     @classmethod

<string> in <module>()

NameError: name 'optionType' is not defined

Insets

Think about insets plots.

Potential syntax

# 1.
p_inset = ggplot() + ...
p = ggplot() + ... + inset(p_inset)

# 2.
p_inset = ggplot(inset=True) + ...
p = ggplot() + ... + p_inset

The main plot and inset plot should have the same number of panels.
Set a location or n locations for the inset panels.

has2k1 / plotnine Goto Github PK

plotnine's Issues

Recommend Projects

Recommend Topics

Recommend Org