has2k1 / plotnine Goto Github PK
View Code? Open in Web Editor NEWA Grammar of Graphics for Python
Home Page: https://plotnine.org
License: MIT License
A Grammar of Graphics for Python
Home Page: https://plotnine.org
License: MIT License
Examples of extensions:
A gallery extension
A custom gallery extension
https://github.com/mwaskom/seaborn/blob/master/doc/sphinxext/plot_generator.py builds https://github.com/mwaskom/seaborn/tree/master/examples which results in https://seaborn.pydata.org/examples/index.html
Styling
http://matplotlib.org/devdocs/tutorials/index.html
Will have to come up with a custom solution.
When creating a boxplot plotting some continuous value against a categorical X axis, it fails when for any category there is only one sample in the data frame.
E.g. this works fine
df = pd.DataFrame(
{
'weight': np.random.normal(size=20),
# Creating two categories, one with 18 samples, one with 2 samples
'category': pd.Categorical(18 * [0] + 2 * [1], categories=[0,1], ordered=True)
}
)
(
ggplot(df, aes(x='category', y='weight'))
+ geom_boxplot()
)
However, this example fails:
df = pd.DataFrame(
{
'weight': np.random.normal(size=20),
# Creating two categories, one with 19 samples, one with 1 sample
'category': pd.Categorical(19 * [0] + 1 * [1], categories=[0,1], ordered=True)
}
)
(
ggplot(df, aes(x='category', y='weight'))
+ geom_boxplot()
)
Below is the trace from the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/.virtualenvs/pandas/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
691 type_pprinters=self.type_printers,
692 deferred_pprinters=self.deferred_printers)
--> 693 printer.pretty(obj)
694 printer.flush()
695 return stream.getvalue()
~/.virtualenvs/pandas/lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj)
378 if callable(meth):
379 return meth(obj, self, cycle)
--> 380 return _default_pprint(obj, self, cycle)
381 finally:
382 self.end_group()
~/.virtualenvs/pandas/lib/python3.6/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
493 if _safe_getattr(klass, '__repr__', None) is not object.__repr__:
494 # A user-provided repr. Find newlines and replace them with p.break_()
--> 495 _repr_pprint(obj, p, cycle)
496 return
497 p.begin_group(1, '<')
~/.virtualenvs/pandas/lib/python3.6/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
691 """A pprint that just redirects to the normal repr function."""
692 # Find newlines and replace them with p.break_()
--> 693 output = repr(obj)
694 for idx,output_line in enumerate(output.splitlines()):
695 if idx:
~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/ggplot.py in __repr__(self)
81 Print/show the plot
82 """
---> 83 self.draw()
84 plt.show()
85 return '<ggplot: (%d)>' % self.__hash__()
~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/ggplot.py in draw(self)
138 # assign a default theme
139 self = deepcopy(self)
--> 140 self._build()
141
142 # If no theme we use the default
~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/ggplot.py in _build(self)
234
235 # Apply and map statistics
--> 236 layers.compute_statistic(layout)
237 layers.map_statistic(self)
238
~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/layer.py in compute_statistic(self, layout)
92 def compute_statistic(self, layout):
93 for l in self:
---> 94 l.compute_statistic(layout)
95
96 def map_statistic(self, plot):
~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/layer.py in compute_statistic(self, layout)
369 data = self.stat.use_defaults(data)
370 data = self.stat.setup_data(data)
--> 371 data = self.stat.compute_layer(data, params, layout)
372 self.data = data
373
~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/stats/stat.py in compute_layer(cls, data, params, layout)
194 return cls.compute_panel(pdata, pscales, **params)
195
--> 196 return groupby_apply(data, 'PANEL', fn)
197
198 @classmethod
~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/utils.py in groupby_apply(df, cols, func, *args, **kwargs)
615 # do not mark d as a slice of df i.e no SettingWithCopyWarning
616 d.is_copy = None
--> 617 lst.append(func(d, *args, **kwargs))
618 return pd.concat(lst, axis=axis, ignore_index=True)
619
~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/stats/stat.py in fn(pdata)
192 return pdata
193 pscales = layout.get_scales(pdata['PANEL'].iat[0])
--> 194 return cls.compute_panel(pdata, pscales, **params)
195
196 return groupby_apply(data, 'PANEL', fn)
~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/stats/stat.py in compute_panel(cls, data, scales, **params)
221 for _, old in data.groupby('group'):
222 old.is_copy = None
--> 223 new = cls.compute_group(old, scales, **params)
224 unique = uniquecols(old)
225 missing = unique.columns.difference(new.columns)
~/.virtualenvs/pandas/lib/python3.6/site-packages/plotnine/stats/stat_boxplot.py in compute_group(cls, data, scales, **params)
69 labels = ['x', 'y']
70 X = np.array(data[labels])
---> 71 res = boxplot_stats(X, whis=params['coef'], labels=labels)[1]
72 try:
73 n = data['weight'].sum()
~/.virtualenvs/pandas/lib/python3.6/site-packages/matplotlib/cbook.py in boxplot_stats(X, whis, bootstrap, labels, autorange)
1998 labels = repeat(None)
1999 elif len(labels) != ncols:
-> 2000 raise ValueError("Dimensions of labels and X must be compatible")
2001
2002 input_whis = whis
ValueError: Dimensions of labels and X must be compatible
This can be worked around by removing the classes with only a single sample from the data frame and overlaying only those with a geom_point()
, as there is no interesting boxplot for them anyway, but it's a bit of a hassle and would be nicer if it just worked.
Is there a way to put tick labels in real scientific notation using LaTex formatting? I want something like 1e7
to appear as a 10 with the superscript exponent 7.
ggplot2 has geom_map
for plotting geography. I can imagine a nice interface with geopandas, which has gpd.GeoDataFrame.plot()
through matplotlib. Here's a demo from the docs:
ggplot(crimesm, aes(map_id = state)) +
geom_map(aes(fill = value), map = states_map) +
expand_limits(x = states_map$long, y = states_map$lat) +
facet_wrap( ~ variable)
I have a request to support a frequent use case. (Apologies if I have just missed the way to do this, I looked!) There exist cases where a column could reasonably map to either a discrete or a continuous scale, and I would like a user option to specify which one to use.
A common example is a data column that is integer type (defaults to continuous scale), but actually only takes on a handful of values in practice (meaning it could be reasonably mapped to a discrete scale). There is not an exact number of distinct values that can differentiate whether or not the integer variable should be treated as discrete or continuous, it ultimately depends on the user's need. From what I can see, plotnine assigns the default scale based on the pandas data type in the data frame, and thus the only way to change the default behavior is to mutate the data frame itself before creating the plot object. Obviously, mutating the dataframe works fine as a workaround.
Here is my example:
import pandas as pd
import random
import plotnine
df = pd.DataFrame({'a': [random.uniform(0,1) for i in range(15)],
'b': [random.uniform(0,1) for i in range(15)],
'c': [random.randint(0,5) for i in range(15)]
})
ggplot(df, aes('a', 'b', color='c')) + geom_text(aes(label='c')) + geom_line(aes(group='c'))
FYI: in R::ggplot2 I would accomplish this by casting the variable to a factor in the aes:
ggplot(df, aes(a, b, color=factor(c)) + ...
In case it isn't clear, the reason this matters to me is it impacts how I can manipulate the color scale. For example, in the case where the integer code is binary, the default behavior is to assign the colors purple and yellow (extremes of the 'viridis' color scale), that yellow is an unfortunately hard to see default, and changing it is harder than it seems like it needs to be.
Can you please give a quick overview of how plotnine differs from ggpy, and what are the pros and cons of each approach?
import pandas as pd
import numpy as np
from plotnine import *
df = pd.DataFrame({'x': [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]})
ggplot(df) + geom_bar(aes(x='x', fill='..count..')) # good
ggplot(df) + geom_bar(aes(x='x', fill='np.log(..count..)')) # bad
ggplot(df) + geom_bar(aes(x='x', fill='..count.. + 2')) # bad
AttributeError: 'numpy.float64' object has no attribute 'lower'
Problem likely in scales.add_defaults
Rendering/Exporting images are vector graphic formats supported, Specifically does the library support SVG files?
df = pd.DataFrame({'x': np.random.rand(100), 'y': np.random.rand(100)})
df['color'] = np.floor(df['y'] * 3).astype('str')
# Bad: free_y ignored
ggplot(df, aes('x', 'y', color='color')) + geom_point() + facet_grid('. ~ color', scales='free_y')
# Good
ggplot(df, aes('x', 'y', color='color')) + geom_point() + facet_grid('color ~ .', scales='free_y')
# Good
ggplot(df, aes('x', 'y', color='color')) + geom_point() + facet_wrap('color', scales='free_y')
I think there is a weird formatting issue where very small numbers get printed as 0 after an axis has been transformed with log10. See example below with two 0 on the y-axis
import pandas as pd
from plotnine import ggplot, aes, scale_y_log10, geom_boxplot
reprex = pd.DataFrame({'value': [0.0000000001,0.00000000001,1,5,100000,4739273,11,0.0001,0.00000000001, 0.00001,0.00001,1,5,100000,4739273,11,0.0001,0.00001],
'cat':['1','1','1','1','1','1','1','1','1','2','2','2','2','2','2','2','2','2']})
p = ggplot(reprex, aes('cat', 'value')) + scale_y_log10() + geom_boxplot()
compare to ggplot2
which transforms the numbers into scientific notation
Maybe it should be a function that returns a continuous or discrete scale depending on the arguments which are used to pick a palette from palettable.
Related to: has2k1/mizani#2
Let's say I have a scatter plot where the x-axis date is pandas.Timestamp objects. By default, plotnine renders the x tick labels on my plot such such that they run into each other.
I'd like to do an arbitrary transformation on the x-axis tick labels. For example, I'd like the tick labels to be blank except for the first day of the year, in which case I want the tick label to just be the year, so "2017-03-04"
would transform to ""
, and "2016-01-01"
would transform to "2016"
.
However, I don't want to explicitly specify the breaks. Instead, I want plotnine to use its default algorithm for determining where to put the breaks.
Is this possible to do with plotnine?
Something similar to R's pairs
or plotmatrix
functions; see e.g. here?
I attempted to use the geom_polygon method by adapting a ggplot2 geom_polygon example. I omitted the group aesthetic which is not implemented in plotnine. I want to adapt some of my R ggplot2 geo visualization code, but I need to understand plotnine's implementation of geom_polygon first. I can contribute some geo examples to plotnine. Here is a notebook that I saved as a gist.
https://gist.github.com/stoneyv/df80e7cdfcd64ad6199c6faccccd215d
I pip installed source cloned from master into a conda virtual environment. I am able to successfully produce example visualizations from the notebooks in the plotnine-examples repo. I will try and use the debugger pdb or pudb to step through plotnine this week.
I want to make boxplot integrated in a violinplot.
In R, this can be achieved with code like this:
ggplot(df, aes(x=series,y=value)) +
geom_violin() +
geom_boxplot(width=0.2)
In Python with plotnine I miss the width argument and therefore the boxplots cannot be made smaller. I looked into the documentary of plotnine but did not find it. Does anybody know if there is a different way to make this with plotnine?
Thanks for any suggestion!
plotnine
is made using matplotlib
as the back-end, so I'm guessing there must be a way to draw subplots (without using faceting).
Is there a way to do so? I'd be happy to contribute to the documentation if someone points out a solution.
I came across an issue where some parts of the box plot don't get drawn when I log transform data that had a lot of zeroes in it. The image below illustrates what I mean. Having a quick look at the code I can see that cbook.boxplot_stats
is returning -inf
for some things like IQR or min values which I guess is causing weirdness.
import pandas as pd
from plotnine import *
reprex = pd.DataFrame({'value': [0,0,1,5,100000,4739273,11,0.0001,0, 0.00001,0.00001,1,5,100000,4739273,11,0.0001,0.00001],
'cat':['1','1','1','1','1','1','1','1','1','2','2','2','2','2','2','2','2','2']})
ggplot(reprex, aes('cat', 'value')) + scale_y_log10() + geom_boxplot()
I would like to suggest using "import plotnine as p9" (or something like that) in the gallery code instead of "from plotnine import *", and then using "p9" as a prefix for all identifiers defined by plotnine. That would make it much clearer which of them come from plotnine and which don't.
Hello,
I am trying to do a custom scaling of the x-axis. I see that the scale_x_continuous function accepts a "trans" parameter, which can be a function. Here's my function (pretty simple)
def scale_frets(frets):
return ([math.log(x + 1, 1.3) for x in frets])
It's not clear how I should use this. Should I be calling by a string representation of the name, like:
scale_x_continuous(trans = 'scale_frets', breaks = list(range(0,1)), minor_breaks = list(range(0,22)), limits=(0, 21))
Thank you.
(p.s. in case you're wondering, I'm making a graphical representation of a guitar neck. I'm not sure my scaling function is correct, but once I get this function working I'll play around with it until it looks right.)
A silly question. But I've been googling around and digging into the source code since the day I knew plotnine. No good luck.
I've used matplotlib for a long time. From time to time, I encounter with the ggplot2 code or figure of R. I don't understand neither ggplot2 nor R. So I just copy the code, change a few lines to match my data, and draw the figure. If I couldn't get what I want by this, nor could I do it the matplotlib way, I'm out of choice. And also I have to export my data from python to R.
plotnine is very useful for me to draw some ggplot2 figure in python. It will be more useful if I can just get the figure and axis from plotnine's ggplot object, and do something with them.
I found this issue when working through Chapter 3 of the "R for Data Science" book (using plotnine). I was working within a Jupyter notebook, using Python 3.6.
%matplotlib inline
from plotnine import *
from plotnine.data import *
ggplot(data=mpg) + geom_point(mapping=aes(x='displ', y='hwy')) + facet_wrap('~class')
I get the interesting error:
File "", line 1
class
^
SyntaxError: unexpected EOF while parsing
Replacing '~class' with '~cyl' gives me a lovely plot, and no error.
To see if Jupyter was contributing to the problem, I created a simple script:
# foo.py
from plotnine import *
from plotnine.data import *
(ggplot(data=mpg) + geom_point(mapping=aes(x='displ', y='hwy'))
+ facet_wrap('~class')).save('foo.png')
I get a somewhat more involved traceback, but essentially the same error:
$ python foo.py
/home/grant/Envs/py36/lib64/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
from pandas.core import datetools
Traceback (most recent call last):
File "foo.py", line 5, in
+ facet_wrap('~class')).save('foo.png')
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/ggplot.py", line 571, in save
raise err
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/ggplot.py", line 568, in save
_save()
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/ggplot.py", line 533, in _save
fig = figure[0] = self.draw()
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/ggplot.py", line 141, in draw
self._build()
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/ggplot.py", line 222, in _build
layout.setup(layers, self)
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/facets/layout.py", line 59, in setup
self.layout = self.facet.compute_layout(data)
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/facets/facet_wrap.py", line 73, in compute_layout
self.vars, drop=self.drop)
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/facets/facet.py", line 519, in combine_vars
for df in data if df is not None]
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/facets/facet.py", line 519, in
for df in data if df is not None]
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/plotnine/facets/facet.py", line 630, in eval_facet_vars
res = env.eval(name, inner_namespace=data)
File "/home/grant/Envs/py36/lib64/python3.6/site-packages/patsy/eval.py", line 164, in eval
code = compile(expr, source_name, "eval", self.flags, False)
File "", line 1
class
^
SyntaxError: unexpected EOF while parsing
Again, if I replace facet_wrap('~class')
with facet_wrap('~cyl')
, I get a lovely plot and no error.
I'm really looking forward to using plotnine (and finally learning ggplot). Thanks for all of your hard work!
I am getting the following issue:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-4-1c35145c0e99> in <module>()
----> 1 (ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)')) + geom_point())
/Users/fred/anaconda/lib/python2.7/site-packages/plotnine/geoms/geom.pyc in __init__(self, *args, **kwargs)
37
38 # separate aesthetics and parameters
---> 39 self.aes_params = copy_keys(kwargs, {}, self.aesthetics())
40 self.params = copy_keys(kwargs, deepcopy(self.DEFAULT_PARAMS))
41 self.mapping = kwargs['mapping']
/Users/fred/anaconda/lib/python2.7/site-packages/plotnine/geoms/geom.pyc in aesthetics(cls)
83 Return all the aesthetics for this geom
84 """
---> 85 main = six.viewkeys(cls.DEFAULT_AES) | cls.REQUIRED_AES
86 other = {'group'}
87 # Need to recognize both spellings
AttributeError: 'module' object has no attribute 'viewkeys'
when trying to run something simple as:
(ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)')) + geom_point())
Hi,
I'm trying to make animations using p9 figures, but I end up with matplotlib exceptions. I was wondering whether there is a convenient way to make animations out of plotnine plots.
Here is what I've tried so far:
from plotnine import *
from plotnine.data import *
from matplotlib.animation import ArtistAnimation
import matplotlib.pyplot as plt
import numpy as np
def plot1(y):
return plt.scatter(y[:, 0], y[:, 1], c='black'),
def plot2(y):
return (qplot(y[:, 0], y[:, 1], xlab='x', ylab='y') +
theme_minimal()).draw(),
# Use mtcars as toy data
X = mtcars[['disp', 'hp']].as_matrix()
# Add little noise to make animation cool
data = [X+np.random.normal(0, 1, (X.shape[0], X.shape[1])) for _ in range(50)]
fig = plt.figure(figsize=(8, 8))
artists = [plot1(x) for x in data]
ani = ArtistAnimation(fig, artists, interval=100, repeat_delay=500)
ani.save('/tmp/animation.mp4')
fig = plt.figure(figsize=(8, 8))
artists = [plot2(x) for x in data]
ani = ArtistAnimation(fig, artists, interval=100, repeat_delay=500)
ani.save('/tmp/animation2.mp4')
Here is the exception I get:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-39-98cef8b8381e> in <module>()
25 artists = [plot2(x) for x in data]
26 ani = ArtistAnimation(fig, artists, interval=100, repeat_delay=500)
---> 27 ani.save('/tmp/animation2.mp4')
~/.miniconda3/lib/python3.6/site-packages/matplotlib/animation.py in save(self, filename, writer, fps, dpi, codec, bitrate, extra_args, metadata, extra_anim, savefig_kwargs)
1055 for anim in all_anim:
1056 # Clear the initial frame
-> 1057 anim._init_draw()
1058 for data in zip(*[a.new_saved_frame_seq()
1059 for a in all_anim]):
~/.miniconda3/lib/python3.6/site-packages/matplotlib/animation.py in _init_draw(self)
1374 # Flush the needed figures
1375 for fig in figs:
-> 1376 fig.canvas.draw_idle()
1377
1378 def _pre_draw(self, framedata, blit):
AttributeError: 'NoneType' object has no attribute 'canvas'
Is there a way to get "proper" artists for ArtistAnimation class?
Hi,
Is it possible to define the **kwargs
for scale_x_datetime
in the same way as the Python ggplot
library, for example using breaks='1 week'
and labels='%W'
? The ggplot
library is using date_breaks
and date_format
helpers to achieve this goal, is there an equivalent in plotnine
?
The code below throws an error: PlotnineError: 'Breaks and labels have unequal lengths'
Using plotnine.__version__ = '0.2.1'
import random
import pandas as pd
import plotnine
n = 100
df = pd.DataFrame({'date': pd.date_range(start='2017-01-01', periods=n),
'value': [random.randrange(0, 100) for x in range(n)]})
ggplot(df, aes('date', 'value')) + \
geom_line() + \
scale_x_date(breaks='1 week', labels='%W') + \
scale_y_continuous()
This is what I am referring to, from ggplot
ggplot scales docs
ggplot(meat, aes('date','beef')) + \
geom_line() + \
scale_x_date(breaks=date_breaks('10 years'),
labels=date_format('%B %-d, %Y'))
Thanks for your work, great library coverage compared to the original ggplot2
in R
.
In ggplot2, it's possible to do this:
df = data.frame(a = rep(c(1,2), 100), b=rnorm(200), c=rnorm(200))
ggplot(df, aes(x=b, y=c)) + geom_point() + geom_abline(intercept=0, slope=1) + facet_wrap('a')
in plotnine, the equivalent doesn't work:
df = pd.DataFrame(dict(a=['a','b'] * 100, b=np.random.random(200), c=np.random.random(200)))
ggplot(df, aes(x='b', y='c')) + geom_point() + \
geom_abline(intercept=0, slope=1) + \
facet_wrap('a')
Out[38]: ---------------------------------------------------------------------------
NameError Traceback (most recent call last)
~/miniconda3/envs/science/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
691 type_pprinters=self.type_printers,
692 deferred_pprinters=self.deferred_printers)
--> 693 printer.pretty(obj)
694 printer.flush()
695 return stream.getvalue()
~/miniconda3/envs/science/lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj)
378 if callable(meth):
379 return meth(obj, self, cycle)
--> 380 return _default_pprint(obj, self, cycle)
381 finally:
382 self.end_group()
~/miniconda3/envs/science/lib/python3.6/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
493 if _safe_getattr(klass, '__repr__', None) is not object.__repr__:
494 # A user-provided repr. Find newlines and replace them with p.break_()
--> 495 _repr_pprint(obj, p, cycle)
496 return
497 p.begin_group(1, '<')
~/miniconda3/envs/science/lib/python3.6/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
691 """A pprint that just redirects to the normal repr function."""
692 # Find newlines and replace them with p.break_()
--> 693 output = repr(obj)
694 for idx,output_line in enumerate(output.splitlines()):
695 if idx:
~/miniconda3/envs/science/lib/python3.6/site-packages/plotnine/ggplot.py in __repr__(self)
82 Print/show the plot
83 """
---> 84 self.draw()
85 plt.show()
86 return '<ggplot: (%d)>' % self.__hash__()
~/miniconda3/envs/science/lib/python3.6/site-packages/plotnine/ggplot.py in draw(self)
139 # assign a default theme
140 self = deepcopy(self)
--> 141 self._build()
142
143 # If no theme we use the default
~/miniconda3/envs/science/lib/python3.6/site-packages/plotnine/ggplot.py in _build(self)
220 # Initialise panels, add extra data for margins & missing
221 # facetting variables, and add on a PANEL variable to data
--> 222 layout.setup(layers, self)
223
224 # Compute aesthetics to produce data with generalised
~/miniconda3/envs/science/lib/python3.6/site-packages/plotnine/facets/layout.py in setup(self, layers, plot)
57 # Generate panel layout
58 data = self.facet.setup_data(data)
---> 59 self.layout = self.facet.compute_layout(data)
60 self.layout = self.coord.setup_layout(self.layout)
61 self.check_layout()
~/miniconda3/envs/science/lib/python3.6/site-packages/plotnine/facets/facet_wrap.py in compute_layout(self, data)
71
72 base = combine_vars(data, self.plot.environment,
---> 73 self.vars, drop=self.drop)
74 n = len(base)
75 dims = wrap_dims(n, self.nrow, self.ncol)
~/miniconda3/envs/science/lib/python3.6/site-packages/plotnine/facets/facet.py in combine_vars(data, environment, vars, drop)
520 # For each layer, compute the facet values
521 values = [eval_facet_vars(df, vars, environment)
--> 522 for df in data if df is not None]
523
524 # Form the base data frame which contains all combinations
~/miniconda3/envs/science/lib/python3.6/site-packages/plotnine/facets/facet.py in <listcomp>(.0)
520 # For each layer, compute the facet values
521 values = [eval_facet_vars(df, vars, environment)
--> 522 for df in data if df is not None]
523
524 # Form the base data frame which contains all combinations
~/miniconda3/envs/science/lib/python3.6/site-packages/plotnine/facets/facet.py in eval_facet_vars(data, vars, env)
637 res = data[name]
638 else:
--> 639 res = env.eval(name, inner_namespace=data)
640 facet_vals[name] = res
641
~/miniconda3/envs/science/lib/python3.6/site-packages/patsy/eval.py in eval(self, expr, source_name, inner_namespace)
164 code = compile(expr, source_name, "eval", self.flags, False)
165 return eval(code, {}, VarLookupDict([inner_namespace]
--> 166 + self._namespaces))
167
168 @classmethod
<string> in <module>()
NameError: name 'a' is not defined
If the geom_abline
call is removed, this works correctly. I think generally, if a variable isn't available to facet on, then any geoms with that variable missing should be plotted with all available data.
I'm using plotnine v0.2.1 and attempting to make a log-log plot:
import pandas as pd
import matplotlib.pyplot as plt
from plotnine import *
df = pd.read_json("""{"index":{"0":49,"1":99,"2":199},"h":{"0":0.5,"1":0.25,"2":0.125},"t":{"0":25.0,"1":25.0,"2":25.0},"vals":{"0":[-60.7488535913,-3.2056655669],"1":[-47.4324189895,-2.3908288923],"2":[-35.7413627393,-1.8986336166]},"x":{"0":-60.7488535913,"1":-47.4324189895,"2":-35.7413627393},"y":{"0":-3.2056655669,"1":-2.3908288923,"2":-1.8986336166},"mag":{"0":60.8333749219,"1":47.4926355764,"2":35.7917563145}}""")
#Works
ggplot(df, aes(x='h', y='mag')) + geom_line()
#Doesn't work
ggplot(df, aes(x='h', y='mag')) + geom_line() + scale_y_log10() + scale_y_log10()
#Works
plt.loglog(df['h'],df['mag'])
plt.show()
However, it fails with the error:
... ggplot(df, aes(x='h', y='mag')) + geom_line() + scale_y_log10() + scale_y_log10()
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/local/lib/python3.5/dist-packages/plotnine/ggplot.py", line 84, in __repr__
self.draw()
File "/usr/local/lib/python3.5/dist-packages/plotnine/ggplot.py", line 141, in draw
self._build()
File "/usr/local/lib/python3.5/dist-packages/plotnine/ggplot.py", line 264, in _build
layout.setup_panel_params(self.coordinates)
File "/usr/local/lib/python3.5/dist-packages/plotnine/facets/layout.py", line 183, in setup_panel_params
self.panel_scales_y[j])
File "/usr/local/lib/python3.5/dist-packages/plotnine/coords/coord_cartesian.py", line 68, in setup_panel_params
out.update(train(scale_y, self.limits.xlim, 'y'))
File "/usr/local/lib/python3.5/dist-packages/plotnine/coords/coord_cartesian.py", line 57, in train
out = scale.break_info(rangee)
File "/usr/local/lib/python3.5/dist-packages/plotnine/scales/scale.py", line 535, in break_info
labels = self.get_labels(major)
File "/usr/local/lib/python3.5/dist-packages/plotnine/scales/scale.py", line 629, in get_labels
labels = self.trans.format(breaks)
File "/usr/local/lib/python3.5/dist-packages/mizani/formatters.py", line 378, in _log_format
dmin = np.log(np.min(x))/np.log(base)
File "/usr/lib/python3/dist-packages/numpy/core/fromnumeric.py", line 2352, in amin
out=out, **kwargs)
File "/usr/lib/python3/dist-packages/numpy/core/_methods.py", line 29, in _amin
return umr_minimum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation minimum which has no identity
I think all of the values used in the loglog plot should be legal and, indeed, loglog plotting works in pyplot (I mention this not to lord pyplot over you better, but to try to rule out data issues).
I need a pie, but without polar coords I can't cook it.
To better appreciate relative differences, having plots starting at 0 is often required (e.g. https://stackoverflow.com/questions/11214012/set-only-lower-bound-of-a-limit-for-ggplot)
In ggplot2 there are several options to achive this with options, e.g.
the 'expand_limits' function,
scale_y_continious(limits=c(0, NA))
or ylim(c(0, NA))
.
However I did not manage to replicate this function in plotnine, apart from calculating the limits 'manually' beforehand (see below). Is there any nicer way to do this?
from plotnine import *
from plotnine.data import mtcars
ymax = mtcars['mpg'].max()*1.1
(ggplot(mtcars, aes('wt', 'mpg'))
+ geom_point()
+ ylim(0, ymax))`
A ggplot worth a thousand words.
import pandas as pd
from sklearn import datasets
from plotnine import *
iris = datasets.load_iris()
iris_df = pd.concat([pd.DataFrame(iris.data, columns=iris.feature_names),
pd.DataFrame(iris.target, columns=['Species'])], axis=1)
iris_m = iris_df.melt(id_vars=['Species'])
iris_m.Species = iris_m.Species.astype('category')
ggplot(iris_m, aes(x='value')) +\
facet_grid('variable ~ .', scales='free') +\
geom_histogram(aes(fill='Species'), bins=30) +\
theme(strip_text_y=element_text(angle=0))
Similar behavior with strip_text_x
.
Yesterday I wrote a blog post creating a standard visualization from my research in plotnine; the final plot / code is available here. Everything works great, but there's one thing that I couldn't get to work: the legend_key element I passed to the final call to theme()
didn't actually change the plot aesthetics. My understanding is that the way I wrote the code, i.e.
legend_key=element_rect(fill='white', color='white')
should've made each of the line marker glyphs in the legend have a white background. But, they remained grey. Is this a mis-understanding on my part, or a glitch?
First of all, thanks for the great implementation of ggplot2 for Python.
Unfortunately, I might have spotted an issue with the scaling of a smoothed facet plot. The following code creates a plot that starts at 0.01 based on the data:
plot = (ggplot(df_docs_model) +
aes(x='quarter_num', y='value', color='factor(topic)') +
geom_smooth(method='loess', span = 0.1, alpha=0.2, size=2, show_legend=False) +
scale_x_continuous(breaks=list(range(1994, 2019, 2))) +
scale_y_continuous(breaks=[y / 100 for y in range(0, 15, 1)]) +
facet_wrap('~topic'))
When I set the scale manually to make sure the y-axis starts by zero, the axis adjusts accordingly. As you can tell from the plots, however, the values between these figures differ.
To set the y-scale I used the this statement:
scale_y_continuous(expand = (0,0), limits = (0,0.1))
I suspect that the statement above restricts the data (only values between 0 and 0.1 are considered) instead of limiting the scale. Do I use the wrong statement? Or is this an issue in the present implementation? Thanks for any clarification.
Is it possible to order plots by some arbitrary order? Now the sub plots are ordered alphabetically. I do not know R but I understand that in ggplot2 you can solve the problem by ordering "levels" that correspond to pandas categoricals.
Thanks and amazing package!
geom_bar() along with ylim(), or lims(y=()) is broken. If limit does not include 0 in its range, we get a blank plot. Here is one example:
df = pd.DataFrame({
'variable': ['gender', 'gender', 'age', 'age', 'age', 'income', 'income', 'income', 'income'],
'category': ['Female', 'Male', '1-24', '25-54', '55+', 'Lo', 'Lo-Med', 'Med', 'High'],
'value': [60, 40, 50, 30, 20, 10, 25, 25, 40],
})
df['variable'] = pd.Categorical(df['variable'], categories=['gender', 'age', 'income'])
dodge_text = position_dodge(width=0.9) # new
# Works
(ggplot(df, aes(x='variable', y='value', fill='category'))
+ geom_bar(stat='identity', position='dodge', show_legend=False) # modified
)
# Works
(ggplot(df, aes(x='variable', y='value', fill='category'))
+ geom_bar(stat='identity', position='dodge', show_legend=False) # modified
+ lims(y=(-5, 60)
)
# Does NOT work
(ggplot(df, aes(x='variable', y='value', fill='category'))
+ geom_bar(stat='identity', position='dodge', show_legend=False) # modified
+ lims(y=(15, 60)
)
How can achieve this ggplot2 functionality in plotnine?
ggplot() +
geom_line(data=df, aes(x=date, y=value)) +
scale_x_date(date_breaks = "1 week")
Right now, my x-axis defaults to hundreds of smashed together dates which is completely unreadable.
Thanks!
I've run into some issues with smoothing over data with numeric columns for colors.
Smoothing (and potentially other methods) seem to run into problems related to DISCRETE_COLUMNS.
I've attached a small notebook printout detailing the failure case.
What is the way in which to achieve the reordering of the bars according to its values.
Example
df = pd.DataFrame([('a', 1), ('b', 20), ('c', 5)], columns=['category', 'value'])
ggplot(df, aes('category', 'value')) + geom_bar(stat='identity')
Bars are ordered A, B, C. Instead I want this reordered on the values, thus B, C, A.
Here is a thread on how this is achieved in R, but my question is how this can be done in plotnine
https://stackoverflow.com/questions/33613385/sort-bar-chart-by-sum-of-values-in-ggplot
Is there an equivalent reorder function?
(ggplot(diamonds) +
+ aes('depth', fill='cut', color='cut')
+ geom_density(alpha=0.1)
+ xlim(55, 70)
)
The sizing of the strips with the labels above the plots in facet_wrap
is inconsistent. If I facet_wrap
a plot with just two rows, I get what seem to me to be overly narrow strips such that the label text is at the edges of the strip.
I can increase the strip height with + theme(strip_text=element_text(lineheight=1.8))
. But then the strips become far too tall when there are more rows in the faceted plot.
I have played around with this some, and I think the ultimate problem is that the way that the height of the strip text is computed must depend on the total height of the faceted plots, such that plots with more rows end up with taller strip texts.
Let me know if you want me to post example images.
I haven't really been able to figure out why, but the fill in ribbon plots in faceted charts doesn't seem to work.
Whenever I try to use the alpha argument to change transparency, especially with geom_point, transparency is left unchanged. Could be a bug?
Seaborn has the nice feature that you can extra matplotlib Axis
objects from a FacetGrid
, which allows you to make low-level adjustments in matplotlib if necessary.
Is something like this possible in plotnine?
The 'alpha' transparency parameter seems to be currently ignored with 'geom_smooth'.
For 'geom_line' it seems to work as expected.
Example:
from plotnine import *
import pandas as pd
pdat = pd.DataFrame({'x': range(10), 'y': range(10)})
p = (ggplot(pdat, aes(x='x', y='y'))+
geom_smooth(alpha=0.1))
p.draw()
p = (ggplot(pdat, aes(x='x', y='y'))+
geom_line(alpha=0.1))
p.draw()
Thanks for your work! I have used plotnine in Kaggle kernels and it seems quite good.
However, does the geom_histogram
support origin
parameter now? For example:
ggplot(train[~train['Age'].isnull()], aes('Age', fill='factor(Survived)')) \
+ geom_histogram(binwidth=1, alpha=0.5, position='identity', origin=0) \
+ scale_fill_manual(values=[ns_color, s_color])
will give the following error.
---------------------------------------------------------------------------
PlotnineError Traceback (most recent call last)
<ipython-input-29-d387ec79f952> in <module>()
1 # fill can be used to group variables. The fill (grouping) variable must be categorical in pandas.
2 # Otherwise, we should transform it with astype or simple just 'factor(variable)'
----> 3 ggplot(train[~train['Age'].isnull()], aes('Age', fill='factor(Survived)')) + geom_histogram(binwidth=1, alpha=0.5, position='identity', origin=0) + scale_fill_manual(values=[ns_color, s_color])
/opt/conda/lib/python3.6/site-packages/plotnine/geoms/geom.py in __init__(self, *args, **kwargs)
43 self._stat = stat.from_geom(self)
44 self._position = position.from_geom(self)
---> 45 self.verify_arguments(kwargs) # geom, stat, layer
46
47 @staticmethod
/opt/conda/lib/python3.6/site-packages/plotnine/geoms/geom.py in verify_arguments(self, kwargs)
249 msg = ("Parameters {}, are not understood by "
250 "either the geom, stat or layer.")
--> 251 raise PlotnineError(msg.format(unknown))
252
253 def handle_na(self, data):
PlotnineError: "Parameters {'origin'}, are not understood by either the geom, stat or layer."
I can create a plot that either facets, or has geom_vline
but not both. Examples below:
# this works
g = (pn.ggplot(df, pn.aes('spread_pct'))
#+ pn.geom_vline(xintercept=0.5)
+ pn.geom_histogram()
+ pn.facet_wrap('~optionType'))
print(g)
# this works
g = (pn.ggplot(df, pn.aes('spread_pct'))
+ pn.geom_vline(xintercept=0.5)
+ pn.geom_histogram())
#+ pn.facet_wrap('~optionType'))
print(g)
# this doesn't work
g = (pn.ggplot(df, pn.aes('spread_pct'))
+ pn.geom_vline(xintercept=0.5)
+ pn.geom_histogram()
+ pn.facet_wrap('~optionType'))
print(g)
This produces a NameError, see the traceback below:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-252-a98d15d74a1d> in <module>()
3 + pn.geom_histogram()
4 + pn.facet_wrap('~optionType'))
----> 5 print(g)
~/anaconda3/envs/pymc3/lib/python3.6/site-packages/plotnine/ggplot.py in __repr__(self)
82 Print/show the plot
83 """
---> 84 self.draw()
85 plt.show()
86 return '<ggplot: (%d)>' % self.__hash__()
~/anaconda3/envs/pymc3/lib/python3.6/site-packages/plotnine/ggplot.py in draw(self)
139 # assign a default theme
140 self = deepcopy(self)
--> 141 self._build()
142
143 # If no theme we use the default
~/anaconda3/envs/pymc3/lib/python3.6/site-packages/plotnine/ggplot.py in _build(self)
220 # Initialise panels, add extra data for margins & missing
221 # facetting variables, and add on a PANEL variable to data
--> 222 layout.setup(layers, self)
223
224 # Compute aesthetics to produce data with generalised
~/anaconda3/envs/pymc3/lib/python3.6/site-packages/plotnine/facets/layout.py in setup(self, layers, plot)
57 # Generate panel layout
58 data = self.facet.setup_data(data)
---> 59 self.layout = self.facet.compute_layout(data)
60 self.layout = self.coord.setup_layout(self.layout)
61 self.check_layout()
~/anaconda3/envs/pymc3/lib/python3.6/site-packages/plotnine/facets/facet_wrap.py in compute_layout(self, data)
71
72 base = combine_vars(data, self.plot.environment,
---> 73 self.vars, drop=self.drop)
74 n = len(base)
75 dims = wrap_dims(n, self.nrow, self.ncol)
~/anaconda3/envs/pymc3/lib/python3.6/site-packages/plotnine/facets/facet.py in combine_vars(data, environment, vars, drop)
517 # For each layer, compute the facet values
518 values = [eval_facet_vars(df, vars, environment)
--> 519 for df in data if df is not None]
520
521 # Form the base data frame which contains all combinations
~/anaconda3/envs/pymc3/lib/python3.6/site-packages/plotnine/facets/facet.py in <listcomp>(.0)
517 # For each layer, compute the facet values
518 values = [eval_facet_vars(df, vars, environment)
--> 519 for df in data if df is not None]
520
521 # Form the base data frame which contains all combinations
~/anaconda3/envs/pymc3/lib/python3.6/site-packages/plotnine/facets/facet.py in eval_facet_vars(data, vars, env)
628
629 for name in vars:
--> 630 res = env.eval(name, inner_namespace=data)
631 facet_vals[name] = res
632
~/anaconda3/envs/pymc3/lib/python3.6/site-packages/patsy/eval.py in eval(self, expr, source_name, inner_namespace)
164 code = compile(expr, source_name, "eval", self.flags, False)
165 return eval(code, {}, VarLookupDict([inner_namespace]
--> 166 + self._namespaces))
167
168 @classmethod
<string> in <module>()
NameError: name 'optionType' is not defined
Think about insets plots.
Potential syntax
# 1.
p_inset = ggplot() + ...
p = ggplot() + ... + inset(p_inset)
# 2.
p_inset = ggplot(inset=True) + ...
p = ggplot() + ... + p_inset
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.