Coder Social home page Coder Social logo

ds-skills-seaborn-color-style-al's Introduction

Pandas Review

import pandas as pd

Previewing the file

df = pd.read_csv('cdc_death_stats.csv')
df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Notes State State Code Ten-Year Age Groups Ten-Year Age Groups Code Gender Gender Code Race Race Code Deaths Population Crude Rate
0 NaN Alabama 1 < 1 year 1 Female F American Indian or Alaska Native 1002-5 14 3579.0 Unreliable
1 NaN Alabama 1 < 1 year 1 Female F Asian or Pacific Islander A-PI 24 7443.0 322.5
2 NaN Alabama 1 < 1 year 1 Female F Black or African American 2054-5 2093 169339.0 1236.0
3 NaN Alabama 1 < 1 year 1 Female F White 2106-3 2144 347921.0 616.2
4 NaN Alabama 1 < 1 year 1 Male M Asian or Pacific Islander A-PI 33 7366.0 448.0
type(df)
pandas.core.frame.DataFrame

Series

#Just pandas way of calling columns

#Preview a column (Pandas Series)
df.State.head() #the .head() method works for Series as well!
0    Alabama
1    Alabama
2    Alabama
3    Alabama
4    Alabama
Name: State, dtype: object
#You can only use the above syntax if your column name has no spaces or special characters
#The syntax below always works.
df['State'].tail() #The general form for calling a column
4110    Wyoming
4111    Wyoming
4112    Wyoming
4113    Wyoming
4114    Wyoming
Name: State, dtype: object

Subsetting the DataFrame

Retrieve Column Names of DataFrame

df.columns
Index(['Notes', 'State', 'State Code', 'Ten-Year Age Groups',
       'Ten-Year Age Groups Code', 'Gender', 'Gender Code', 'Race',
       'Race Code', 'Deaths', 'Population', 'Crude Rate'],
      dtype='object')

Subsetting the DataFrame by Columns

df[df.columns[1:4]].head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
State State Code Ten-Year Age Groups
0 Alabama 1 < 1 year
1 Alabama 1 < 1 year
2 Alabama 1 < 1 year
3 Alabama 1 < 1 year
4 Alabama 1 < 1 year
cols = ['Notes', 'State', 'Population']
df[cols].head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Notes State Population
0 NaN Alabama 3579
1 NaN Alabama 7443
2 NaN Alabama 169339
3 NaN Alabama 347921
4 NaN Alabama 7366
df[['Gender', 'Deaths']].head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Gender Deaths
0 Female 14
1 Female 24
2 Female 2093
3 Female 2144
4 Male 33

Subsetting Rows using Conditionals

#Only display data where the State Column is New York and the Deaths column is greater then 50.
ny_50plus = df[(df['State']=='New York')
  & (df['Deaths']>50)]
print(len(df))
print(len(ny_50plus))
ny_50plus.head()
4115
82
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Notes State State Code Ten-Year Age Groups Ten-Year Age Groups Code Gender Gender Code Race Race Code Deaths Population Crude Rate
2606 NaN New York 36 < 1 year 1 Female F Asian or Pacific Islander A-PI 485 168826.0 287.3
2607 NaN New York 36 < 1 year 1 Female F Black or African American 2054-5 3767 467735.0 805.4
2608 NaN New York 36 < 1 year 1 Female F White 2106-3 6505 1456339.0 446.7
2610 NaN New York 36 < 1 year 1 Male M Asian or Pacific Islander A-PI 626 179832.0 348.1
2611 NaN New York 36 < 1 year 1 Male M Black or African American 2054-5 4654 485909.0 957.8

Groupby

#Grouping by a single feature
grouped = df.groupby('State')['Deaths'].sum()
grouped.head()
State
Alabama        860780
Alaska          63334
Arizona        838094
Arkansas       522914
California    4307061
Name: Deaths, dtype: int64
#Grouping by multiple features and reseting the index
grouped = df.groupby(['Gender', 'Race'])['Deaths'].sum().reset_index()
grouped.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Gender Race Deaths
0 Female American Indian or Alaska Native 120827
1 Female Asian or Pacific Islander 417760
2 Female Black or African American 2601979
3 Female White 19427767
4 Male American Indian or Alaska Native 145492

Matplotlib Review

Thus far we've primarily worked with the pyplot module within matplotlib.
Also recall the ipython magic command for displaying graphs within notebooks:

import matplotlib.pyplot as plt
%matplotlib inline

A simple plot

# df.Population = df.Population.astype(int)
to_plot = df.groupby('State').Deaths.sum().sort_values(ascending=False)
to_plot.head(2)
State
California    4307061
Florida       3131111
Name: Deaths, dtype: int64
to_plot.head(10).plot(kind='barh')
<matplotlib.axes._subplots.AxesSubplot at 0x10da3d198>

png

Seaborn

Another very useful package that sits on top of matplotlib is called seaborn. Seaborn helps with figure asthetics and making your graphs by default better styled.

import seaborn as sns

Seaborn styles

One easy thing to do is change the figure asthetic of all future graphs. You can do this by setting a seaborn style with one line:

sns.set_style('darkgrid')

Then simply rerunning our previous code:

to_plot.head(10).plot(kind='barh')
<matplotlib.axes._subplots.AxesSubplot at 0x1a1aeb1710>

png

Voila! Notice that nice background thanks to our seaborn style!

Seaborn Color Palettes

Another nice feature are color palettes! Here's a few examples:

current_palette = sns.color_palette() #Save a color palette to a variable
sns.palplot(current_palette) #Preview color palette

png

sns.palplot(sns.color_palette("Paired"))

png

sns.palplot(sns.color_palette("Blues"))

png

And there are many many more! For a more complete description of available color palettes in seaborn check out the documentation here: https://seaborn.pydata.org/tutorial/color_palettes.html

Applying a color palette to our previous example:

color_palette = sns.color_palette("RdBu_r", 10) #The number reperesents how many colors you want
to_plot.head(10).plot(kind='barh', color = color_palette)
<matplotlib.axes._subplots.AxesSubplot at 0x1a1b4e38d0>

png

ds-skills-seaborn-color-style-al's People

Contributors

mathymitchell avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.