Pandas Review

import pandas as pd

Previewing the file

df = pd.read_csv('cdc_death_stats.csv')
df.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Notes	State	State Code	Ten-Year Age Groups	Ten-Year Age Groups Code	Gender	Gender Code	Race	Race Code	Deaths	Population	Crude Rate
0	NaN	Alabama	1	< 1 year	1	Female	F	American Indian or Alaska Native	1002-5	14	3579.0	Unreliable
1	NaN	Alabama	1	< 1 year	1	Female	F	Asian or Pacific Islander	A-PI	24	7443.0	322.5
2	NaN	Alabama	1	< 1 year	1	Female	F	Black or African American	2054-5	2093	169339.0	1236.0
3	NaN	Alabama	1	< 1 year	1	Female	F	White	2106-3	2144	347921.0	616.2
4	NaN	Alabama	1	< 1 year	1	Male	M	Asian or Pacific Islander	A-PI	33	7366.0	448.0

type(df)

pandas.core.frame.DataFrame

Series

#Just pandas way of calling columns

#Preview a column (Pandas Series)
df.State.head() #the .head() method works for Series as well!

0    Alabama
1    Alabama
2    Alabama
3    Alabama
4    Alabama
Name: State, dtype: object

#You can only use the above syntax if your column name has no spaces or special characters
#The syntax below always works.
df['State'].tail() #The general form for calling a column

4110    Wyoming
4111    Wyoming
4112    Wyoming
4113    Wyoming
4114    Wyoming
Name: State, dtype: object

Subsetting the DataFrame

Retrieve Column Names of DataFrame

df.columns

Index(['Notes', 'State', 'State Code', 'Ten-Year Age Groups',
       'Ten-Year Age Groups Code', 'Gender', 'Gender Code', 'Race',
       'Race Code', 'Deaths', 'Population', 'Crude Rate'],
      dtype='object')

Subsetting the DataFrame by Columns

df[df.columns[1:4]].head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	State	State Code	Ten-Year Age Groups
0	Alabama	1	< 1 year
1	Alabama	1	< 1 year
2	Alabama	1	< 1 year
3	Alabama	1	< 1 year
4	Alabama	1	< 1 year

cols = ['Notes', 'State', 'Population']
df[cols].head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Notes	State	Population
0	NaN	Alabama	3579
1	NaN	Alabama	7443
2	NaN	Alabama	169339
3	NaN	Alabama	347921
4	NaN	Alabama	7366

df[['Gender', 'Deaths']].head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Gender	Deaths
0	Female	14
1	Female	24
2	Female	2093
3	Female	2144
4	Male	33

Subsetting Rows using Conditionals

#Only display data where the State Column is New York and the Deaths column is greater then 50.
ny_50plus = df[(df['State']=='New York')
  & (df['Deaths']>50)]

print(len(df))
print(len(ny_50plus))
ny_50plus.head()

4115
82

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Notes	State	State Code	Ten-Year Age Groups	Ten-Year Age Groups Code	Gender	Gender Code	Race	Race Code	Deaths	Population	Crude Rate
2606	NaN	New York	36	< 1 year	1	Female	F	Asian or Pacific Islander	A-PI	485	168826.0	287.3
2607	NaN	New York	36	< 1 year	1	Female	F	Black or African American	2054-5	3767	467735.0	805.4
2608	NaN	New York	36	< 1 year	1	Female	F	White	2106-3	6505	1456339.0	446.7
2610	NaN	New York	36	< 1 year	1	Male	M	Asian or Pacific Islander	A-PI	626	179832.0	348.1
2611	NaN	New York	36	< 1 year	1	Male	M	Black or African American	2054-5	4654	485909.0	957.8

Groupby

#Grouping by a single feature
grouped = df.groupby('State')['Deaths'].sum()
grouped.head()

State
Alabama        860780
Alaska          63334
Arizona        838094
Arkansas       522914
California    4307061
Name: Deaths, dtype: int64

#Grouping by multiple features and reseting the index
grouped = df.groupby(['Gender', 'Race'])['Deaths'].sum().reset_index()
grouped.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Gender	Race	Deaths
0	Female	American Indian or Alaska Native	120827
1	Female	Asian or Pacific Islander	417760
2	Female	Black or African American	2601979
3	Female	White	19427767
4	Male	American Indian or Alaska Native	145492

Matplotlib Review

Thus far we've primarily worked with the pyplot module within matplotlib.
Also recall the ipython magic command for displaying graphs within notebooks:

import matplotlib.pyplot as plt
%matplotlib inline

A simple plot

# df.Population = df.Population.astype(int)
to_plot = df.groupby('State').Deaths.sum().sort_values(ascending=False)
to_plot.head(2)

State
California    4307061
Florida       3131111
Name: Deaths, dtype: int64

to_plot.head(10).plot(kind='barh')

<matplotlib.axes._subplots.AxesSubplot at 0x10da3d198>

Seaborn

Another very useful package that sits on top of matplotlib is called seaborn. Seaborn helps with figure asthetics and making your graphs by default better styled.

import seaborn as sns

Seaborn styles

One easy thing to do is change the figure asthetic of all future graphs. You can do this by setting a seaborn style with one line:

sns.set_style('darkgrid')

Then simply rerunning our previous code:

to_plot.head(10).plot(kind='barh')

<matplotlib.axes._subplots.AxesSubplot at 0x1a1aeb1710>

Voila! Notice that nice background thanks to our seaborn style!

Seaborn Color Palettes

Another nice feature are color palettes! Here's a few examples:

current_palette = sns.color_palette() #Save a color palette to a variable
sns.palplot(current_palette) #Preview color palette

sns.palplot(sns.color_palette("Paired"))

sns.palplot(sns.color_palette("Blues"))

And there are many many more! For a more complete description of available color palettes in seaborn check out the documentation here: https://seaborn.pydata.org/tutorial/color_palettes.html

Applying a color palette to our previous example:

color_palette = sns.color_palette("RdBu_r", 10) #The number reperesents how many colors you want
to_plot.head(10).plot(kind='barh', color = color_palette)

<matplotlib.axes._subplots.AxesSubplot at 0x1a1b4e38d0>

learn-co-students / ds-skills-seaborn-color-style-qa-internal Goto Github PK

ds-skills-seaborn-color-style-qa-internal's Introduction

Pandas Review

Previewing the file

Series

Subsetting the DataFrame

Retrieve Column Names of DataFrame

Subsetting the DataFrame by Columns

Subsetting Rows using Conditionals

Groupby

Matplotlib Review

A simple plot

Seaborn

Seaborn styles

Seaborn Color Palettes

Applying a color palette to our previous example:

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent