Typically, whenever we are handed a new dataset or problem our first step is to begin to wrap our head around the problem and the data associated with it. This is where some of our previous tools such as the measure of center and dispersion will come in handy. From there, we can continue to dissect the problem by employing various techniques and algorithms to further analyze the dataset from various angles.
import pandas as pd
df = pd.read_csv('train.csv')
print(len(df))
df.head()
891
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId 891 non-null int64
Survived 891 non-null int64
Pclass 891 non-null int64
Name 891 non-null object
Sex 891 non-null object
Age 714 non-null float64
SibSp 891 non-null int64
Parch 891 non-null int64
Ticket 891 non-null object
Fare 891 non-null float64
Cabin 204 non-null object
Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB
Along the diagonal are the distributions for each of our variables. Every other cell shows a correlation plot of one feature against another feature.
%matplotlib inline
pd.plotting.scatter_matrix(df, figsize=(10,10));
(Any appropriate graph works including pie chart or bar chart or other appropriate.)
#Your code here
#Your code here
#Your code here
- 0-12 years old
- 12-18 years old
- 18-25 years old
- 25-35 years old
- 35-45 years old
- 45-65 years old
- 65+ years old
#Your code here
#Your code here
# Your code here
# Your code here
# Your code here