The purpose of this project is to analyze data on GDP and life expectancy from the World Health Organization and the World Bank to try and identify the relationship between the GDP and life expectancy of six countries.
Files : life_expectancy
For my data mining I used two sources:
- GDP Source World Bank
- Life expectancy Data Source World Health Organization
There are six countries, Chile, China, Germany, Mexico, the US, and Zimbabwe represented in the data. Looking over the data, there are inconsistencies with the column names. Life expectancy at birth (years) is descriptive so changed column name to LEABY.
Analyzing the data after the initial exploration.
The plot below shows the distribution of GDP. The distribution of GDP in the data is very right skewed where most of the values are on the left-hand side.
Next, examined the distribution of LEABY. The distribution of LEABY in the data is very left skewed where most of the values are on the right-hand side. This is almost the opposite of what was observed in the GDP column.
Breaking up the data by countries, to find the average LEABY and GDP by country. Examining the Life Expectancy and all of the countries except for Zimbabwe have values in the mid-to-high 70s. This probably explains the skew in the distribution from before!
For the average GDP by Country it seems that the US has a much higher value compared to the rest of the countries. In this bar plot, Zimbabwe is not even visible where Chile is just barely seen. In comparison the USA has a huge GDP compared to the rest. China, Germany and Mexico seem to be relatively close in figures.
Comparing data is to visualize the distributions of each and to look for patterns in the shapes. Below, country is on the x-axis and the distribution of numeric columns : GDP and LEABY are on the y axis. In the GDP plot on the left, China and the US have a relatively wide range, where Zimbabwe, Chile, and Mexico have shorter ranges. In the LEABY plot, many of the countries have shorter ranges except for Zimbabwe which has a range spanning from the high 30s to the high 60s.
Showing distributions using swarm plot, as they can be used to complement the box and violin plots. First the stand alone swarm plot is shown and then overlayed on top of a violin plot.
In the case of of the GDP
plot on the left, Chile and Zimbabwe have a vertical line of dots that illustrate the number of data points that fall around their values. This detail would have been lost in the box plot, unless the reader is very adept at data visualizations.
Exploring GDP
and LEABY
over the years through line charts. Below the countries are separated by colors and one can see that the US and China have seen substantial gains between 2000-2015. China went from less than a quarter trillion dollars to one trillion dollars in the time span. The rest of the countries did not see increases in this magnitude.
Examining different aspects with faceted line charts by Country. In the individual plots, each country has their own y axis, which makes it easier to compare the shape of their GDP over the years without the same scale. In the chart above, the other country's GDP growth looked modest compared to China and the US, but all of the countries did experience growth from the year 2000.
The charts below show life expectancy over the years. It is evident that every country has been increasing their life expectancy, but Zimbabwe has seen the greatest increase after a bit of a dip around 2004.
After breaking out life expectancy by country. It is apparent that Chile, and Mexico seemed to have dips in their life expectancy around the same time which could be looked into further. Also the seemingly linear changes were in reality was not as smooth for some of the countries.
Exploring the relationship between GDP and LEABY. In the chart below, it looks like the previous charts where GDP for Zimbabwe is staying flat, while their life expectancy is going up. For the other countries they seem to exhibit a rise in life expectancy as GDP goes up. The US and China seem to have very similar slopes in their relationship between GDP and life expectancy.
Looking at the individual countries, most countries like the US, Mexico and Zimbabwe have linear relationships between GDP and life expectancy. China on the other hand has a slightly exponential curve, and Chile's looks a bit logarithmic. In general though one can see an increase in GDP and life expectancy, exhibiting a positive correlation.
- The life expectancy increased over time in the six nations, with Zimbabwe having the greatest increase.
- GDP has increased for all countries in our list, especially for China.
- There is a positive correlation between GDP and life expectancy for countries in our list.
- Average life expectancy was between mid to high 70s for the countries except for Zimbabwe which was 50.
- The life expectancy had a left skew, or most of the observations were on the right side.