Coder Social home page Coder Social logo

yudhisteer / 100-data-viz Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 2.85 MB

This repositiory is about exploring fun datasets to extract insights and be able to adopt meaningful data visualization for storytelling.

Jupyter Notebook 74.64% R 1.72% HTML 23.39% CSS 0.02% JavaScript 0.24%
analytics business-analytics data-analysis data-science data-visualization insights

100-data-viz's Introduction

Hi there 👋

100-data-viz's People

Watchers

 avatar

100-data-viz's Issues

A brief history of police shootings

Today I explore the data on U.S Police Shootings from 2015–2022. It is based on the Washington Post’s database which contains records of every fatal shooting in the United States by a police officer in the line of duty since Jan. 1, 2015.

The dataset contains information about the name, age, race, and gender of the deceased, whether the person was fleeing, whether the person was armed, the threat level of the person, and whether the person had mental health issues. It also contains information on the weapons used by the law enforcer, the date and location of the incident took place, and whether the officer had a body camera on.

The purpose of this analysis is to be able to answer the questions below:

  • Has the number of fatal shootings by U.S Police decreased over the years?
  • Are young people more likely to be shot than old people? Is it the same for all gender?
  • Did the assailants really mean harm such that shooting in self-defense by the officer is justified?
  • How is race related to these shootings?
  • Do Black Americans get killed more than White Americans?

Has the number of fatal shootings by U.S Police decreased over the years?

The simple answer is no! With this bar chart, we observe that the number of fatal shootings has remained nearly constant over the years. On the contrary, we had the highest number of deaths in a single month in the last 2 years. In 2021 alone, 1055 people were killed.

Image

I inverted the bar chart, rounded the corners, and used red color to represent blood dripping. The inspiration comes from Simon Scarr’s amazing graph on Iraq’s bloody toll.

Are young people more likely to be shot than old people? Is it the same for all gender?

We see that we do not have a balanced dataset in terms of gender. We have more males deceased than females.

Image

Image

However, when it comes to age, we observe nearly no discrepancy. The median age of the victims for males and females are 35 and 36 respectively as indicated by the dashed lines. More than half of the victims are between 20 and 40 years old.

Did the assailants really mean harm such that shooting in self-defense by the officer is justified?

To answer this question we need to look at whether the victim was armed or not. First, we observe that most of the victims were shot however, some were both shot and tasered.

Image

Secondly, more than half of the victims had a gun. A vast majority of them had a knife as well.

Image

Notice that some of the victims were either unarmed, or the weapon was undetermined which means it is not known whether or not the victim had a weapon. There were also cases where the weapon was unknown which means the victim was armed, but the weapon was not identified. In this case, it could have been a toy gun or a harmless object as well. So there have been cases where shooting could not be justified however, this all depends on the context of the situation.

How is race related to these shootings?

I grouped the data by race to see how the number of victims is correlated with the body camera, fleeing, mental illness, and threat level. I did a log transformation on the data so that we could have a proper visualization.

We observe that no matter the race, the number of victims is higher when the body camera is Off.

Image

Black Americans, Hispanic Americans, and White Americans were more likely to have a history of mental health issues, expressed suicidal intentions, or were experiencing mental distress at the time of the shooting when the body camera was off.

Image

Not fleeing was more common in all races.

Image

Most of the time the threat level of Black Americans, Hispanic Americans, and White Americans was the attack category which is the highest level of threat.

Image

Do Black Americans get killed more than White Americans?

From the first boxplot, we observe that White Americans were killed more than Black Americans however the population of each race was not taken into account.

Image

The U.S currently has 41.6 million Black Americans, 62.57 million Hispanic Americans, and 231.9 million White Americans. If we take the ratio of the number of victims of a particular race by its population, we observe that Black Americans are indeed killed at more than twice the rate of White Americans by the police. Hispanic Americans are also killed at a higher rate than White Americans.

Image

My conclusion would first be to have a law to have police officers have their body cameras On at all times. Pressing the On button of a camera has fewer consequences than pressing the trigger.

Secondly, we observed that a certain number of people were either unarmed or it was undetermined whether they were armed. I realize that a lot more variables have to be taken into account but for cases when the assailants are unarmed, a taser or any non-lethal weapon would be more “ethical” to use.

Finally, the number of police shootings irrespective of race has not decreased in the last 7 years. It is so common for someone to have a gun in the U.S that we now have posters in Walmart advising people to “refrain from openly carrying a firearm.” Gun control for the general public would definitely be a solution.

This is #day3 of my #100dataviz projects on data science and storytelling with data. I welcome feedback or ideas on any topics which you would want me to explore. Full code on GitHub soon. Thank you for reading!

7

Betting on the right team

Continuing from my previous post on extracting nitty-gritty insights from the FIFA World Cup data, I would now like to make some analysis on which team I would bet my money on.

Compared to a boxplot, the violin plot allows us to see the distribution of our data more clearly. For example, we can see we have bimodal distributions for the Away Team Goals in 1934, 1962, 1966, and 1978. Observe that we have long-tail distributions after the third quartile for the Home Team, but this reduces after 1982. Thereafter, the distribution for both teams seems to be similar.

Image

Then, with a stacked bar chart we observe the frequency of countries who won the World Cup but also those countries who missed by a little. From the graph, Brazil is the most successful World Cup team with five titles. But what is more interesting is to see how many times Germany has been 1st runner-up and 2nd runner-up.

In the previous post, we observed that a German holds the record for most goals scored by a single player. Now, we see that Germany reached the finals and semi-finals the greatest number of times. This is clearly an indicator of the strength of the team.

Image

I acknowledge that no statistical inferences have been done on the players and the teams over the years to demonstrate mathematically which team to bet on, but from the analysis we observe Germany to be a ferocious team. Hence, I will act on my “hunch” and bet on Germany for the 2022 World Cup.

This is #day2 of my #100dataviz projects on data science and storytelling with data. I welcome feedback of any kind or ideas on any topics which you would want me to explore. Thank you for reading!

Nitty-gritty insights on FIFA World Cup

With the 2022 FIFA World Cup in less than 3 months, I decided to look at the past World Cup data to extract some nitty-gritty insights.

After some data cleaning and data preprocessing, I plot the ridge plot on the number of goals scored over the years for a few selected countries. We see that Brazil, Germany, France, and Uruguay have some heavy tail distribution on the right which would mean these countries score more goals compared to others.

Image

With boxplots, we observe that the average number of goals scored per country(white dot) seems to be higher for Brazil, Argentina, and Uruguay where their means are pulled to the right by some outliers.

Image

Next, we plot the stacked area chart for the sum of goals scored by all countries from 1930 to 2014. First, we observe a gap between 1942 and 1946 where the area is zero. This is because the World Cup was canceled due to the Second World War.

Secondly, we can clearly see an increasing trend in the total number of goals scored by all the countries. Note that the number of teams qualified for the 2022 World Cup is 32 but that number is not always constant.

Image

Finally, we want to find out the strength of a country through the maximum number of goals scored by players. I plot a stripplot with a few selected countries.

Germany is in 1st position with Miroslav Klose having a total number of 16 goals. Brazil is in 2nd position with Ronaldo having scored 15 goals and in 3rd position, we have Germany again with Gerd Müller having 14 goals.

Note how the distribution of the points for France, Peru, England, and Spain are different from the rest. The maximum goals scored for these countries seem to be outliers. It is as if once in a lifetime a gifted player is born who breaks all records by a significant number.

Image

This is #day1 of my #100dataviz projects on data science and storytelling with data. This was inspired by Hannah Yan Han. Her articles have been a great source of inspiration. I welcome feedback of any kind or ideas on any topics which you would want me to explore. Thank you for reading!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.