Coder Social home page Coder Social logo

stevensalazarm / apache-spark-car-accidents-in-ny Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 3.0 485 KB

An Apache-Spark application that infers qualitative data regarding the car accidents in New York City.

Java 100.00%
polimi apache-spark big-data spark-java car-accidents pandas jupyter-notebook

apache-spark-car-accidents-in-ny's Introduction

Car Accidents in New York

The goal of this project is to infer qualitative data regarding the car accidents in New York City. In particular, it is asked to perform the following queries:

Query 1

Number of lethal accidents per week throughout the entire dataset.

Query 2

Number of accidents and percentage of number of deaths per contributing factor in the dataset.

for each contributing factor, we want to know how many accidents were due to that contributing factor and what percentage of these accidents were also lethal.

Query 3

Number of accidents and average number of lethal accidents per week per borough.

for each borough, we want to know how many accidents there were in that borough each week, as well as the average number of lethal accidents that the borough had per week.

Solution

The dataset that is used to perform the three queries is available at NYPD_Motor_Vehicle_Collisions.

In order to complete the queries requested it was considered that:

  • In the dataset some rows contain incorrect values since:

# Persons Injured = # Cyclist Injured + # Pedestrians Injured + # Motorist Injured

and

# Persons Killed = # Cyclist Killed + # Pedestrians Killed + # Motorist Killed

  • From the dataset it is possible to see that in its structure it was not considered to have YEAR and WEEK as direct data, so it these two information had to be calculated from DATE.
  • There are 5 columns with the same domain called CONTRIBUTING FACTOR X that is merged into a single array column.

Usage

Local Mode

  1. Clone the project.

  2. Download the dataset.

  3. Move the dataset into the files directory of the project.

  4. Open the project with Eclipse or any IDE that supports maven.

  5. Run as Java Application without passing any parameter.

Cluster Mode

  1. Download Spark 2.4+ and configure it as you like.

  2. Start Spark and make sure that you connect to at least one worker.

  3. Clone the project.

  4. Download the dataset.

  5. Move the dataset into the files directory of the project.

  6. Compile the project with maven

    mvn package

  7. Submit the project through spark-submit, for example:

    spark-submit --class it.polimi.middleware.spark.car.accidents.CarAccidentsCache car_accidents.jar spark://master_ip:port dataset_directory/ test_number

Results

The decimal number above each bar indicates the percentage of accidents that were also lethal.

The decimal number above each bar indicates the average number of lethal accidents that the borough had in a week.

apache-spark-car-accidents-in-ny's People

Contributors

stevensalazarm avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.