Coder Social home page Coder Social logo

twitteranalysis's Introduction

TwitterAnalysis

This is a simple spark program. It shows the ability to take a given input of Tweet JSON files, map them to Tweet objects, analyze those, and then collect information and write to output. This implementation is built specifically to work with AWS (S3, EMR). The program takes input and writes output to an S3 bucket. It is meant to be run on an EMR, the setup of which is discussed in more depth below.

Installation and Usage

  1. Clone this repo
  2. (Optional) If edits are desired -> Open maven project in any IDE (Java 8 required!)
  3. Build maven project into a jar file (so long as Spark is provided on EMR, you do not need to include the extracted Spark output in the jar)
  4. Upload jar to S3 bucket and save path for later
  5. Ensure input JSON files are located in the input path and the output directory is created in the output path, as specified in the application.properties file
  6. Deploy an EMR cluster
  7. Add step to cluster with specifications
  8. Choose Spark Application
  9. Deploy mode = cluster
  10. Spark submit args: --class pickle.plaza.TwitterMain (main class of the application pickle.plaza being the containing package for TwitterMain)
  11. Spark application location = path/from/step3 (i.e. s3://twitter-redshift-json/apps/twitteranalysis.jar)
  12. Let application run, you can view progress if SSH'd into the EMR and proxied

twitteranalysis's People

Contributors

cooldude53 avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.