Coder Social home page Coder Social logo

streaming-data-pipeline's Introduction

streaming-data-pipeline

CircleCI

Streaming pipeline repo for data engineering training program

See producers and consumers set up README in their respective directories

#local environment setup

###Prerequisites:

  • Make sure you have sbt installed.
  • Make sure you have docker installed and running.
  • Make sure you don't have a previous instance of Zookeeper, Kafka or Spark running before executing the script (it won't be able to allocate the port)

###Steps

  1. Run ./sbin/buildAndRunLocal.sh. This creates various Docker containers (each with an independent purpose) for running and testing this setup on your local machine.

  2. If everything us up and running, you should be able to see data in hadoop. To check for data:

    1. docker ps | grep hadoop - you should see at least one container referencing hadoop (we can ignore hadoop_seed for now)
    2. docker exec -it $CONTAINER_ID bash
    3. /usr/local/hadoop/bin/hadoop fs -ls /free2wheelers/stationMart/data
    4. Tada! We have data! (if you don't -- something went wrong, check "Considerations")

###Considerations

  • Your docker machine may need at least CPUs: 2/Memory: 4GiB/Swap: 512 MiB; remember to "Apply & Restart"
  • When running the script run docker stats for some insights
  • There's a script for stopping: ./sbin/stopAndRemoveLocal.sh, try stopping and restarting
  • If you're interested in execution logs: docker logs $CONTAINER_ID

streaming-data-pipeline's People

Contributors

zhangyuan avatar chandnirpatel avatar ryandjf avatar clopezfuentes avatar gabicha avatar ajablonski avatar dsepulve avatar burakince avatar revolaution avatar neokat avatar danniyu avatar yijiewang1990 avatar cchuang09 avatar absolouie avatar rlin-tw avatar rxvc avatar otecteng avatar stahlad avatar zm-zheng avatar santhp avatar vivitc avatar vrushalir avatar nikolasavic-tw avatar alvarohernandez avatar claresudbery avatar piyushpungliya avatar alpeshpandya avatar gz-ink avatar jfgreen avatar darshanj avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.