Light

nicoaddai / beginner_de_project_stream Goto Github PK

View Code? Open in Web Editor NEW

This project forked from josephmachado/beginner_de_project_stream

0.0 0.0 0.0 295 KB

Simple stream processing pipeline

Scala 97.19% Dockerfile 2.81%

beginner_de_project_stream's Introduction

Beginner Data Engineering Project - Stream Version

This is the repo for the blog at Data Engineering Project: Stream Edition

Prerequisites

You will need to install

docker (make sure to have docker-compose as well)
pgcli to connect to our postgres instance
git to clone the starter repo
Optional: tmux

Design

Data

The data will be generated by a data generation script at src/main/scala/com.startdataengineering/ServerLogGenerator.scala.

Setup and run

Everything is dockerized. Run the below commands in the project directory.

docker-compose up -d # -d mean run in detached mode (in the background)
docker ps # display all running containers

Do some manual checks using

docker exec -t beginner_de_project_stream_kafka_1 kafka-console-consumer.sh --bootstrap-server :9092  --topic server-logs --from-beginning --max-messages 10 # used to check the first 10 messages in the server-logs  topic
docker exec -t beginner_de_project_stream_kafka_1 kafka-console-consumer.sh --bootstrap-server :9092  --topic alerts --from-beginning --max-messages 10 # used to check the first 10 messages in the alerts topic

and

pgcli -h localhost -p 5432 -U startdataengineer events

password is password

select * from server_log limit 5; -- should match the first 5 from the server-logs topic
select count(*) from server_log; -- 100000
\q -- to exit pgcli

take down all the running containers using down in the project repository

docker-compose down

Contact

website: https://www.startdataengineering.com/

twitter: https://twitter.com/start_data_eng

beginner_de_project_stream's People

Contributors

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.