Coder Social home page Coder Social logo

hanand1979 / gcp-black-friday-analytics Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lorenzoridi/gcp-black-friday-analytics

0.0 2.0 0.0 20 KB

Analyze Black Friday tweets with a Serverless Data Processing pipeline on Google Cloud Platform

License: Apache License 2.0

Java 90.17% Python 7.92% Shell 1.91%

gcp-black-friday-analytics's Introduction

gcp-black-friday-analytics

Analyze Black Friday tweets with a Serverless Data Processing pipeline on Google Cloud Platform.

This repository hosts the codebase for the example posted:

The architecture consists of:

  • a Google Container Engine cluster running a Python application that gathers tweets and sends them to Google Pub/Sub;
  • a Google Cloud Pub/Sub topic;
  • a Google Cloud Dataflow pipeline that reads from the Pub/Sub topic and uses the Natural Language API to retrieve the sentiment of each tweet;
  • a Google BigQuery dataset containing two tables, respectively for "raw" and "annotated" tweets.

Setup

The repository contains a bash script that automates most of the work. However, there is still something you have to do yourself:

  • Create a new Google Cloud Platform project (see https://support.google.com/cloud/answer/6251787?hl=en for instructions).
  • Enable the Natural Language API from the Cloud Console (https://console.cloud.google.com/apis/api/language.googleapis.com/overview).
  • Open Google Cloud Shell.
  • Within Cloud Shell, clone the Git Repository: git clone https://github.com/LorenzoRidiNoovle/gcp-black-friday-analytics.git gcp-black-friday-analytics.
  • Set the Google Cloud Platform zone: gcloud config set compute/zone <COMPUTE_ZONE> (replace <COMPUTE_ZONE> with your preferred zone. You can list the available zones with gcloud compute zones list).
  • Create a GCS bucket as a staging location for Dataflow deployment gsutil mb -l <LOCATION> gs://gcp-black-friday-analytics-staging (replace <LOCATION> with your preferred location. Available choices are US, EU or ASIA. Choose the location depending on the compute zone you previously selected.
  • Replace all occurrences of <YOUR_PROJECT_ID> with your actual Project ID within the K8S Yaml file. You can do this with this one-liner, if you are running commands from the Cloud Shell: sed -i -- 's@<YOUR_PROJECT_ID>@'"$DEVSHELL_PROJECT_ID"'@g' gcp-black-friday-analytics/k8s-twitter-to-pubsub/twitter-stream.yaml
  • Create a Twitter application and paste the required information in the gcp-black-friday-analytics/k8s-twitter-to-pubsub/twitter-stream.yaml file. Use your preferred text editor, like vi or nano: nano gcp-black-friday-analytics/k8s-twitter-to-pubsub/twitter-stream.yaml
  • launch the start.sh file to provision and start the processing pipeline: bash gcp-black-friday-analytics/start.sh.

Credits

The Python appplication that collects tweets and publish them on Pub/Sub comes from the really nice example "Real-Time Data Analysis with Kubernetes, Cloud Pub/Sub, and BigQuery" published here: https://cloud.google.com/solutions/real-time/kubernetes-pubsub-bigquery.

gcp-black-friday-analytics's People

Contributors

lorenzoridi avatar

Watchers

James Cloos avatar Seagull avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.