Coder Social home page Coder Social logo

gcp-datalow-assignment-zenith-sample's Introduction

GCP Dataflow Assignment Zenith

This repository contains assignment code for gcp by zenith system solution

Requirement

Build the data pipeline on Google Cloud Platform (GCP). 
Source code can be in java or python and procedure on setup GCP Dataflow.

Acceptance Criteria :

  1. Create a BigQuery table with the fields the same as csv.
  2. Manual upload the rh_201810.csv into GCP Bucket.
  3. Create a java or python project (run in GCP Dataflow) to load the data from GCP Bucket into BigQuery.

Table Schema

Field Name Type Mode
TIMESTAMP DATE NULLABLE
CUSTOMER STRING NULLABLE
POINTNAME STRING NULLABLE
VALUE FLOAT NULLABLE

Source Code

1. create_gcp_table.py - It creates the table with the defined Schema
2. push_data_to_bucket.py - It send the file to the storage Bucket

Authentication The python program set GOOGLE_APP_CREDENTIAL to content of auth.json. Please make sure you have the key placed at the root of the python program.

3. Dataflow Pipeline - Java based Maven Project
The Java code is written in Apache Beam, which creates a pipeline and submits a Dataflow Job to the Dataflow for reading the CSV from cloud storage to the BigQuery. 
Pass the below command-line arguments to run the job on Datflow, if the arguments are not supplied, it will use java direct runner.
--project=<YOUR_PROJECT_ID> --stagingLocation=<STAGING_LOCATION_IN_CLOUD_STORAGE> --runner=DataflowRunner

The application assumes the GOOGLE_APP_CREDENTIAL is set in the classpath. If not already done, this can be done by running below command.

export GOOGLE_APPLICATION_CREDENTIALS=/Users/username/key.json

gcp-datalow-assignment-zenith-sample's People

Contributors

amarkum avatar

Watchers

Rohan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.