Coder Social home page Coder Social logo

dataflow-gcs-cf's Introduction

Pre requisites:

  1. Have a gcp project with a linked billing account

  2. Open up cloud shell

  3. have pip installed

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
sudo apt-get install python3 python3-setuptools
sudo python3 get-pip.py

Set variables

export project="your-gcp-project-id" #change this to your project
export region="your-gcp-region" #for example us-central1
export bq_dataset="dataflow_example"

Create 2 buckets one for the files and one for the template

gsutil mb -p $project -c regional -l $region -b on gs://$project-df-template/
gsutil mb -p $project -c regional -l $region -b on gs://$project-df-files/

Create bigquery dataset

bq mk --location=us --dataset $project:$bq_dataset

Clone the repostiory files

git clone https://github.com/thomas-vl/dataflow-gcs-cf.git

Deploy dataflow template

sudo pip3 install apache-beam[gcp]
cd ~/dataflow-gcs-cf/dataflow
python3 -m main --output $project:$bq_dataset.example --runner DataflowRunner --project $project \
 --staging_location gs://$project-df-template/staging --temp_location gs://$project-df-template/temp \
 --template_location gs://$project-df-template/templates/df-bq

validate if the template file exists:

gsutil ls gs://$project-df-template/templates/

Deploy the function (from cloud-functions folder)

gcloud services enable dataflow.googleapis.com --project $project
cd ~/dataflow-gcs-cf/cloud-functions
gcloud functions deploy start_dataflow --runtime python37 --trigger-resource $project-df-files --trigger-event google.storage.object.finalize --project $project --region $region

##upload the file

cd ~/dataflow-gcs-cf/
gsutil cp titanic.csv gs://$project-df-files

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.