Coder Social home page Coder Social logo

tuhinbhowmick / dataproc-templates Goto Github PK

View Code? Open in Web Editor NEW

This project forked from googlecloudplatform/dataproc-templates

0.0 0.0 0.0 18.99 MB

Dataproc templates and pipelines for solving simple in-cloud data tasks

License: Apache License 2.0

Shell 2.76% Python 40.21% Java 36.25% Jupyter Notebook 20.46% Dockerfile 0.31%

dataproc-templates's Introduction

Java Build Status Java Dataproc Serverless Integration Tests Status Java Dataproc Cluster Integration Tests Status

Python Build Status Python Dataproc Serverless Integration Test Status Python Dataproc Cluster Integration Tests Status

Dataproc Templates

Dataproc templates are designed to address various in-cloud data tasks, including data import/export/backup/restore and bulk API operations. These templates leverage the power of Google Cloud's Dataproc, supporting both Dataproc Serverless and Dataproc clusters.

Google provides this collection of pre-implemented Dataproc templates as a reference and for easy customization. (Video Link)

Open in Cloud Shell

Dataproc Templates (Java - Spark)

Please refer to the Dataproc Templates (Java - Spark) README for more information

Dataproc Templates (Python - PySpark)

Please refer to the Dataproc Templates (Python - PySpark) README for more information

Dataproc Templates (Notebooks)

Please refer to the Dataproc Templates (Notebooks) README for more information

Getting Started

  1. Clone this repository

     git clone https://github.com/GoogleCloudPlatform/dataproc-templates.git
    
  2. Obtain authentication credentials

    Create local credentials by running the following command and following the oauth2 flow (read more about the command here.

     gcloud auth application-default login
    

    Or manually set the GOOGLE_APPLICATION_CREDENTIALS environment variable to point to a service account key JSON file path.

    Learn more at Setting Up Authentication for Server to Server Production Applications.

Note: Application Default Credentials is able to implicitly find the credentials as long as the application is running on Compute Engine, Kubernetes Engine, App Engine, or Cloud Functions.

  1. Executing a Template

    Follow the specific guide, depending on your use case:

Flow diagram

Below flow diagram shows execution flow for Dataproc Templates:

Dataproc templates flow diagram

Contributing

See the contributing instructions to get started contributing.

License

All solutions within this repository are provided under the Apache 2.0 license. Please see the LICENSE file for more detailed terms and conditions.

Disclaimer

This repository and its contents are not an official Google Product.

Contact

Share your feedback, ideas, thoughts feedback-form

Questions, issues, and comments should be directed to [email protected]

dataproc-templates's People

Contributors

tanyarw avatar hhasija avatar shashank-google avatar vanshaj-bhatia avatar nj1973 avatar surjits254 avatar nilofreitas avatar anish97ind avatar poojabasker20 avatar saumyasinha-google avatar franklinwhaite avatar balajiss2 avatar shubhamgoogle avatar naveenkm13 avatar ankuljain09 avatar vsinghal202 avatar ppaglilla avatar sjlva avatar mugdhapattnaik avatar anshumanwins avatar chakresh84 avatar somanishivam avatar mokhahmed avatar varunika avatar shradha-tyagi avatar tims avatar satpreetmakhija avatar nikhil6790 avatar ajayydv avatar snrssc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.