Coder Social home page Coder Social logo

oracle-samples / oracle-dataflow-samples Goto Github PK

View Code? Open in Web Editor NEW
33.0 5.0 29.0 27.8 MB

Sample examples Examples demonstrating how to use OCI Data Flow

Home Page: https://www.oracle.com/in/big-data/data-flow/

License: Universal Permissive License v1.0

Python 4.51% Java 3.45% Scala 90.67% HTML 0.01% HCL 1.06% Shell 0.30%
spark python dataflow serverless paas oracle-cloud oracle-cloud-infrastructure scala java

oracle-dataflow-samples's Introduction

Oracle Cloud Infrastructure Data Flow Samples

This repository provides examples demonstrating how to use Oracle Cloud Infrastructure Data Flow, a service that lets you run any Apache Spark Application at any scale with no infrastructure to deploy or manage.

What is Oracle Cloud Infrastructure Data Flow

Oracle Cloud Infrastructure (OCI) Data Flow is a cloud-based serverless platform with a rich user interface. It allows Spark developers and data scientists to create, edit, and run Spark jobs at any scale without the need for clusters, an operations team, or highly specialized Spark knowledge. Being serverless means there is no infrastructure for you to deploy or manage. It is entirely driven by REST APIs, giving you easy integration with applications or workflows. You can:

  • Connect to Apache Spark data sources.

  • Create reusable Apache Spark applications.

  • Launch Apache Spark jobs in seconds.

  • Manage all Apache Spark applications from a single platform.

  • Process data in the Cloud or on-premises in your data center.

  • Create Big Data building blocks that you can easily assemble into advanced Big Data applications.

Before you Begin

You must have Set Up Your Tenancy and be able to Access Data Flow

  • Setup Tenancy : Before Data Flow can run, you must grant permissions that allow effective log capture and run management.See the Set Up Administration section of Data Flow Service Guide, and follow the instructions given there.
  • Access Data Flow : Refer to this section on how to Access Data Flow

Sample Examples

Example Description Python Java Scala
CSV to Parquet This application shows how to use PySpark to convert CSV data store in OCI Object Store to Apache Parquet format which is then written back to Object Store. CSV to Parquet CSV to Parquet CSV to Parquet
Load to ADW This application shows how to read a file from OCI Object Store, perform some transformation and write the results to an Autonomous Data Warehouse instance. Load to ADW Load to ADW Load to ADW
Structured Streaming Kafka Word Count This Structured Streaming application shows how to read Kafka stream and calculate word frequencies over one minute window interval Structured Kafka Word Count Structured Kafka Word Count
Random Forest Regression This application shows how to build a model and make prediction using Random Forest Regression. Random Forest Regression
Oracle NoSQL Database cloud service This application shows how to interface with Oracle NoSQL Database cloud service. Oracle NoSQL Database cloud service

For step-by-step instructions, see the README files included with each sample.

Running the Samples

These samples show how to use the OCI Data Flow service and are meant to be deployed to and run from Oracle Cloud. You can optionally test these applications locally before you deploy them. When they are ready, you can deploy them to Data Flow without any need to reconfigure them, make code changes, or apply deployment profiles.To test these applications locally, Apache Spark needs to be installed. Refer to section on how to set the Prerequisites before you deploy the application locally Setup locally.

MLFlow Tracking Server

Set up MLFlow Tracking Server: Refer to this section dataflow-mlflow-integration

Install Spark

To install Spark, visit spark.apache.org and pick the installation path that best suits your environment.

Documentation

You can find the online documentation for Oracle Cloud Infrastructure Data Flow at docs.oracle.com.

Get Support

Security

Please consult the security guide for our responsible security vulnerability disclosure process.

Contributing

This project welcomes contributions from the community. Before submitting a pull request, please review our contribution guide.

License

See LICENSE

oracle-dataflow-samples's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.