Coder Social home page Coder Social logo

dalalsunil1986 / flink-couchbase-data-starter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from isopropylcyanide/flink-couchbase-data-sink

0.0 0.0 0.0 24 KB

A Flink job that reads a Json file (either one-time or continous poll) as its source and dumps it to couchbase as a sink using the asynchronous Couchbase SDK.

Java 100.00%

flink-couchbase-data-starter's Introduction

Flink-couchbase-data-starter

A Flink job that reads a Json file (either one-time or continous poll) as its source and dumps it to couchbase as a sink using the asynchronous Couchbase SDK.


  • A file containing the list of json documents with id is present that needs to be inserted into couchbase

  • A Flink job takes the file as its source and dumps it to a couchbase sink

  • Couchbase sink puts the incoming documents to the cluster specified in the config files

  • [Optional] Flink job has the abiility to poll the file for changes at a duration specified in the config

flink


Prerequisites

  • Java 1.8
  • Apache Flink 1.6.0 Download
  • Couchbase Server

Config properties

Edit the following properties to match your target instance

Property Value
couchbase.node Location of couchbase cluster. By default, localhost
couchbase.username Username of couchbase dashboard
couchbase.password Password of couchbase dashboard
startup.documents.path Path of the json document file. By default, it is present in src/main/resources
startup.documents.poll.continuous Flag to enable polling or not. By default set to false
startup.documents.poll.duration Duration in ms after which file will be polled for changes if enabled

Setting up the project

  # Start Couchbase server instance 
  $ sudo /etc/init.d/couchbase-server start

  # Create a default bucket. Change port accordingly
  $ View couchbase dashboard at http://127.0.0.1:8091. Enter your credentials and create a bucket called "data"
  
  # Start Flink cluster in the FLINK_BIN directory
  $ start-cluster.sh
  
  # Submit the job by packaging the jar and supplying its path. The config lies in src/main/resources
  $ flink.sh run -c com.aman.flink.job.FlinkDatabaseStartupJob <jar-location> --config <config-file-location>
  
  # Verify the documents were inserted properly
  $ View the dashboard at http://127.0.0.1:8091 and verify the documents in the bucket "data"
  

Note: Replace .sh files with .bat files when working in a Windows environment.



Flink

  • Open-source platform for distributed stream and batch data processing.
  • Provides data distribution, communication, and fault tolerance for distributed computations over data streams.
  • Builds batch processing on top of the streaming engine, overlaying native iteration support, managed memory, and program optimization.

Couchbase

  • Open source, distributed, NoSQL document-oriented engagement database.
  • Exposes a fast key-value store with managed cache for sub-millisecond data operations
  • Specialized to provide low-latency data management for large-scale interactive web, mobile, and IoT applications

flink-couchbase-data-starter's People

Contributors

isopropylcyanide avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.