Coder Social home page Coder Social logo

ev2900 / flink_kinesis_data_analytics Goto Github PK

View Code? Open in Web Editor NEW
10.0 4.0 10.0 17.73 MB

Apache Flink examples designed to be run by AWS Kinesis Data Analytics (KDA).

Python 100.00%
flink-examples flink-stream-processing flink-streaming kinesis flink flink-sql flinksql aws

flink_kinesis_data_analytics's Introduction

Kinesis Data Analytics Lab

map-user map-user

Processing real-time data via. Kinesis Data Analytics - Apache Flink

Youtube video(s)

  1. Send Data to Kinesis from a Python Script
  2. Optional - Send Data to Kinesis from a KDA Notebook
  3. Create a Kinesis Data Analytics Studio and Upload a Notebook
  4. Running the Interactive Flink Zeppelin Notebook
  5. Deploy a Kinesis Data Analytics Studio Notebook

Data Producer

Note if you want to get started and do not want to set up a Kinesis Data Stream & load data into the stream / set up a data simulator, use the sql_1.13_DataGen.zpln notebook. This Zeppelin notebook uses the Flink DataGen connector to generate data with in the Zeppelin notebook without needing a connnection to Kineis or Kafka.

In order to get started with Apache Flink via. Kinesis Data Analytics (KDA), a Kinesis Data Stream with sample data is required. The kinesis_data_producer folder provides two python scripts that will read the data from the CSV file yellow_tripdata_2020-01.csv in the data folder and stream each line in the file as a JSON record/message to a Kineis Data Stream specified.

Two variations of this python data producer are provided.

The two scripts/programs are very similar. A few differences exist depending on if you want run the producer application(s) from your local computer/laptop or if you want to use Cloud9.

For a step by step walk through view the Youtube video Send Data to Kinesis from a Python Script

An alternative method to send sample data to a Kinesis Data Stream - without the need to set up the python data producer(s) described above - is to use the Nyc_Taxi_Produce_KDA_Zeppelin_Notebook.zpln notebook in KDA Studio. This notebook can be uploaded and has instructions to sends sample data from S3 to a Kinesis Data Stream.

To benefit the most from the sample Flink code / labs provided it will be important that you can easily start and stop a python data producer.

Interactive KDA Flink Zeppelin Notebook(s)

The interactive_KDA_flink_zeppelin_notebook folder provides Zeppelin notebooks that are design to work with Kinesis Data Analytics Studio. Deploy a Kinesis Data Analytics Studio instance and upload the Zeppelin (.zpln) notebook(s).

Note - with in the the interactive_KDA_flink_zeppelin_notebook folder are subfolders

Depending on which version of Flink your notebook is configured to use. I would recommend using Flink v1.13.

To upload the notebook

upload_notebook

Once uploaded and opended in Zeppelin. Run the notebook one cell at a time

interactive_notebook

For a step by step walk through of the notebook running view the Youtube video Running the Interactive Flink Zeppelin Notebook

Deployable KDA Flink Zeppelin Notebook(s)

Kinesis Data Analytics Studio provides an excellent development environment. When you are ready to deploy you application Kinesis Data Analytics Studio has a mechanism to build and deploy your notebook code as a long running Kinesis Data Analytics application.

To deploy your notebook

Ensure that when you created your notebook environment you configured the Deploy as application configuration - optional setting with a valid S3 bucket.

deploy_config

To access this configuration menu during the creation of your studio notebook select Create with custom settings instead of the default Quick create with sample code. Follow the set up prompts and on Step 3 - Configure select an S3 bucket for the Deploy as application configuration - optional

With this configured your Zeppelin notebook select Build deployable and export to Amazon S3

build_action

Once the build is complete. Select Deploy deployable as Kinesis Analytics application

deploy_action

When the deployment is complete you will see the application under the analytics application section of Kinesis Data Analytics

deployed

Future Improvements Planned for this Repository

flink_kinesis_data_analytics's People

Contributors

ev2900 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.