Coder Social home page Coder Social logo

danhnguyen123 / data-on-eks Goto Github PK

View Code? Open in Web Editor NEW

This project forked from awslabs/data-on-eks

0.0 0.0 0.0 139.36 MB

DoEKS is a tool to build, deploy and scale Data Platforms on Amazon EKS

Home Page: https://awslabs.github.io/data-on-eks/

License: Apache License 2.0

Shell 14.75% JavaScript 1.24% Python 10.53% TypeScript 0.52% CSS 0.28% PLpgSQL 13.97% HCL 44.31% Jupyter Notebook 13.50% Dockerfile 0.89%

data-on-eks's Introduction

Data on EKS

(pronounce Do.eks)

plan-examples

Build, Scale, and Optimize Data & AI/ML Platforms on Amazon EKS πŸš€

Welcome to the Data on EKS repository, a comprehensive resource for scaling your data and machine learning workloads on Amazon EKS and unlocking the power of Gen AI. Harness the capabilities of AWS Trainium, AWS Inferentia and NVIDIA GPUs to scale and optimize your Gen AI workloads with ease.

This open-source tool offers a comprehensive collection of Terraform Blueprints, featuring industry best practices, to effortlessly deploy end-to-end solutions on Amazon EKS with advanced logging and observability. Dive into a diverse range of practical examples, showcasing the potential and flexibility of running AI/ML workloads on EKS, including Apache Spark, PyTorch, Tensorflow, XGBoost, and more. Unlock valuable insights from benchmark reports and access expert guidance to optimize your data solutions. Discover how to effortlessly create robust clusters for Amazon EMR on EKS, Apache Spark, Apache Flink, Apache Kafka, and Apache Airflow, while exploring cutting-edge machine learning platforms like Ray, Kubeflow, Jupyterhub, NVIDIA GPUs, AWS Trainium, and AWS Inferentia on EKS.

Note: DoEKS is actively being developed for various patterns. To see what features are in progress, please check out the issues section of our repository.

πŸ—οΈ Architecture

The diagram below showcases the wide array of open-source data tools, Kubernetes operators, and frameworks supported by DoEKS. It also highlights the seamless integration of AWS Data Analytics managed services with the powerful capabilities of DoEKS open-source tools.

image

🌟 Features

Data on EKS(DoEKS) solution is categorized into the following focus areas.

🎯 Data Analytics on EKS

🎯 AI/ML on EKS

🎯 Streaming Platforms on EKS

🎯 Scheduler Workflow Platforms on EKS

🎯 Distributed Databases & Query Engine on EKS

πŸƒβ€β™€οΈGetting Started

In this repository, you'll find a variety of deployment blueprints for creating Data/ML platforms with Amazon EKS clusters. These examples are just a small selection of the available blueprints - visit the DoEKS website for the complete list of options.

πŸš€ JupyterHub on EKS πŸ‘ˆ This blueprint deploys a self-managed JupyterHub on EKS with Amazon Cognito authentication.

πŸš€ Ray on EKS πŸ‘ˆ This blueprint deploys Ray Operator on EKS with sample scripts.

πŸš€ Trainium/Inferentia with TorchX and Volcano on EKS πŸ‘ˆ This blueprint deploys Gen AI blueprint on EKS with sample Training scripts.

πŸš€ EMR-on-EKS with Karpenter πŸ‘ˆ Start here if you are new to EMR on EKS. This blueprint deploys EMR on EKS cluster and uses Karpenter to scale Spark jobs.

πŸš€ Spark Operator with Apache YuniKorn on EKS πŸ‘ˆ This blueprint deploys EKS cluster and uses Spark Operator and Apache YuniKorn for running self-managed Spark jobs

πŸš€ Self-managed Airflow on EKS πŸ‘ˆ This blueprint sets up a self-managed Apache Airflow on an Amazon EKS cluster, following best practices.

πŸš€ Argo Workflows on EKS πŸ‘ˆ This blueprint sets up a self-managed Argo Workflow on an Amazon EKS cluster, following best practices.

πŸš€ Kafka on EKS πŸ‘ˆ This blueprint deploys a self-managed Kafka on EKS using the popular Strimzi Kafka operator.

πŸ—‚οΈ Documentation

For instructions on how to deploy Data on EKS patterns and run sample tests, visit the DoEKS website.

πŸ† Motivation

Kubernetes is a widely adopted system for orchestrating containerized software at scale. As more users migrate their data and machine learning workloads to Kubernetes, they often face the complexity of managing the Kubernetes ecosystem and selecting the right tools and configurations for their specific needs.

At AWS, we understand the challenges users encounter when deploying and scaling data workloads on Kubernetes. To simplify the process and enable users to quickly conduct proof-of-concepts and build production-ready clusters, we have developed Data on EKS (DoEKS). DoEKS offers opinionated open-source blueprints that provide end-to-end logging and observability, making it easier for users to deploy and manage Spark on EKS, Kubeflow, MLFlow, Airflow, Presto, Kafka, Cassandra, and other data workloads. With DoEKS, users can confidently leverage the power of Kubernetes for their data and machine learning needs without getting overwhelmed by its complexity.

🀝 Support & Feedback

DoEKS is maintained by AWS Solution Architects and is not an AWS service. Support is provided on a best effort basis by the Data on EKS Blueprints community. If you have feedback, feature ideas, or wish to report bugs, please use the Issues section of this GitHub.

πŸ” Security

See CONTRIBUTING for more information.

πŸ’Ό License

This library is licensed under the Apache 2.0 License.

πŸ™Œ Community

We welcome all individuals who are enthusiastic about data on Kubernetes to become a part of this open source community. Your contributions and participation are invaluable to the success of this project.

Built with ❀️ at AWS.

data-on-eks's People

Contributors

5cp avatar alanty avatar alyibrahim avatar askulkarni2 avatar asmacdo avatar bbgu1 avatar bryantbiggs avatar codesometech avatar dalbhanj avatar dependabot[bot] avatar github-actions[bot] avatar jagpk avatar jaradtke-aws avatar jihed avatar lmouhib avatar lusoal avatar melodyyangaws avatar nabuskey avatar ovaleanu avatar rajarshighosal avatar ratnopamc avatar raykrueger avatar rbarcia avatar sanjeevrg89 avatar senkinnar avatar srikaanthpenugonda avatar vara-bonthu avatar victorgu-github avatar yarikoptic avatar youngjeong46 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.