Coder Social home page Coder Social logo

pradithya / flyte Goto Github PK

View Code? Open in Web Editor NEW

This project forked from flyteorg/flyte

0.0 1.0 0.0 43.22 MB

Kubernetes-native workflow automation platform for complex, mission-critical data and ML processes at scale. It has been battle-tested at Lyft, Spotify, Freenome, and others and is truly open-source.

Home Page: https://flyte.org

License: Apache License 2.0

Makefile 2.19% Shell 20.29% Python 51.23% HCL 14.10% Dockerfile 2.43% Mustache 9.06% CSS 0.70%

flyte's Introduction

Flyte and LF AI & Data Logo

Flyte

Flyte is a workflow automation platform for complex, mission-critical data and ML processes at scale

Current Release Sandbox Build End-to-End Tests License Commit Activity Commits since Last Release GitHub Milestones Completed GitHub Next Milestone Percentage Docs Twitter Follow Flyte Helm Chart Slack Status

💥 Introduction

Flyte is a structured programming and distributed processing platform that enables highly concurrent, scalable and maintainable workflows for Machine Learning and Data Processing. It is a fabric that connects disparate computation backends using a type safe data dependency graph. It records all changes to a pipeline, making it possible to rewind time. It also stores a history of all executions and provides an intuitive UI, CLI and REST/gRPC API to interact with the computation.

Flyte is more than a workflow engine -- it uses a workflow as a core concept and a task (a single unit of execution) as a top level concept. Multiple tasks arranged in a data producer-consumer order create a workflow.

Workflows and Tasks can be written in any language, with out of the box support for Python, Java and Scala.

⏳ Five Reasons to Use Flyte

  • Kubernetes-Native Workflow Automation Platform
  • Ergonomic SDK's in Python, Java & Scala
  • Versioned & Auditable
  • Reproducible Pipelines
  • Strong Data Typing

🚀 Quick Start

With Docker installed and Flytectl installed, run the following command:

  flytectl sandbox start

This creates a local Flyte sandbox. Once the sandbox is ready, you should see the following message: Flyte is ready! Flyte UI is available at http://localhost:30081/console.

Visit http://localhost:30081/console to view the Flyte dashboard.

Here's a quick visual tour of the console.

Flyte console Example

To dig deeper into Flyte, refer to the Documentation.

⭐️ Current Deployments & Contributors

🔥 Features

  • Used at Scale in production by 500+ users at Lyft with more than 1 million executions and 40+ million container executions per month
  • A data aware platform
  • Enables collaboration across your organization by:
    • Executing distributed data pipelines/workflows
    • Reusing tasks across projects, users, and workflows
    • Making it easy to stitch together workflows from different teams and domain experts
    • Backtracing to a specified workflow
    • Comparing results of training workflows over time and across pipelines
    • Sharing workflows and tasks across your teams
    • Simplifying the complexity of multi-step, multi-owner workflows
  • Quick registration -- start locally and scale to the cloud instantly
  • Centralized Inventory constituting Tasks, Workflows and Executions
  • gRPC / REST interface to define and execute tasks and workflows
  • Type safe construction of pipelines -- each task has an interface which is characterized by its input and output, so illegal construction of pipelines fails during declaration rather than at runtime
  • Supports multiple data types for machine learning and data processing pipelines, such as Blobs (images, arbitrary files), Directories, Schema (columnar structured data), collections, maps, etc.
  • Memoization and Lineage tracking
  • Provides logging and observability
  • Workflow features:
    • Start with one task, convert to a pipeline, attach multiple schedules, trigger using a programmatic API, or on-demand
    • Parallel step execution
    • Extensible backend to add customized plugin experience (with simplified user experience)
    • Branching
    • Inline subworkflows (a workflow can be embeded within one node of the top level workflow)
    • Distributed remote child workflows (a remote workflow can be triggered and statically verified at compile time)
    • Array Tasks (map a function over a large dataset -- ensures controlled execution of thousands of containers)
    • Dynamic workflow creation and execution with runtime type safety
    • Container side plugins with first class support in Python
    • PreAlpha: Arbitrary flytekit-less containers supported (RawContainer)
  • Guaranteed reproducibility of pipelines via:
    • Versioned data, code and models
    • Automatically tracked executions
    • Declarative pipelines
  • Multi cloud support (AWS, GCP and others)
  • Extensible core, modularized, and deep observability
  • No single point of failure and is resilient by design
  • Automated notifications to Slack, Email, and Pagerduty
  • Multi K8s cluster support
  • Out of the box support to run Spark jobs on K8s, Hive queries, etc.
  • Snappy Console
  • Python CLI and Golang CLI (flytectl)
  • Written in Golang and optimized for large running jobs' performance
  • Grafana templates (user/system observability)

In Progress

  • Demos; Distributed Pytorch, feature engineering, etc.
  • Integrations; Great Expectations, Feast
  • Least-privilege Minimal Helm Chart
  • Relaunch execution in recover mode
  • Documentation as code

🔌 Available Plugins

📦 Component Repos

Repo Language Purpose Status
flyte Kustomize,RST deployment, documentation, issues Production-grade
flyteidl Protobuf interface definitions Production-grade
flytepropeller Go execution engine Production-grade
flyteadmin Go control plane Production-grade
flytekit Python python SDK and tools Production-grade
flyteconsole Typescript admin console Production-grade
datacatalog Go manage input & output artifacts Production-grade
flyteplugins Go flyte plugins Production-grade
flytestdlib Go standard library Production-grade
flytesnacks Python examples, tips, and tricks Incubating
flytekit-java Java/Scala Java & scala SDK for authoring Flyte workflows Incubating
flytectl Go A standalone Flyte CLI Incomplete

🔩 Production K8s Operators

Repo Language Purpose
Spark Go Apache Spark batch
Flink Go Apache Flink streaming

🤝 Community & Resources

Here are some resources to help you learn more about Flyte.

Communication Channels

Biweekly Community Sync

  • 📣 Flyte OSS Community Sync Every other Tuesday, 9am-10am PDT. Checkout the calendar and register to stay up-to-date with our meeting times. Or simply join us on Zoom.
  • Upcoming meeting agenda, previous meeting notes and a backlog of topics are captured in this document.
  • If you'd like to revisit any previous community sync meetings, you can access the video recordings on Flyte's YouTube channel.

Conference Talks

  • Kubecon 2019 - Flyte: Cloud Native Machine Learning and Data Processing Platform video | deck
  • Kubecon 2019 - Running LargeScale Stateful workloads on Kubernetes at Lyft video
  • re:invent 2019 - Implementing ML workflows with Kubernetes and Amazon Sagemaker video
  • Cloud-native machine learning at Lyft with AWS Batch and Amazon EKS video
  • OSS + ELC NA 2020 splash
  • Datacouncil video | splash
  • FB AI@Scale Making MLOps & DataOps a reality
  • GAIC 2020
  • OSPOCon 2021 Catch a variety of Flyte talks - final schedule and topics to be released soon.

Blog Posts

Podcasts

💖 Top Contributors

A big thank you to the community for making Flyte possible!

flyte's People

Contributors

akhurana001 avatar cosmicbboy avatar dschaller avatar enghabu avatar flyte-bot avatar georgesnelling avatar honnix avatar hoyajigi avatar igorvalko avatar ilikedata avatar jeevb avatar jonathanburns avatar katrogan avatar kumare3 avatar manuelrombach avatar mayitbeegh avatar migueltol22 avatar narape avatar nitinagg avatar orf avatar pingsutw avatar pmahindrakar-oss avatar rubenbarragan avatar samhita-alla avatar sandragh5 avatar sbrunk avatar schottra avatar stephen37 avatar wild-endeavor avatar yindia avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.