Coder Social home page Coder Social logo

aporia-ai / mlplatform-workshop Goto Github PK

View Code? Open in Web Editor NEW
430.0 12.0 67.0 614 KB

🍫 Example code for a basic ML Platform based on Pulumi, FastAPI, DVC, MLFlow and more

Home Page: https://www.aporia.com/blog/building-an-ml-platform-from-scratch/

License: MIT License

TypeScript 76.49% Dockerfile 3.37% Makefile 4.22% Python 15.92%
mlops machine-learning devops

mlplatform-workshop's Introduction

ML Platform Workshop

This repo contains example code for a (very basic) ML platform.

  • The model-template directory contains an example for a Cookiecutter-based template that data scientists can clone to start a new project.
  • The infra directory contains Pulumi code that spins up the shared infrastructure of the ML platform, such as Kubernetes, MLFlow, etc.

Made with ❤️ by Aporia

The YouTube Video

IMAGE ALT TEXT HERE

Why?

As data science teams become more mature with models reaching actual production, the need for a proper infrastructure becomes crucial. Leading companies in the field with massive engineering teams like Uber, Netflix and Airbnb had created multiple solutions for their infrastructure and named the combination of them as “ML Platform”.

We hope this repo can help you get started with building your own ML platform ❤️

Architecture

Based on the following projects:

When building your own ML platform, do not take these tools for granted! Check out alternatives and find the best tools that solve each one of your problems.

What's missing from this?

Well... a lot actually. Here's a partial list:

  • HTTPS & Authentication
  • Environments (staging, production)
  • Common library for preprocessing, postprocessing, etc
  • Model input & validation
  • Training orchestration
  • and probably much more!

We would love your help!

mlplatform-workshop's People

Contributors

alongubkin avatar nicarod avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlplatform-workshop's Issues

Can you please update the index.ts code?

There are a lot of errors happening due to kubernetes 1.22 and traefik helm charts.

Can you please update the code to reflect the current versions. Alon's tutorial on youtube has a different code and run differently.

Thanks! This will be a great help for everyone working on this in 2023!

Unable to deploy

Hi there

Appreciate the great work for posting and uploading this. I'm struggling to deploy the infrastructure as is.

I've created a domain in route53 and inserted the domain name and the Zoneid in the code, and had to change the Postgres version from 11.10 to 11 as it threw an error with the 11.10 version but I'm getting a bunch of errors related to Traefik.

The diagnosis of the Pulumi provision is below:

`

kubernetes:apiextensions.k8s.io/v1beta1:CustomResourceDefinition (middlewares.traefik.containo.us):
warning: apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.

error: resource middlewares.traefik.containo.us was not successfully created by the Kubernetes API server : apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.

kubernetes:apiextensions.k8s.io/v1beta1:CustomResourceDefinition (ingressroutes.traefik.containo.us):
warning: apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.
error: resource ingressroutes.traefik.containo.us was not successfully created by the Kubernetes API server : apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.

kubernetes:apiextensions.k8s.io/v1beta1:CustomResourceDefinition (tlsoptions.traefik.containo.us):
warning: apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.
error: resource tlsoptions.traefik.containo.us was not successfully created by the Kubernetes API server : apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.

kubernetes:traefik.containo.us/v1alpha1:IngressRoute (traefik-dashboard):
warning: This resource contains Helm hooks that are not currently supported by Pulumi. The resource will be created, but any hooks will not be executed. Hooks support is tracked at https://github.com/pulumi/pulumi-kubernetes/issues/555 -- This warning can be disabled by setting the PULUMI_K8S_SUPPRESS_HELM_HOOK_WARNINGS environment variable
error: creation of resource default/traefik-dashboard failed because the Kubernetes API server reported that the apiVersion for this resource does not exist. Verify that any required CRDs have been created: no matches for kind "IngressRoute" in version "traefik.containo.us/v1alpha1"

kubernetes:apiextensions.k8s.io/v1beta1:CustomResourceDefinition (ingressroutetcps.traefik.containo.us):
warning: apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.
error: resource ingressroutetcps.traefik.containo.us was not successfully created by the Kubernetes API server : apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.

kubernetes:traefik.containo.us/v1alpha1:Middleware (mlflow-strip-prefix):
error: creation of resource mlflow/mlflow-strip-prefix-33bf2e4f failed because the Kubernetes API server reported that the apiVersion for this resource does not exist. Verify that any required CRDs have been created: no matches for kind "Middleware" in version "traefik.containo.us/v1alpha1"

kubernetes:traefik.containo.us/v1alpha1:Middleware (mlflow-trailing-slash):
error: creation of resource mlflow/mlflow-trailing-slash-0d17ce3f failed because the Kubernetes API server reported that the apiVersion for this resource does not exist. Verify that any required CRDs have been created: no matches for kind "Middleware" in version "traefik.containo.us/v1alpha1"

kubernetes:apiextensions.k8s.io/v1beta1:CustomResourceDefinition (ingressrouteudps.traefik.containo.us):
warning: apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.
 error: resource ingressrouteudps.traefik.containo.us was not successfully created by the Kubernetes API server : apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.

kubernetes:apiextensions.k8s.io/v1beta1:CustomResourceDefinition (traefikservices.traefik.containo.us):
warning: apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.
 error: resource traefikservices.traefik.containo.us was not successfully created by the Kubernetes API server : apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.

pulumi:pulumi:Stack (ml-infra-dev):
error: update failed
error: Error: invocation of kubernetes:helm:template returned an error: error reading from server: read tcp 127.0.0.1:53702->127.0.0.1:53700: use of closed network connection
    at Object.callback (/Users/Programming/projects/ml-pipeline/ml-infra/node_modules/@pulumi/runtime/invoke.ts:172:33)
    at Object.onReceiveStatus (/Users/Programming/projects/ml-pipeline/ml-infra/node_modules/@grpc/grpc-js/src/client.ts:338:26)
    at Object.onReceiveStatus (/Users/Programming/projects/ml-pipeline/ml-infra/node_modules/@grpc/grpc-js/src/client-interceptors.ts:426:34)
    at Object.onReceiveStatus (/Users/Programming/projects/ml-pipeline/ml-infra/node_modules/@grpc/grpc-js/src/client-interceptors.ts:389:48)
    at /Users/Programming/projects/ml-pipeline/ml-infra/node_modules/@grpc/grpc-js/src/call-stream.ts:276:24
    at processTicksAndRejections (node:internal/process/task_queues:77:11)

I0108 12:50:16.966189    2838 request.go:682] Waited for 1.032607959s due to client-side throttling, not priority and fairness, request: GET:https://2DD036C575E8E66E280A78402AFB414F.gr7.us-west-2.eks.amazonaws.com/apis/apps/v1?timeout=32s
I0108 12:50:27.164277    2838 request.go:682] Waited for 4.434218083s due to client-side throttling, not priority and fairness, request: GET:https://2DD036C575E8E66E280A78402AFB414F.gr7.us-west-2.eks.amazonaws.com/apis/batch/v1beta1?timeout=32s
I0108 12:50:37.364251    2838 request.go:682] Waited for 1.036100542s due to client-side throttling, not priority and fairness, request: GET:https://2DD036C575E8E66E280A78402AFB414F.gr7.us-west-2.eks.amazonaws.com/apis/rbac.authorization.k8s.io/v1?timeout=32s
I0108 12:50:47.564198    2838 request.go:682] Waited for 4.433233583s due to client-side throttling, not priority and fairness, request: GET:https://2DD036C575E8E66E280A78402AFB414F.gr7.us-west-2.eks.amazonaws.com/apis/storage.k8s.io/v1?timeout=32s
I0108 12:50:57.764186    2838 request.go:682] Waited for 1.032272375s due to client-side throttling, not priority and fairness, request: GET:https://2DD036C575E8E66E280A78402AFB414F.gr7.us-west-2.eks.amazonaws.com/apis/node.k8s.io/v1beta1?timeout=32s
I0108 12:51:07.964178    2838 request.go:682] Waited for 4.432762875s due to client-side throttling, not priority and fairness, request: GET:https://2DD036C575E8E66E280A78402AFB414F.gr7.us-west-2.eks.amazonaws.com/apis/flowcontrol.apiserver.k8s.io/v1beta2?timeout=32s
I0108 12:51:20.563948    2838 request.go:682] Waited for 1.09841225s due to client-side throttling, not priority and fairness, request: GET:https://2DD036C575E8E66E280A78402AFB414F.gr7.us-west-2.eks.amazonaws.com/apis/batch/v1beta1?timeout=32s
I0108 12:51:34.163858    2838 request.go:682] Waited for 1.102064041s due to client-side throttling, not priority and fairness, request: GET:https://2DD036C575E8E66E280A78402AFB414F.gr7.us-west-2.eks.amazonaws.com/apis/rbac.authorization.k8s.io/v1?timeout=32s

kubernetes:apiextensions.k8s.io/v1beta1:CustomResourceDefinition (tlsstores.traefik.containo.us):
warning: apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.
  error: resource tlsstores.traefik.containo.us was not successfully created by the Kubernetes API server : apiVersion "apiextensions.k8s.io/v1beta1/CustomResourceDefinition" was removed in Kubernetes 1.22. Use "apiextensions.k8s.io/v1/CustomResourceDefinition" instead.`

Appreciate if someone can guide me through fixing this

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.