Coder Social home page Coder Social logo

community-infra's Introduction

OpenSSF Best Practices OpenSSF Scorecard CLOMonitor

Kubeflow the cloud-native platform for machine learning operations - pipelines, training and deployment.


Documentation

Please refer to the official docs at kubeflow.org.

Working Groups

The Kubeflow community is organized into working groups (WGs) with associated repositories, that focus on specific pieces of the ML platform.

Quick Links

Get Involved

Please refer to the Community page.

community-infra's People

Contributors

andreyvelich avatar bobgy avatar jlewi avatar karlschriek avatar terrytangyuan avatar yuzisun avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

community-infra's Issues

CNRM unhealthy

ACM shows the following error.

kf-kcc-admin   KNV2010: unable to create resource: KNV2010: failed to create "iam.cnrm.cloud.google.com/v1beta1, Kind=IAMPolicyMember", "github-probots/jlewi-editor": Internal error occurred: failed calling webhook "iam-validation.cnrm.cloud.google.com": Post https://cnrm-validating-webhook.cnrm-system.svc:443/iam-validation?timeout=30s: no endpoints available for service "cnrm-validating-webhook"

GCP Resources for GSoC students

The IAMPolicyMember YAML file which @jlewi has uploaded grants owner access to [email protected]. Should I add user:<my google account email> to members list and open a PR?

Although, it is noted in the file that

  # Note: You can't grant owner permissions this way
  # because external owners must be invited. We have a couple options
  # for deploying Kubeflow. 
  #
  # 1. We can grant owners to groups
  # 2. We can invite owners through cloud console

What should be the approach on getting access to community's GCP resources in this case?

ACM sync if failing no containercluster crd

Config Management Errors:
kf-kcc-admin KNV1021: No CustomResourceDefinition is defined for the type "ContainerCluster.container.cnrm.cloud.google.com" in the cluster.
Resource types that are not native Kubernetes objects must have a CustomResourceDefinition.

source: namespaces/kf-infra-gitops/cluster.yaml
namespace: kf-infra-gitops
metadata.name: kf-org-admin
group: container.cnrm.cloud.google.com
version: v1beta1
kind: ContainerCluster

Need additional kf-kcc admins

We need additional kf-kcc admins that can help teams relying on the community infra.

The current list of kf-kcc admins is listed here.
https://github.com/kubeflow/internal-acls/blob/master/kf-kcc-admins.members.txt

I think most of the non-Googlers (and lots of the Googlers) are no longer actively involved in the project so that list needs to be pruned.
@jinchihe is still active but currently on vacation.

@kubeflow/automl-leads
@kubeflow/training-leads
@kubeflow/kfserving-owners

One or more of you folks probably needs to join the group in order to help out yourself and others.

ACM Installs an old version of KCC

We currently have ACM
gcr.io/config-management-release/nomos:v1.3.1-rc.1

It looks like this has an old version of Cloud Config Connector.
gcr.io/config-management-release/cnrm-controller:cac1dbb

This version doesn't include support for a lot of resources e.g. CloudService.

How do we surface sync errors

Anyone can create PRs to create or modify infrastructure.

Unfortunately, we don't have a good way to surface sync errors to those users. For example, one problem we are seeing is that if people submit a PR to create a project, creating that project might fail because the project id is non-unique. This error is readily visable if we do

kubectl describe ....

I don't think we want to give GKE view access to everyone as there is sensitive information like secrets.
Likewise, I'm not sure we want to give viewer log access to everyone.

Could we setup a stackdriver sync to filter out only a subset of logs (e.g. relevant K8s events) and dump them to some world readable location; e.g. BigQuery

related: kubeflow/testing#737
kubeflow/testing#736

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.