Coder Social home page Coder Social logo

Comments (7)

flaviodsr avatar flaviodsr commented on May 22, 2024
  1. don't encode the registry hostname explicitly in the workflow and runnable definitions. Using a placeholder attribute value that says "this image comes from the built-in docker-registry" instead of pointing to it explicitly can help with the identity crysis.

Just a note that while I am working on the workflows, I am not really explicitly setting the registry. As the image name is one of the inputs, the workflow expects the registry to be part of the image name, for example, for using the internal FuseML registry the image name should be registry.fuseml-registry/mlflow/trainer:v1

from fuseml.

stefannica avatar stefannica commented on May 22, 2024

Moving out of MVP scope due to time constraints and the fact that this is not a must have for MVP. This issue is not functionality impairing and a workaround is already in place to address the Tekton related issue.

from fuseml.

stefannica avatar stefannica commented on May 22, 2024

Moving to V2 for further investigation. If it turns out to be too difficult to implement, or it requires adding TLS support to part of the FuseML components, we should do it later.

from fuseml.

stefannica avatar stefannica commented on May 22, 2024

Adding notes from older issue regarding removal of Quarks from FuseML.

  • Quarks is used to generate certificate requests for CA certificates and automatically get them signed by the kubernetes API server. Only the quarks secret functionality is used (https://quarks.suse.dev/docs/quarks-secret/). This is something that can also be done manually through the use of kubernetes CertificateSigningRequest (more here: https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/). Let's call these cluster CA certificates.
  • cluster CA certificates are used to further generate and sign server certificates for the docker-registry component. They need to be accepted by the container engine running on all kubernetes nodes, because the kubernetes cluster will interact directly with the docker registry (i.e. pull images from it), which is why it is critical that they be signed by the kubernetes API server. Apparently, the root CA certificate used by the kubernetes API to sign cluster CA certificates is already installed on all nodes, so all signed certificates are immediately valid across the entire cluster
  • cluster CA certificates also need to be installed on all containers and external machines that interact with the docker registry: tekton pipeline containers that build and publish container images for instance, but this is less critical than the previous point about instant validation by kubernetes nodes. The fact that they are cluster CA certificates is irrelevant for clients other than the container engines
  • removing quarks requires that we find another way to generate and maintain cluster CA certificates. Some thoughts on that:
    • we cannot generate self-signed CA certificates: they have to be manually installed on all kubernetes nodes
    • we could generate and sign cluster CA certificates directly in the helm chart. The helm chart needs to interact with the k8s API to both create CertificateSigningRequests and accept them, so that might be a problem.
    • the certificate-manager k8s service might provide the solution, if it supports generating cluster CA certificates (to be researched), but this is not as lightweight as simply using the k8s built-in CertificateSigningRequests feature. It creates a hard dependency from FuseML to certificate-manager and it complicates the installation process. On the other hand, we'll probably need to use certificate-manager or another external certificate management tool at some point anyway, to generate certificates for TLS support
    • using istio for certificate management is also an option. Furthermore, istio has some integration with the kubernetes CSR feature, albeit currently experimental (to be researched). More on that here: https://istio.io/latest/docs/tasks/security/cert-management/custom-ca-k8s/
    • as a last resort: there might be another kubernetes service or application out there that we can use that is not related to cloudfoundry and is concerned exclusively with managing cluster CA certificates (unlike quarks). This project looks promising (read the linked medium article for more info): https://github.com/Trendyol/k8s-webhook-certificator

Looking towards the future, we should try to take into account the multi-cluster use-case, where several clusters may consume images hosted in a single docker registry server. Generating cluster CA certificates only works in the cluster that is used to generates them.

from fuseml.

flaviodsr avatar flaviodsr commented on May 22, 2024

This issue covers 2 different, but related, problems:

  1. Remove FuseML dependency from quarks

    I was able to get FuseML working without quarks by using Trow, although it should also work with docker registry as long as it is setup properly. The catch is: the registry is accessed via http by the kubernetes nodes and via https by the pods. By default docker/crictl considers any registry on localhost as insecure, so when, for example, pulling an image from localhost/127.0.0.1 it is done through http. That is the reason that the trainer step succeeds to start the pod using an image from the FuseML registry even when the node does not have the registry certificate. It is done via http.

    The assumption that a certificate signed by the cluster CA is trusted by the kubernetes nodes is not correct. To confirm that, I have configured an ingress gateway/virtual service with SSL passthrough and tried pulling an image from one of the kubernetes nodes using the istio gateway url, it failed with certificate signed by unknown authority. In that way, I am not really sure why the need to have that certificate signed by the cluster CA, I suspect that it might be the pods who mounts the cluster CA and consequently trusts any certificate signed by the cluster CA. For FuseML this would be needed in case kaniko did not have an option to not verify ssl.

    Some notes about quarks-secrets, this is what happens when a quarks CRD (signerType: cluster) is created:
    1. CFSSL is used to create a private key and a Certificate Signing Request [1]
    2. The private key is stored in a kubernetes secret (to be used later) and a kubernetes Certificate Signing Request is created with the Certificate Signing Request created in 1. [2]
    3. Once the kubernetes CSR is created, the operator CSR reconciler approves the kuberntes CSR [3]. It is important to note that, as the operator did not specify signerName the certificate is singed by kubernetes.io/legacy-unknown which has some implications
    4. Upon the CSR approval, the status.certificate field of the kubernetes CSR is populated with the signed certificate PEM data. With that, the reconciler proceeds by creating a secret that stores [4]:
    - certificate: the signed certificate provided by the kubernetes CSR
    - ca: the root CA (fetched from the service account token secret [5])
    - private_key: the private key created on step b.
    - is_ca: false
    5. Finally, the kubernetes CSR and the secret holding the private key created in step a. are deleted

    Besides the certificate signed by the kubernetes CA, FuseML also make use of quarks-secrets ability of maintaining a copy of its secrets/certificates across different namespaces.

  2. Use the same URL to reference the registry from the kubernetes nodes and pods

    This is a more complex issue, AFAIK the only way to make that possible is to expose the registry externally (via ingress, gateway, ...). However, this means that the kubernetes nodes needs to pull the image not from localhost anymore, forcing it to be performed through https. Consequently, unless the registry is using a trusted certificate, this is not possible to do with the requirement of not touching the kubernetes nodes (to install a custom CA when using self signed certificates).

  1. https://github.com/cloudfoundry-incubator/quarks-secret/blob/129013cd23f9578d3eb0a1612bd8ab2dc340adfc/pkg/credsgen/in_memory_generator/certificate.go#L42
  2. https://github.com/cloudfoundry-incubator/quarks-secret/blob/129013cd23f9578d3eb0a1612bd8ab2dc340adfc/pkg/kube/controllers/quarkssecret/certificates.go#L211
  3. https://github.com/cloudfoundry-incubator/quarks-secret/blob/129013cd23f9578d3eb0a1612bd8ab2dc340adfc/pkg/kube/controllers/quarkssecret/certificatesigningrequest_reconciler.go#L166
  4. https://github.com/cloudfoundry-incubator/quarks-secret/blob/129013cd23f9578d3eb0a1612bd8ab2dc340adfc/pkg/kube/controllers/quarkssecret/certificatesigningrequest_reconciler.go#L71
  5. https://github.com/cloudfoundry-incubator/quarks-secret/blob/129013cd23f9578d3eb0a1612bd8ab2dc340adfc/pkg/kube/controllers/quarkssecret/certificatesigningrequest_reconciler.go#L256

from fuseml.

flaviodsr avatar flaviodsr commented on May 22, 2024

Reopening as Use the same URL to reference the registry from the kubernetes nodes and pods is not really fixed.

from fuseml.

stefannica avatar stefannica commented on May 22, 2024

The double identity problem can only be solved when we add TLS support to FuseML and the services that are part of it, trow included. Even so, valid certificates will be required for the exposed endpoints, otherwise certificates will need to be installed on the k8s nodes manually. A mixed solution should also be investigated, where if valid certificates are not provided, the current approach is reused.

from fuseml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.