Comments (7)
- don't encode the registry hostname explicitly in the workflow and runnable definitions. Using a placeholder attribute value that says "this image comes from the built-in docker-registry" instead of pointing to it explicitly can help with the identity crysis.
Just a note that while I am working on the workflows, I am not really explicitly setting the registry. As the image name is one of the inputs, the workflow expects the registry to be part of the image name, for example, for using the internal FuseML registry the image name should be registry.fuseml-registry/mlflow/trainer:v1
from fuseml.
Moving out of MVP scope due to time constraints and the fact that this is not a must have for MVP. This issue is not functionality impairing and a workaround is already in place to address the Tekton related issue.
from fuseml.
Moving to V2 for further investigation. If it turns out to be too difficult to implement, or it requires adding TLS support to part of the FuseML components, we should do it later.
from fuseml.
Adding notes from older issue regarding removal of Quarks from FuseML.
- Quarks is used to generate certificate requests for CA certificates and automatically get them signed by the kubernetes API server. Only the quarks secret functionality is used (https://quarks.suse.dev/docs/quarks-secret/). This is something that can also be done manually through the use of kubernetes CertificateSigningRequest (more here: https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/). Let's call these cluster CA certificates.
- cluster CA certificates are used to further generate and sign server certificates for the docker-registry component. They need to be accepted by the container engine running on all kubernetes nodes, because the kubernetes cluster will interact directly with the docker registry (i.e. pull images from it), which is why it is critical that they be signed by the kubernetes API server. Apparently, the root CA certificate used by the kubernetes API to sign cluster CA certificates is already installed on all nodes, so all signed certificates are immediately valid across the entire cluster
- cluster CA certificates also need to be installed on all containers and external machines that interact with the docker registry: tekton pipeline containers that build and publish container images for instance, but this is less critical than the previous point about instant validation by kubernetes nodes. The fact that they are cluster CA certificates is irrelevant for clients other than the container engines
- removing quarks requires that we find another way to generate and maintain cluster CA certificates. Some thoughts on that:
- we cannot generate self-signed CA certificates: they have to be manually installed on all kubernetes nodes
- we could generate and sign cluster CA certificates directly in the helm chart. The helm chart needs to interact with the k8s API to both create CertificateSigningRequests and accept them, so that might be a problem.
- the certificate-manager k8s service might provide the solution, if it supports generating cluster CA certificates (to be researched), but this is not as lightweight as simply using the k8s built-in CertificateSigningRequests feature. It creates a hard dependency from FuseML to certificate-manager and it complicates the installation process. On the other hand, we'll probably need to use certificate-manager or another external certificate management tool at some point anyway, to generate certificates for TLS support
- using istio for certificate management is also an option. Furthermore, istio has some integration with the kubernetes CSR feature, albeit currently experimental (to be researched). More on that here: https://istio.io/latest/docs/tasks/security/cert-management/custom-ca-k8s/
- as a last resort: there might be another kubernetes service or application out there that we can use that is not related to cloudfoundry and is concerned exclusively with managing cluster CA certificates (unlike quarks). This project looks promising (read the linked medium article for more info): https://github.com/Trendyol/k8s-webhook-certificator
Looking towards the future, we should try to take into account the multi-cluster use-case, where several clusters may consume images hosted in a single docker registry server. Generating cluster CA certificates only works in the cluster that is used to generates them.
from fuseml.
This issue covers 2 different, but related, problems:
-
Remove FuseML dependency from quarks
I was able to get FuseML working without quarks by using Trow, although it should also work with docker registry as long as it is setup properly. The catch is: the registry is accessed via http by the kubernetes nodes and via https by the pods. By default docker/crictl considers any registry on localhost as insecure, so when, for example, pulling an image from localhost/127.0.0.1 it is done through http. That is the reason that the trainer step succeeds to start the pod using an image from the FuseML registry even when the node does not have the registry certificate. It is done via http.
The assumption that a certificate signed by the cluster CA is trusted by the kubernetes nodes is not correct. To confirm that, I have configured an ingress gateway/virtual service with SSL passthrough and tried pulling an image from one of the kubernetes nodes using the istio gateway url, it failed with
certificate signed by unknown authority
. In that way, I am not really sure why the need to have that certificate signed by the cluster CA, I suspect that it might be the pods who mounts the cluster CA and consequently trusts any certificate signed by the cluster CA. For FuseML this would be needed in case kaniko did not have an option to not verify ssl.Some notes about quarks-secrets, this is what happens when a quarks CRD (
signerType: cluster
) is created:
1. CFSSL is used to create a private key and a Certificate Signing Request [1]
2. The private key is stored in a kubernetes secret (to be used later) and a kubernetes Certificate Signing Request is created with the Certificate Signing Request created in1
. [2]
3. Once the kubernetes CSR is created, the operator CSR reconciler approves the kuberntes CSR [3]. It is important to note that, as the operator did not specifysignerName
the certificate is singed bykubernetes.io/legacy-unknown
which has some implications
4. Upon the CSR approval, thestatus.certificate
field of the kubernetes CSR is populated with the signed certificate PEM data. With that, the reconciler proceeds by creating a secret that stores [4]:
-certificate
: the signed certificate provided by the kubernetes CSR
-ca
: the root CA (fetched from the service account token secret [5])
-private_key
: the private key created on stepb
.
-is_ca: false
5. Finally, the kubernetes CSR and the secret holding the private key created in stepa
. are deletedBesides the certificate signed by the kubernetes CA, FuseML also make use of
quarks-secrets
ability of maintaining a copy of its secrets/certificates across different namespaces. -
Use the same URL to reference the registry from the kubernetes nodes and pods
This is a more complex issue, AFAIK the only way to make that possible is to expose the registry externally (via ingress, gateway, ...). However, this means that the kubernetes nodes needs to pull the image not from localhost anymore, forcing it to be performed through https. Consequently, unless the registry is using a trusted certificate, this is not possible to do with the requirement of not touching the kubernetes nodes (to install a custom CA when using self signed certificates).
- https://github.com/cloudfoundry-incubator/quarks-secret/blob/129013cd23f9578d3eb0a1612bd8ab2dc340adfc/pkg/credsgen/in_memory_generator/certificate.go#L42
- https://github.com/cloudfoundry-incubator/quarks-secret/blob/129013cd23f9578d3eb0a1612bd8ab2dc340adfc/pkg/kube/controllers/quarkssecret/certificates.go#L211
- https://github.com/cloudfoundry-incubator/quarks-secret/blob/129013cd23f9578d3eb0a1612bd8ab2dc340adfc/pkg/kube/controllers/quarkssecret/certificatesigningrequest_reconciler.go#L166
- https://github.com/cloudfoundry-incubator/quarks-secret/blob/129013cd23f9578d3eb0a1612bd8ab2dc340adfc/pkg/kube/controllers/quarkssecret/certificatesigningrequest_reconciler.go#L71
- https://github.com/cloudfoundry-incubator/quarks-secret/blob/129013cd23f9578d3eb0a1612bd8ab2dc340adfc/pkg/kube/controllers/quarkssecret/certificatesigningrequest_reconciler.go#L256
from fuseml.
Reopening as Use the same URL to reference the registry from the kubernetes nodes and pods
is not really fixed.
from fuseml.
The double identity problem can only be solved when we add TLS support to FuseML and the services that are part of it, trow included. Even so, valid certificates will be required for the exposed endpoints, otherwise certificates will need to be installed on the k8s nodes manually. A mixed solution should also be investigated, where if valid certificates are not provided, the current approach is reused.
from fuseml.
Related Issues (20)
- Confusing message after an attempt to delete non-existing object (codeset, workflow...)
- Add support for Intel OVMS as an inference platform
- [Epic] Integrate FuseML as feature of Rancher HOT 1
- Allow setting workflow step resource requests and limits
- `mlflow-builder` image not supported by glibc compiled libraries
- Add an option to not download fuseml client from installer
- Remove limitation of having a single predictor (application) per codeset HOT 1
- Create specific Role for each extension installed
- fuseml uninstall fails when no extensions were installed HOT 1
- extensions should not be deleted if not owned
- FuseML should detect 3rd party extensions and not try to reinstall them
- extensions: repeated use of a namespace for different steps leads to incomplete installation
- Alternative OpenVINO integration ideas
- Triton inference protocol support for Seldon Core HOT 1
- Better documentation of the project layout and release artifacts
- Installing/uninstalling extensions requires domain
- All workflow steps need to execute on the same kubernetes node
- FuseML uninstaller gets stuck while deleting extensions HOT 1
- Installer should download latest compatible CLI version HOT 1
- fuseml-installer install is failing with timeout
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fuseml.