Coder Social home page Coder Social logo

catalogd's Introduction

catalogd

Catalogd is a Kubernetes extension that unpacks file-based catalog (FBC) content for on-cluster clients. Currently, catalogd unpacks FBC content that is packaged and distributed as container images. The catalogd road map includes plans for unpacking other content sources, such as Git repositories and OCI artifacts. For more information, see the catalogd issues page.

Catalogd helps customers discover installable content by hosting catalog metadata for Kubernetes extensions, such as Operators and controllers. For more information on the Operator Lifecycle Manager (OLM) v1 suite of microservices, see the documentation for the Operator Controller.

Quickstart DEMO

asciicast

Quickstart Steps

Procedure steps marked with an asterisk (*) are likely to change with future API updates.

NOTE: The examples below use the -k flag in curl to skip validating the TLS certificates. This is for demonstration purposes only.

  1. To install catalogd, navigate to the releases page, and follow the install instructions included in the release you want to install.

  2. Create a ClusterCatalog object that points to the OperatorHub Community catalog by running the following command:

    $ kubectl apply -f - << EOF
    apiVersion: olm.operatorframework.io/v1alpha1
    kind: ClusterCatalog
    metadata:
      name: operatorhubio
    spec:
      source:
        type: image
        image:
          ref: quay.io/operatorhubio/catalog:latest
    EOF
  3. Verify the ClusterCatalog object was created successfully by running the following command:

    $ kubectl describe clustercatalog/operatorhubio

    Example output

    Name:         operatorhubio
    Namespace:    
    Labels:       <none>
    Annotations:  <none>
    API Version:  olm.operatorframework.io/v1alpha1
    Kind:         ClusterCatalog
    Metadata:
      Creation Timestamp:  2023-06-23T18:35:13Z
      Generation:          1
      Managed Fields:
        API Version:  olm.operatorframework.io/v1alpha1
        Fields Type:  FieldsV1
        fieldsV1:
          f:metadata:
            f:annotations:
              .:
              f:kubectl.kubernetes.io/last-applied-configuration:
          f:spec:
            .:
            f:source:
              .:
              f:image:
                .:
                f:ref:
              f:type:
        Manager:      kubectl-client-side-apply
        Operation:    Update
        Time:         2023-06-23T18:35:13Z
        API Version:  olm.operatorframework.io/v1alpha1
        Fields Type:  FieldsV1
        fieldsV1:
          f:status:
            .:
            f:conditions:
            f:phase:
        Manager:         manager
        Operation:       Update
        Subresource:     status
        Time:            2023-06-23T18:35:43Z
      Resource Version:  1397
      UID:               709cee9d-c669-46e1-97d0-e97dcce8f388
    Spec:
      Source:
        Image:
          Ref:  quay.io/operatorhubio/catalog:latest
        Type:   image
    Status:
      Conditions:
        Last Transition Time:  2023-06-23T18:35:13Z
        Message:               
        Reason:                Unpacking
        Status:                False
        Type:                  Unpacked
      Phase:                   Unpacking
    Events:                    <none>
  4. Port forward the catalogd-catalogserver service in the olmv1-system namespace:

    $ kubectl -n olmv1-system port-forward svc/catalogd-catalogserver 8080:443
  5. Run the following command to get a list of packages:

    $ curl -k https://localhost:8080/catalogs/operatorhubio/all.json | jq -s '.[] | select(.schema == "olm.package") | .name'

    Example output

      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                    Dload  Upload   Total   Spent    Left  Speed
    100  110M  100  110M    0     0   112M      0 --:--:-- --:--:-- --:--:--  112M
    "ack-acm-controller"
    "ack-apigatewayv2-controller"
    "ack-applicationautoscaling-controller"
    "ack-cloudtrail-controller"
    "ack-cloudwatch-controller"
    "ack-dynamodb-controller"
    "ack-ec2-controller"
    "ack-ecr-controller"
    "ack-eks-controller"
    "ack-elasticache-controller"
    "ack-emrcontainers-controller"
    "ack-eventbridge-controller"
    "ack-iam-controller"
    "ack-kinesis-controller"
    ...
  6. Run the following command to get a list of channels for the ack-acm-controller package:

    $ curl -k https://localhost:8080/catalogs/operatorhubio/all.json | jq -s '.[] | select(.schema == "olm.channel") | select(.package == "ack-acm-controller") | .name'

    Example output

      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                    Dload  Upload   Total   Spent    Left  Speed
    100  110M  100  110M    0     0   115M      0 --:--:-- --:--:-- --:--:--  116M
    "alpha"
  7. Run the following command to get a list of bundles belonging to the ack-acm-controller package:

    $ curl -k https://localhost:8080/catalogs/operatorhubio/all.json | jq -s '.[] | select(.schema == "olm.bundle") | select(.package == "ack-acm-controller") | .name'

    Example output

      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                    Dload  Upload   Total   Spent    Left  Speed
    100  110M  100  110M    0     0   122M      0 --:--:-- --:--:-- --:--:--  122M
    "ack-acm-controller.v0.0.1"
    "ack-acm-controller.v0.0.2"
    "ack-acm-controller.v0.0.4"
    "ack-acm-controller.v0.0.5"
    "ack-acm-controller.v0.0.6"
    "ack-acm-controller.v0.0.7"

Contributing

Thanks for your interest in contributing to catalogd!

catalogd is in the very early stages of development and a more in depth contributing guide will come in the near future.

In the mean time, it is assumed you know how to make contributions to open source projects in general and this guide will only focus on how to manually test your changes (no automated testing yet).

If you have any questions, feel free to reach out to us on the Kubernetes Slack channel #olm-dev or create an issue

Testing Local Changes

Prerequisites

Test it out

make run

This will build a local container image for the catalogd controller, create a new KIND cluster and then deploy onto that cluster.

catalogd's People

Contributors

anik120 avatar ankitathomas avatar dependabot[bot] avatar dtfranz avatar everettraven avatar grokspawn avatar itroyano avatar joelanford avatar justinkuli avatar kevinrizza avatar lalatendumohanty avatar m1kola avatar michaelryanpeter avatar ncdc avatar oceanc80 avatar rashmigottipati avatar tmshort avatar trgeiger avatar varshaprasad96 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

catalogd's Issues

Disable etcd and apiserver

For now the custom apiserver and it's corresponding etcd instance are not being used. To temporarily limit overhead we should disable them until we update catalogd to be properly using the custom apiserver.

We could simply comment out the following sections with some TODO comments to uncomment them when we are back to using an apiserver:

  • - ../apiserver
    - ../etcd
  • catalogd/Makefile

    Lines 81 to 83 in bc7778c

    .PHONY: build-server
    build-server: fmt vet ## Build api-server binary.
    CGO_ENABLED=0 GOOS=linux go build -tags $(GO_BUILD_TAGS) $(VERSION_FLAGS) -o bin/apiserver cmd/apiserver/main.go
  • catalogd/Makefile

    Lines 97 to 103 in bc7778c

    .PHONY: docker-build-server
    docker-build-server: build-server test ## Build docker image with the apiserver.
    docker build -f apiserver.Dockerfile -t ${SERVER_IMG}:${IMG_TAG} .
    .PHONY: docker-push-server
    docker-push-server: ## Push docker image with the apiserver.
    docker push ${SERVER_IMG}
  • $(KIND) load docker-image $(SERVER_IMG):${IMG_TAG} --name $(KIND_CLUSTER_NAME)
  • cd config/apiserver && $(KUSTOMIZE) edit set image apiserver=${SERVER_IMG}:${IMG_TAG}
  • kubectl wait --for=condition=Available --namespace=$(CATALOGD_NAMESPACE) deployment/catalogd-apiserver --timeout=60s
  • catalogd/.goreleaser.yml

    Lines 24 to 36 in bc7778c

    - id: catalogd-server
    main: ./cmd/apiserver/
    binary: bin/apiserver
    tags: $GO_BUILD_TAGS
    goos:
    - linux
    goarch:
    - amd64
    - arm64
    - ppc64le
    - s390x
    ldflags:
    - -X main.Version={{ .Version }}
  • catalogd/.goreleaser.yml

    Lines 58 to 77 in bc7778c

    - image_templates:
    - "{{ .Env.APISERVER_IMAGE_REPO }}:{{ .Env.IMAGE_TAG }}-amd64"
    dockerfile: apiserver.Dockerfile
    goos: linux
    goarch: amd64
    - image_templates:
    - "{{ .Env.APISERVER_IMAGE_REPO }}:{{ .Env.IMAGE_TAG }}-arm64"
    dockerfile: apiserver.Dockerfile
    goos: linux
    goarch: arm64
    - image_templates:
    - "{{ .Env.APISERVER_IMAGE_REPO }}:{{ .Env.IMAGE_TAG }}-ppc64le"
    dockerfile: apiserver.Dockerfile
    goos: linux
    goarch: ppc64le
    - image_templates:
    - "{{ .Env.APISERVER_IMAGE_REPO }}:{{ .Env.IMAGE_TAG }}-s390x"
    dockerfile: apiserver.Dockerfile
    goos: linux
    goarch: s390x

(I've tried to capture all the places but there could be more)

Unpack job does not honor cluster pull configuration

When we create the unpack job, we directly run opm render <catalogImage> in a single container. This means that that container directly pulls and reads the catalog image. If we want to take advantage of cluster image pull configurations, we should refactor this job into two containers. One container is the catalog image, from which we copy the FBCs into a shared empty dir. The other container renders that shared empty dir.

Note that this only makes sense in the case that the catalog image is a standard docker or OCI image containing FBC data.
If we're using other container formats or FBC data sources, this strategy would not work.

Longer term, I think this highlights the need for rukpak-esque spec.source union type in the catalog source API.

Enhance the logic for syncing changes to catalog contents

Follow up to #65 as per #65 (comment)

The current logic doesn't allow for issuing updates to already existing unpacked catalog contents (we always try to create the resource and fail if it already exists) and should be improved to have the ability to update the contents when they have been modified in the source for the parent Catalog resource

The example catalogd unpack failed --> "status.phase: Failing"

Install the catalogd according to the docs/demo : https://github.com/operator-framework/catalogd/blob/main/docs/demo.gif
But the status of catalog operatorhubio failed at first , then the bundle content can be unpacked but the status is still failing.
The steps:
1.$git clone https://github.com/operator-framework/catalogd.git
2.$cd catalogd/
3.

$make kind-cluster
......
Set kubectl context to "kind-catalogd"
You can now use your cluster with:

kubectl cluster-info --context kind-catalogd

Thanks for using kind! ๐Ÿ˜Š
/home/jfan/projects/src/github.com/catalogd/hack/tools/bin/kind export kubeconfig --name catalogd
Set kubectl context to "kind-catalogd"
$kubectl cluster-info --context kind-catalogd
Kubernetes control plane is running at https://127.0.0.1:46095
CoreDNS is running at https://127.0.0.1:46095/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
$make install
......
deployment.apps/catalogd-controller-manager created
kubectl wait --for=condition=Available --namespace=catalogd-system deployment/catalogd-controller-manager --timeout=60s
deployment.apps/catalogd-controller-manager condition met
$kubectl get crds -A
NAME                                           CREATED AT
bundlemetadata.catalogd.operatorframework.io   2023-05-24T07:25:58Z
catalogs.catalogd.operatorframework.io         2023-05-24T07:25:58Z
packages.catalogd.operatorframework.io         2023-05-24T07:25:58Z
$kubectl apply -f config/samples/core_v1beta1_catalog.yaml
catalog.catalogd.operatorframework.io/operatorhubio created
$kubectl get catalog -A
NAME             AGE
operatorhubio   21s
$kubectl wait --for=condition=Ready catalog operatorhubio
error: timed out waiting for the condition on catalogs/operatorhubio
$kubectl get catalog operatorhubio -o yaml
......
status:
  conditions:
  - lastTransitionTime: "2023-05-25T08:38:18Z"
    message: 'create bundle metadata objects: creating bundlemetadata "cloud-native-postgresql.v1.10.0":
      BundleMetadata.catalogd.operatorframework.io "cloud-native-postgresql.v1.10.0"
      is invalid: spec.properties[4].value: Invalid value: "string": spec.properties[4].value
      in body must be of type object: "string"'
    reason: UnpackFailed
    status: "False"
    type: Unpacked
  phase: Failing


$kubectl get catalog operatorhubio -o yaml
apiVersion: catalogd.operatorframework.io/v1beta1
kind: Catalog
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"catalogd.operatorframework.io/v1beta1","kind":"Catalog","metadata":{"annotations":{},"name":"operatorhubio"},"spec":{"source":{"image":{"ref":"quay.io/operatorhubio/catalog:latest"},"type":"image"}}}
  creationTimestamp: "2023-05-25T08:38:18Z"
  generation: 1
  name: operatorhubio
  resourceVersion: "2019"
  uid: 891f895b-4926-4fa8-aa20-92670e392783
spec:
  source:
    image:
      ref: quay.io/operatorhubio/catalog:latest
    type: image
status:
  conditions:
  - lastTransitionTime: "2023-05-25T08:38:18Z"
    message: 'create package objects: creating package "ack-acm-controller": packages.catalogd.operatorframework.io
      "ack-acm-controller" already exists'
    reason: UnpackFailed
    status: "False"
    type: Unpacked
  phase: Failing
$kubectl wait --for=condition=Ready catalog operatorhubio
error: timed out waiting for the condition on catalogs/operatorhubio

Flesh out CONTRIBUTING.md file

To help new contributors get started we should include a CONTRIBUTING.md file that outlines the process for creating a contribution to catalogd

How to handle non-unique catalog items?

In #76 , we updated catalogd to prefix the names of Package and BundleMetadata objects with the metadata.name of the Catalog from which they are derived.

This solved the problem of making it possible to distinguish between these duplicate names, but it raises another question: Do we need mechanisms to help clients disambiguate? For example, the operator-controller project has an Operator API with spec.package which is expected to be a simple package name (e.g. etcd, not operatorhubio-catalog-etcd).

Does catalogd need to do anything here to help clients disambiguate or get a non-ambiguous view (e.g. maybe via filtering?)

This is a bug in catalogd. We should not require `relatedImages` to be populated. In fact, I would argue that `[{}]` is more invalid because it is saying "here's a related image. it's image is `""` and it's name is `""`.

          This is a bug in catalogd. We should not require `relatedImages` to be populated. In fact, I would argue that `[{}]` is more invalid because it is saying "here's a related image. it's image is `""` and it's name is `""`.

Originally posted by @joelanford in operator-framework/operator-controller#250 (comment)

Ensure HTTP endpoints are accessible outside the cluster

Following #113 we should ensure that the necessary HTTP endpoints for getting catalog contents are accessible from outside the cluster. A user shouldn't have to perform any magic to curl the endpoint containing a Catalog's contents.

In essence, a user should be able to fetch the catalog content via the command line by:

  • Running kubectl get catalog {catalogName} -o yaml to fetch Catalog details
  • Running curl {Catalog.Status.contentURL} to fetch all the contents of the Catalog

Update `CatalogSource` controller logging

We should make sure that we are logging useful information throughout the reconciliation process. The zap logger that is used has configurable levels of verbosity and we should make sure we are taking advantage of that to allow for more or less verbose log configuration.

Issue created due to #30 (comment)

Remove Catalogmetadata CRD and serve CatalogMetadata with aggregated apiserver

Motivation:

The child resources created by the Catalog controller are registered as CRDs, which are served by the main kube apiserver. These resources should instead be served by an aggregated apiservice, so that the consumption and delivery of these API objects can be customized according to catalogd's use cases (eg custom storage for the objects so that the main etcd storage is not being misused, custom CRUD permissions for the objects like "only the catalog controller can CRUD the objects, etc).

Additional background: https://hackmd.io/-Rrvhgb3Q7m9HYZW2gxQtQ

Goal:

Configure the apiserver to administer the child resources, and get rid of the CRDs (expect for the catalogsource CRD, since that doesn't pose any resource constraint risk for the main apiserver)

Improve `CatalogSource` controller error handling during reconciliation

Upon reconciliation of a CatalogSource resource, the CatalogSource controller does a few things:

  • Creates a Job to unpack catalog contents
  • Once unpack Job finishes, reads logs from the Job Pod to get the catalog contents
  • Creates Package CRs for each package in the catalog
  • Creates BundleMetadata CRs for each bundle in the catalog

Currently, the CatalogSource controller's error handling is very basic and should be updated to handle the following scenarios:

  • IF the unpack Job can not be created
    • Update the CatalogSource resource status indicating the unpack failure
  • IF the unpack Job's Pod's logs can not be read
    • Update the CatalogSource resource status indicating the unpack failure
    • IF the error is because the Pod no longer exists, requeue to attempt unpacking process again as the Job could have been cleaned up by another process
  • IF a Package CR can not be created
    • IF caused by the Package already existing - continue
    • ELSE (?)
      • Update the CatalogSource resource status to indicate failure to create children resources
      • Cleanup already created child Package CRs
  • IF a BundleMetadata CR can not be created
    • IF caused by the BundleMetadata already existing - continue
    • ELSE (?)
      • Update the CatalogSource resource status to indicate failure to create children resources
      • Cleanup already created children BundleMetadata & Package CRs

Note: All the scenarios here are just proposed solutions. Ones marked with (?) are ones I feel could have better solutions

Cleanup after catalogsource is deleted

Issue:
The packages and bundlemetadata doesn't seem to be deleted when the catalogsource is deleted.

Steps to reproduce:

  • Follow the steps in the README.md.
  • Create a catsrc CR with image being: docker.io/anik120/community-operator-index:v4.11
  • Delete the catsrc CR
  • The packages and bundlemetadata is expected to cleanup, but it still persists.

The owner references doesn't seem to be set in the packages:

โžœ  catalogd git:(poc/catsrc-v2) โœ— k describe packages.core.catalogd.io zookeeper-operator
Name:         zookeeper-operator
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  core.catalogd.io/v1beta1
Kind:         Package
Metadata:
  Creation Timestamp:  2023-03-30T16:30:51Z
  Generation:          1
  Managed Fields:
    API Version:  core.catalogd.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        .:
        f:catalogSource:
        f:channels:
        f:defaultChannel:
        f:description:
        f:icon:
      f:status:
    Manager:         manager
    Operation:       Update
    Time:            2023-03-30T16:30:51Z
  Resource Version:  1489
  UID:               8eab1b43-2623-46f8-b423-9e264c8c8a4f
Spec:
  Catalog Source:  catalogsource-sample
  Channels:
    Entries:
      Name:         zookeeper-operator.v0.10.3
      Name:         zookeeper-operator.v0.11.0
      Replaces:     zookeeper-operator.v0.10.3
    Name:           alpha
  Default Channel:  alpha
  Description:
  Icon:
Status:
Events:  <none>

Refactor manifests and manifest generation

Currently there are a lot of manual changes that need to be made with the manifests (mostly related to RBAC) as the controller and API implementations are modified. We should probably try to use kustomize to make this process as automated as possible (i.e just have to run a make target)

The Operator-SDK testdata could be used as a reference to understand how the end result of this refactor could look:

catalogd doesn't support multiple catalog

When there are more than one catalogs, "oc get packages" doesn't show all packages.

  1. create catalog test-catalog-xzha
zhaoxia@xzha-mac catalogd % cat catalogd-xzha.yaml 
apiVersion: catalogd.operatorframework.io/v1alpha1
kind: Catalog
metadata:
  name: test-catalog-xzha
spec:
  source:
    type: image
    image:
      ref: quay.io/olmqe/nginxolm-operator-index:catalogd-1
  1. get packages
zhaoxia@xzha-mac catalogd % oc get packages
NAME                               AGE
test-catalog-xzha-nginx-operator   19s
  1. create catalog community-operator
zhaoxia@xzha-mac catalogd % cat catalogd-community.yaml 
apiVersion: catalogd.operatorframework.io/v1alpha1
kind: Catalog
metadata:
  name: community-operator
spec:
  source:
    type: image
    image:
      ref: registry.redhat.io/redhat/community-operator-index:v4.14 
  1. get packages
zhaoxia@xzha-mac catalogd % oc get packages
NAME                                                           AGE
community-operator-3scale-community-operator                   3s
community-operator-ack-acm-controller                          3s
community-operator-ack-apigatewayv2-controller                 3s
community-operator-ack-applicationautoscaling-controller       3s
...

zhaoxia@xzha-mac catalogd % oc get packages| grep test-catalog-xzha
zhaoxia@xzha-mac catalogd % 

cannot get packages from test-catalog-xzha

  1. If I delete pod of all catalogs
zhaoxia@xzha-mac catalogd % oc delete pod test-catalog-xzha  
pod "test-catalog-xzha" deleted
zhaoxia@xzha-mac catalogd % oc delete pod community-operator 
pod "community-operator" deleted

After the pod is restart, the output of "oc get packages" are different
zhaoxia@xzha-mac catalogd % oc get pod
NAME                                           READY   STATUS      RESTARTS   AGE
catalogd-controller-manager-5c7768b8b4-vc7bx   2/2     Running     0          5m30s
community-operator                             0/1     Completed   0          26s
test-catalog-xzha                              0/1     Completed   0          26s
zhaoxia@xzha-mac catalogd % oc get packages
NAME                               AGE
test-catalog-xzha-nginx-operator   17s

Use FBC as the only API for exposing catalog content on cluster

Motivation:

Right now, the content of a file-based catalog(distributed in an OCI container image) is exposed on cluster using Package and BundleMetadata APIs. However, this translation layer adds another layer of APIs, that must be maintained and supported, besides the FBC API, without adding any extra value to the goal of exposing the catalog metadata on cluster. Since the FBC API itself is
a) Already an API we will be maintaining and providing support guarantees for
b) Is sufficient to communicate all of the information in the catalog to the on-cluster consumers (as opposed to the legacy sqlite db API that needed the translation in order to be interpretable/consumable)

the FBC API should be exposed without any translation on cluster.

Goal:

One way to go about this is consolidating the Package/BundleMetadata APIs to a single CatalogMetadata API, that contains the schemas in it's spec field.

For eg, for a package foo that has bundles foo.v0.1.0 and foo.v0.2.0, the following CatalogMetadata objects would be created:

kind: CatalogMetdata 
metadata: 
    labels: 
       catalog: bar
       name: foo.v0.1.0
       package: foo
       schema: olm.bundle
    name: foo.v0.1.0
spec:
   catalog: bar
   package: foo
   schema: olm.bundle
   content: 
      <fbc blob rendered as yaml>
       .
       .
kind: CatalogMetdata 
metadata: 
    labels: 
       catalog: bar
       name: foo.v0.2.0
       package: foo
       schema: olm.bundle
    name: foo.v0.2.0
spec:
   catalog: bar
   package: foo
   schema: olm.bundle
   content: 
      <fbc blob rendered as yaml>
       .
       .
kind: CatalogMetdata 
metadata: 
    labels: 
       catalog: bar
       name: foo.v0.1.0
       package: foo
       schema: olm.package
    name: foo
spec:
   catalog: bar
   package: foo
   schema: olm.package
   content: 
      <fbc blob rendered as yaml>
       .
       .

This would allow clients to query for the content on cluster in a kube native way by taking advantage of the label selectors, and read the content off of the CatalogMetadata objects.

$ kubectl get catalogmetadata -l package=foo
NAME         CATALOG       SCHEMA        PACKAGE              
foo           bar         olm.package    foo   
foo.v0.1.0    bar         olm.bundle     foo 
foo.v0.2.0    bar         olm.bundle     foo   

Acceptance Criteria:

  • Package and bundle metadata api are removed and replaced with catalog metadata api
  • spec.content of the api has individual FBC blobs
  • labels mentioned in description are implemented for each CR

Optimize `CatalogSource` controller memory consumption

Based on the initial findings from #3 we noticed that there is a fairly large and persistent memory consumption increase (~232Mi) in the CatalogSource controller after reconciling a CatalogSource CR and creating the children Package and BundleMetadata CRs.

This increase in memory consumption is likely caused by the CatalogSource controller caching these child resources (which there can be a large number of). We should explore methods we can use to decrease the memory consumption to prevent the CatalogSource controller from hogging large amounts of memory.

Additional Context:

Refactor `CatalogSource` controller

Currently the CatalogSource controller has it's own implementation structure that doesn't match that of existing OLMv1 controllers. In the interest of consistency between the OLMv1 controller implementations, we should update the CatalogSource controller's implementation to be more like:

The catalog example 'catalog-sample' can't be Ready

Install the catalogd according to the docs/demo : https://github.com/operator-framework/catalogd/blob/main/docs/demo.gif
But the status of catalog catalog-sample can't be Ready.
The steps:
1.$git clone https://github.com/operator-framework/catalogd.git
2.$cd catalogd/
3.

$make kind-cluster
......
Set kubectl context to "kind-catalogd"
You can now use your cluster with:

kubectl cluster-info --context kind-catalogd

Thanks for using kind! ๐Ÿ˜Š
/home/jfan/projects/src/github.com/catalogd/hack/tools/bin/kind export kubeconfig --name catalogd
Set kubectl context to "kind-catalogd"
$kubectl cluster-info --context kind-catalogd
Kubernetes control plane is running at https://127.0.0.1:46095
CoreDNS is running at https://127.0.0.1:46095/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
$make install
......
cd config/manager && /home/jfan/projects/src/github.com/catalogd/hack/tools/bin/kustomize edit set image controller=quay.io/operator-framework/catalogd-controller:devel
/home/jfan/projects/src/github.com/catalogd/hack/tools/bin/kustomize build config/default | kubectl apply -f -
namespace/catalogd-system created
customresourcedefinition.apiextensions.k8s.io/bundlemetadata.catalogd.operatorframework.io created
customresourcedefinition.apiextensions.k8s.io/catalogs.catalogd.operatorframework.io created
customresourcedefinition.apiextensions.k8s.io/packages.catalogd.operatorframework.io created
serviceaccount/catalogd-controller-manager created
role.rbac.authorization.k8s.io/catalogd-leader-election-role created
clusterrole.rbac.authorization.k8s.io/catalogd-manager-role created
clusterrole.rbac.authorization.k8s.io/catalogd-metrics-reader created
clusterrole.rbac.authorization.k8s.io/catalogd-proxy-role created
rolebinding.rbac.authorization.k8s.io/catalogd-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/catalogd-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/catalogd-proxy-rolebinding created
service/catalogd-controller-manager-metrics-service created
deployment.apps/catalogd-controller-manager created
kubectl wait --for=condition=Available --namespace=catalogd-system deployment/catalogd-controller-manager --timeout=60s
deployment.apps/catalogd-controller-manager condition met
$kubectl get crds -A
NAME                                           CREATED AT
bundlemetadata.catalogd.operatorframework.io   2023-05-24T07:25:58Z
catalogs.catalogd.operatorframework.io         2023-05-24T07:25:58Z
packages.catalogd.operatorframework.io         2023-05-24T07:25:58Z
$kubectl apply -f config/samples/core_v1beta1_catalog.yaml
catalog.catalogd.operatorframework.io/catalog-sample created
$kubectl get catalog -A
NAME             AGE
catalog-sample   21s
$kubectl wait --for=condition=Ready catalog catalog-sample
error: timed out waiting for the condition on catalogs/catalog-sample
$kubectl get packages
No resources found
$kubectl get bundlemetadata
No resources found
$kubectl get catalog catalog-sample -o yaml
apiVersion: catalogd.operatorframework.io/v1beta1
kind: Catalog
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"catalogd.operatorframework.io/v1beta1","kind":"Catalog","metadata":{"annotations":{},"labels":{"app.kuberentes.io/managed-by":"kustomize","app.kubernetes.io/created-by":"catalogd","app.kubernetes.io/instance":"catalog-sample","app.kubernetes.io/name":"catalog","app.kubernetes.io/part-of":"catalogd"},"name":"catalog-sample"},"spec":{"image":"quay.io/operatorhubio/catalog:latest","pollingInterval":"45m"}}
  creationTimestamp: "2023-05-24T07:30:00Z"
  generation: 2
  labels:
    app.kuberentes.io/managed-by: kustomize
    app.kubernetes.io/created-by: catalogd
    app.kubernetes.io/instance: catalog-sample
    app.kubernetes.io/name: catalog
    app.kubernetes.io/part-of: catalogd
  name: catalog-sample
  resourceVersion: "1398"
  uid: 6e0fe471-2698-4ed9-af59-e93a1c3fd146
spec:
  image: quay.io/operatorhubio/catalog:latest
  pollingInterval: 45m0s

Write e2e tests

There are currently no e2e tests for catalogd, we should write some

Create an unpacking interface

Add a Rukpak-like spec.source union type to the Catalog API, which would allow catalog filesystems to be sourced in a variety of ways. It would also give catalogd developers an extensible API surface to add new source mechanisms in the future.

The above quote was an idea mentioned in the OLMv1 Milestone 3 ideation discussion. We should implement an unpacking interface that allows us to easily support different sources of information for catalogs

An example of what this could look like on a CatalogSource CR in yaml:

apiVersion: catalogd.operatorframework.io/v1beta1
kind: CatalogSource
metadata: 
  name: catalogsource-sample
spec:
  source:
    type: catalogd.source.image
    image: quay.io/operatorhubio/catalog:latest

Acceptance Criteria:

  • Unit tests
  • An interface implemented for an unpacking source
  • An image unpacking source implementation
  • Updating the CatalogSource controller to use the unpacking interface instead of the currently hardcoded unpacking process

Rename `CatalogSource` to `OperatorCatalog`

Motivation:

CatalogSource is a hangover from OLM v0.

"Here's how you can add this operator catalog to your cluster" sounds less confusing than "Here's how you can add this catalog source to your cluster", possibly since catalogSource is trying to encompass engineering implementation details, that our customers don't need to be exposed to.

Goal:

This is proposal to change the name CatalogSource to OperatorCatalog for catalogd.

(Idea): Make catalogd CRs "schema based"

In the Operator Framework community call, one of the things mentioned and briefly discussed was creating the CRs in a way that allowed for easier extensibility.

I'm going to try and break down my proposed solution for how the CRs could look if we went this route into a section for each CR.

CatalogSource

Taking a schema based approach here would allow us to more easily extend the ways in which catalog information can be retrieved. This will loosely tie in the concept of "sources" for catalogs (git, http, image, etc.), but that concept is likely to be discussed in a separate issue. If the change to the CatalogSource CR is better suited as part of that work we can move this item there.

All that being said, here is what a schema based CatalogSource CR in it's simplest form could look like:

apiVersion: catalogsource.catalogd.io/v1
kind: CatalogSource
metadata:
  name: sample-catalogsource
spec:
  source:
    type: "catalogd.source.image"
    config:
      image: docker.io/repo/some-index-image:tag

The benefit of taking this approach is that it makes the way in which a catalog is fetched highly extensible and configurable. rukpak already has a similar concept of sources.

Package

Taking a schema based approach to the Package CR would allow for a wider range of use cases of catalogd. This can allow a "package" to have a standard interface to determine basic package information (like it's name) but allow each package to have its own information structure. This more generic interface inherently makes the Package CR more resilient to changes in the way package data is formatted (i.e if the olm.package spec was changed we wouldn't have to change anything).

A schema based Package CR in it's simplest form could look like:

apiVersion: package.catalogd.io/v1
kind: Package
metadata:
  name: some-package
spec:
  schema: "olm.package"
  config:
    channels: ...
    description: ...
    icon: ...
    ...

BundleMetadata

Taking a schema based approach to the BundleMetadata CR has a similar benefit to making the Package CR schema based. If the olm.bundle spec were to suddenly change we would be able to handle it the same as before.

A schema based BundleMetadata CR in it's simplest form could look like:

apiVersion: bundlemetadata.catalogd.io/v1
kind: BundleMetadata
metadata:
  name: some-bundle
spec:
  schema: "olm.bundle"
  config:
    properties: ...
    constraints: ...
    ...

Serve locally stored fbc content via a server

As part of implementing the RFC for catalogd's content storage and serving mechanism we should import the RukPak storage library (dependent on operator-framework/rukpak#656). Once imported we will need to create a custom implementation of the Storage interface.

The custom Storage implementation should:

  • Store all FBC JSON blobs in a singular {catalogName}/all.json file
  • Serve the {catalogName}/all.json file via a go http.FileServer handler

Note If operator-framework/rukpak#656 is not accepted by the RukPak maintainers it would be acceptable to copy the Storage interface to a catalogd package. This is not ideal but is rather a last resort if RukPak maintainers deny the request to externalize the storage package.

Acceptance Criteria

  • Storage interface implementation that:
    • Store all FBC JSON blobs in a singular {catalogName}/all.json file
    • Serve the {catalogName}/all.json file via a go http.FileServer handler
  • Unit tests

Optimize memory usage of unpacking FBC from pod logs

In these lines, we copy the logs to a byte buffer, and then turn around and create a reader from the byte buffer.

We may be able to avoid the buffer entirely by feeding the podLog stream directly into declcfg.LoadReader.

Acceptance Criteria:

Catalogd installation fails on 1.26 k8s cluster

What did you do?

Followed the steps in readme to deploy this project. To be specific, the exact steps are:

1. Create a kind cluster.
2. Install cert-manager
3. Apply crds: k apply -f config/crd/bases
4. Install config files: k apply -f config/

The etcd and catalogd-apiserver- pod fails with OOM error. This happens even before creating a CR that specify the catalog image we want to use.

This does not happen after downgrading clusters to k8s 1.24 or 1.25.

Introduce a means to track catalog content changes

Currently, catalogs provide no easy way to tell when their contents change on an update. This makes it difficult to know when updates may be available on a catalog for operators on cluster. This may be possible by periodically running resolution for all installed operators blindly, but having a field to reference to check if catalog contents have changed will cut down on the need for this wasted periodic resolution.

Requirements:
Provide an easy way to track catalog content change, such as a field or annotation with a content hash.

Goals:
Communicate catalog content change without requiring catalog consumers to go through all of the catalog contents

Non-goals:
Provide specifics of contents affected by a catalog change

Add version information

Currently both the catalogd-controller and catalogd-server don't contain version information. We should have a way for the version information to be printed (flag, subcommand, etc.) and we should capture the version information at build time.

References:

Update the README

Looking at the README it is starting to become a bit out of date. We should spend some time to update the README to be consistent to the current state of catalogd

Update the `Catalog` controller to use a `Storage` interface

As part of implementing the RFC for catalogd's content storage and serving mechanism and following #113 we need to update the Catalog controller to use the Storage interface for storing catalog content after it is unpacked.

This functionality should be feature gated.

Acceptance Criteria

  • Updates Catalog controller to have and use a Storage interface
  • Adds a feature gate to enable use of the functionality
  • Updates unit/e2e tests as necessary

Introduce spec.Source type in the CatalogSource CRD

Catalog images are exclusively container images hosted in image registries today. There's growing evidence to suggest that allowing catalog content to be packaged and published in other ways (such as as OCI images, or in git repositories, or even added to the cluster as configmaps) could streamline the process of building and maintaining these clusters, improving quality of life for catalog authors.

Introducing a new field spec.Source in the CatalogSource CRD, and adding the value image that calls the unpacking job that exists today, will lay the ground work for introducing new spec.Source type in the near term.

Create an `OwnerReference` based watch on the unpack job

In the controller's Reconcile function, the existing logic creates the unpack job and obtains the pods for the job and if there is an error during the parsing of unpack logs to obtain the pods, then we check if the error is due to pod not being present and if it is, then we requeue after 10s. This is essentially polling for every 10s even when no events come in, and it also leads to ignoring any events that come between 0-10s.

Ideally, we should reconcile when the state of the job changes instead of polling. Therefore, the better solution is to put an OwnerReference on the job (which we already do) and create an OwnerReference based watch on the job such that we reconcile when state of the job changes. We may also need to do a similar thing for the underlying pod itself.

Also, instead of doing a RequeueAfter for every 10s, we should just return the error as that would trigger exponential backoff.

See: #34 (comment)

`Package` resource naming clash if same named `Package` exists in multiple `CatalogSource`s

Currently the CatalogSource controller creates Package resources when reconciling a CatalogSource CR. It doesn't do any special naming logic to ensure uniqueness of Package resources being created which means that if multiple catalogs define a package of the same name (but are technically different), there will be a naming clash resulting in an error creating the corresponding Package resource.

We should evaluate what our options are to prevent this naming clash.

One suggestion is to see what options we have to specify the Package admission criteria on our custom apiserver. For more context see: #1 (comment)

Explore alternative methods for unpacking catalog data

Following #65 it is probably a good idea to explore what alternatives exist for unpacking methods separate from copying implementations from rukpak.

Some potential alternatives:

  • Make catalogd a rukpak provisioner
  • Make the rukpak unpacking implementation a library

Create a minimal client library

Following #113 - to help catalogd clients transition from the old world of Kubernetes CustomResources for serving catalog contents, we should create a minimal client library that can be used to fetch catalog information and return it in a list of serialized Go objects.

Ensure the `CatalogSource` controller is properly setting owner references

Currently the CatalogSource controller creates Package and BundleMetadata resources as part of reconciling a CatalogSource CR but doesn't set the owner references on the children resources.

We should update the CatalogSource controller to properly set owner references on the resources it creates to ensure the Kubernetes garbage collection will automatically delete these resources if the owning CatalogSource CR is deleted.

Additional Context

Non-atomic syncing of Package/BundleMetadata/CatalogMetadata resources for a particular catalog.

When the resources (Package+BundleMetadata / CatalogMetadata) are being synced for one catalog, if an error is encountered in the middle of an operation, eg in the middle of a loop where existing Packages are being deleted, the cluster ends up in a state where some of the resources have been operated on, while the remaining resources are in the previous/stale state. This gives an incorrect view of the world to the clients.

Instead, operations should be atomic. Either An operation succeeds wholly in operating on all of the resources it was supposed to be operating on, or : none of the resources are updated at all + an error is surfaced about the failure in operating on the resources.

Update unit tests for the catalog controller

Existing unit tests for the catalog controller doesn't have any testing for the sync logic of bundlemetadata and packages (and catalogmetadata when catalogmetadata API PR gets merged.

It would be nice to have the following based on this review suggestion to fully test out the sync logic:

  • Add some other blobs to the test catalog that have (a) just schema, (b) schema and name, (c) schema and package, (d) schema, name, and package, and where all of those have other non-meta fields. This would make sure that we treat FBC meta objects opaquely
  • Update this test to check the full contents of what is synced.
  • Change the Catalog to point to a different image, which would result in existing catalog metadatas being deleted/updated, and at least one new metadata created.

Don't unpack again if unchanged

Creating this issue to continue discussion from #30 (comment)

Currently the CatalogSource controller will attempt to unpack the catalog contents and recreate the metadata on cluster every reconciliation, even if it has already been unpacked previously. This causes some unnecessary overhead by running the unpacking logic every single reconciliation loop. Instead, we should explore ways to signal that the unpacking process has already been completed and that the unpacking logic does not need to be run.

Some potential solutions:

  • Update the CatalogSource status to also communicate the observed generation and/or the hash of the catalog contents (maybe image sha if we can easily fetch that info?). If there is a change in generation / catalog contents then unpack again
    • Downside: We can't determine the hash of the catalog contents without running the unpack job again so that would still be a process that is run every time (although maybe we gate it with a generation change?)
  • Add a label/annotation to signal it has been unpacked and doesn't need to run the unpack process again

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.