Coder Social home page Coder Social logo

projectsyn / component-rook-ceph Goto Github PK

View Code? Open in Web Editor NEW
0.0 9.0 0.0 552 KB

Commodore component to manage Rook.io rook-ceph operator, Ceph cluster, and CSI drivers

License: BSD 3-Clause "New" or "Revised" License

Makefile 12.48% Jsonnet 85.03% Go 2.49%
commodore-component rook rook-ceph csi-driver storage

component-rook-ceph's Introduction

Commodore Component: Rook Ceph

This is a Commodore Component for Rook Ceph.

This repository is part of Project Syn. For documentation on Project Syn and this component, see syn.tools.

Documentation

The rendered documentation for this component is available on the Commodore Components Hub.

Documentation for this component is written using Asciidoc and Antora. It can be found in the docs folder. We use the Divio documentation structure to organize our documentation.

Run the make docs-serve command in the root of the project, and then browse to http://localhost:2020 to see a preview of the current state of the documentation.

After writing the documentation, please use the make docs-vale command and correct any warnings raised by the tool.

Contributing and license

This library is licensed under BSD-3-Clause. For information about how to contribute, see CONTRIBUTING.

component-rook-ceph's People

Contributors

arska avatar bastjan avatar debakelorakel avatar glrf avatar haasad avatar kidswiss avatar megian avatar renovate[bot] avatar simu avatar vshn-renovate avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

component-rook-ceph's Issues

Properly expose `storageClassDeviceSet` in component parameters

Context

Currently, the component offers parameters to configure some values of the single default storageClassDeviceSet entry in the CephCluster CR spec, cf.

ceph_cluster:
name: cluster
namespace: syn-rook-ceph-${rook_ceph:ceph_cluster:name}
node_count: 3
block_storage_class: localblock
# Configure volume size here, if block storage PVs are provisioned
# dynamically
block_volume_size: 1
# set to true if backing storage is SSD
tune_fast_device_class: false
# Control placement of osd pods.
osd_placement: {}
# Mark OSDs as portable (doesn't bind OSD to a host)
osd_portable: false
and
- name: ${rook_ceph:ceph_cluster:name}
count: ${rook_ceph:ceph_cluster:node_count}
volumeClaimTemplates:
- spec:
storageClassName: ${rook_ceph:ceph_cluster:block_storage_class}
volumeMode: Block
accessModes:
- ReadWriteOnce
resources:
requests:
storage: ${rook_ceph:ceph_cluster:block_volume_size}
encrypted: true
tuneFastDeviceClass: ${rook_ceph:ceph_cluster:tune_fast_device_class}
placement: ${rook_ceph:ceph_cluster:osd_placement}
portable: ${rook_ceph:ceph_cluster:osd_portable}

We should refactor this config to simply provide parameter ceph_cluster.storageClassDeviceSet which is used verbatim as the first entry of cephClusterSpec value storage.storageClassDeviceSets. This would make it much easier to add additional configurations through the config hierarchy, e.g. annotations on the PVC template to force Rook to use a specific OSD device class (cf. https://github.com/rook/rook/blob/1db2ecf99b77394258c458ed6782ad26ebe8255b/deploy/examples/cluster-on-pvc.yaml#L123-L124)

In fact, we should probably also handle the pvcTemplates for the first storageClassDeviceSet as a separate parameter, since each storageClassDeviceSet has an array of pvcTemplates which is also not adjustable in the hierarchy.

Alternatives

  • Continue adding fields to ceph_cluster which allow users to set individual fields in the first entry of storage.storageClassDeviceSets.
  • Completely refactor the parameters to generate the list of storageClassDeviceSets from a map in the component parameters

Rook caused creating a mon canary deployment lead to duplicate mon endpoint entries

The Rook operator did somehow create a monitor canary deployment after the node was drained. This was not prevented by the config

$ kubectl -n syn-rook-ceph-cluster get cephcluster cluster -o jsonpath={.spec.mon.allowMultiplePerNode}
false

Because we are running the monitors on the host network rather than the Kubernetes SDN the monitor ports are already occupied, causing the mon canary deployment duck in state pending. As a result a new monitor is going to deployed, Rook did add the monitor endpoint twice to the config map.

kubectl -n syn-rook-ceph-cluster get configmap rook-ceph-mon-endpoints -o jsonpath={.data.csi-cluster-config-json}
[{"clusterID":"syn-rook-ceph-cluster","monitors":["172.18.200.162:6789","172.18.200.146:6789","172.18.200.132:6789","172.18.200.132:6789"],"namespace":""}]

This caused ceph components to crash, because the amount of monitor endpoints was wrong:

FAILED ceph_assert(addr_mons.count(a) == 0)

Removing the duplicate IPs from the configmap rook-ceph-mon-endpoints and update maxMonId n-1 did resolve it and the components could start without an issue.

Edit(@bastjan): identifier to id mapping is <idchar> - 'a'

This ended up in:

Ceph stays in Warning state due to 'X OSDs or CRUSH {nodes, device-classes} have NOOUT flags set' 

Couldn't be reproduced with the rook version v1.9.10.

Potential related issues:

Steps to Reproduce the Problem

  1. Use the rook version 1.6.5
  2. Drain a node

Resolved in the component version v3.4.1 #88.

Reduce or remove MON and OSD alerts during the maintenance

Context

The maintenance causes MON and OSD to be restarted.
This is a regular process and no issue, as long as just a qualified amount of components are down at the same time.

Current state is that we get P1 alerts out of MON and OSDs down caused by the regular maintenance process.
This is misleading the operator, because it it not an actionable alert, recover automatically as the maintenance processes.

Implementation idea

  • Relax the alerts, so they are P3 rather than P1. This still causes noice.
  • Relax the time MON and OSD can be down until an alert happens. Increases the delay in a real event.
  • Figure a way MON and OSD downs are just counted, if more the the minimum amount of running services covering the service are down

Reevaluate default resource requests

Context

We've seen that the Ceph cluster doesn't really use the resources we request during normal operation, requiring us to provision relatively large storage nodes, which are then mostly idle. This is not great for financial reasons. We should reevaluate the component's default resource requests and limits based on actual usage numbers from production environments.

Since Rook now provides a ceph-cluster Helm chart which defines default resource requests for the Ceph components, we should also check if those default requests are suitable for us. See https://github.com/rook/rook/blob/1ae867049b49079b76696e68ee9b8f30216528bd/deploy/charts/rook-ceph-cluster/values.yaml#L233-L289

Alternatives

Don't do anything.

Custom labels for rook-ceph alerts

Context

In order to identify OnCall relevant alerts, we use alert labels that can then be used in Opsgenie alert routing. In order to route rook-ceph alerts to OnCall, support for custom alert labels is required.

Upgrade to Rook v1.9

Context

Upgrade the component to use Rook v1.9 by default. Please note that Rook has updated the bundled Prometheus alerts for 1.9, and we'll need to ensure the runbooks bundled with the component are updated as part of this issue. With the upgrade to Rook 1.9, we should also upgrade the default ceph-csi version to v3.6 as that's the default version which ships with Rook 1.9.

Out of scope

  • Upgrade to Ceph v17

Acceptance criteria

  • Component installs Rook 1.9 by default
  • Alert runbooks are upgraded to match the new set of alerts shipped by Rook

Improve metrics scraping configuration for CephCSI drivers

Summary

Review and improve metrics scraping config for Ceph CSI drivers

Background

The current implementation for the metrics scraping config for the CSI drivers hasn't been reviewed or updated for Rook (and associated CephCSI) upgrades. Rook 1.9 / CephCSI 3.6 moved one of the metrics endpoints to an optional side-car container which can be enabled via Helm value enableLiveness of the Rook operator Helm chart (cf. #90). Additionally, it appears that the CephFS grpc metrics endpoint is deprecated, cf.

$ kubectl -n syn-rook-ceph-operator logs csi-cephfsplugin-zg9dc csi-cephfsplugin 
W1005 09:09:33.674333 1754966 driver.go:150] EnableGRPCMetrics is deprecated

Goal

We understand the available metrics for the Ceph CSI drivers and provide an appropriate default config in the component.

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Location: renovate.json
Error type: The renovate configuration file contains some invalid settings
Message: Invalid configuration option: packageRules[0].customChangelogUrl

Default Kubelet and Ceph disk available threshold do not match

By default the Kubelet ships with imageGCLowThresholdPercent (Default 85) and imageGCHighThresholdPercent (Default 80). Means the garbage collector starts at 20% free disk available to drop images.

Ceph has a default threshold of 30% where the monitors start complaining about not enough disk space HEALTH_WARN.

This leads to flapping alerts, because the alert of Ceph is triggered and the Kubelet not yet started the cleanup.

Steps to Reproduce the Problem

  1. Install the compoent ceph-rook
  2. Wait until a Kubernetes node uses more han 70% for container images

Actual Behavior

The ceph mon starts to complain about with a HEALTH_WARN.

Expected Behavior

The ceph mon never complains, because the Kubelet image garbage collector does the cleanup ahead the threshold reached.

Rook incorrectly creates new MON deployments during maintenance

Sometimes during cluster maintenance (on cloudscale.ch), the Rook-Ceph operator creates new mon deployments when a storage node is marked as unschedulable, instead of just waiting for the node to come back after maintenance.

Possible root causes

One configuration which can cause the observed issues is:

kubectl --as=cluster-admin -n syn-rook-ceph-cluster patch cephcluster cluster --type=json \
  -p '[{
    "op": "replace",
    "path": "/spec/healthCheck/daemonHealth/mon",
    "value": {
      "disabled": false,
      "interval": "10s",
      "timeout": "10s"
    }
  }]'

Which configures the operator to treat mons as failed after 10 seconds (down from the default 10 minutes). This config is intended to be used when replacing storage nodes (see e.g. https://kb.vshn.ch/oc4/how-tos/cloudscale/replace-storage-node.html#_remove_the_old_mon) and should be reverted once the mon has been moved to the new storage node. During a maintenance this config will cause the operator to treat the mon on cordoned nodes as failed after 10s which triggers creation of a replacement mon.

Steps to Reproduce the Problem

TBD: Some form of cordon/drain/restart nodes and observing the rook operator creating a new unnecessary mon

Actual Behavior

New mon gets created on a node which already has a mon and added to the monmap configmap, causing lots of issues because we now have a configuration of mons which can't work since the mons bind to host ports in our setup.

Expected Behavior

No new mon is created when a node is unschedulable due to node maintenance or similar.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.