In order to support kubernetes clusters where the pods have access to cvmfs, it would

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Links from Enrico: CVMFS Dockerfile: <a href="https://github.c

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

There was also <a href="https://github.com/opensciencegrid/docker-frontie

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Helm charts for cvmfs-enabled clusters about cvmfs HOT 9 OPEN

cvmfs commented on June 24, 2024

Helm charts for cvmfs-enabled clusters

from cvmfs.

Comments (9)

fbarreir commented on June 24, 2024 1

Hi @rptaylor

My squid chart is very basic and I only use it in a very small cluster, so it's OK for what I need it. For my larger clusters I have the squid separate to the kubernetes cluster. I was not aware of the sciencebox charts until this ticket. I didn't look at @ebocchi 's squid chart. But I read his CVMFS client chart and I liked many parts of it.

While the squid chart is not my first priority (I'm interested in the CVMFS client chart :), I agree that there is a lot of redundant work and similar charts around. @rptaylor I see you are on the agenda for the CVMFS Workshop, that would be a good place to discuss this. Are you going in person or remote?

from cvmfs.

ebocchi commented on June 24, 2024 1

I am more than happy to donate and move "my" charts from ScienceBox to here. I fully agree there is some redundant work which should be consolidated.

@rptaylor, I don't see a problem in changing the squid chart to use StatefulSet instead of Deployment, and PVC for persistent caching. I wonder if the latter is really needed, though, as from my experience (it was some time ago, I admit) restarting a squid process would invalidate the on-disk cache. That's mainly the reason why there are no PVCs for now.
We can also look hot to expose 3401 (snmp on UDP, I guess) for monitoring.

from cvmfs.

jblomer commented on June 24, 2024

Links from Enrico:

CVMFS Dockerfile: https://github.com/sciencebox/cvmfs
CVMFS Helm chart: https://github.com/sciencebox/charts/tree/master/cvmfs
Squid Dockerfile: https://github.com/sciencebox/frontier-squid
Squid Helm chart: https://github.com/sciencebox/charts/tree/master/frontier-squid

from cvmfs.

ebocchi commented on June 24, 2024

There was also this osg repo for frontiner-squid in Docker, but it seems archived now.

The Helm chart in ScienceBox uses a DaemonSet. One pain point is mounting several repos in the same container: Autofs is a bad idea, and it is best practice to have one process (pid 1) running in one container. I (horribly) worked it around with supervisord to mount multiple repos. In the past I was doing something very similar to your mount_cvmfs.sh.

Happy to further discuss this and contribute to better charts!

from cvmfs.

fbarreir commented on June 24, 2024

For context, I use Kubernetes as a batch system for ATLAS. Harvester (a component of PanDA) uses the Kubernetes python API to submit and monitor pods with an ATLAS job running inside. I install CVMFS through a fork of Igor's PRP OSG driver (mentioned by Jakob above as 3rd party driver): https://github.com/PanDAWMS/prp-osg-cvmfs
I did some changes based on my operational experience and added a basic Helm chart. Ricardo told me that my version is used for the CERN IT cloud deployments, so it works OK, but not perfect.

My fork also includes a Frontier squid deployment. I don't use that part so often, but maybe there's something that you want to extract from there.

For ATLAS we run the CVMFS pod with 7 repositories (atlas, atlas-nightlies, atlas-condb, sft, sft-nightlies, unpacked, grid). My main operational problem is that there are relatively frequent crashes in a small fraction of the mountpoints, in particular when the CVMFS pod is starting to run. This produces a lousy failure rate when ramping up a dynamic cluster and it gets into a territory that I'm not an expert in. This is the part, where I think that collaboration with the CVMFS team and ultimately getting an official product would be very useful.

As a side note, I also observed a case when I fill up a 80vCPU node with many (40-80) single core jobs, then the CVMFS plugin starts to consume a lot of CPU and the jobs have a very low CPU efficiency. I'm not sure if this is related to the Kubernetes plugin, or a general issue.

I'm happy to have a in person or Zoom discussion to go through any details, and to contribute to anything I can. In the meantime I'm going to look at the links above, I was not aware of Enrico's solution.

from cvmfs.

elmsheus commented on June 24, 2024

@fbarreir, we've started a separate thread about the high CVMFS load observed in ATLAS software builds in #2879 - this is might be the same/similar issue that you describe for the 80 vCPU case.

from cvmfs.

DrDaveD commented on June 24, 2024

There was also this osg repo for frontiner-squid in Docker, but it seems archived now.

It is just moved, to https://github.com/opensciencegrid/images/tree/main/opensciencegrid/frontier-squid

from cvmfs.

rptaylor commented on June 24, 2024

@fbarreir I saw your squid chart , is it substantially different from the sciencebox one maintained by @ebocchi , or was there missing functionality in the sciencebox one?
I also looked at the OSG chart .

None of these frontier-squid charts can be scaled up for higher availability and load handling, and also avoid the problem of cold caches (emptydirs) by using PVCs for storage. This is only possible by using a statefulSet (with VolumeClaimTemplates) instead of a deployment.

I wonder if we could consolidate our efforts on maintaining and developing a frontier-squid helm chart, and in which direction.
I am mostly interested in a statefulSet with PVCs, and possibly external access but only for monitoring on port 3401.

from cvmfs.

rptaylor commented on June 24, 2024

I liked the sciencebox squid chart; it is pretty simple and has a readiness probe.

Are you going in person or remote?

That would be nice, but I'm not really sure if either will work though considering travel and time zones :/

from cvmfs.

Helm charts for cvmfs-enabled clusters about cvmfs HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent