Coder Social home page Coder Social logo

Helm charts for cvmfs-enabled clusters about cvmfs HOT 9 OPEN

cvmfs avatar cvmfs commented on June 24, 2024
Helm charts for cvmfs-enabled clusters

from cvmfs.

Comments (9)

fbarreir avatar fbarreir commented on June 24, 2024 1

Hi @rptaylor

My squid chart is very basic and I only use it in a very small cluster, so it's OK for what I need it. For my larger clusters I have the squid separate to the kubernetes cluster. I was not aware of the sciencebox charts until this ticket. I didn't look at @ebocchi 's squid chart. But I read his CVMFS client chart and I liked many parts of it.

While the squid chart is not my first priority (I'm interested in the CVMFS client chart :), I agree that there is a lot of redundant work and similar charts around. @rptaylor I see you are on the agenda for the CVMFS Workshop, that would be a good place to discuss this. Are you going in person or remote?

from cvmfs.

ebocchi avatar ebocchi commented on June 24, 2024 1

I am more than happy to donate and move "my" charts from ScienceBox to here. I fully agree there is some redundant work which should be consolidated.

@rptaylor, I don't see a problem in changing the squid chart to use StatefulSet instead of Deployment, and PVC for persistent caching. I wonder if the latter is really needed, though, as from my experience (it was some time ago, I admit) restarting a squid process would invalidate the on-disk cache. That's mainly the reason why there are no PVCs for now.
We can also look hot to expose 3401 (snmp on UDP, I guess) for monitoring.

from cvmfs.

jblomer avatar jblomer commented on June 24, 2024

Links from Enrico:

from cvmfs.

ebocchi avatar ebocchi commented on June 24, 2024

There was also this osg repo for frontiner-squid in Docker, but it seems archived now.

The Helm chart in ScienceBox uses a DaemonSet. One pain point is mounting several repos in the same container: Autofs is a bad idea, and it is best practice to have one process (pid 1) running in one container. I (horribly) worked it around with supervisord to mount multiple repos. In the past I was doing something very similar to your mount_cvmfs.sh.

Happy to further discuss this and contribute to better charts!

from cvmfs.

fbarreir avatar fbarreir commented on June 24, 2024

For context, I use Kubernetes as a batch system for ATLAS. Harvester (a component of PanDA) uses the Kubernetes python API to submit and monitor pods with an ATLAS job running inside. I install CVMFS through a fork of Igor's PRP OSG driver (mentioned by Jakob above as 3rd party driver): https://github.com/PanDAWMS/prp-osg-cvmfs
I did some changes based on my operational experience and added a basic Helm chart. Ricardo told me that my version is used for the CERN IT cloud deployments, so it works OK, but not perfect.

My fork also includes a Frontier squid deployment. I don't use that part so often, but maybe there's something that you want to extract from there.

For ATLAS we run the CVMFS pod with 7 repositories (atlas, atlas-nightlies, atlas-condb, sft, sft-nightlies, unpacked, grid). My main operational problem is that there are relatively frequent crashes in a small fraction of the mountpoints, in particular when the CVMFS pod is starting to run. This produces a lousy failure rate when ramping up a dynamic cluster and it gets into a territory that I'm not an expert in. This is the part, where I think that collaboration with the CVMFS team and ultimately getting an official product would be very useful.

As a side note, I also observed a case when I fill up a 80vCPU node with many (40-80) single core jobs, then the CVMFS plugin starts to consume a lot of CPU and the jobs have a very low CPU efficiency. I'm not sure if this is related to the Kubernetes plugin, or a general issue.

I'm happy to have a in person or Zoom discussion to go through any details, and to contribute to anything I can. In the meantime I'm going to look at the links above, I was not aware of Enrico's solution.

from cvmfs.

elmsheus avatar elmsheus commented on June 24, 2024

@fbarreir, we've started a separate thread about the high CVMFS load observed in ATLAS software builds in #2879 - this is might be the same/similar issue that you describe for the 80 vCPU case.

from cvmfs.

DrDaveD avatar DrDaveD commented on June 24, 2024

There was also this osg repo for frontiner-squid in Docker, but it seems archived now.

It is just moved, to https://github.com/opensciencegrid/images/tree/main/opensciencegrid/frontier-squid

from cvmfs.

rptaylor avatar rptaylor commented on June 24, 2024

@fbarreir I saw your squid chart , is it substantially different from the sciencebox one maintained by @ebocchi , or was there missing functionality in the sciencebox one?
I also looked at the OSG chart .

None of these frontier-squid charts can be scaled up for higher availability and load handling, and also avoid the problem of cold caches (emptydirs) by using PVCs for storage. This is only possible by using a statefulSet (with VolumeClaimTemplates) instead of a deployment.

I wonder if we could consolidate our efforts on maintaining and developing a frontier-squid helm chart, and in which direction.
I am mostly interested in a statefulSet with PVCs, and possibly external access but only for monitoring on port 3401.

from cvmfs.

rptaylor avatar rptaylor commented on June 24, 2024

I liked the sciencebox squid chart; it is pretty simple and has a readiness probe.

Are you going in person or remote?

That would be nice, but I'm not really sure if either will work though considering travel and time zones :/

from cvmfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.