Coder Social home page Coder Social logo

Comments (2)

werkt avatar werkt commented on July 3, 2024

Oh man, those are some gems. Sorry for the difficulty, I'll paraphrase here and update the docs where this prose appears:

How should we go about learning how to set up buildfarm? The Quickstart was easy enough, but I'm finding it difficult to go beyond the Quickstart. I'm trying to read the documentation, but finding it very challenging. The following might as well be written in a foreign language because I can't make sense of these sentences:

The topics covered below definitely don't describe setting up your first cluster: we can recommend that you take the helm charts or k8s configs (assuming that's how you got started) and inflate them to a cluster, but we do want you to make the choices that work for you to expand in different directions (storage, execution, schedulers, etc), and the system is flexible enough to just run the server and workers on more computers that can see each other on a network.

The realization of an operation’s execution root with the execution filesystem constitutes a transaction that the operating directory for an action will appear, be writable for outputs, and released and be made unavailable as it proceeds and exits the pipeline.

"The buildfarm worker creates a unique directory where all an action's inputs can be found, just as it described them. When the worker runs the action's command, it can write to the output paths, also from its description. The worker will create this directory before execution, and delete it afterward. The complicated bit: while that directory exists, we prevent the CAS from trying to get storage for new content by removing any of the input files in it (this is eviction). After (or before) the lifetime of this directory, the CAS might evict its inputs"

Workers can delegate to successive storage declarations (FILESYSTEM or GRPC), with read-through or expiration waterfall if configured, but only the first storage entry will be used for Executions.

"You can tell a worker's primary CAS ('A' for discussion here) about another CAS ('B' here), to alter how it delivers its content (blobs). This can be another on-disk CAS, or a remote CAS URL (like --remote_cache=grpcs://host for bazel). In either case, CAS 'A' will, when it does not contain a blob, try to retrieve it from CAS 'B', and when it does, it will retain a copy of the content, presuming CAS 'A' to be a desirable place to store it for the future, and this can be controlled with configuration. This is also reported similarly for requests for missing blobs - if CAS 'A' or CAS 'B' contains a blob, it will not be reported as missing. Another configuration allows CAS 'A' to write blobs that expire from it to CAS 'B' - when data is pushed into CAS 'A', resulting in it overflowing its maxSizeBytes, the blobs that we evict from it to make room can be cascaded into CAS 'B' before they are deleted to use their space."

"The primary CAS - its first specified - must be a FILESYSTEM type, and all inputs for the actions that a worker fetches or executes at one time must fit within the maxSizeBytes that it has available."

Entry eviction is a registrable event for use as a storage for the delegated ActionCache, and Writes may be completed asynchronously by concurrent independent upload completion of an entry.

The first sentence is just restating the 'waterfall cascade' from 'A' to 'B' above, and actually mislabels it as 'ActionCache' (it just holds blobs).

"If two writes to the same blob name (Digest) are happening at the same time on a worker, the first one to complete will signal the rest that they don't need to write any more data, and this can happen at any point while the write is still in progress - we will prevent more writes than necessary from completing or being active"

Is the documentation intended for end-users? Maybe I'm just out of my depths. Thanks.

It is tricky, but some of these details are largely promises - there's not a whole lot of information here to act on, this mostly describes how we go about the business of storing data, and the guarantees we make to perform executions - it serves as a reference for how buildfarm operates, for those who need to know.

A better approach is to start with a goal/problem - speed up the build, support more active users, store cache for longer, perform executions for more platforms, keep the actions you execute more isolated - and try to achieve/solve them with either questions here or by reading relevant documentation. Establish a requirement supporting 'learning buildfarm' and hopefully some of the docs we've got will lay out a map for how to do it. And if they don't, I'll write new ones :)

from bazel-buildfarm.

mtanida avatar mtanida commented on July 3, 2024

Thank you so much for taking the time to write this up, @werkt! I really appreciate it! And your explanation is much easier to understand. IMO, I think all documentation should be written in this plain English style... but I understand people are often busy so they focus on completeness and correctness over clarity. I am often guilty of that myself.

To answer your question, yes - i tried all of the ways of running buildfarm found in the Quickstart, including the helm install. Helm install worked fine... but we are currently on Ubuntu 22.04 and using gcc11 so I ran into some build errors.

A better approach is to start with a goal/problem - speed up the build, support more active users, store cache for longer, perform executions for more platforms, keep the actions you execute more isolated - and try to achieve/solve them with either questions here or by reading relevant documentation. Establish a requirement supporting 'learning buildfarm' and hopefully some of the docs we've got will lay out a map for how to do it. And if they don't, I'll write new ones :)

These words of guidance are very helpful. Thank you.

from bazel-buildfarm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.