Comments (2)
Oh man, those are some gems. Sorry for the difficulty, I'll paraphrase here and update the docs where this prose appears:
How should we go about learning how to set up buildfarm? The Quickstart was easy enough, but I'm finding it difficult to go beyond the Quickstart. I'm trying to read the documentation, but finding it very challenging. The following might as well be written in a foreign language because I can't make sense of these sentences:
The topics covered below definitely don't describe setting up your first cluster: we can recommend that you take the helm charts or k8s configs (assuming that's how you got started) and inflate them to a cluster, but we do want you to make the choices that work for you to expand in different directions (storage, execution, schedulers, etc), and the system is flexible enough to just run the server and workers on more computers that can see each other on a network.
The realization of an operation’s execution root with the execution filesystem constitutes a transaction that the operating directory for an action will appear, be writable for outputs, and released and be made unavailable as it proceeds and exits the pipeline.
"The buildfarm worker creates a unique directory where all an action's inputs can be found, just as it described them. When the worker runs the action's command, it can write to the output paths, also from its description. The worker will create this directory before execution, and delete it afterward. The complicated bit: while that directory exists, we prevent the CAS from trying to get storage for new content by removing any of the input files in it (this is eviction). After (or before) the lifetime of this directory, the CAS might evict its inputs"
Workers can delegate to successive storage declarations (FILESYSTEM or GRPC), with read-through or expiration waterfall if configured, but only the first storage entry will be used for Executions.
"You can tell a worker's primary CAS ('A' for discussion here) about another CAS ('B' here), to alter how it delivers its content (blobs). This can be another on-disk CAS, or a remote CAS URL (like --remote_cache=grpcs://host for bazel). In either case, CAS 'A' will, when it does not contain a blob, try to retrieve it from CAS 'B', and when it does, it will retain a copy of the content, presuming CAS 'A' to be a desirable place to store it for the future, and this can be controlled with configuration. This is also reported similarly for requests for missing blobs - if CAS 'A' or CAS 'B' contains a blob, it will not be reported as missing. Another configuration allows CAS 'A' to write blobs that expire from it to CAS 'B' - when data is pushed into CAS 'A', resulting in it overflowing its maxSizeBytes, the blobs that we evict from it to make room can be cascaded into CAS 'B' before they are deleted to use their space."
"The primary CAS - its first specified - must be a FILESYSTEM type, and all inputs for the actions that a worker fetches or executes at one time must fit within the maxSizeBytes that it has available."
Entry eviction is a registrable event for use as a storage for the delegated ActionCache, and Writes may be completed asynchronously by concurrent independent upload completion of an entry.
The first sentence is just restating the 'waterfall cascade' from 'A' to 'B' above, and actually mislabels it as 'ActionCache' (it just holds blobs).
"If two writes to the same blob name (Digest) are happening at the same time on a worker, the first one to complete will signal the rest that they don't need to write any more data, and this can happen at any point while the write is still in progress - we will prevent more writes than necessary from completing or being active"
Is the documentation intended for end-users? Maybe I'm just out of my depths. Thanks.
It is tricky, but some of these details are largely promises - there's not a whole lot of information here to act on, this mostly describes how we go about the business of storing data, and the guarantees we make to perform executions - it serves as a reference for how buildfarm operates, for those who need to know.
A better approach is to start with a goal/problem - speed up the build, support more active users, store cache for longer, perform executions for more platforms, keep the actions you execute more isolated - and try to achieve/solve them with either questions here or by reading relevant documentation. Establish a requirement supporting 'learning buildfarm' and hopefully some of the docs we've got will lay out a map for how to do it. And if they don't, I'll write new ones :)
from bazel-buildfarm.
Thank you so much for taking the time to write this up, @werkt! I really appreciate it! And your explanation is much easier to understand. IMO, I think all documentation should be written in this plain English style... but I understand people are often busy so they focus on completeness and correctness over clarity. I am often guilty of that myself.
To answer your question, yes - i tried all of the ways of running buildfarm found in the Quickstart, including the helm install. Helm install worked fine... but we are currently on Ubuntu 22.04 and using gcc11 so I ran into some build errors.
A better approach is to start with a goal/problem - speed up the build, support more active users, store cache for longer, perform executions for more platforms, keep the actions you execute more isolated - and try to achieve/solve them with either questions here or by reading relevant documentation. Establish a requirement supporting 'learning buildfarm' and hopefully some of the docs we've got will lay out a map for how to do it. And if they don't, I'll write new ones :)
These words of guidance are very helpful. Thank you.
from bazel-buildfarm.
Related Issues (20)
- Work directory should be added automatically as an input directory HOT 3
- Many `WARNING: DEADLINE_EXCEEDED: write` logs from buildfarm-server. HOT 2
- [Bazel CI] gaussian_distribution_test is failing at Bazel@HEAD HOT 1
- No logs HOT 4
- Running a server on Linux and build on MacOS HOT 3
- buildfarm-worker on Windows Server 2022 fails to clean up operation files HOT 3
- if redisUri is empty, java.lang.NullPointerException error occurred HOT 1
- [Bazel CI] Build Error: No repository for `@com_google_protobuf_javalite` in module `grpc-java` without bazel_dep or use_repo HOT 6
- Do deployments via Helm work as RBE ? HOT 5
- make execution-policy=as-nobody the default on Linux
- [ZstdCompression] Read request with offset incorrectly fails with not_found error and removes blob
- [Bazel CI] apt-get and integration tests are failing with Bazel@HEAD HOT 1
- Remote builds stuck on Buildfarm that is deployed with Helm. HOT 6
- Implement Fetch 'Push' Service
- Support http_header: prefix in Fetch Qualifiers
- bazel.canonical_id unsupported in Asset Fetch API HOT 1
- ERROR: error running 'git fetch origin refs/heads/*:refs/remotes/origin/* refs/tags/*:refs/tags/*' while working with @build_buildfarm~: HOT 2
- Updated from java17 to java21 to use the new server and worker causes JVM to shutdown
- Fetch service does not properly handle missing content-length HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bazel-buildfarm.