Comments (3)
Capturing some miscellaneous thoughts from interacting with the matrices on rapidsai/shared-workflows#166.
In general, I think whatever decisions we make about the support matrix should be:
- encouraged via shared variables in configuration
- enforced in
shared-workflows
CI
idea 1. matrices in workflow configs should make more use of shared variables
For example, it seems that we only do conda
builds with 1 Python, want to cover one minor version per CUDA version, and want to cover the previous and current CUDA major versions (code link).
Today that looks like:
export MATRIX="
- { CUDA_VER: '11.8.0', LINUX_VER: 'ubuntu22.04', ARCH: 'amd64', PY_VER: '3.10' }
- { CUDA_VER: '11.8.0', LINUX_VER: 'ubuntu22.04', ARCH: 'arm64', PY_VER: '3.10' }
- { CUDA_VER: '12.0.1', LINUX_VER: 'ubuntu22.04', ARCH: 'amd64', PY_VER: '3.10' }
- { CUDA_VER: '12.0.1', LINUX_VER: 'ubuntu22.04', ARCH: 'arm64', PY_VER: '3.10' }
"
The intention there would be clearer, in my opinion, like this:
BUILD_OS="ubuntu22.04"
CUDA_PREVIOUS="11.8.0"
CUDA_CURRENT="12.0.1"
PYTHON_VERSION="3.10"
export MATRIX="
- { CUDA_VER: '${CUDA_PREVIOUS}', LINUX_VER: '${BUILD_OS}', ARCH: 'amd64', PY_VER: '${PYTHON_VERSION}' }
- { CUDA_VER: '${CUDA_PREVIOUS}', LINUX_VER: '${BUILD_OS}', ARCH: 'arm64', PY_VER: '${PYTHON_VERSION}' }
- { CUDA_VER: '${CUDA_CURRENT}', LINUX_VER: '${BUILD_OS}', ARCH: 'amd64', PY_VER: '${PYTHON_VERSION}' }
- { CUDA_VER: '${CUDA_CURRENT}', LINUX_VER: '${BUILD_OS}', ARCH: 'arm64', PY_VER: '${PYTHON_VERSION}' }
"
That'd make the patterns more obvious and reduce the diffs for changes like updating CUDA or Python version.
If the post-processing of those variables that's already done with yq
/ jq
(code link) also resolved duplicates, then for other matrices you also wouldn't have to think about the difference between the state "we're only supporting 1 minor version of CUDA like 12.0.1
" and "we're supporting 2 minor versions like 12.0.1
and 12.2.2
".
idea 2: desirable properties of matrices should be enforced in CI
Here are some constraints I can imagine being desirable:
- "no identical matrix configurations"
- "no more than
n
totalarm64
jobs" - "at least 1 job for each of the operating systems we support"
- "at least 1 test job on PRs per CUDA
{major}.{minor}
that we support
Instead of relying on code comments or convention, I think it's worth considering whether those constraints could be enforced in shared-workflows
CI.
I'm imagining here a little script that reads in the matrix configuration, renders the full matrices, and then asserts all the conditions we want to be true and raises a big loud error if any are violated.
from build-planning.
One other passing thought.... right now all wheels are tested against only the latest driver supported on the NVIDIA-hosted runners.
We should consider how much, if any, coverage of older drivers we want when testing wheels.
from build-planning.
Instead of relying on code comments or convention, I think it's worth considering whether those constraints could be enforced in shared-workflows CI.
I'm imagining here a little script that reads in the matrix configuration, renders the full matrices, and then asserts all the conditions we want to be true and raises a big loud error if any are violated.
I really like this idea in theory. I don't know how well it'll work in practice, but I definitely think it's worth experimenting with.
from build-planning.
Related Issues (17)
- Move creation of temporary build file env.yaml outside the current directory in build scripts
- Update RAPIDS repos for RMM pool and detail API improvements HOT 1
- Update cuDF to always explicitly specify RMM pool size and avoid rmm::detail usage HOT 1
- Update RAPIDS to mark all CUDA kernels with internal linkage HOT 5
- ensure `update-versions.sh` scripts account for dependencies with `-cu{CUDA_MAJOR}` suffixes HOT 1
- Ensure cached packages installed in CI test phase
- Reduce amount of hard-coding of RAPIDS version HOT 5
- Update RAPIDS to use `cuda::mr::async_resource_ref` HOT 1
- Moving from `pynvml` to `nvidia-ml-py`
- Migrate all Python builds from scikit-build to scikit-build-core HOT 9
- Add support for Python 3.11 HOT 13
- Consider statically linking the CUDA runtime HOT 1
- Add support for CUDA 12.2 HOT 3
- Add support for CUDA 12.2 wheels HOT 11
- Add support for CUDA 12.2 conda packages HOT 8
- Adding URL checking to CI HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from build-planning.