Coder Social home page Coder Social logo

dependency-file-generator's Introduction

rapids-dependency-file-generator

rapids-dependency-file-generator is a Python CLI tool that generates conda environment.yaml files and requirements.txt files from a single YAML file, typically named dependencies.yaml.

When installed, it makes the rapids-dependency-file-generator CLI command available which is responsible for parsing a dependencies.yaml configuration file and generating the appropriate conda environment.yaml and requirements.txt dependency files.

Table of Contents

Installation

rapids-dependency-file-generator is available on PyPI. To install, run:

pip install rapids-dependency-file-generator

Usage

When rapids-dependency-file-generator is invoked, it will read a dependencies.yaml file from the current directory and generate children dependency files.

The dependencies.yaml file has the following characteristics:

  • it is intended to be committed to the root directory of repositories
  • it can define matrices that enable the output dependency files to vary according to any arbitrary specification (or combination of specifications), including CUDA version, machine architecture, Python version, etc.
  • it contains bifurcated lists of dependencies based on the dependency's purpose (i.e. build, runtime, test, etc.). The bifurcated dependency lists are merged according to the description in the How Dependency Lists Are Merged section below.

dependencies.yaml Format

The Examples section below has instructions on where example dependency.yaml files and their corresponding output can be viewed.

The dependencies.yaml file has three relevant top-level keys: files, channels, and dependencies. These keys are described in detail below.

files Key

The top-level files key is responsible for determining the following:

  • which types of dependency files should be generated (i.e. conda environment.yaml files and/or requirements.txt files)
  • where the generated files should be written to (relative to the dependencies.yaml file)
  • which variant files should be generated (based on the provided matrix)
  • which of the dependency lists from the top-level dependencies key should be included in the generated files

Here is an example of what the files key might look like:

files:
  all: # used as the prefix for the generated dependency file names for conda or requirements files (has no effect on pyproject.toml files)
    output: [conda, requirements] # which dependency file types to generate. required, can be "conda", "requirements", "pyproject", "none" or a list of non-"none" values
    conda_dir: conda/environments # where to put conda environment.yaml files. optional, defaults to "conda/environments"
    requirements_dir: python/cudf # where to put requirements.txt files. optional, but recommended. defaults to "python"
    pyproject_dir: python/cudf # where to put pyproject.toml files. optional, but recommended. defaults to "python"
    matrix: # (optional) contains an arbitrary set of key/value pairs to determine which dependency files that should be generated. These values are included in the output filename.
      cuda: ["11.5", "11.6"] # which CUDA version variant files to generate.
      arch: [x86_64] # which architecture version variant files to generate. This value should be the result of running the `arch` command on a given machine.
    includes: # a list of keys from the `dependencies` section which should be included in the generated files
      - build
      - test
      - runtime
  build: # multiple `files` children keys can be specified
    output: requirements
    conda_dir: conda/environments
    requirements_dir: python/cudf
    matrix:
      cuda: ["11.5"]
      arch: [x86_64]
      py: ["3.8"]
    includes:
      - build

The result of the above configuration is that the following dependency files would be generated:

  • conda/environments/all_cuda-115_arch-x86_64.yaml
  • conda/environments/all_cuda-116_arch-x86_64.yaml
  • python/cudf/requirements_all_cuda-115_arch-x86_64.txt
  • python/cudf/requirements_all_cuda-116_arch-x86_64.txt
  • python/cudf/requirements_build_cuda-115_arch-x86_64_py-38.txt

The all*.yaml and requirements_all*.txt files would include the contents of the build, test, and runtime dependency lists from the top-level dependency key. The requirements_build*.txt file would only include the contents of the build dependency list from the top-level dependency key.

The value of output can also be none as shown below.

files:
  test:
    output: none
    includes:
      - test

When output: none is used, the conda_dir, requirements_dir and matrix keys can be omitted. The use case for output: none is described in the Additional CLI Notes section below.

extras

A given file may include an extras entry that may be used to provide inputs specific to a particular file type

Here is an example:

files:
  build:
    output: pyproject
    includes: # a list of keys from the `dependencies` section which should be included in the generated files
      - build
    extras:
      table: table_name
      key: key_name

Currently the supported extras by file type are:

  • pyproject.toml
    • table: The table in pyproject.toml where the dependencies should be written. Acceptable values are "build-system", "project", and "project.optional-dependencies".
    • key: The key corresponding to the dependency list in table. This may only be provided for the "project.optional-dependencies" table since the key name is fixed for "build-system" ("requires") and "project" ("dependencies"). Note that this implicitly prohibits including optional dependencies via an inline table under the "project" table.

channels Key

The top-level channels key specifies the channels that should be included in any generated conda environment.yaml files.

It might look like this:

channels:
  - rapidsai
  - conda-forge

In the absence of a channels key, some sensible defaults for RAPIDS will be used (see constants.py).

dependencies Key

The top-level dependencies key is where the bifurcated dependency lists should be specified.

Underneath the dependencies key are sets of key-value pairs. For each pair, the key can be arbitarily named, but should match an item from the includes list of any files entry.

The value of each key-value pair can have the following children keys:

  • common - contains dependency lists that are the same across all matrix variations
  • specific - contains dependency lists that are specific to a particular matrix combination

The values of each of these keys are described in detail below.

common Key

The common key contains a list of objects with the following keys:

  • output_types - a list of output types (e.g. "conda" for environment.yaml files or "requirements" for requirements.txt files) for the packages in the packages key
  • packages - a list of packages to be included in the generated output file

specific Key

The specific key contains a list of objects with the following keys:

  • output_types - same as output_types for the common key above
  • matrices - a list of objects (described below) which define packages that are specific to a particular matrix combination
matrices Key

Each list item under the matrices key contains a matrix key and a packages key. The matrix key is used to define which matrix combinations from files.[*].matrix will use the associated packages. The packages key is a list of packages to be included in the generated output file for a matching matrix. This is elaborated on in How Dependency Lists Are Merged.

An example of the above structure is exemplified below:

dependencies:
  build: # dependency list name
    common: # dependencies common among all matrix variations
      - output_types: [conda, requirements] # the output types this list item should apply to
        packages:
          - common_build_dep
      - output_types: conda
        packages:
          - cupy
          - pip: # supports `pip` key for conda environment.yaml files
              - some_random_dep
    specific: # dependencies specific to a particular matrix combination
      - output_types: conda # dependencies specific to conda environment.yaml files
        matrices:
          - matrix:
              cuda: "11.5"
            packages:
              - cudatoolkit=11.5
          - matrix:
              cuda: "11.6"
            packages:
              - cudatoolkit=11.6
          - matrix: # an empty matrix entry serves as a fallback if there are no other matrix matches
            packages:
              - cudatoolkit
      - output_types: [conda, requirements]
        matrices:
          - matrix: # dependencies specific to x86_64 and 11.5
              cuda: "11.5"
              arch: x86_64
            packages:
              - a_random_x86_115_specific_dep
          - matrix: # an empty matrix/package entry to prevent error from being thrown for non 11.5 and x86_64 matches
            packages:
      - output_types: requirements # dependencies specific to requirements.txt files
        matrices:
          - matrix:
              cuda: "11.5"
            packages:
              - another_random_dep=11.5.0
          - matrix:
              cuda: "11.6"
            packages:
              - another_random_dep=11.6.0
  test:
    common:
      - output_types: [conda, requirements]
        packages:
          - pytest

How Dependency Lists Are Merged

The information from the top-level files and dependencies keys are used to determine which dependencies should be included in the final output of the generated dependency files.

Consider the following top-level files key configuration:

files:
  all:
    output: conda
    conda_dir: conda/environments
    requirements_dir: python/cudf
    matrix:
      cuda: ["11.5", "11.6"]
      arch: [x86_64]
    includes:
      - build
      - test

In this example, rapids-dependency-file-generator will generate two conda environment files: conda/environments/all_cuda-115_arch-x86_64.yaml and conda/environments/all_cuda-116_arch-x86_64.yaml.

Since the output value is conda, rapids-dependency-file-generator will iterate through any dependencies.build.common and dependencies.test.common list entries and use the packages of any entry whose output_types key is conda or [conda, ...].

Further, for the 11.5 and x86_64 matrix combination, any build.specific and test.specific list items whose output includes conda and whose matrices list items matches any of the definitions below would also be merged:

specific:
  - output_types: conda
    matrices:
      - matrix:
          cuda: "11.5"
        packages:
          - some_dep1
          - some_dep2
# or
specific:
  - output_types: conda
    matrices:
      - matrix:
          cuda: "11.5"
          arch: "x86_64"
        packages:
          - some_dep1
          - some_dep2
# or
specific:
  - output_types: conda
    matrices:
      - matrix:
          arch: "x86_64"
        packages:
          - some_dep1
          - some_dep2

Every matrices list must have a match for a given input matrix (only the first matching matrix in the list of matrices will be used). If no matches are found for a particular matrix combination, an error will be thrown. In instances where an error should not be thrown, an empty matrix and packages list item can be used:

- output_types: conda
  matrices:
    - matrix:
        cuda: "11.5"
        arch: x86_64
        py: "3.8"
      packages:
        - a_very_specific_115_x86_38_dep
    - matrix: # an empty matrix entry serves as a fallback if there are no other matrix matches
      packages:

Merged dependency lists are sorted and deduped.

Additional CLI Notes

Invoking rapids-dependency-file-generator without any arguments is meant to be the default behavior for RAPIDS developers. It will generate all of the necessary dependency files as specified in the top-level files configuration.

However, there are CLI arguments that can augment the files configuration values before the files are generated.

Consider the example when output: none is used:

files:
  test:
    output: none
    includes:
      - test

The test file generated by the configuration above is useful for CI, but it might not make sense to necessarily commit those files to a repository. In such a scenario, the following CLI arguments can be used:

ENV_NAME="cudf_test"

rapids-dependency-file-generator \
  --file-key "test" \
  --output "conda" \
  --matrix "cuda=11.5;arch=$(arch)" > env.yaml
mamba env create --file env.yaml
mamba activate "$ENV_NAME"

# install cudf packages built in CI and test them in newly created environment...

The --file-key argument is passed the test key name from the files configuration. Additional flags are used to generate a single dependency file. When the CLI is used in this fashion, it will print to stdout instead of writing the resulting contents to the filesystem.

The --file-key, --output, and --matrix flags must be used together. --matrix may be an empty string if the file that should be generated does not depend on any specific matrix variations.

Where multiple values for the same key are passed to --matrix, e.g. cuda_suffixed=true;cuda_suffixed=false, only the last value will be used.

The --prepend-channel argument accepts additional channels to use, like rapids-dependency-file-generator --prepend-channel my_channel --prepend-channel my_other_channel. If both --output and --prepend-channel are provided, the output format must be conda. Prepending channels can be useful for adding local channels with packages to be tested in CI workflows.

Running rapids-dependency-file-generator -h will show the most up-to-date CLI arguments.

Examples

The tests/examples directory has example dependencies.yaml files along with their corresponding output files.

To create new example tests do the following:

  • Create a new directory with a dependencies.yaml file in tests/examples
  • Ensure the output directories (e.g. conda_dir, requirements_dir, etc.) are set to write to output/actual
  • Run rapids-dependency-file-generator --config tests/examples/<new_folder_name>/dependencies.yaml to generate the initial output files
  • Manually inspect the generated files for correctness
  • Copy the contents of output/actual to output/expected, so it will be committed to the repository and used as a baseline for future changes
  • Add the new folder name to test_examples.py

dependency-file-generator's People

Contributors

ajschmidt8 avatar ayodeawe avatar bdice avatar jameslamb avatar kylefromnvidia avatar mdemoret-nv avatar renovate[bot] avatar semantic-release-bot avatar trxcllnt avatar vyasr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dependency-file-generator's Issues

pkg_resources is deprecated and removed in Python 3.12

CI failed with the error:

ModuleNotFoundError: No module named 'pkg_resources'

https://github.com/rapidsai/cuxfilter/actions/runs/6400270892/job/17373650755?pr=542#step:6:172

This is because the style check job used Python 3.12 which removed setuptools, pkg_resources, and some other libraries by default.

We need to migrate this code to use importlib.resources instead:

Enforce `mypy` checks

Description

The Python code in this project should be checked with mypy. It isn't currently checked by any static type-checker.

Benefits of this work

  • validates the type hints, improving their usefulness as documentation
  • increases release confidence via improving the chance of some types of bugs being caught during development
  • reduces the effort required to review PRs

Acceptance Criteria

  • mypy is configured to run over this project's Python code via pre-commit

Approach

Keep as much configuration as possible in pyproject.toml, the rest in pre-commit's config. e.g.

https://github.com/rapidsai/cudf/blob/149253b2e9f3801fdcc88c17e31a25788fe6381a/pyproject.toml#L3

https://github.com/rapidsai/cudf/blob/149253b2e9f3801fdcc88c17e31a25788fe6381a/.pre-commit-config.yaml#L32

Notes

As of this writing, the latest version of mypy (v1.10)`, run like this in a Python 3.10 environment ...

mypy \
    --ignore-missing-imports \
    --explicit-package-bases \
    ./src

... yields the following

Found 26 errors in 2 files (checked 7 source files) (click me)
src/rapids_dependency_file_generator/_config.py:7: error: Library stubs not installed for "yaml"  [import-untyped]
src/rapids_dependency_file_generator/_config.py:7: note: Hint: "python3 -m pip install types-PyYAML"
src/rapids_dependency_file_generator/_config.py:7: note: (or run "mypy --install-types" to install all missing stub packages)
src/rapids_dependency_file_generator/_config.py:7: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
src/rapids_dependency_file_generator/_config.py:171: error: Argument 1 to "_parse_outputs" has incompatible type "object"; expected "str | list[str]"  [arg-type]
src/rapids_dependency_file_generator/_config.py:173: error: No overload variant of "list" matches argument type "object"  [call-overload]
src/rapids_dependency_file_generator/_config.py:173: note: Possible overload variants:
src/rapids_dependency_file_generator/_config.py:173: note:     def [_T] __init__(self) -> list[_T]
src/rapids_dependency_file_generator/_config.py:173: note:     def [_T] __init__(self, Iterable[_T], /) -> list[_T]
src/rapids_dependency_file_generator/_config.py:174: error: "object" has no attribute "items"  [attr-defined]
src/rapids_dependency_file_generator/_config.py:175: error: Argument 1 to "Path" has incompatible type "object"; expected "str | PathLike[str]"  [arg-type]
src/rapids_dependency_file_generator/_config.py:176: error: Argument 1 to "Path" has incompatible type "object"; expected "str | PathLike[str]"  [arg-type]
src/rapids_dependency_file_generator/_config.py:177: error: Argument 1 to "Path" has incompatible type "object"; expected "str | PathLike[str]"  [arg-type]
src/rapids_dependency_file_generator/_config.py:185: error: Argument "pip" to "PipRequirements" has incompatible type "str"; expected "list[str]"  [arg-type]
src/rapids_dependency_file_generator/_config.py:195: error: "object" has no attribute "__iter__"; maybe "__dir__" or "__str__"? (not iterable)  [attr-defined]
src/rapids_dependency_file_generator/_config.py:208: error: "object" has no attribute "__iter__"; maybe "__dir__" or "__str__"? (not iterable)  [attr-defined]
src/rapids_dependency_file_generator/_config.py:244: error: "object" has no attribute "items"  [attr-defined]
src/rapids_dependency_file_generator/_config.py:246: error: "object" has no attribute "items"  [attr-defined]
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:10: error: Library stubs not installed for "yaml"  [import-untyped]
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:33: error: Value of type variable "AnyStr" of "walk" cannot be "PathLike[Any] | Any"  [type-var]
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:34: error: Need type annotation for "fn"  [var-annotated]
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:34: error: Argument 1 to "filter" has incompatible type "Callable[[Any], Any]"; expected "Callable[[PathLike[Any] | Any], TypeGuard[Never]]"  [arg-type]
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:34: error: Item "PathLike[Any]" of "PathLike[Any] | Any" has no attribute "endswith"  [union-attr]
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:65: error: Argument 1 to "append" of "list" has incompatible type "dict[str, list[str]]"; expected "str"  [arg-type]
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:66: error: Incompatible return value type (got "list[str]", expected "list[str | dict[str, str]]")  [return-value]
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:66: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:66: note: Consider using "Sequence" instead, which is covariant
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:66: note: Perhaps you need a type annotation for "deduped"? Suggestion: "list[str | dict[str, str]]"
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:69: error: "Generator" expects 3 type arguments, but 1 given  [type-arg]
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:144: error: Argument 1 to "join" of "str" has incompatible type "list[str | dict[str, list[str]]]"; expected "Iterable[str]"  [arg-type]
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:183: error: Incompatible types in assignment (expression has type "Item | Container", variable has type "TOMLDocument")  [assignment]
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:188: error: "TOMLDocument" has no attribute "is_super_table"  [attr-defined]
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:191: error: Incompatible types in assignment (expression has type "Item | Container", variable has type "TOMLDocument")  [assignment]
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:425: error: Argument "dependencies" to "make_dependency_file" has incompatible type "list[str | dict[str, str]]"; expected "list[str | dict[str, list[str]]]"  [arg-type]
src/rapids_dependency_file_generator/_rapids_dependency_file_generator.py:426: error: Argument "extras" to "make_dependency_file" has incompatible type "FileExtras | None"; expected "FileExtras"  [arg-type]
Found 26 errors in 2 files (checked 7 source files)

And another 25 or so if using mypy --strict.

AttributeError with Python 3.8

I get an AttributeError with Python 3.8, but not with 3.10.

Repro:

conda create -n rdfg_test python=3.8
conda activate rdfg_test
pip install rapids-dependency-file-generator
rapids-dependency-file-generator --help

Error:

Traceback (most recent call last):
  File "/home/dagardner/work/conda/envs/rdfg_test/bin/rapids-dependency-file-generator", line 5, in <module>
    from rapids_dependency_file_generator.cli import main
  File "/home/dagardner/work/conda/envs/rdfg_test/lib/python3.8/site-packages/rapids_dependency_file_generator/cli.py", line 12, in <module>
    from .rapids_dependency_file_validator import validate_dependencies
  File "/home/dagardner/work/conda/envs/rdfg_test/lib/python3.8/site-packages/rapids_dependency_file_generator/rapids_dependency_file_validator.py", line 12, in <module>
    importlib.resources.files(__package__).joinpath("schema.json").read_bytes()
AttributeError: module 'importlib.resources' has no attribute 'files'

Implement Semantic Release

It'd be cool if we could implement semantic-release for this repository.

Doing so would allow us to pin to a major version of rapids-dependency-file-generator in our CI images to prevent releases with breaking changes from breaking CI.

Support meta.yaml, pyproject.toml and versions.json

In addition to conda requirements files, the rapid dependency generator needs to support a few additional use cases. In order of priority, these are:

  • pyproject.toml: We need to be able to encode build-time dependencies into pyproject.toml files for Python packages. This will become much more important when we start officially supporting pip installation. At present most requirements information encoded in setup.py or pyproject.toml is essentially ignored by all our conda-only build processes, but that is going to change soon.
  • meta.yaml: We need to be able to write out the appropriate sections of meta.yaml. Currently the generator will produce the env files that are helpful for developers or for use in different CI stages, but the conda package definition files are still manually maintained and could go out of sync.
  • versions.json: We need to be able to write out versions to some sort of dependency file that can be handled in CMake. This necessity arises because there are some dependencies (treelite for cuML is the one that comes to mind) that are needed both at C++ compile time -- which means they are brought in via CPM.cmake -- and at Python run/test time -- which means they must be handled by setuptools. I propose a versions.json file because that is already essentially supported out of the box by rapids-cmake. I'll note that at this time treelite is the only obvious example of this that I see, so supporting this is not crucial. However, I think it would be minimal extra work on top of what we would already have to do to enable pyproject.toml, so I think it would be worth it. If we go this route, we should follow the rapids-cmake spec for versions.json.

Enabling the above will require making the following changes (in roughly this order based on which tasks depend on others as well as their relative priorities):

  1. Generalizing the dependencies.yml file format. Currently, the file format hardcodes the conda_and_requirements key as the way to specify requirements that are common to both types of dependency files. Instead of this, we will need to modify the file to have something like a file_types key that accepts an array of values and writes those requirements to all the associated files. That would allow one set of entries to be shared across an arbitrary subset of different files.
  2. Enabling read-modify-write behavior rather than a simple overwrite behavior in the generator. Currently the generator simply reads dependencies.yml, generates complete dependency lists, and then writes out those lists to the appropriate files, overwriting any preexisting files. While that is appropriate for requirements.txt and environment.yml files, that is not appropriate for pyproject.toml files or meta.yaml files. The latter types of files could contain additional information beyond a simple list of dependencies and that information must persist.
  3. Implement writing of TOML files. The TOML package is the easiest choice here.
  4. Write out the meta.yaml dependencies sections into new sub yml files, then use conda-build's jinja functions to read those files in for each section.
  5. Implement writing of JSON files. We can just use the built-in json module.
  6. Update rapids-cmake to support reading arbitrary JSON files. That should just require extracting some of the logic from rapids_cpm_load_preset_versions into a separate function.

Some additional notes and caveats:

  • The version styles used by CPM.cmake do not support the sorts of complex version string used by setuptools/conda. Instead these version strings are hardcoded to a specific number. However, I do not think this is an issue because in cases where we hardcode a dependency for CMake we should force Python to use the same exact version (no fuzzy deps), so those dependencies can always be written as $PKG=$VERSION rather than $PKG>x,<y etc.
  • CPM downloads require a git tag/branch. I'm not yet sure how we would want to support this. It may require adding some keys to the appropriate section. The best outcome here is likely that we will maintain separate sections for versions.json and Python requirements for a given dependency, and the versions.json section will support the additional keys like git tag/branch, but then we'll use a YAML anchor inside dependencies.yml containing the version so that there remains only a single source of truth.

Some suggested updates for scikit-build-core

I was looking at the config, and saw some updates, but didn't know where to make them. For configs like this:

[build-system]
build-backend = "scikit_build_core.build"
requires = [
    "cmake>=3.26.4",
    "cython>=3.0.3",
    "ninja",
    "numpy==1.23.*",
    "pyarrow==16.0.0.*",
    "rmm==24.6.*",
    "scikit-build-core[pyproject]>=0.7.0",
] # This list was generated by `rapids-dependency-file-generator`. To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.

The cmake and ninja items should be removed. The "scikit_build_core.build" adds them intelligently based on what's already on the system and what you set in tool.scikit-build.

If you increase the scikit-build-core version here to >=0.9, you can drop the [pyproject] extra, it's included now.

And, for that section:

[tool.scikit-build]
build-dir = "build/{wheel_tag}"
cmake.build-type = "Release"
cmake.minimum-version = "3.26.4"
ninja.make-fallback = true
sdist.exclude = ["*tests*"]
sdist.reproducible = true
wheel.packages = ["cudf"]

[tool.scikit-build.metadata.version]
provider = "scikit_build_core.metadata.regex"
input = "cudf/VERSION"
regex = "(?P<value>.*)"

I'd recommend setting the minimum-version to 0.9, changing cmake.minimum-version = "3.26.4" to cmake.version = ">=3.26.4" (the minimum-version value will enforce this if set high enough). This also protects the defaults, in case you want to remove a few of the defaults. The minimum would be, then:

[tool.scikit-build]
minimum-version = "0.9"
build-dir = "build/{wheel_tag}"
cmake.version = ">=3.26.4"
sdist.exclude = ["*tests*"]
wheel.packages = ["cudf"]

[tool.scikit-build.metadata.version]
provider = "scikit_build_core.metadata.regex"
input = "cudf/VERSION"
regex = "(?P<value>.*)"

Raise on duplicate keys

Recently we had an issue in cuCIM where the same key appeared twice ( rapidsai/cucim#757 ). Unfortunately we didn't realize until something didn't behave correctly

It would be helpful to catch these cases in advance by having RDFG raise if it sees duplicate keys

Add support for release assets

#29 adds in a schema file for dependencies.json. One of the keys in this file is the schema id, which is linked to the package version. We should look into using https://github.com/semantic-release/git to support updating the version in this schema id upon release so that it stays consistent when a PR merges. Without that, the PR merge will immediately bump the package version and invalidate the id.

Ensure that versions are PEP 440 compliant

In the long run all of our tooling will need to generate and consume PEP 440 compliant versions. Currently conda supports PEP 440 along with various extensions to the syntax for historical reasons (and in some cases, convenience), but that will eventually become untenable as wheel builds and pip packaging become more central to our workflows. Once dfg becomes a true single source of truth for versions, it would also make sense to add version linting as a validation step to ensure that all versions in dependencies.yaml are compliant. Version normalization is fairly easy to do by simply constructing a packaging.version.Version and ensuring that the string representations have not changed (there may also be a helper function to check this directly).

Support matrix globs

Feature proposal: matrix: entries should be interpreted as globs, so that matrix: cuda: "11.*" can match any of 11.2, 11.4, 11.5, 11.8. This will be really helpful when delineating CUDA 11 from CUDA 12, so that we don't have to specify every version individually if the match is major-version-only and not specific to minor versions.

We would keep the current behavior of taking the first match. If a specific match is to take precedence, it should be listed before the glob match. e.g. 11.8 before 11.*.

Example:

  cuda-python:
    specific:
      - output_types: conda
        matrices:
          - matrix:
              cuda: "11.*"
            packages:
              - cuda-python>=11.7.0
          - matrix:
              cuda: "12.*"
            packages:
              - cuda-python>=12.1.0

Support matrix entries for pyproject.toml

Currently dfg explicitly prohibits matrix entries when generating pyproject.toml files. This partially makes sense: there can only ever be one pyproject.toml file at a time, so there is no value in being able to generate multiple files in one run (i.e. when dfg is run without arguments). However, there is a use case for being able to control the matrix entry used from the command line. One application of this is if we have different Python package requirements for different CUDA versions (e.g. cupy-cuda11x vs cupy-cuda12x). We would like for the packaging process to be able to leverage dfg to generate the appropriate dependency list before a package is built in this case. To support this we should allow specifying multiple matrix entries in the dependencies section even if one of the output_types is pyproject.toml. We should continue to prohibit a matrix entry in the files section, instead only generating by default the file corresponding to the default matrix entries. Note that this will require a fallback to be present for all dependency lists that specify pyproject as an output type and have a matrix section.

dfg does not clean up nonexistent lists in dependencies.yaml

The pyproject.toml support is based on overwriting the file in place. If a particular table/key that was previously being overwritten is removed altogether from dependencies.yaml, however, the resulting list will not be removed from the file. For instance, if all test dependencies were to be removed from a dependencies.yaml file, the project.optional-dependencies.test key would continue to exist in the pyproject.toml file.

Currently this problem is fairly academic since there are almost no cases where a project has 0 dependencies in any section, and it is even less likely that a project with nonzero dependencies removes all of them. I am documenting this issue for now, but there is no immediate rush to fix it. The solution would likely be to erase all dependency-related keys at the beginning of every run, similar to how the cli cleans up files.

Initial feedback

I'm just using this issue to track some initial feedback that I have on the code in this repo. I will attempt to make PRs for some things that are obvious improvements and can clean up this list as I go, but this is a useful starting point for collecting various unrelated thoughts.

  • pytorch is listed twice in the build list
  • The supported set of matrix components and the way that they are mapped to the final file/env name is currently an implementation detail. That should be documented somewhere, as well as a process for extending the matrix if we need to add additional components.
  • The includes subsection of the envs section is documented as "the names of the keys whose lists should be included in the final environment file". Where are these keys searched? Both the specifics/$MATRIX section and at the top level (so that a top-level build is found)? I think it would probably be cleaner if the top-level search was instead moved into a generics or general section to more directly parallel the specifics section. Alternatively, we could do something like includes/$MATRIX/build for matrix-specific build requirements, and includes/build for matrix-agnostic (aka generic) build requirements. Thoughts?
  • Is there ever a case where we need separate channels for separate environments? For example, could we need rapidsai vs rapisai-nightly? If so, is that supported?
  • I would recommend putting configuration information and constants like the default channels, the env name (e.g. the string f"cuda-{cuda_version}-arch-{aarch}"), etc into a single module where it's easy to look up and modify rather than integrating them directly into the code.

One bigger picture question, which I know that I have returned to a few times with @ajschmidt8 but I think is worth raising here since the generator offers some new potential solutions: how do we want to handle requirements.txt files for non-conda users? I was originally thinking that we'd need to maintain those requirements separately. However, looking at the generator, I'm wondering if we could simply introduce an extra section or two that corresponds to pip-specific requirements, use the same sort of merging logic, and then at the end use a separate file generator (not PyYAML) to write out a requirements.txt-compliant file?

pre-commit hook will not detect new files

The pre-commit hook is designed around deleting all previously existing generated files and then generating new ones. If the end result is only a modification or deletion of existing files, this change will be noted by pre-commit and trigger hook failure, and developers may then simply stage the new changes and proceed. If, however, the hook produces new files, for instance if a new file key or matrix entry is added to dependencies.yaml, the file creation will not be treated as a failure of the hook. In order to remedy this, we need to modify rapids-dependency-file-generator to return a nonzero exit code if it produces new files. If we do not want to have dfg always behave this way (e.g. if we expect to script with it outside of pre-commit and don't want to be stopped by nonzero exit codes in some cases) we could either use a command line argument or simply provide an additional command line entry point just for the pre-commit hook.

The most elegant solution here would be to use GitPython to check the list of untracked files before and after, but we could also use the same os.walk + text parse strategy used for cleaning to get the list of generated files before and after. I assume that we would prefer that to introducing the GitPython dependency since we did so for the cleaning.

Accept empty `--matrix` values as CLI flag

Currently, if you run the following command:

rapids-dependency-file-generator \
  --output conda \
  --file_key test \
  --matrix "" 

you'll get an error that says:

Traceback (most recent call last):
  File "/home/aj/miniconda3/bin/rapids-dependency-file-generator", line 33, in <module>
    sys.exit(load_entry_point('rapids-dependency-file-generator', 'console_scripts', 'rapids-dependency-file-generator')())
  File "/home/aj/code/nvidia/dependency-file-generator/src/rapids_dependency_file_generator/cli.py", line 63, in main
    args = validate_args(argv)
  File "/home/aj/code/nvidia/dependency-file-generator/src/rapids_dependency_file_generator/cli.py", line 44, in validate_args
    raise ValueError(
ValueError: The following arguments must be used together:
  --file_key
  --output
  --matrix

Empty --matrix values should probably be allowed. One use case is for environments generated for the check_style.sh scripts, where most of the dependencies will just be pure Python dependencies that don't vary per CUDA version, Python version, or architecture.

Make the boundary between public and private in the API clearer

Details

The ongoing work on rapids-build-backend relies on using rapids-dependency-file-generator as an importable module, not a CLI, like this

from rapids_dependency_file_generator.cli import generate_matrix
from rapids_dependency_file_generator.constants import default_pyproject_dir
from rapids_dependency_file_generator.rapids_dependency_file_generator import (
    get_requested_output_types,
    make_dependency_files,
)

ref: https://github.com/rapidsai/rapids-build-backend/pull/17/files#r1542068869

Given that, I think some work should be done to make the boundary between public and private parts of the API here clearer.

Approach

  • use __all__ entries in every module of the library
    • to limit which things should be imported with * imports
  • use an __all__ entry in the top-level __init__.py
    • to define what is importable directly from the rapids_dependency_file_generator module without needing to know about submodules
    • to limit what is imported by from rapids_dependency_file_generator import *
  • prefix all internal-only functions, classes, methods, and other objects with _

Benefits of this work

Clarifies the boundary between public and private, reducing the need for tight coordination between rapids-build-backend and rapids-dependency-file-generator for some types of changes.

Makes it easier for tools to help enforce those boundaries:

Other notes

Consider the following.

only '__version__' is importable from the top-level namespace today (click me)
pip install .
python -c "from rapids_dependency_file_generator import *; print(dir())"
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '__version__']

Only __version__ is exported from the top-level module

That means that things like rapids-build-backend that want to import other things have imported coupled to the exact file layout in this project, like this.

from rapids_dependency_file_generator.rapids_dependency_file_generator import (
    get_requested_output_types,
    make_dependency_files,
)

In my opinion, it'd be better to re-export that stuff from the main module, so that rapids-build-backend could do this

from rapids_dependency_file_generator import (
    get_request_output_types,
    make_dependency_files
)

And so we could freely re-arrange the modules inside this library without breaking rapids-build-backend.

Exposing submodule import paths is really helpful in projects with large APIs like sklearn or scipy, but for this small library I think it'd be better to push downstream consumers to just import from the top-level namespace.

submodules are re-exporting all their imports (click me)

Consider the following

pip install .
python -c "from rapids_dependency_file_generator.cli import *; print(dir())"
['Output', '__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'argparse', 'default_dependency_file_path', 'delete_existing_files', 'generate_matrix', 'load_config_from_file', 'main', 'make_dependency_files', 'os', 'validate_args', 'version', 'warnings']

It's technically possible right now to do something like this:

from rapids_dependency_file_generator.cli import argparse

That should be discouraged, via an __all__ entry in src/rapids_dependency_file_generator/cli.py (and all the other submodules).

Rendering a nested pyproject.toml file to stdout is broken

Discovered this today while working on rapidsai/cudf#15245.

On the latest version of dependency-file-generator (v1.13.4), rendering a nested pyproject.toml to stdout fails.

Reproducible Example

git clone [email protected]:rapidsai/cudf.git
cd cudf
git checkout b810113d

rapids-dependency-file-generator \
  --output pyproject \
  --file-key py_run_cudf \
  --matrix "cuda=12.2"

Fails like this:

Traceback (most recent call last):
  File "/Users/jlamb/miniforge3/envs/rapids-dev/lib/python3.10/site-packages/rapids_dependency_file_generator/_rapids_dependency_file_generator.py", line 183, in make_dependency_file
    table = table[section]
  File "/Users/jlamb/miniforge3/envs/rapids-dev/lib/python3.10/site-packages/tomlkit/container.py", line 624, in __getitem__
    item = self.item(key)
  File "/Users/jlamb/miniforge3/envs/rapids-dev/lib/python3.10/site-packages/tomlkit/container.py", line 466, in item
    raise NonExistentKey(key)
tomlkit.exceptions.NonExistentKey: 'Key "project" does not exist.'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/jlamb/miniforge3/envs/rapids-dev/bin/rapids-dependency-file-generator", line 10, in <module>
    sys.exit(main())
  File "/Users/jlamb/miniforge3/envs/rapids-dev/lib/python3.10/site-packages/rapids_dependency_file_generator/_cli.py", line 151, in main
    make_dependency_files(parsed_config, file_keys, output, matrix, args.prepend_channels, to_stdout)
  File "/Users/jlamb/miniforge3/envs/rapids-dev/lib/python3.10/site-packages/rapids_dependency_file_generator/_rapids_dependency_file_generator.py", line 419, in make_dependency_files
    contents = make_dependency_file(
  File "/Users/jlamb/miniforge3/envs/rapids-dev/lib/python3.10/site-packages/rapids_dependency_file_generator/_rapids_dependency_file_generator.py", line 188, in make_dependency_file
    if not table.is_super_table():
AttributeError: 'TOMLDocument' object has no attribute 'is_super_table'

Notes

I believe the specific issue is right here:

output_dir = "." if to_stdout else get_output_dir(file_type, parsed_config.path, file_config)

[FEA] Conditional Channel Support

We're having issues with our DGL dependency since DGL uses conda labels to split CUDA version so we need a different channel specification per CUDA version, which dependencies.yaml currently does not allow. By adding support, we would be able to better handle DGL testing for CUDA 12.

Related: dmlc/dgl#7344 rapidsai/cugraph#4346

Tool fails when using `--clean` if files with non-unicode characters exist in the repo

If you run rapids-dependency-file-generator --clean with any file that has non-unicode characters in it, it will generate the following exception:

Traceback (most recent call last):
  File "${VIRTUAL_ENV}/bin/rapids-dependency-file-generator", line 8, in <module>
    sys.exit(main())
  File "${VIRTUAL_ENV}/lib/python3.10/site-packages/rapids_dependency_file_generator/cli.py", line 103, in main
    delete_existing_files(args.clean)
  File "${VIRTUAL_ENV}/lib/python3.10/site-packages/rapids_dependency_file_generator/rapids_dependency_file_generator.py", line 40, in delete_existing_files
    if HEADER in f.read():
  File "${VIRTUAL_ENV}/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 6038: invalid start byte

There should be a try block around reading the file to ignore files that can not be decoded. Something like this fixed it for me:

def delete_existing_files(root="."):
    """Delete any files generated by this generator.

    This function can be used to clean up a directory tree before generating a new set
    of files from scratch.

    Parameters
    ----------
    root : str
        The path to the root of the directory tree to search for files to delete.
    """
    for dirpath, _, filenames in os.walk(root):
        for fn in filter(
            lambda fn: fn.endswith(".txt") or fn.endswith(".yaml"), filenames
        ):
            try:
                with open(file_path := os.path.join(dirpath, fn)) as f:
                    if HEADER in f.read():
                        os.remove(file_path)
            except UnicodeDecodeError:
                pass

idea: linting dependencies.yaml files via '--strict'

Description

I just manually updated a LOT of dependencies.yaml files across RAPIDS (rapidsai/build-planning#31), and found myself looking for some things manually that should be possible to enforce with a linter.

This issue proposes adding a --strict argument to rapids-dependency-file-generator, which enforces more standardization across dependencies.yaml files.

Benefits of this work

  • greater standardization of dependencies.yaml files should reduce the effort to do all-of-RAPIDS migrations (either manually or with rapids-reviser)
  • may help to reduce complexity of the files
  • could save some reviewer effort (e.g. the need to leave comments that are about personal preference and style)

Acceptance Criteria

  • rapids-dependency-file-generator enforces some linting checks on dependencies.yaml files
  • that enforcement is opt-in (off by default)
  • that enforcement is deployed across all RAPIDS repos .pre-commit-config.yaml configurations using rapids-dependency-file-generator as a hook

Approach

I'm proposing that if a flag --strict is passed to rapids-dependency-file-generator, it check the content of the YAML file passed to --config and raise a non-0 exit code if any of a set of opt-in linting rules are violated.

I'll list some initial ideas I have here. I'm sure others will have more. For most of these, I have lightly-held opinions about which should be the preferred pattern, and care more that there be some preferred pattern and a tool to automatically enforce that preference.

  • preferring indented lists to inline [] lists
# this
packages:
  - rmm-cu12

# not this
packages: [rmm-cu12]
  • preferring indented maps to inline {} maps
# this
matrix:
  cuda: "12.*"

# not this:
matrix: {"cuda": "12.*"}
  • preferring explicit : null to implicit (which might catch some mistakes)
# this
- matrix: null
  packages: null

# not this
- matrix:
  packages:
  • disallowing any anchors that are never re-used
# failing if this is never met by a corresponding '*cupy_cu12'
- matrix:
    cuda: "12.*"
   packages:
     - &cupy_cu12 cupy-cuda12x>=12.0.0

Notes

This proposal is inspired by mypy --strict (mypy docs).

And by conversations like this: rapidsai/rmm#1627 (comment)

Remove "--file_key" and "--prepend_channels"

Description

In #71, the arguments --file_key and --prepend-channels were deprecated.

Uses of them across RAPIDS should be removed, and then they and the corresponding deprecation warnings here should be removed.

Benefits of this work

  • simplifies rapids-dependency-file-generator
  • removes a source of noise in CI logs across RAPIDS projects

Acceptance Criteria

  • 0 non-archived projects in the rapidsai org use --file_key or --prepend-channels in their calls to rapids-dependency-file-generator
  • arguments --file_key and --prepend-channels do not exist in rapids-dependency-file-generator

Approach

Across the rapidsai GitHub org:

  • replace --file_key with --file-key (note the -)
  • replace --prepend-channels with one or more --prepend-channel

For example this:

rapids-dependency-file-generator \
  --output conda \
  --file_key test_cpp \
  --matrix "cuda=12.2" \
  --prepend-channels "${CPP_CHANNEL};pytorch"

Should become this:

rapids-dependency-file-generator \
  --output conda \
  --file-key test_cpp \
  --matrix "cuda=12.2" \
  --prepend-channel "${CPP_CHANNEL}" \
  --prepend-channel "pytorch"

Notes

N/A

Make the `--matrix` flag value be YAML

Right now the --matrix flag accepts ; delimited key/value pairs like this: --matrix "cuda=11.5;arch=$(arch)"

We should switch the --matrix value to be YAML to be consistent with dependencies.yaml.

e.g. --matrix '{"cuda": "11.6", "arch": "x86_64"}

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Ignored or Blocked

These are blocked by an existing closed PR and will not be recreated unless you click a checkbox below.

Detected dependencies

github-actions
.github/actions/semantic-release/action.yaml
  • actions/setup-node v4
.github/workflows/build-test.yaml
  • actions/checkout v4
  • actions/upload-artifact v4
  • actions/checkout v4
  • actions/upload-artifact v4
  • python 3.9
.github/workflows/fix-style.yaml
  • actions/checkout v4
  • actions/setup-python v5
  • EndBug/add-and-commit v9
.github/workflows/release.yaml
  • actions/checkout v4
  • actions/checkout v4
  • actions/download-artifact v4
  • actions/download-artifact v4
.github/workflows/semantic-pr-title.yaml
  • amannn/action-semantic-pull-request v5
npm
package.json
  • @semantic-release/exec ^6.0.3
  • @semantic-release/git ^10.0.1
  • semantic-release ^24.0.0
pep621
pyproject.toml

  • Check this box to trigger a request for Renovate to run again on this repository

Generated environments need unique names

The generated conda environment files have names like name: all_cuda-115_arch-x86_64. If I install this environment with conda, it is not clear whether the dependency list came from rmm or cudf or another library, and the names will conflict between libraries.

We should make these names unique, perhaps by including a name: field in the dependencies.yaml file that contains a string like "rmm" or "cudf". Then the generated environments would be named like rmm_all_cuda-115_arch-x86_64.

(Also as a side note, we can't parse the file names / environment names to retrieve the matrix values because the underscore separator is used in x86_64, making it impossible to split on _. ๐Ÿ™ƒ)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.