Coder Social home page Coder Social logo

anemoi-models's Introduction

anemoi-models

DISCLAIMER This project is BETA and will be Experimental for the foreseeable future. Interfaces and functionality are likely to change, and the project itself may be scrapped. DO NOT use this software in any project/software that is operational.

Miscellanous tools for training data-driven weather forecasts.

Documentation

The documentation can be found at https://anemoi-models.readthedocs.io/.

Install

Install via pip with:

$ pip install anemoi-models

License

Copyright 2022, European Centre for Medium Range Weather Forecasts.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

In applying this licence, ECMWF does not waive the privileges and immunities
granted to it by virtue of its status as an intergovernmental organisation
nor does it submit to any jurisdiction.

anemoi-models's People

Contributors

b8raoult avatar gmertes avatar jesperdramsch avatar jpxkqx avatar mchantry avatar ssmmnn11 avatar theissenhelen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

leifdenby metno

anemoi-models's Issues

Proposal of expansion of anemoi-wide pre-commit hooks and ruff ruleset for code quality, security and maintainability

Is your feature request related to a problem? Please describe.

Maintainability and code quality are important parts to keep anemoi alive and well and avoid technical debt.

Pre-commit hooks are an easy way to automatically flag and possibly fix these issues, but above this we can run pre-commit in Github Actions to automatically check code compliance across many definitions.

We already implement many pre-commit hooks, but from experience I would propose the following hooks in addition:

Describe the solution you'd like

  1. Use pygrep-hooks to enforce type annotations, check for blanket noqa's (they should be specific to what they exempt) and check for log.warns
-   repo: https://github.com/pre-commit/pygrep-hooks
    rev: v1.10.0  # Use the ref you want to point at
    hooks:
    -   id: python-use-type-annotations # Check for missing type annotations
    -   id: python-check-blanket-noqa # Check for # noqa: all
    -   id: python-no-log-warn # Check for log.warn
  1. Use docsig to check docstrings against function signature
-   repo: https://github.com/jshwi/docsig # Check docstrings against function sig
    rev: v0.44.2
    hooks:
    -   id: docsig
        args:
        - --ignore-no-params # Allow docstrings without parameters
        - --check-dunders    # Check dunder methods
        - --check-overridden # Check overridden methods
        - --check-protected  # Check protected methods
        - --check-class      # Check class docstrings
        - --disable=E113     # Disable empty docstrings
        - --summary          # Print a summary

There is no formal check that docstrings actually represent the function they describe. The additional settings make the docstring checking only necessary, when parameters are set and doesn't fail when no docstring is set, as a fairly lenient implementation. (This can be a point of discussion, whether we want to force docstrings and parameter descriptions.)

  1. I would suggest expanding the ruff ruleset to ALL, which expands the checking to a wide variety of possible problems.

By default ruff includes ruleset E and F which are pydocstyle errors and flake8 errors.

There are many more rulesets that improve the overall code quality and should be considered. We could also enable those explicitly, or simply rely on ALL to include the full expansion of community accepted best practices.

From experimentation I had also disabled some specific rulesets:

"E203", whitespace before punctuation

"D100", missing docstring in public module
"D101", missing docstring in public class
"D102", missing docstring in public method
"D103", missing docstring in public function
"D104", missing docstring in public package
"D105", missing docstring in magic method
"D401", First line of docstring written in imperative mood

"S101", Use of assert (asserts are usually good in our case)

"PT018", Composite pytest asserts

These could be discussed, as empty docstrings may actually not be wanted in the framework.

Describe alternatives you've considered

Alternatives would be to enable each package we want individually:

Currently in AIFS we are working with these:

select = [
    "A",    # flake8 builtins
    "B",    # flake8-bugbear
    "D",    # pydocstyle
    "E",    # pycodestyle error
    "W",    # pycodestyle warning
    "F",    # flake8 error
    "UP",   # Pyupgrade
    "SIM",  # flake8-simplify
    "N",    # pep8 naming
    "YTT",  # flake8-2020
    "S",    # bandit
    "COM",  # Commas
    "C4",   # Comprehensions
    "DTZ",  # Datetimes
    "ISC",  # Implicit string concatenation
    "ICN",  # Import conventions
    "LOG",  # Logging
    "PIE",  # Misc lints
    "T20",  # Print statements
    "PT",   # Pytest
    "Q",    # Quotes
    "RSE",  # Raises
    "TID",  # Tidy imports
    "PTH",  # Use Pathlib
    "PGH",  # Pygrep hooks
    "R",    # Refactor
    "FLY",  # Fstrings
    "PERF", # Perfomance linting
    "FURB", # Modernising code
    "RUF",  # Ruff specific
    "NPY",  # Numpys
    # "PL",  # Pylint
    # "TD",  # Todos
    # "FBT", # Boolean traps
    # "CPY", # Copyright
]

Only disabling four, which would still be useful for a framework. The boolean traps ruleset might make coding harder however, and might necessitate some refactoring of parts of the code.

Alternatively we can introduce pre-commit hooks these rulesets implement:

  1. Use bandit for common code vulnerabilities (implemented in ruleset S):
-   repo: https://github.com/pycqa/bandit # Check code for common security issues
    rev: 1.7.7
    hooks:
    -   id: bandit
  1. Use docformatter to enforce consistent formatting of docstrings (ruleset from pydocstyle D, E, W):
-   repo: https://github.com/PyCQA/docformatter # Format docstrings
    rev: v1.7.5
    hooks:
    -   id: docformatter
        args:
        - -s numpy
        - --black
        - --in-place
  1. Use pyupgrade to automatically upgrade Python syntax from older patterns (e.g. upgrading from percent style formatting '%s %s' % (a, b) to '{} {}'.format(a, b))
-   repo: https://github.com/asottile/pyupgrade # Upgrade Python syntax
    rev: v3.15.1
    hooks:
    -   id: pyupgrade
        args:
        - --py38-plus

Additional context

Bandit or ruleset S especially may expose vulnerabilities in code, which makes our life as maintainers a little easier.

Organisation

ECMWF

Error messages

Is your feature request related to a problem? Please describe.

When using the GNN architecture with a graph object without connections within hidden nodes, we get an error like the following, which is not representative of what happens:

Error executing job with overrides: []
Error in call to target 'anemoi.models.layers.processor.GNNProcessor':
KeyError('edge_length')

Describe the solution you'd like

Raise a more meaningful error if the subgraph is not correct.

Describe alternatives you've considered

No response

Additional context

No response

Organisation

ECMWF

Add configurability to dropout in MultiHeadSelfAttention Module

Is your feature request related to a problem? Please describe.

Currently, the MultiHeadSelfAttention module has a fixed dropout rate of 0.00, which limits the ability to tune this hyperparameter for different use cases. This lack of configurability can hinder model optimization and performance, especially in scenarios where overfitting may occur due to smaller datasets.

Describe the solution you'd like

I would like to see the addition of a configurable dropout parameter to the MultiHeadSelfAttention module. This parameter should allow users to specify the dropout rate when initialising the module, enabling better customisation and optimization of the model.

Describe alternatives you've considered

No response

Additional context

No response

Organisation

No response

Remapping of one input variable to multiple new ones

Is your feature request related to a problem? Please describe.

Specific problem: We want the model to treat a variable x in degrees in range [0,360] as cos(x) and sin(x) inside the model and map the output back to the representation in degree.

Describe the solution you'd like

Preprocessor:
The model should handle the remapped variables, and the training loss must be calculated on the the remapped set of variables. The validation loss should be calculated on the original variable, which will also be output in inference. Therefore, this remapping should be implemented as a preprocessor that allows the remapping of one variable to several new ones.

Data Indices: The preprocessors are the first layers of the model, so the input variables are the same.
After the preprocessors, the set of variables has changed. Therefore, this information needs to be included in the data indices.

Describe alternatives you've considered

Changing the set of variables when creating the dataset is not an option since the model output needs to contain the variables we are interested in for inference.

Additional context

No response

Organisation

ECMWF

Activation functions for bounding the output of the model

Is your feature request related to a problem? Please describe.

The output of some specific variables, total precipitation for example, is not bounded. Therefore, the model sometimes outputs negative values, which is not physical.

Describe the solution you'd like

Adding a bounding strategy to the model via the usage of activation functions.

Describe alternatives you've considered

No response

Additional context

No response

Organisation

ECMWF

[Feature] Add tendency training

Is your feature request related to a problem? Please describe.

Allow optional training with optimisation towards tendency as opposed to state

Describe the solution you'd like

Adapt logic to allow the objective to be training towards tendencies

Describe alternatives you've considered

NA

Additional context

Has been shown to improve results with a diffusion model targeting atmosphere - this will be useful for investigating whether or not improvements extend to other models and other domains such as ocean and wave.

Besides implement logic key things to consider:

  1. Any adjustments required to Imputer logic?
  2. [Please Suggest More]

Organisation

ECMWF

Support Limited Area Models

Is your feature request related to a problem? Please describe.

Limited Area Modelling (LAM) is a use case that reuses much of the functionality of global weather models, such as the model architectures.

It would be interesting to extend the capabilities of anemoi-models to support LAM.

The main difference with respect to the current use case is that some input nodes are not part of the output state. These nodes are the boundary forcing.

Describe the solution you'd like

Define an output mask with the data nodes part of the output state. This output mask can be defined based on:

  • A node attribute in the graph
  • Simple rule, like all nodes with incoming connections from the encoder.

Describe alternatives you've considered

No response

Additional context

This supposes that the LAM data and the boundary forcing are passed together.

Organisation

ECMWF

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.