Coder Social home page Coder Social logo

aai-institute / sensai Goto Github PK

View Code? Open in Web Editor NEW
31.0 7.0 3.0 7.7 MB

The Python library for sensible AI.

Home Page: https://aai-institute.github.io/sensAI/docs/

License: Other

Python 90.51% Jupyter Notebook 8.83% Shell 0.61% HTML 0.05%
artificial-intelligence machine-learning python transferlab

sensai's People

Contributors

dependabot[bot] avatar mischapanch avatar opcode81 avatar schroedk avatar uejuli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sensai's Issues

Generalize SpanningTree and graph-related methods

Currently we support graph-related stuff only for geospatial data. If ever needed in a more general context (e.g. for arbitrary Euclidean or even non-Euclidean data, arbitrary triangulations a.s.o.), the classes there should be refactored and extended. Here a snippet of a conversation highlighting some of the problems with the current structure:


It is related to coordinates, since a Delaunay triangulation only makes sense for euclidean points. If we would like to have stuff only related to the graph representation, it shouldn't be tightly coupled to a special type of graph (representing a Delaunay triangulation). The class

class SpanningTree:
    """
    Wrapper around a tree-finding algorithm that will be applied on the Delaunay graph of the datapoints
    """
    def __init__(self, datapoints: np.ndarray, tree_finder: Callable[[nx.Graph], nx.Graph] = nx.minimum_spanning_tree):

should not depend on an np.ndarray, which has to be an array of euclidean coordinates, since a Delaunay triangualation is computed, but rather a nx.Graph. Moreover, paramter tree_finder allows to construct a subgraph, which may not be a spanning tree at all. Actually, the class only provides convenience methods for inspecting a weighted (nx) graph. So, either the class should be

class WeightedGraphWrapper:
    """
    Wrapper around a nx.Graph
    """
    def __init__(self, graph: nx.Graph):

Or, if you really would like to have a spanning tree, it may look like

class SpanningTree:
    def __init__(self, graph: nx.Graph, mode="min"):
        self.tree = nx.minimum_spanning_tree if mode== "min" else ...

(resp. including an enum for min max). In any case, I don't know if this is enough functionality, to justify a class.
By now, it is tightly coupled to euclidean point graphs (even more, Delaunay graphs), so I would put it into geoanalytics. What do you think?

Originally posted by @schroedk in #50 (comment)

Rule-Based Models' fit interface

The overrides of the rule-based models violate the substitution principle. The specific fit methods should be deleted.
It is also unexpected for the fitPreprocessors argument to be completely ignored for rule-based models in VectorMode.fit.

Maybe add a formatter?

This becomes especially relevant in case external devs want to contribute.

I personally grew to like the black formatter (after an initial phase of wtf), that I use together with isort in git hooks through pre-commit.

Support passing separate validation data set to pytorch-lightning based models

With the higher level vector model interfaces, validation in training of neural networks can currently only be performed with a train-test split. On the other hand, the lower level NNOptimizer itself supports passing a separate validation set. Pytorch-lightning's Trainer class also has similar capabilities which are currently disabled in sensai because of vector model's fit interface. I ran into this myself because I want to fit pytorch-lightning models to a data frame where the validation set cannot be created by splitting (due to data leakage).

Note that it is not sufficient to use evaluators for performing validation on separate sets since this way the validation set cannot be used for early stopping.

I propose to address this by relaxing VectorModel's interface and allowing to pass a validation set in fit and fit_classifier.

Something like fit(X, Y, X_validation=None, Y_validation=None)

@opcode81 @schroedk
If you think this is reasonable, I would prepare a PR in the next days.

Do not track rst-files in repository

For the current workflow, one needs to manually update rst-files (using a script build_scripts/update_docu.py). While the script generates rst-files for new modules, it does not delete rst-files for no longer existent modules. So either (hopefully) build breaks on documentation build step or resulting documentation is referencing non-existing code.

@MischaPanch. is there any reason, we track rst-files in the repository and do not generate them on the fly? In a different project, we use a script for exactly this task, which is called in build pipeline, so documentation always reflects actual code base.

Support models that can input and predict multidimensional tensors

This issue is only about setting up models and training, evaluation of such models will be handled in a separate issue. Thus, it is mainly about writing the TensorModel abstraction and some implementations of it.

The goal is to enable training of autoencoders, prediction of 2-dimensional tensors for geospatial analysis and so on.

I am not entirely sure how to deal with models that take multi-dim data and predict scalars like CNN-based classifiers and regressors (a very common use case). VectorModel does not feel like the right fit, nor does TensorModel (because it will focus on tensor-like prediction). @opcode81 Do you have an idea about that?

Support conda for venv configuration

We should consider switching to conda for venv configuration (using the YAML-based environment specification: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#sharing-an-environment).
Installing libraries like tensorflow via conda has considerable advantages. In particular, the conda-based installation will install CUDA and CuDNN in versions that match the tf installation, whereas the pip-based installation requires that the matching installations be provided externally, which is extremely painful.

Originally posted by @opcode81 in #21 (comment)

Fix publish package github workflow

For some reason it stopped working. It should be triggered when releases are created but this did not happen for the last releases (that were created by a workflow themselves, maybe that's the reason). I uploaded the last release to pypi manually because of that

Update dependencies that are safe

I think we can safely update to the newest versions for everything apart from tf, torch and lgbm. Only for pandas there would be a bump in the major version but I am pretty confident it does not break any of our code

Write evaluation methods for TensorModel

This will include new metrics (like intersection over union, stuctural similarity indices and so on) and also new visualization methods for the special case of 2-dim. data.

Refactoring of the clustering package, bundling geo-analytics stuff in a separate optional package

ClusteringModel is not sufficiently general; it should be renamed to EuclidianClusterer as it assumes that data points are points in a Euclidian space. All subclasses should be named accordingly.

Since geopandas is a problematic dependency (see #45), all geo-analytics-related code should be bundled in a new package geoanalytics.
This package should also contain all the utils that depend on geopandas.

Restructuring of the package:
clustering/base/clustering.py -> clustering/clustering_base.py
clustering/sklearn_clustering.py: shall contain sklearn base classes and specialisations (using prefix "Sk" not "SK")
clustering/coordinate_clustering: move to geoanalytics

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.