dilis-lab / simbaml Goto Github PK

SimbaML is an all-in-one framework for integrating prior knowledge of ODE models into the ML process by synthetic data augmentation. It allows for the convenient generation of realistic synthetic data by sparsifying and adding noise. Furthermore, our framework provides customizable pipelines for various ML experiments, such as transfer learning.

Home Page: https://simbaml.readthedocs.io

License: MIT License

Dockerfile 0.12% Makefile 0.24% Python 99.63%

open-source prior-knowledge synthetic-data transfer-learning informed-machine-learning data-augmentation ordinary-differential-equations time-series-forecasting

simbaml's People

Contributors

Watchers

simbaml's Issues

Mixed_data_pipeline should return pd.DataFrame, not Dict

Mixed_data_pipeline should return pd.DataFrame, not Dict to be consistent with the other pipelines.

Add more details to Read-the-Docs Documentation

Issue was opened to provide more sophisticated documentation for SimbaML and make it ready to publish.

To-Dos:
[ ] Move example to Quickstart
[ ] Add Installation to landing page of read-the-docs
[ ] remove example from GitHub and add link to documentation instead
[ ] Include link to paper

Include the ability to set the relative and absolute tolerance of the ODE solver

Include the ability to set the relative and absolute tolerance of the ODE solver. This can be useful if a higher precision of the solver is required for certain tasks.

Export y_true in evaluation to export_path

Also, export y_true in evaluation to export_path. Only in this way the models' performance can be properly evaluated.

Make test split by index

So far, users only have the opportunity to define a test_split by defining a percentage that should be split from the dataset and used as a test split. In certain situation, however, defining a test_split also should be possible by defining an index at which the the split should be made. This is not only more handy when having a certain time point to split in mind, but also avoid rounding mistake that can appear with the approach that is so far used in SimbaML.

Include option to define different test splits for pre-training and fine-tuning in transfer learning pipeline

In some scenarios, the user might want to choose a larger test set when fine-tuning on the real-world dataset than when pre-training on the synthetic dataset. Thus it is useful to include the option to set to different test_splits for pre-training and fine-tuning, respectively.

Enable export of trained models

To reuse trained models with SimbaML for further investigations, it is useful for users to have the option to export trained models by configuring an export_model_path.

Allow users to export predictions of model

For investigation purposes of the model's performance, the user should have the option to export the model's prediction by defining an export bath.

Include freezing, adding layers and adjusting the learning rate in transfer learning models

It is common in the transfer learning approach to freeze and add layers when training on the target dataset. Also, a fine-tuning step with unfrozen layers is a common way to adjust the model to the final dataset. SimbaML should be able to support this.

Turn transfer learning models into stand-alone models

So far, SimbaML uses a model_to_transfer_learning_model function to turn any model into a transfer learning model (see model_to_transfer_learning_model.py). As this has some negative side effects (such as non-transfer learning models having transfer learning-specific model parameters), we would prefer to remove the conversion from the model to the transfer learning model and define transfer learning models as stand-alone models.

Include option to set num_worker for PyTorch Lightning models

Finding the optimal number of workers for the PyTorch Lightning models is key to optimize the training performance of models on local machines and cluster. Thus, it is important that user can manually select the num_worker for each model.

To-Do:

Add option to manually set num_workers
Set default to number of cpus of the current machine.

For best practices, see: https://lightning.ai/docs/pytorch/stable/advanced/speed.html

Include option to disable PyTorch Lightning progress bar from being printed

Printing the progress of the pytorch lightning models can run into large output files when training on clusters. Thus, it would be useful for the user to decide whether he/she wants to see the progress of the trained models by printing a progress bar or not.

dilis-lab / simbaml Goto Github PK

simbaml's People

Contributors

Watchers

simbaml's Issues

Mixed_data_pipeline should return pd.DataFrame, not Dict

Add more details to Read-the-Docs Documentation

Include the ability to set the relative and absolute tolerance of the ODE solver

Export y_true in evaluation to export_path

Make test split by index

Include option to define different test splits for pre-training and fine-tuning in transfer learning pipeline

Enable export of trained models

Allow users to export predictions of model

Include freezing, adding layers and adjusting the learning rate in transfer learning models

Turn transfer learning models into stand-alone models

Include option to set num_worker for PyTorch Lightning models

Include option to disable PyTorch Lightning progress bar from being printed

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent