SimbaML is an all-in-one framework for integrating prior knowledge of ODE models into the ML process by synthetic data augmentation. It allows for the convenient generation of realistic synthetic data by sparsifying and adding noise. Furthermore, our framework provides customizable pipelines for various ML experiments, such as transfer learning.
Issue was opened to provide more sophisticated documentation for SimbaML and make it ready to publish.
To-Dos:
[ ] Move example to Quickstart
[ ] Add Installation to landing page of read-the-docs
[ ] remove example from GitHub and add link to documentation instead
[ ] Include link to paper
Include the ability to set the relative and absolute tolerance of the ODE solver. This can be useful if a higher precision of the solver is required for certain tasks.
So far, users only have the opportunity to define a test_split by defining a percentage that should be split from the dataset and used as a test split. In certain situation, however, defining a test_split also should be possible by defining an index at which the the split should be made. This is not only more handy when having a certain time point to split in mind, but also avoid rounding mistake that can appear with the approach that is so far used in SimbaML.
In some scenarios, the user might want to choose a larger test set when fine-tuning on the real-world dataset than when pre-training on the synthetic dataset. Thus it is useful to include the option to set to different test_splits for pre-training and fine-tuning, respectively.
To reuse trained models with SimbaML for further investigations, it is useful for users to have the option to export trained models by configuring an export_model_path.
It is common in the transfer learning approach to freeze and add layers when training on the target dataset. Also, a fine-tuning step with unfrozen layers is a common way to adjust the model to the final dataset. SimbaML should be able to support this.
So far, SimbaML uses a model_to_transfer_learning_model function to turn any model into a transfer learning model (see model_to_transfer_learning_model.py). As this has some negative side effects (such as non-transfer learning models having transfer learning-specific model parameters), we would prefer to remove the conversion from the model to the transfer learning model and define transfer learning models as stand-alone models.
Finding the optimal number of workers for the PyTorch Lightning models is key to optimize the training performance of models on local machines and cluster. Thus, it is important that user can manually select the num_worker for each model.
To-Do:
Add option to manually set num_workers
Set default to number of cpus of the current machine.
Printing the progress of the pytorch lightning models can run into large output files when training on clusters. Thus, it would be useful for the user to decide whether he/she wants to see the progress of the trained models by printing a progress bar or not.