autogoal / autogoal Goto Github PK

A Python framework for program synthesis with a focus on Automated Machine Learning.

License: MIT License

Python 96.42% Makefile 1.74% Dockerfile 0.30% HTML 0.40% Shell 0.90% PowerShell 0.24%

automl framework machine-learning optimization program-synthesis python3

autogoal's Introduction

AutoGOAL is a Python library for automatically finding the best way to solve a given task. It has been designed mainly for Automated Machine Learning (aka AutoML) but it can be used in any scenario where you have several possible ways to solve a given task.

Technically speaking, AutoGOAL is a framework for program synthesis, i.e., finding the best program to solve a given problem, provided that the user can describe the space of all possible programs. AutoGOAL provides a set of low-level components to define different spaces and efficiently search in them. In the specific context of machine learning, AutoGOAL also provides high-level components that can be used as a black-box in almost any type of problem and dataset format.

⭐ Quickstart

AutoGOAL is first and foremost a framework for Automated Machine Learning. As such, it comes pre-packaged with hundreds of low-level machine learning algorithms that can be automatically assembled into pipelines for different problems.

The core of this functionality lies in the AutoML class.

To illustrate the simplicity of its use we will load a dataset and run an automatic classifier in it. The following code will run for approximately 5 minutes on a classic dataset.

from autogoal.datasets import cars
from autogoal.kb import (MatrixContinuousDense, 
                         Supervised, 
                         VectorCategorical)
from autogoal.ml import AutoML

# Load dataset
X, y = cars.load()

# Instantiate AutoML and define input/output types
automl = AutoML(
    input=(MatrixContinuousDense, 
           Supervised[VectorCategorical]),
    output=VectorCategorical
)

# Run the pipeline search process
automl.fit(X, y)

# Report the best pipeline
print(automl.best_pipeline_)
print(automl.best_score_)

Sensible defaults are defined for each of the many parameters of AutoML. Make sure to read the documentation for more information.

⚙️ Installation

The easiest way to get AutoGOAL up and running with all the dependencies is to pull the development Docker image, which is somewhat big:

docker pull autogoal/autogoal

Instructions for setting up Docker are available here.

Once you have the development image downloaded, you can fire up a console and use AutoGOAL interactively.

If you prefer to not use Docker, or you don't want all the dependencies, you can also install AutoGOAL directly with pip:

pip install autogoal

This will install the core library but you won't be able to use any of the underlying machine learning algorithms until you install the corresponding optional dependencies. You can install them all with:

pip install autogoal[contrib]

To fine-pick which dependencies you want, read the dependencies section.

⚠️ NOTE: By installing through pip you will get the latest release version of AutoGOAL, while by installing through Docker, you will get the latest development version.

The development version is mostly up-to-date with the main branch, hence it will probably contain more features, but also more bugs, than the release version.

💻 CLI

You can use AutoGOAL directly from the CLI. To see options just type:

autogoal

Using the CLI you can train and use AutoML models, download datasets and inspect the contrib libraries without writing a single line of code.

🤩 Demo

An online demo app is available at autogoal.github.io/demo. This app showcases the main features of AutoGOAL in interactive case studies.

To run the demo locally, simply type:

docker run -p 8501:8501 autogoal/autogoal

And navigate to localhost:8501.

⚖️ API stability

We make a conscious effort to maintain a consistent public API across versions, but the private API can change at any time. In general, everything you can import from autogoal without underscores is considered public.

For example:

# "clean" imports are part of the public API
from autogoal import optimize   
from autogoal.ml import AutoML  
from autogoal.contrib.sklearn import find_classes

# public members of public types as well
automl = AutoML
automl.fit(...) 

# underscored imports are part of the private API
from autogoal.ml._automl import ...
from autogoal.contrib.sklearn._generated import ...

# as well as private members of any type
automl._input_type(...)

These are our consistency rules:

Major breaking changes are introduced between major version updates, e.g., x.0 and y.0. These can be additions, removals, or modifications of any kind in any part of the API.
Between minor version updates, e.g., 1.x and 1.y, you can expect to find new functionality, but anything you can use from the public API will still be there with a consistent semantic (save for bugfixes).
Between micro version updates, e.g., 1.3.x and 1.3.y, the public API is frozen even for additions.
The private API can be changed at all times.

⚠️ While AutoGOAL is on public beta (versions 0.x) the public API is considered unstable and thus everything can change. However, we try to keep breaking changes to a minimum.

📚 Documentation

This documentation is available online at autogoal.github.io. Check the following sections:

User Guide: Step-by-step showcase of everything you need to know to use AutoGOAL.
Examples: The best way to learn how to use AutoGOAL by practice.
API: Details about the public API for AutoGOAL.

The HTML version can be deployed offline by downloading the AutoGOAL Docker image and running:

docker run -p 8000:8000 autogoal/autogoal mkdocs serve -a 0.0.0.0:8000

And navigating to localhost:8000.

📃 Publications

If you use AutoGOAL in academic research, please cite the following paper:

@article{estevez2020general,
  title={General-purpose hierarchical optimisation of machine learning pipelines with grammatical evolution},
  author={Est{\'e}vez-Velarde, Suilan and Guti{\'e}rrez, Yoan and Almeida-Cruz, Yudivi{\'a}n and Montoyo, Andr{\'e}s},
  journal={Information Sciences},
  year={2020},
  publisher={Elsevier},
  doi={10.1016/j.ins.2020.07.035}
}

The technologies and theoretical results leading up to AutoGOAL have been presented at different venues:

Optimizing Natural Language Processing Pipelines: Opinion Mining Case Study marks the inception of the idea of using evolutionary optimization with a probabilistic search space for pipeline optimization.
AutoML Strategy Based on Grammatical Evolution: A Case Study about Knowledge Discovery from Text applied probabilistic grammatical evolution with a custom-made grammar in the context of entity recognition in medical text.
General-purpose Hierarchical Optimisation of Machine Learning Pipelines with Grammatical Evolution presents a more uniform framework with different grammars in different problems, from tabular datasets to natural language processing.
Solving Heterogeneous AutoML Problems with AutoGOAL is the first actual description of AutoGOAL as a framework, unifying the ideas presented in the previous papers.

🤝 Contribution

Code is licensed under MIT. Read the details in the collaboration section.

This project follows the all-contributors specification. Any contribution will be given credit, from fixing typos, to reporting bugs, to implementing new core functionalities.

Here are all the current contributions.

🙏 Thanks!

_{Suilan Estevez-Velarde} 💻 ⚠️ 🤔 📖	_{Alejandro Piad} 💻 ⚠️ 📖	_{Yudivián Almeida Cruz} 🤔 📖	_ygutierrez 🤔 📖	_{Ernesto Luis Estevanell Valladares} 💻 ⚠️	_{Alexander Gonzalez} 💻 ⚠️	_{Anshu Trivedi} 💻
_{Alex Coto} 📖	_{Guillermo Blanco} 🐛 💻 📖	_yacth 🐛 💻	_{Brandon Fergerson} 🐛	_{Aditya Nikhil} 🐛	_lucas-FP 🐛 💻	_{Leynier Gutiérrez González} 📖
_{Ender Minyard} 📖

autogoal's People

Contributors

Stargazers

Watchers

Forkers

o7s8r6 manikant92 dhruvmsheth alxrcs geblanco yacth integracore2 anshutrivedi adityanikhil apenab jvaldesgonzalez apiad gcemaj bfair-ml lucas-fp lorainemg mauricio1802 ericmg97 fsadannn adriantpaez antoniojesus0398 blackbeard98 hros18 liset97 danielgpz ioverflow stdevmac dayfundora yami9408 lsuarez98 gmijenes carlosmartinezmolina dcordb aldorivero oschdez97 ravenpoe1900 amandamiguel leynier rmarticedeno 3i-hust-ps eguinosa alexfertel cbermudez97 yasmany93 tonyrbf96 shahp7575 eestevanell danielorlando97 stjordanis rodrigo-pino mrrobot2211 karelito00 samueldsr99 aleescgir openssl-sg-insights robegr42 sosacode20 danieluh2019 javieroramas lia001218 geekslabtech jlm111-ua

autogoal's Issues

Add support for Convolutional Neural Networks

The objective of this issue is to insert Convolutional Neural Networks (CNN). The main idea is to solve problems like image segmentation. For now, we are guiding by class KerasImageClassifier, we think the final approach for CNN is close to that.

Update `dependencies.md`

Since #17 the current explanation in docs/dependencies.md is no longer true. We need to update that to reflect:

How to install all dependencies
Which contribs are actually defined in pyproject.toml

Identify text language

Using the libraries numpy, Scipy, nltk and Scikit-learn, and Bayesian Network to identify a text language (Spanish and English but with the possibility of adding more).

Add DrQA system from Facebook

From the project page :

"DrQA is a system for reading comprehension applied to open-domain question answering. In particular, DrQA is targeted at the task of "machine reading at scale" (MRS)."

The idea is to integrate this system into Autogoal.

Add rich logger

Using rich, we can add a very cool logger that supersedes what we now have with a combination of ConsoleLogger and ProgressLogger. This new logger should:

Log each pipeline evaluated.
Log progress bars for overall progress, the current generation, and the current pipeline.
Optionally save the log to an HTML file at the end.

Add algorithms to detect, highlight and correct grammatical errors on natural language text.

Add algorithms to detect, highlight and correct grammatical errors on natural language text using Gramformer.

Añadir algortimo para reconocimiento de comandos de voz

Añadir un algoritmo de "Machine Learning" que realice reconocimiento de comandos de voz. El método run del algoritmo recibirá el tipo VectorContinuos y devolverá el tipo Word, ambos definido en el módulo _semantics.
Para el desarrollo y prueba del algoritmo se utilizará el dataset www.tensorflow.org/datasets/catalog/speech_commands.

[Feature Request] Save trained classifier

Hi!

After fitting to find the best pipeline, you can save/load it with the appropriate methods, but this only saves the best pipeline found yet.

It would be a nice feature to be able to save the fitted classifier, otherwise, the best pipeline has to be retrained every time, which is very time consuming.

I have tried to do something similar with:

best_pipe = classifier.best_pipeline_.sampler_.replay()
pickle.dump(best_pipe, open('./trained_classifier.pkl', 'wb'))

But it fails to load it again later with a maximum recursion limit error. Raising the recursion limit doesn't solve it either.

What do you think? Would this be a desirable feature or is out of scope?

Best,

Low-level annotation for "SubsetOf"

The idea is to be able to annotate things like:

class A:
    def __init__(self, features:Subset('A','B','C')):
        self.features = features

And features is a list that can contain any valid subset from ['A', 'B', 'C'] (maybe with a kwarg option to allow/disallow the empty set).

There is something like this already in the class https://github.com/autogoal/autogoal/blob/main/autogoal/grammar/_cfg.py#L89 but it requires that the options are themselves production items, hence it tries to recurse into building a grammar for each sub-item. That is useful when what you want is a subset of objects that are themselves algorithms, but not when you want simply to select objects (strings) from a list. Hence, it has to be adapted to this "simpler" use case.

Integration for SparseML

Add integration for sparsification algorithms using SparseML library.

Enrich documentation

Hello AutoGoal Team!

Congrats for the project! It looks amazing!

I was tinkering around with the library and have several considerations:

The documentation is still a bit harsh, are you planning on growing the current documentation? I.e.: The only way (I am aware of) to get the difference between a ConsoleLogger and a MemoryLogger is to look at the source code, no description is provided in the API section.
Right now, examples are mixed up with the web page, while this is nice for reading, it is not so for testing, I found the plain python files inside the examples folder, but maybe linking those directly in the examples web page could be great, just like Google has (nice little button with a <full source code link>). What do you think?

I could happily contribute on those things :)

Best,

Cars dataset not existing

The uci_cars dataset isn't present in its folder
Using :
from autogoal.datasets import cars

Gives us :
[Errno 2] No such file or directory: '/home/coder/autogoal/autogoal/datasets/data/uci_cars.zip'

Refactor examples to use the RichLogger

Error when saving models

This error appears when saving the pipeline's models

PicklingError Traceback (most recent call last)
in ()
1 with open('model_reviews.pkl', 'wb') as f:
----> 2 model.save(f)
3
4 print('Done!')

/usr/local/lib/python3.7/dist-packages/autogoal/ml/_automl.py in save(self, fp)
102 """
103 self._check_fitted()
--> 104 pickle.Pickler(fp).dump(self)
105
106 @classmethod

PicklingError: Can't pickle <class 'abc.SeqAlgorithm[TabTokenizer]'>: attribute lookup SeqAlgorithm[TabTokenizer] on abc failed

Dev environment tests fail

Hi,

To get started with #23, I created the dev environment following the guides in the README, unfortunately, the tests fail.
Steps to reproduce:

Fork the project
Clone the forked project
make pull
make shell
make test (inside the docker container)

Error:

========================================================================================================== short test summary info ==========================================================================================================
FAILED tests/contrib/test_contrib.py::test_create_grammar_for_generated_class[CountVectorizerNoTokenize] - ValueError: Cannot find compatible implementations for interface <class 'types.Algorithm[List(Word()), List(Word())]'>
======================================================================================== 1 failed, 166 passed, 134 deselected, 7 warnings in 30.00s =========================================================================================
makefile:98: recipe for target 'test' failed
make: *** [test] Error 1

Any idea how that got wrong?

Best,

Add support for RL based on AlphaZero for 2-Players Boards Games with Perfect Information

Create a mechanism to define game rules for 2-players board games where perfect information is available such as Tic-Tac-Toe, Hex, Blue, Chess, Go, etc, and support finding an "expert" level DNN model, trained from a complete blank state, without any input data other than the game rules itself. The model training would be based on the AlphaZero algorithm, where we must define as input parameter also the amount of time and memory for training.

(version 0.6.0)Error when saving models(other)

PicklingError Traceback (most recent call last)
in ()
1 with open('model_reviews.pkl', 'wb') as f:
----> 2 model.save(f)
3
4 print('Done!')

/usr/local/lib/python3.7/dist-packages/autogoal/ml/_automl.py in save(self, fp)
102 """
103 self._check_fitted()
--> 104 pickle.Pickler(fp).dump(self)
105
106 @classmethod

PicklingError: Can't pickle <class 'abc.SeqAlgorithm[WhitespaceTokenizer]'>: attribute lookup SeqAlgorithm[WhitespaceTokenizer] on abc failed

Speech recognition with hmmlearn

Add some of the functionalities of the hmmlearn library for the usage of Hidden Markov Models (HMM) and it's use in speech recognition. Also add algorithms to extract features from audio data who can be used as input to HMM algorithms.

Just a typo

This is easy to fix. I just noticed a typo in the README.

User Guide: Step-by-step showcase of everything you need to know to use AuoGOAL.

Pull request: #115

Named Entity Detection

Add spacy algorithms to find entities in unstructured text. Based on the nlp spacy's algorithms and using en_core_web_sm model to process texts searching for entities.

Check pythonic style in CI GitHub Action

Add and force the use of formatting and style rules using flake8, black, isort, etc.

Refactor logging and callbacks

The problem

Right now we have a Logger class that receives callbacks from SearchAlgorithm. Implementations of this include logging to terminal and files, as well as saving the intermediate steps. The Logger name is misleading. This is actually a Callback since logging is just one of the things that could be done.

Likewise, there are many things that can be implemented as callbacks and are now hard-coded, e.g., the early stopping criteria, model serialization, etc.

Finally, we already have a pretty nice logging system implemented with rich, but many places inside the library are not aware of this mechanism. There are lots of prints and custom progress bars around there.

Proposed solution:

Refactor the Logger class to be called Callback and move it to its own namespace.
Refactor all the custom print and progress bars to use the new logging system in autogoal.logging and rich progress bars.
Implement as callbacks all the hard-coded behaviour inside SearchAlgorithm than can be extracted, such as early stopping, saving the best model and saving the search algorithm progress.

Fix keras CRF import error

Add Text classification to autogoal using fastText from Facebook's AI Research (FAIR) lab.

Integrate Data Augmentations with AugLy

Augly is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations.

Objective: Integrate Augly Model to Autogoal pipeline.

PyPI Library - Augly

Add GAN

Add rules and algorithms to process GAN in autogoal

Add Algorithm for Image generation using GAN

Automatic release to PyPi and Docker Hub

The purpose is to automatically build a Python package with Poetry and submit to PyPi on every release. Also, create a custom image in DockerHub.

Add Text-To-Speech Model

Integrate Text-To-Speech Model using the PyPI library TTS.

Add configuration for `pip install autogoal[all]`

The installation command 'pip install autogoal[all]' misses sklearn and tqdm packages

Support for GANs module

Hey there,
it's been a while since I talked about this idea on integrating GANs with AutoGoal.
So it's very simple idea for integrating gans module with autogoal. Which shall look like this,

from autogoal.ml import GANS

#initialize model
gan = GANS.dcgan()

#load train data
Data = gan.load_data(use_mnist=True)

#get samples from daa object
Samples = gan.get_samples(Data, n_samples=5)

#train the model
Gan.fit(train_ds=Data)

#get generated saples from model
Generated_samples = gan.generate_samples(n_samples=5)

Firstly am thinking to start off with simple GANs architecture and then we'll slowly add more gan architectures like Pix2Pix, WGAN, CycleGAN.....

Support I need,
I have the raw code ready to run GANs. I need your help in turning this raw code to a class and then a beautiful module which can be easily accessed from AutoGoal like shown above.

@apiad ,I hope you like it. Also am hoping to get support from you in building this module.

Looking forward for your reply!

Write design and contributing guides

load_cfg

As exists a function generate_cfg it is necessary a function load_cfg.

Move the API documentation to use `illiterate`

SeqAlgorithm problems

I am currently working on my graduation course's final project and introducing new methods to deal with portuguese text representation in Autogoal. I already had some working code, but everything stopped working when I updated the autogoal framework a couple of weeks ago.

I correctly renamed all the Lists to Seq algorithms, changed all other types that got new names and started using the Supervised type in the input. Unfortunately, I still couldn't find valid pipelines with the small registry that I was using (it was the same code and same registry that was working some weeks before).

So I started debugging and noticed that only the first algorithm of the graph was being wrapped by the SeqAlgorithm. In line 494 of algorithm.py the traversed algorithms are the ones in registry and not the ones in the pool with wrapped SeqAlgorithms. Changing that line to iterate over pool kind of solved the issue.

Now the pipelines were found, but whenever the sampler tried to initialize an algorithm that was wrapped by the SeqAlgorithm and had parameters in the _init_ method, an error was thrown because those parameters were not passed.

This was a trickier one to solve. I noticed that in line 315 of cfg.py the signature being retrieved for SeqAlgorithms did not contain the inner class __init__ parameters. I solved this one by creating a new method get_inner_signature in SeqAlgorithms that returns the inner class's __init__ signature. So back at line 315 of _cfg.py I could check if the class had that method implemented and, if so, used it instead of the __init__ signature.
The problem now is that the pipelines I get after these changes are using a lot of unnecessary SeqAlgorithms that doesn't make sense.

So, now I am in a spot where I do not know if I was doing something wrong in the first place and all these changes I made were actually not corrections, or if these changes I made were really necessary and I still need to dig deeper. If someone could point me out to the right direction I will be very grateful.

Add Stacking ensemble

Adding this issue to keep track of this feature.

Add CLI option to train / evaluate / save pipeline directly from the terminal

What I envision is something like this:

$ python -m autogoal automl fit --input-type List(Sentence()) --output-type CategoricalVector() \
    --file CSV_FILE --target COLUMN_NAME --model MODEL_NAME

This would instantiate an AutoML class, load the corresponding CSV file, and run fit, using the COLUMN_NAME as the target. Once finished, it would dump the trained pipeline to MODEL_NAME.pickle or something similar.

Then we could do something like:

$ python -m autogoal automl predict --file CSV_FILE --model MODEL_NAME --output OUTPUT_FILE

This would unpickle MODEL_NAME.pickle and execute it on each element of the CSV_FILE and write the results to OUTPUT_FILE.

Additionally, we could have options like --timeout, --memory, etc., to configure the AutoML parameters.

Add CLI option for inspecting the contrib database

The idea is to have something like

$ python -m autogoal contrib list

This would show all the algorithms currently available under contrib that can be imported with the currently installed dependencies, and warnings for all the dependencies that need to be installed.

The algorithms can be obtained by calling autogoal.contrib.find_classes().

$ python -m autogoal contrib search [--include KEYWORD] [--exclude KEYWORD] [--input INPUT_TYPE] [--output OUTPUT_TYPE]

This would list the algorithms that match the given keywords (e.g., "Embedding") and optionally matching the input and output types.

The algorithms can be obtained by calling autogoal.contrib.find_classes(include="...", exclude="..."). For matching input and output types, some coding would be necessary, but I think the Interface class solves most of the problem.

Example `movie_reviews` not working

Hello,

Following the sklearn integration example from the user guide I encountered some errors.
At first,

python docs/examples/sklearn_simple_grammar.py
<Pipeline>      := Pipeline (vectorizer=<Vectorizer>, decomposer=<Decomposer>, classifier=<Classifier>)
<Vectorizer>    := <Count> | <TfIdf>
<Count>         := Count (ngram=<Count_ngram>)
<Count_ngram>   := discrete (min=1, max=3)
<TfIdf>         := TfIdf (ngram=<TfIdf_ngram>, use_idf=<TfIdf_use_idf>)
<TfIdf_ngram>   := discrete (min=1, max=3)
<TfIdf_use_idf> := boolean ()
<Decomposer>    := <NoDec> | <SVD>
<NoDec>         := NoDec ()
<SVD>           := SVD (n=<SVD_n>)
<SVD_n>         := discrete (min=50, max=200)
<Classifier>    := <LR> | <SVM> | <DT>
<LR>            := LR (penalty=<LR_penalty>, reg=<LR_reg>)
<LR_penalty>    := categorical (options=['l1', 'l2'])
<LR_reg>        := continuous (min=0.1, max=10)
<SVM>           := SVM (kernel=<SVM_kernel>, reg=<SVM_reg>)
<SVM_kernel>    := categorical (options=['rbf', 'linear', 'poly'])
<SVM_reg>       := continuous (min=0.1, max=10)
<DT>            := DT (criterion=<DT_criterion>)
<DT_criterion>  := categorical (options=['gini', 'entropy'])
Pipeline(classifier=LR(penalty='l2', reg=3.5043071948123306),
         decomposer=NoDec(), vectorizer=TfIdf(ngram=1, use_idf=True))
Pipeline(classifier=SVM(kernel='poly', reg=9.486662105103457),
         decomposer=NoDec(), vectorizer=Count(ngram=3))
Pipeline(classifier=LR(penalty='l2', reg=3.7888734971195985),
         decomposer=SVD(n=117), vectorizer=Count(ngram=2))
Pipeline(classifier=DT(criterion='gini'), decomposer=SVD(n=131),
         vectorizer=Count(ngram=2))
Pipeline(classifier=LR(penalty='l1', reg=1.2230686432095683),
         decomposer=NoDec(), vectorizer=TfIdf(ngram=3, use_idf=False))
Pipeline(classifier=DT(criterion='gini'), decomposer=SVD(n=111),
         vectorizer=Count(ngram=3))
Pipeline(classifier=SVM(kernel='rbf', reg=6.572569333421884),
         decomposer=SVD(n=89), vectorizer=Count(ngram=1))
Pipeline(classifier=DT(criterion='entropy'), decomposer=NoDec(),
         vectorizer=Count(ngram=1))
Pipeline(classifier=LR(penalty='l1', reg=7.546789410143803),
         decomposer=SVD(n=140), vectorizer=Count(ngram=1))
Pipeline(classifier=LR(penalty='l1', reg=4.0054809098376065),
         decomposer=SVD(n=113), vectorizer=Count(ngram=2))
Traceback (most recent call last):
  File "docs/examples/sklearn_simple_grammar.py", line 298, in <module>
    logger = ProgressLogger(log_solutions=True)
TypeError: ProgressLogger() takes no arguments

Then, removing the parameter to the ProgressLogger, I get the following error:

python docs/examples/sklearn_simple_grammar.py
<Pipeline>      := Pipeline (vectorizer=<Vectorizer>, decomposer=<Decomposer>, classifier=<Classifier>)
<Vectorizer>    := <Count> | <TfIdf>
<Count>         := Count (ngram=<Count_ngram>)
<Count_ngram>   := discrete (min=1, max=3)
<TfIdf>         := TfIdf (ngram=<TfIdf_ngram>, use_idf=<TfIdf_use_idf>)
<TfIdf_ngram>   := discrete (min=1, max=3)
<TfIdf_use_idf> := boolean ()
<Decomposer>    := <NoDec> | <SVD>
<NoDec>         := NoDec ()
<SVD>           := SVD (n=<SVD_n>)
<SVD_n>         := discrete (min=50, max=200)
<Classifier>    := <LR> | <SVM> | <DT>
<LR>            := LR (penalty=<LR_penalty>, reg=<LR_reg>)
<LR_penalty>    := categorical (options=['l1', 'l2'])
<LR_reg>        := continuous (min=0.1, max=10)
<SVM>           := SVM (kernel=<SVM_kernel>, reg=<SVM_reg>)
<SVM_kernel>    := categorical (options=['rbf', 'linear', 'poly'])
<SVM_reg>       := continuous (min=0.1, max=10)
<DT>            := DT (criterion=<DT_criterion>)
<DT_criterion>  := categorical (options=['gini', 'entropy'])
Pipeline(classifier=LR(penalty='l1', reg=5.0100641442497995),
         decomposer=NoDec(), vectorizer=Count(ngram=1))
Pipeline(classifier=DT(criterion='entropy'), decomposer=SVD(n=132),
         vectorizer=TfIdf(ngram=1, use_idf=True))
Pipeline(classifier=SVM(kernel='poly', reg=4.920899781533629),
         decomposer=NoDec(), vectorizer=TfIdf(ngram=1, use_idf=False))
Pipeline(classifier=SVM(kernel='poly', reg=6.2324098495719165),
         decomposer=SVD(n=168), vectorizer=TfIdf(ngram=3, use_idf=False))
Pipeline(classifier=LR(penalty='l2', reg=1.1722788593996094),
         decomposer=NoDec(), vectorizer=TfIdf(ngram=3, use_idf=True))
Pipeline(classifier=LR(penalty='l1', reg=2.6947947887881387),
         decomposer=SVD(n=70), vectorizer=TfIdf(ngram=3, use_idf=True))
Pipeline(classifier=SVM(kernel='poly', reg=6.555178265496012),
         decomposer=SVD(n=161), vectorizer=TfIdf(ngram=3, use_idf=True))
Pipeline(classifier=SVM(kernel='poly', reg=0.11487924961813162),
         decomposer=NoDec(), vectorizer=Count(ngram=3))
Pipeline(classifier=SVM(kernel='poly', reg=3.3642041546657793),
         decomposer=SVD(n=69), vectorizer=TfIdf(ngram=3, use_idf=True))
Pipeline(classifier=DT(criterion='entropy'), decomposer=NoDec(),
         vectorizer=Count(ngram=1))
Restricting memory to 4294967296

Current Gen 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 41.58 evals/s]
Best: 0.000   0%|▍                                                                                                                                                                                   |    2/1000 [00:00<00:13, 78.01 evals/s]
Traceback (most recent call last):
  File "docs/examples/sklearn_simple_grammar.py", line 301, in <module>
    best_rand, fn_rand = random_search.run(1000, logger=logger)
  File "/data/gblanco/virtualenvs/race-experiments/lib/python3.8/site-packages/autogoal/search/_base.py", line 112, in run
    raise e from None
  File "/data/gblanco/virtualenvs/race-experiments/lib/python3.8/site-packages/autogoal/search/_base.py", line 105, in run
    fn = self._fitness_fn(solution)
  File "/data/gblanco/virtualenvs/race-experiments/lib/python3.8/site-packages/autogoal/utils/_process.py", line 75, in __call__
    return self.run_restricted(*args, **kwargs)
  File "/data/gblanco/virtualenvs/race-experiments/lib/python3.8/site-packages/autogoal/utils/_process.py", line 124, in run_restricted
    raise result
AttributeError: 'Pipeline' object has no attribute 'send'

Original Traceback (most recent call last):
  File "/data/gblanco/virtualenvs/race-experiments/lib/python3.8/site-packages/autogoal/utils/_process.py", line 40, in _restricted_function
    result = self.function(*args, **kwargs)
  File "/data/gblanco/virtualenvs/race-experiments/lib/python3.8/site-packages/autogoal/datasets/movie_reviews.py", line 37, in fitness_fn
    pipeline.send("train")
AttributeError: 'Pipeline' object has no attribute 'send'

Inspecting the code, I see that the fitness_fn does a send call, isn't it a Sklearn Pipeline? In which case, there is no send nor run methods.

What is the purpose of those send calls? Maybe notify the worker process?

Best,

Refactor unit-testing for all contrib imports

Refactor dependencies

Correctly separate dependencies into:

core (i.e., without any ML framework)
development (e.g., pytest, coverage, mkdocs, etc.)
extras (one group for each contrib module)

Extraction of meta-features for selection of clustering algorithms

The idea of meta-learning is to learn based on already adquired experiences from previous tasks. This technique, in the form of extracting meta-features from the dataset, can give us information about the most suitable clustering algorithm to use.

Serialization functionality for arbitrary pipelines

To fix the save/load functionality once and for all, we should add save and load methods to each class in the whole path from AutoML down to single algorithms. This way, libraries with custom serialization protocols (looking at you, tensorflow) can do their thing, and we can fall back to pickle for the most common cases.

We have to see if there is a way to present a transparent file-system-like interface to the pipelines, so they can create folders and files as they wish, but from the outside we only have a single file. Dunno if we can do this with zip files.

Add Stacking Ensemble

Es un algoritmo que recive dos algoritmos de "nivel 0" y uno de "nivel 1" y devuelve un algoritmo que es la combinacion de los dos de nivel 0.

Add integration tests based on the examples

We need a final layer of testing that consists of executing the code examples, at least for one generation, to guarantee that no major disruptions are being caused. This would not run on Github Actions (they are very costly) but should be run prior to any release.

Add Image-To-Text Model

Integrate Image-To-Text Model using the PyPI library pytesseract.

Change the dataset storage to a folder outside the project

The problem:
Right now the call to autogoal.download(dataset) stores each dataset inside a data folder in the project source.
This is problematic for many reasons: First, data shouldn't be in the project source folder (even if .gitignored). Second, when autogoal is installed with pip it lands in site-packages and that location is often write-protected against non-root users.

Proposed solution:
Similar to what nltk or gensim do, simply create a .autogoal folder in $HOME and store everything there.

Add Deep Face Recognition and Facial Attribute Analysis

Add Text Translation to autogoal using Transformer

From the Transformers Repository:
State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0.

Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation and more in over 100 languages. Its aim is to make cutting-edge NLP easier to use for everyone.

Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on their model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.

Transformers is backed by the three most popular deep learning libraries — Jax, PyTorch and TensorFlow — with a seamless integration between them. It's straightforward to train your models with one before loading them for inference with the other.

Add DeepMatcher to Autogoal

From the project page:

"DeepMatcher is a Python package for performing entity and text matching using deep learning. It provides built-in neural networks and utilities that enable you to train and apply state-of-the-art deep learning models for entity matching in less than 10 lines of code. The models are also easily customizable - the modular design allows any subcomponent to be altered or swapped out for a custom implementation."

@dgc9715 and I would like to add this to Autogoal.

Correct handling of dataset download error in demo

When a dataset is not downloaded, the demo will launch an error. It should be better informed, perhaps even show a button to download it.

autogoal / autogoal Goto Github PK

autogoal's Introduction

⭐ Quickstart

⚙️ Installation

💻 CLI

🤩 Demo

⚖️ API stability

📚 Documentation

📃 Publications

🤝 Contribution

autogoal's People

Contributors

Stargazers

Watchers

Forkers

autogoal's Issues

This error appears when saving the pipeline's models

Recommend Projects

Recommend Topics

Recommend Org