Hello, I would like to ask about the training performance reported o

Hi! The output <div class="snippet-clipboard-content notranslate

Hi! The output <div class="snippet-clipboard-conten

Training estimation report about gama HOT 5 CLOSED

openml-labs commented on August 17, 2024

Training estimation report

from gama.

Comments (5)

PGijsbers commented on August 17, 2024

Hi!

The output

accuracy: 0.951048951048951
log loss: 0.1111237013184977

from this example is given by the last two lines of the corresponding script:

print("accuracy:", accuracy_score(y_test, label_predictions))
print("log loss:", log_loss(y_test, probability_predictions))

In the example, the reported performance is a test performance, not a training performance, as the test set was held out during training.

Does that answer the question?

from gama.

iXanthos commented on August 17, 2024

Hi!

The output
accuracy: 0.951048951048951
log loss: 0.1111237013184977
from this example is given by the last two lines of the corresponding script:
print("accuracy:", accuracy_score(y_test, label_predictions))
print("log loss:", log_loss(y_test, probability_predictions))
In the example, the reported performance is a test performance, not a training performance, as the test set was held out during training.

Does that answer the question?

ah, yes, I am sorry for that, I got confused. I meant the training performance, not the test performance.

Thank you.

from gama.

PGijsbers commented on August 17, 2024

Sorry for the slow replies. To get the training score, just all use score with the training data: automl.score(X_train, y_train). If you want to access the 5-fold cross validation score that was found during the search process, it's a little bit more tricky (making it more easily accessible is in the works), assuming you have the (default) BestFit postprocessing:

max(map(lambda ind: ind.fitness.values, automl._final_pop))

should output a (score, length) tuple, e.g. (-0.0908208740423977, -2), where the first value is the score for the given metric. The score is negated if the metric was a loss, because internally the scorers follow scikit-learns conversion to always ensure "bigger is better".
I hope this answers the question!

from gama.

iXanthos commented on August 17, 2024

Ah, I see, great answer. Yes, the cv score is what I was after, as this is the training performance estimation. What is the length in the outputted tuple and how is it negative? Also, I came across an error when using automl.score(), should I post this here or start a new issue?

Thank you for your help

from gama.

PGijsbers commented on August 17, 2024

Please open a new issue with a minimal working example included 👍
Length represents the length of the pipeline in number of steps (not including imputation and categorical encoding). Internally all scores are represented s.t. they can be maximized, which is why metrics that should be minimized (loss, length) are negated.

from gama.

Recommend Projects

Training estimation report about gama HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent