openml / evaluationengine Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 6.0 21.27 MB

Sources of the Java Evaluation Engine

Java 100.00%

evaluationengine's Introduction

OpenML: Open Machine Learning

Welcome to the OpenML GitHub page! 🎉

Contents:

Who are we?
What is OpenML?
Get involved

Who are we?

We are a group of people who are excited about open science, open data and machine learning. We want to make machine learning and data analysis simple, accessible, collaborative and open with an optimal division of labour between computers and humans.

What is OpenML?

Want to learn about OpenML or get involved? Please do and get in touch in case of questions or comments! 📨

Getting started:
- Check out the OpenML Website to get a first impression of what OpenML is
- The OpenML Documentation page gives an introduction in details and features, as well as
- OpenML's different APIs and integrations so that everyone can work with their favorite tool.
How to contribute: https://github.com/openml/OpenML/blob/master/CONTRIBUTING.md
Citation and Honor Code: https://www.openml.org/terms
Communication / Contact: https://github.com/openml/OpenML/wiki/Communication-Channels

OpenML is an online machine learning platform for sharing and organizing data, machine learning algorithms and experiments. It is designed to create a frictionless, networked ecosystem, that you can readily integrate into your existing processes/code/environments, allowing people all over the world to collaborate and build directly on each other’s latest ideas, data and results, irrespective of the tools and infrastructure they happen to use.

As an open science platform, OpenML provides important benefits for the science community and beyond.

Benefits for Science

Many sciences have made significant breakthroughs by adopting online tools that help organizing, structuring and analyzing scientific data online. Indeed, any shared idea, question, observation or tool may be noticed by someone who has just the right expertise to spark new ideas, answer open questions, reinterpret observations or reuse data and tools in unexpected new ways. Therefore, sharing research results and collaborating online as a (possibly cross-disciplinary) team enables scientists to quickly build on and extend the results of others, fostering new discoveries.

Moreover, ever larger studies become feasible as a lot of data are already available. Questions such as “Which hyperparameter is important to tune?”, “Which is the best known workflow for analyzing this data set?” or “Which data sets are similar in structure to my own?” can be answered in minutes by reusing prior experiments, instead of spending days setting up and running new experiments.

Benefits for Scientists

Scientists can also benefit personally from using OpenML. For example, they can save time, because OpenML assists in many routine and tedious duties: finding data sets, tasks, flows and prior results, setting up experiments and organizing all experiments for further analysis. Moreover, new experiments are immediately compared to the state of the art without always having to rerun other people’s experiments.

Another benefit is that linking one’s results to those of others has a large potential for new discoveries (see, for instance, Feurer et al. 2015; Post et al. 2016; Probst et al. 2017), leading to more publications and more collaboration with other scientists all over the world.

Finally, OpenML can help scientists to reinforce their reputation by making their work (published or not) visible to a wide group of people and by showing how often one’s data, code and experiments are downloaded or reused in the experiments of others.

Benefits for Society

OpenML also provides a useful learning and working environment for students, citizen scientists and practitioners. Students and citizen scientist can easily explore the state of the art and work together with top minds by contributing their own algorithms and experiments. Teachers can challenge their students by letting them compete on OpenML tasks or by reusing OpenML data in assignments. Finally, machine learning practitioners can explore and reuse the best solutions for specific analysis problems, interact with the scientific community or efficiently try out many possible approaches.

Get involved

OpenML has grown into quite a big project. We could use many more hands to help us out 🔧.

You want to contribute?: Awesome! Check out our wiki page on how to contribute or get in touch. There may be unexpected ways for how you could help. We are open for any ideas.
You want to support us financially?: YES! Getting funding through conventional channels is very competitive, and we are happy about every small contribution. Please send an email to [email protected]!

GitHub organization structure

OpenML's code distrubuted over different repositories to simplify development. Please see their individual readme's and issue trackers of you like to contribute. These are the most important ones:

openml/OpenML: The OpenML web application, including the REST API.
openml/openml-python: The Python API, to talk to OpenML from Python scripts (including scikit-learn).
openml/openml-r: The R API, to talk to OpenML from R scripts (inclusing mlr).
openml/java: The Java API, to talk to OpenML from Java scripts.
openml/openml-weka: The WEKA plugin, to talk to OpenML from the WEKA toolbox.

evaluationengine's People

Contributors

Stargazers

Watchers

Forkers

mireillebf seanixxx slaksh canakgol doraithodlakct mrsolo201

evaluationengine's Issues

AutoCorrelation meta-feature: order of instances

I am looking into how AutoCorrelation is computed. And it looks like order of instances is important for this meta-feature. Isn't this a bit strange? Order of instances is really something which changes often. Or is this useful for time-based datasets? But shouldn't then we first order by time column?

Compute feature distributions for nominal attributes

Currently the website shows no information on nominal attributes in a dataset with a numeric target.

See: https://www.openml.org/d/41022
Season, Series,... have no information om their distribution.

Looking at the code, we could extend models.AttributeStatistics
with a new function that returns something of the form
[[v1,v2,v3],[123],[234],[354]], i.e. the list of possible values and their corresponding counts.
That way they are in the same format as the class distributions.

Of course, something like [[v1,v2,v3],[123,234,354]] is also fine.

Still -1 values for 'unavailable' meta features

These should be null values

Add MCC to evaluation measures for classification

As requested in openml/OpenML#191

I received a request to add the Matthews correlation coefficient to the evaluation engine for classification. It can be straightforwardly derived from the confusion table: MCC = (TP * TN – FP * FN)/sqrt((TP+FP) (TP+FN) (FP+TN) (TN+FN)).

I propose we check if it is available in Weka, in that case it should be easy to add.

regression datasets handled as classification datasets

Please check the following:
7be41b0

numValues use to be 0 for regression. It is now 1?

This trips up the AttributeStatistics, which wants to compute the class distribution of a feature if numClasses > 0.

https://github.com/openml/EvaluationEngine/blob/master/src/main/java/org/openml/webapplication/models/AttributeStatistics.java#L56

The result is that regression datasets aren't parsed correctly anymore, e.g.:

[19-06-2019 23:30:14] [OK] [Process Dataset] Processing dataset 41936 - obtaining features.
java.lang.ArrayIndexOutOfBoundsException: Index 3600 out of bounds for length 1
	at org.openml.webapplication.models.AttributeStatistics.addValue(AttributeStatistics.java:72)
	at org.openml.webapplication.features.ExtractFeatures.getFeatures(ExtractFeatures.java:79)
	at org.openml.webapplication.ProcessDataset.process(ProcessDataset.java:55)
	at org.openml.webapplication.ProcessDataset.<init>(ProcessDataset.java:32)
	at org.openml.webapplication.Main.main(Main.java:115)

Unclear how to create the evaluation engine jar file for the server

I tried exporting the Evaluation Engine to a jar (including all required libraries), but now I get xstream errors when trying to evaluate a run. What am I doing wrong?

[30-09-2018 00:49:42] [OK] [Process Run] Start processing run: 24360
com.thoughtworks.xstream.converters.ConversionException: name : name
---- Debugging information ----
message             : name
cause-exception     : java.lang.IllegalArgumentException
cause-message       : name
class               : org.openml.apiconnector.xml.Run
required-type       : org.openml.apiconnector.xml.Run
converter-type      : com.thoughtworks.xstream.converters.reflection.ReflectionConverter
path                : /oml:run/oml:uploader_name
version             : not available
-------------------------------
	at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:79)
	at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:65)
	at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66)
	at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:50)
	at com.thoughtworks.xstream.core.TreeUnmarshaller.start(TreeUnmarshaller.java:134)
	at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.unmarshal(AbstractTreeMarshallingStrategy.java:32)
	at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1185)
	at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1169)
	at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1040)
	at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1031)
	at org.openml.apiconnector.io.HttpConnector.wrapHttpResponse(HttpConnector.java:116)
	at org.openml.apiconnector.io.HttpConnector.doApiRequest(HttpConnector.java:85)
        ...

Caused by: java.lang.IllegalArgumentException: name
	at sun.misc.URLClassPath$Loader.getResource(URLClassPath.java:729)
	at sun.misc.URLClassPath.getResource(URLClassPath.java:239)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:365)
        ...

{"error":"name : name
---- Debugging information ----
message             : name
cause-exception     : java.lang.IllegalArgumentException
cause-message       : name
class               : org.openml.apiconnector.xml.Run
required-type       : org.openml.apiconnector.xml.Run
converter-type      : com.thoughtworks.xstream.converters.reflection.ReflectionConverter
path                : /oml:run/oml:uploader_name
version             : not available
-------------------------------"}

OpenML Feature mismatch data

The OpenML Dataset Features have their nominal_values normalized, while the data itself is not:

import openml
dresses = openml.datasets.get_dataset(23381)
df, *_ = dresses.get_data(dataset_format="dataframe")
print(d.features[1].nominal_values)
print(df[d.features[1].name].unique())

output:

['Average', 'high', 'low', 'Medium', 'very-high']

['Low', 'High', 'Average', 'Medium', 'very-high', 'low', 'high', NaN]
Categories (7, object): ['Average' < 'high' < 'High' < 'low' < 'Low' < 'Medium' < 'very-high']

edit: This is a problem already in the XML as @mfeurer indicates below.

confusion matrix makes no sense as evaluation measure

There's "confusion matrix" in the evaluation measure drop-down on the frontend. I'm not sure if that's a frontend issue or a backend issue. This makes no sense in the context of the leaderboard and is not computed anyway.

NullpointerExceptions in dataset processing when target cannot be found.

Some datasets fail during processing. They all fail with a NullpointerException that occurs while setting the target feature.

Checking the datasets (uploaded by Guillaume), they have a feature such as:
@Attribute "Is Public Domain" {True, False}
while in the database this is stored as 'Is_Public_Domain'.
I assume the underscores are added during dataset upload.

What is more troubling is that there seems to be no trace of this error. The message says that the error is marked in the database but I could not find it. I assume it tries to store an error message but that fails, too?

[23-12-2018 23:23:43] [OK] [Process Dataset] Processing dataset 41249 - obtaining features.
java.lang.NullPointerException
	at weka.core.Instances.setClass(Instances.java:1532)
	at org.openml.webapplication.features.ExtractFeatures.getFeatures(ExtractFeatures.java:42)
	at org.openml.webapplication.ProcessDataset.process(ProcessDataset.java:65)
	at org.openml.webapplication.ProcessDataset.<init>(ProcessDataset.java:41)
	at org.openml.webapplication.Main.main(Main.java:120)
[23-12-2018 23:23:43] [Error] [Process Dataset] Error while processing dataset. Marking this in database.
[23-12-2018 23:23:43] [Error] [Process Dataset] Dataset 41249 - Error: null

Add logloss for classification.

take this task called "Supervised Classification on credit-g" as an example, https://www.openml.org/t/31

I just can not find Logloss metric

Evaluation Engine policy on datasets without a default target

Several datasets do not have a specific target. Also, multitask datasets do not have a single target, which complicates the calculation of meta-features such as: classcount, entropy, landmarkers and mean mutual information. Several things that we can do:

in case of no single/valid class, do not calculate these features
define meta-features on task level. We should do so anyway at some point. This does not solve the multitarget problem though
... ?

@mfeurer @amueller @joaquinvanschoren @berndbischl @giuseppec @ja-thomas

Qualities for data 4080 not computed

https://test.openml.org/api/v1/xml/data/qualities/4080

Not sure what the issue is, but this prevents us from creating a task on this dataset

meta-features store vector of numbers instead of aggregates

Currently we store (for numeric columns):

Mean X of numeric atts
Stdev of X of numeric atts
Quartile {1, 2, 3} of X of numeric atts
Min of X of numeric atts
Max of X of numeric atts

Where X = {mean, stdev, kurtosis, skewness}. Something similar for information theoretic measures of nominal atts.

This selection is arbitrary and not well supported in the literature.

Much better would be to store a vector of each value per attribute, giving the possibility to researchers to calculate these values client-side.

Evaluation measures duplicated or not present / no measure for imbalanced data available

Related: #20

Currently no measure is computed that's useful for highly imbalanced classes.
Take for example sick:
https://www.openml.org/t/3021

I would like to see the "mean" measures be computed in particular (they also are helpful for comparison with D3M, cc @joaquinvanschoren).

On the other hand, the "weighted" measures are not computed but seem to be duplicates of the measure without prefix, which is also weighted by class size:
https://www.openml.org/a/evaluation-measures/mean-weighted-f-measure
https://www.openml.org/a/evaluation-measures/f-measure

Though that's not entirely clear from the documentation. If the f-measure documentation is actually accurate (which I don't think it is), that would be worse because it's unclear for which class the f-measure is reported.

Compute histograms for numeric attributes

Currently the website shows a box plot for numeric attributes. This does not always look good, plus it hides a lot of information.

It would be better to store a histogram of the distribution. This can be computed beforehand.
I.e. Something like this: https://www.mathworks.com/help/examples/matlab/win64/AdjustHistogramPropertiesExample_01.png

For categorical targets we could also compute it per class value: https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2014/03/histograms.png

Looking at the code, we could extend models.AttributeStatistics
with a new function that returns something of the form
[[b1,b2,b3],[123],[234],[354]], where b1, b2 are the bucket values.

For categorical targets, we could compute something like
[[b1,b2,b3],[123,12,23],[234,23,34],[354,34,45]] for a 3-class dataset.

What do you think would be the best way to implement this?

Probably a version mismatch but it is not clear what to do exactly.

Kurtosis meta-feature

I have a feeling that it is calculated wrongly.