openml / openml Goto Github PK

Open Machine Learning

License: BSD 3-Clause "New" or "Revised" License

CSS 33.25% JavaScript 29.39% Processing 1.55% PHP 35.34% HTML 0.06% Shell 0.05% Hack 0.33% Batchfile 0.01% Python 0.02% Dockerfile 0.01%

machine-learning open-science science citizen-scientists collaboration opendata datasets hacktoberfest

openml's People

Contributors

Stargazers

Watchers

Forkers

schevalier mehdijamali rm3l vkramanuj jaksmid barrygolden stevenlol zardaloop anukat2015 rachelmsi sevenihust luwangbear strategist922 artklochkov tguillemot mutual-ai feilong0309 benjamesbabala xsongx kristinmcleod claudioapose jankneumann arlindkadra nodechef lawrennd ducnguyen77 muhabdulhaq zerojuls clustersdata cosmologist10 till-tomorrow codeaudit mmbabol longhua8800w amueller diehumblex dursunkoc rquintino georggr sfinxcz propixel-prc kumarveer chkoar mardillu ledell nanaakwasiabayieboateng rajasekharponakala liangtianumich mhpaler kavap xeransis adsbb alenwon ji-zhang evangelosdaniil eggachecat nadiaoliveira emailhy bin2000 thomascherickal liuweiping2020 passysosysmas cognitojayant 07avaz07 shawndegroot qiaoxingli ltdaovn mahamarif meissnereric ondrocks solversa mkim2001 pu55yf3r nolll77 nurur akashmavle5 johnfelipe marcingrze yogeshchaudhary7 mitagr andy-brainome risecai 4ai-vault genostack itsomsarraf decentralised-ai labxr eeinz sujaanr fenix0817

openml's Issues

Do not used reserved names

In some of the tables reserved names like "class" and "type" are being used. This is a bad practice because it can cause some conflicts in some development languages. Ruby for example where the word class in the "Algorithm" table overrides the default ruby "Class".

My request is renaming of the columns "class" to something like "algorithm_class" and the columns type (tables: cvrun, task_type_estimation_procedure, math_function, experiment_variable, queries, task_type_prediction_feature) to something else.

Error 207

Hi,

I'm having trouble uploading a run. I keep on getting error 207:
"File upload failed. One of the files uploaded has a problem"

Is it possible to provide more information? E.g., which file has a problem or even what the problem is. I think it's the output_files (in my case, a single .arff. Which format is expected?)...

Thanks in advance,
Dominik

Old / broken data sets

These should probably be removed very soon. I think they are also flagged as "original".

                                                  name NumberOfFeatures NumberOfInstances
28                        cl2_view2_combined_and_view3                0                 0
30                                     cl2_view3_names                0                 0
35                        cl3_view2_combined_and_view3                0                 0
37                                           cl3_view3                0                 0
53 CoEPrA-2006_Classification_001_Calibration_Peptides                0                 0
55 CoEPrA-2006_Classification_002_Calibration_Peptides                0                 0
57 CoEPrA-2006_Classification_003_Calibration_Peptides                0                 0

Collect web interface example queries and use cases

The already emailed example queries for the web interface should be collected in the wiki and also extended! We can only design the interface in a good way, if we collect reasonable things that people want to do with it.

Empty data sets?

There are 43 data sets (for isOriginal = 'true') that have neither features nor instances. What's up with them? They have names like these:

cl1_view2, cl1_view2_combined, cl1_view2_combined_and_raw_data, cl1_view2_combined_and_view3, CoEPrA-2006_Classification_001_Calibration_Data, ...

Can they be deleted or labeled as "not original"?

There is another data set without features and instances called "eucalyptus". Well, at least this is what the server tells me.

Evaluation measures: undocumented / unclear

http://expdb.cs.kuleuven.be/expdb/api/?f=openml.evaluation.measures

For quite a lot of the measures it is unclear what they mean exactly, or it is, but it does not make sense to ask the client to optimize them in a task.

Examples:
a) How is kohavi_wolpert_bias_squared defined exactly?

b) Ho is the client supposed to optimize for "confusion_matrix"?

Solution:

Document measures, at least by providing links to definitions.
Remove the ones which do not make sense or explain to the user why they do.

Server-side versioning

Right now, we have user-defined versioning for datasets and implementations, which means that users have to keep track of versions and have to select/invent a versioning system which will lead to a variety of versioning schemes on the server.

It would be better if OpenML could take care of versioning.

We can then remove the version field altogether. The user just provides a name for his dataset/implementation, the server then checks if that name exists, and if not, assigns version number 1 and stores a hash computed on the uploaded code. If the dataset/implementation is uploaded again, and the hash has changed, a new version number is assigned and the new id is returned.

Comments, please :)

Extend output for classification/regression task: models

Allow to return a model built on the input data. This is useful for people actually interested in what is hidden in the input data. We don't want to force people to use PMML, so a model can be anything, such as a WEKA model file (.model) or an R data object (binary). Ideally, we can catalogue commonly used model formats (i.e. 'Weka model', 'R data object', ...) and describe then on the webpage, so that people know what to do with these model files.

I would propose to make this an option output for the classification/regression task, thus:
'model' -> POSTed file with the model.
'model_format' -> a string with the model format. Can be free text, people can add a description afterwards on the website.

DB schema pic on openml.org / SQL page is outdated

You can see yourself, e.g., tasks dont even appear in it.

Strange tasks "weka.RemovePercentage"

When I am searching for tasks I see these:

iris-weka.RemovePercentage-P:20

What are they? Should they be removed?

Best,

Bernd

example data set desc contains invalid upload_date

R cannot parse this, fix pls

http://expdb.cs.kuleuven.be/expdb/api/?f=openml.data.description&data_id=61

0000-00-00 00:00:00

Feature request: add 'forgot my password' link to login screen

See subject ;)

Example for performance measures not correct

See:
http://expdb.cs.kuleuven.be/expdb/api/index.php#openml_evaluation_measures

Naming format is wrong here, measure names should be lower case.

Also: Why have this twice anyway? Probably best to remove the example output, just provide a link to the api call, this gives all needed info.

Also: area_under_ROC_curve
should probably be area_under_roc_curve

Include tasks in dataset and implementation details

Currently, all results shown in both the implementation and dataset detail pages implicitly belong to the 'Supervised classification' task with 10x10 CV. It would be good to show that.

Maybe we should add a dropdown box showing the different tasks for which results can be returned? It is possible that the same dataset is used in more than one task.

Extension: record selected features with run upload

Bernd mentioned he would like to store the features selected in a run.

I would like to start a discussion about how to do this.

Is this one set of selected features obtained when running the method over all data? Or do we want to record this for every fold/repeat?
Can this be an extra optional output file, or do we want this in the run upload xml file?

Thanks,
Joaquin

data splits for task id = 1 are not valid

should be 2 reps of 10CV but is:

, , = TRAIN

  1   2   3   4   5   6   7   8   9  10

1 142 135 126 135 135 135 135 135 135 135
2 11 135 126 135 135 135 135 135 135 270

, , = TEST

  1   2   3   4   5   6   7   8   9  10

1 15 15 15 15 15 15 15 15 15 15
2 2 15 13 15 15 15 15 15 15 30

Issues/Questions about the stored implementations

Hi,

I just downloaded all implementations that are stored on the server at the moment. Therefore, I made an SQL-query and downloaded a .csv-table with names and versions of all implementations. Here are some issues/questions:

All implementations can be downloaded when id = "name(version)" is transmitted to the server, except for "weka.CfsSubsetEval(1.28)". Why's that? There are no problems using the "real", numerical ID though.
The following implementations contain the character "<" in some of the descriptions in the XML-document, so it cannot be parsed:
"weka.BVDecomposeSegCVSub(1.7)", "weka.RandomForest(1.6)", "weka.RandomTree(1.8.2.2)", "weka.classifiers.functions.LibLINEAR(5917)", "weka.classifiers.meta.RandomCommittee(1.13)", "weka.classifiers.meta.RandomSubSpace(1.4)", "weka.classifiers.trees.RandomForest(1.13)", "weka.classifiers.trees.RandomTree(5536)".
There's an empty implementation as a sub-component of "J48-forest(1.0)". I uploaded this myself, so it was probably my mistake, but shouldn't we check if an implementation is missing a non-optional slot?

The second point is obviously the most problematic one. Should it be forbidden to use "<" and ">" or are there possibilities to parse an XML that contatins these in its contents?

Handling components of implementations

Hi all,

We're working on the WEKA-plugin and had the following question: Say you have an ensemble method, such as Bagging, and a base-learner like a decision tree.

It is currently possible to store this either as:

An implementation Bagging_J48 with parameters belonging to Bagging and J48
An implementation Bagging with a string value representing the component, e.g. "W=weka.classifiers.J48 - M=2"

I believe KNIME and Rapidminer would store these as separate subcomponents of the workflow. How are things currently handled in R? Do you use option 1 or 2?

I have a slight preference for the first method, mainly because it becomes easier to compare implementations (e.g. Bagging_J48 vs Bagging_OneR), even between environments (weka.Bagging_J48 vs KNIME.Bagging_J48_workflow), and to track the effect of parameters: I can track the effect of a J48 parameter easily without having to interpret strings.

This is indeed harder for us to implement because WEKA is kind of quirky in this area, but overall I think it makes things easier and more comparable.

Thanks,
Joaquin

Better Dataset search and organisation

Make Datasets searchable for

2 Class Classfication / Multiclass Classification / Regression
n (NumberOfInstances)
Baseline Accuracy (DefaultAccuracy)
...

And / or provide a table with the most essential data features for each available dataset. This would also be my prefered overview when clicking on Search -> Datasets.

Wrong link

Download an implementation

"The implementation is returned by the server hosting it. This can be OpenML, but also any other code repository. Try it now"

"Try it now" links to http://expdb.cs.kuleuven.be/expdb/data/uci/nominal/anneal.arff
which is a data set.

Uploading the same run multiple times is possible

Hi,

during a test today I simply uploaded the same run (exactly the same object) three times and this was possible.

Do we really want this? I did not think this thru currently, mainly posting this as a question. But this is in 99% of cases a user error that we should catch I would suggest...

Cannot parse XML of data set description

Hi,

the problem seems to be the comments in tag oml:description.
Probably because there can be any kind of weird chars in there - and apparently there already are.

R does not parse the whole XML at all, but tells me:

xmlParseEntityRef: no name
xmlParseEntityRef: no name
xmlParseEntityRef: no name
Error: 1: xmlParseEntityRef: no name

If I nearly completely remove the contents of oml:description I can parse again, so the problem is definitely located there.

Any ideas?

Task ID = 6 / DSD has no valid upload datae

Data set description contains:

oml:upload_date0000-00-00 00:00:00/oml:upload_date

In R this produces:

Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format

Discovered while unit testing all tasks.

JSON output of data qualities.

In the JSON output of the data qualities, no type information of the columns is given, when we directly query thru API / SQL.

Every columns has an undefined type and every value is encoded as a string, even if it is a number.

Can this be corrected?
Currently we use a trick in R so we do not have to convert manually.

Here is the API call

"http://www.openml.org/api_query/?q=SELECT%20d.name%20AS%20dataset,%20MAX(IF(dq.quality='NumberOfFeatures',%20dq.value,%20NULL))%20AS%20NumberOfFeatures,MAX(IF(dq.quality='NumberOfInstances',%20dq.value,%20NULL))%20AS%20NumberOfInstances,MAX(IF(dq.quality='NumberOfClasses',%20dq.value,%20NULL))%20AS%20NumberOfClasses,MAX(IF(dq.quality='MajorityClassSize',%20dq.value,%20NULL))%20AS%20MajorityClassSize,MAX(IF(dq.quality='MinorityClassSize',%20dq.value,%20NULL))%20AS%20MinorityClassSize,MAX(IF(dq.quality='NumberOfInstancesWithMissingValues',%20dq.value,%20NULL))%20AS%20NumberOfInstancesWithMissingValues,MAX(IF(dq.quality='NumberOfMissingValues',%20dq.value,%20NULL))%20AS%20NumberOfMissingValues,MAX(IF(dq.quality='NumberOfNumericFeatures',%20dq.value,%20NULL))%20AS%20NumberOfNumericFeatures,MAX(IF(dq.quality='NumberOfSymbolicFeatures',%20dq.value,%20NULL))%20AS%20NumberOfSymbolicFeatures%20FROM%20dataset%20d,%20data_quality%20dq%20WHERE%20d.did%20=%20dq.data%20AND%20d.isOriginal%20=%20'true'%20GROUP%20BY%20dataset"

Provide a better overview for tasks and data sets

I want to see how many observations, features, types of features, NAs and so on are in a data set, so I can choose the correct sets for my study.

I also want to query that table in R to "compute" on it.

implementation schema vs. current implementations

Every implementation needs a name, a version and a description, but there are many implementations that do not contain all of these (most have name = version = ""). I only checked the first few algorithms, however.

Additionally, the implementation "weka.AODE(1.8.2.3)" is not parseable.

Suggested interface for implementations

Implementations can currently be uploaded in many different ways. While this makes it easier for users to upload implementations, it makes it harder for other users to download and use those implementations. Hence, it would be good to define an interface for uploaded implementations that is simple enough for uploaders to provide, and that will allow downloaders to easily run the algorithm. It also allows us to provide further services on OpenML, such as automatically running implementations on the server.

We won't enforce this interface, but suggest it as a 'best practice', and state it as a prerequisite for more advanced OpenML services. We should adhere to it for our own plugins and provide clear examples for users to look at.

As usual, in what follows an implementation can be an script, program or workflow depending on its environment.

The interface:

An implementation should accept an 'OpenML task object' as an input (next to other inputs/parameters)
The implementation should return at least the outputs expected by the task type.
The implementation does not need to communicate with the OpenML server.
It will be used within an environment that constructs the task object (e.g., from a task_id), handles the outputs, and communicates with the OpenML API. This environment is typically a plugin. We will also provide standalone libraries for this (for Java, R, Python,...).
Optional(?): An implementation should be specific enough, i.e. don't write an implementation that wraps all of WEKA (e.g. takes an algorithm name as a parameter), unless of course you do some internal algorithm selection.

For the common case of running well-known library algorithms, an implementation will be a wrapper/adapter that handles the conversion from an OpenML task to the required inputs for the library algorithm and interprets its (intermediate) outputs to produce the expected outputs.

I believe it is also best that the implementation description lists the task_types that it supports. Bernd also previously suggested that implementations report which types of data they can/cannot handle.

Comments, please :)

CamelCase evaluation measures

Evaluation measures in ExpDB are CamelCase. Should become lower_case.

Display implementation / run uploader

Hi,

Search - Datasets - select one - select a run / impl

If you click on "General information" of the implementation it would be nice to see the uploader displayed.

Yes, minor point for now.

Might probably be relevant of similar displays for other objects as well?

Uploading of implementations with neither source nor binary file

When thinking about uploading our first experiments, I noticed that sometimes I maybe do not want to upload either a source file or a binary file.

This mainly concerns applying "standard methods" from libraries. E.g., when I apply the libsvm implementation in the R package e1071, I only need to know the package name and the version number. Uploading the package itself (in binary or source form) makes no sense, this is hosted on the official CRAN R package server.

I could upload a very short code that uses this package and produces the desired predictions. Actually there are a few more subtle questions involed here and it might be easier to discuss them briefly on Skype, I would like to hear your opinions on this.

The question basically is, how much we want to enable users that download implementations to rerun the experiments in a convenient fashion.

anneal inconsistencies / representation of missing values

The data set description seems to be wrong. E.g., it says there are 798 instances but the data set has 898 rows.

Here you can find the same inconsistencies:
http://mldata.org/repository/data/viewslug/datasets-uci-anneal/
(tabs "summary" vs. "data")

I think this is what Bernd meant when he said someone should check all the data sets. Actually, the correctness of the data characteristics is way more important than the description. Let's check it:

[...]
NumberOfInstancesWithMissingValues: 0  
NumberOfMissingValues: 0
[...]

This is obviously wrong. I think we have to add a slot in the data set description for how missing values are signified. Also, the server should transform them into the desired representation (e.g., "NA") before computing the data qualities.

More 404s

http://openml.org/learn
Sharing a run

Both links in: (Response / XSD Schema)

Returned file: Response

The response file returned implementation description file depending on the task type. For supervised classification, the API will also compute evaluation measures based on uploaded predictions, and return them as part of the response file. See the XSD Schema for details.

Parameter optimization

Currently, an uploaded result could be the result of running an implementation with default parameters, running an implementation that does internal parameter optimisation, running an implementation many times in a parameter sweep, or running an implementation with 'magically optimised' parameters.

When ranking implementations based on their evaluations, an unfair advantage will be given to parameter sweeps (data leakage).

Thus, it has been suggested that, during upload, users should flag the run with one of the following cases:

default parameters
parameter sweep
optimized parameters

With the latter, a short notice should indicate that this optimization must have been done internally using only the training set(s).

I do think that, even with default parameter settings, the parameter settings should be uploaded with the run.

Comments, please :).

Potential problem / question regarding impl. ids

They way I understand it:
Impl ID = name + version (Both user chosen)

When uploading, the server tells me, whether this combo is already in use and therefore not possible.

Could we please specify somewhere in the docs what chars are actually allowed for id and version? Do we really allow:
name = "Jörg's cool algorithm^2" ?

Uploading implementations

Some clarification on how we are reimplementing code uploads/checks:

There will be 2 API calls:

'implementation.upload' (exists)
This call has as a required argument POST description: an XML file containing the implementation meta data. Currently, this XML file contains a field 'version', but this was ignored at upload time. The reason for this was that we don't want to force the user to provide a version number. Therefore, the server would pick a version number (1,2,3,...).
However, it often make sense for users to include some kind of versioning. For instance, if I maintain my code at GitHub I may want to add the version hash so I can revisit the code as it was at the time of upload.

Therefore, we will do the following. The description XML will have the following fields:

library (optional): the name of the library/plugin, e.g. 'WEKA', 'mlr', 'rweka', ...
name (required): the name of the implementation. User can choose freely.
version (optional): a user-defined version number, e.g. a github hash, weka versioning, a hash calculated by the plugin,...

Plugins can decide freely how to handle this. If there is a good versioning system already, use that, if not, maybe take a hash of the source code. As long as changes to the code correspond to changes in the version number.

What will happen is that the server will store this info, and then associate a 'label' to each upload (1,2,3,...) linked one-to-one to the user version number/hash. This label is merely aesthetic: in the web interface, you will see both the upload counter as the user-defined version number/hash value. If no version number is given, the server will compute a hash based on the uploaded code. The library-name-version combo will be linked to a unique implementation id.

If you try to upload an implementation with the same library, name, version the server will say that there already is an implementation with those keys, and return the id.

When you want to check what the id is of an already uploaded implementation, there will be the following api:

'implementation.getid' (or implementation.check')
arguments:
GET library_name
GET implementation_name
GET version (user-defined version number, e.g. the hash value)

Based on that info, the server will return the corresponding implementation id. If no match can be found, it will tell you that that implementation is unknown.

Sounds ok?

Cheers,
Joaquin

Provide different examples of all XMLs / ARFFs to parse

Client programmers want / should check their parsers through unit tests with different examples for task.xml, dataset.xml and so on.

Therefore, the server needs to provide examples of different complexity for each of these.

Best is probably to have the server already provide them trough the standard API calls and for now just tell the client programmers how to access them. We might reserve special IDs for this "testing calls" for now, e.g.

(???) task_id = 100001 to 100005 are examples to test tasks for now (???)

Uploading implementations

I have a problem uploading implementations to the server.

I downloaded weka.AODE(1.8.2.3) from the server, changed name and version in the XML file and tried to upload it agan. This doesn't work yet, I always get this error:

"Problem validating uploaded description file
XML does not correspond to XSD schema."

The XML looks like this now:

< oml:implementation xmlns:oml="http://openml.org/openml" >
< oml:name >testestest< /oml:name >< oml:version >1.0< /oml:version >< oml:description >test< /oml:description >< /oml:implementation >

What is wrong?

Web definition of algorithms must allow parameter definitions

We need to be able to at least specify:

Parameter name
Parameter data type

Bonus points (need not be done at once)
Simple constraints like box-constraints

CSV export of SQL query results does not work

My CSV file is always empty, just run any query.
I tried:
select * from implementation

Check column names for special characters

Hey,

I discovered a few data sets that have special characters (";", "?", ...) or spaces in some of their column names. Some of them just start with a number, which is also not okay with R.
It would be great if the server could check for those problems and resolve them somehow.

Thanks in advance,
Dominik

new tasks?

I'm a bit confused. Task 1 used to be based on the Iris data set. Now it's annealing? Did you guys change the tasks? So,... the results that are provided by the new API call (http://www.openml.org/api/?f=openml.task.results&task_id=1) don't belong to the displayed data descriptions, right?

API call missing? How to get performance metrics / predictions for a run?

For a given task, would like to get (on the client)

a) What runs / implemenations are available?

b) What are their performances metric values?

c) Get the complete prediction. Would be sufficient for just a selected implementation / run, because I could always loop thru this.

Task search does not work

Tried to list all tasks on openml.org

Search -> Tasks -> Supervised Classif

Hit Search to list all tasks: Server error
I then typed "iris" in "Datasets": Server error

Extend output for classification/regression task: parameters

It should be clear from an uploaded run how parameters were chosen. We previously agreed on the following three cases:

manual parameter settings (typically these are sensible defaults)
parameter sweep (try many settings in some experimental design)
optimized parameters (parameters are optimized internally by some algorithm)

We should add a field/flag to report this, e.g.
parameter_setting_type = [manual, sweep, optimized]

In cases 1 and 2, the parameter settings should be uploaded with the run. This is already supported.

In case 3, the optimized parameters are fold/repeat specific, and should thus be added to the predictions file. This can simply be an extra column in the predictions arff file. I propose a simple key-value format, maybe json, that can then be stored as a string:
{"parameter_name_1":0.4, "parameter_name_2":123}

We can thus extend the classification/regression task with the following:

a parameter_setting_type string field, see above
an extra optional column in the prediction arff file

Provide large scale data sets

For at least 1-2 projects I would like to have larger data sets on OpenML.
So with more than 10K-50K observations.

Some are available here:

http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
http://mldata.org/
http://www.cs.ubc.ca/labs/beta/Projects/autoweka/datasets/

Issue: We might need to support another data format, especially w.r.t. sparse data.

There is HDF5.

There is also a converter:
http://mldata.org/about/hdf5/

Can you look into the general data format issue server-wise, than we can upload some data sets?

General java lib for OpenML

This is more of a very general design question.

Would it make sense to have a general OpenML Java base lib, which contains all the common objects as Java classes and offers common functionality like downloading, parsing and uploading?

This would make very simple for next guy to connect another Java-based toolkit to OpenMl.

Or do you guys already do that?

factor variables with only 1 level/distinct value

In some data sets there are factor variables that have only one level. Sometimes there are two or more levels but all examples belong to the same level. I'm not quite sure where we should fix this. For machine learning, such a factor is useless and might lead to errors. Either the server deletes those factors or we do it locally. What do you think is better?

implementation schema(s)

There are 2 implementation schemas.

a) https://raw.github.com/openml/OpenML/master/XML/Schemas/implementation_upload.xsd

b)
https://raw.github.com/openml/OpenML/master/XML/Schemas/implementation.xsd

I understand that a) is uploaded by the user, b) is returned when you ask for it on the server to get it.

The problem is that both share about 90% of their xml fields, but the schemas are already not the same. Could they be made consistent?

Also we noticed this:
<xs:element name="version" minOccurs="0" type="xs:string"/>

minOccurrs = 0 is wrong, isnt it?

OpenML 1.0

Things are coming together nicely, but there are also many new things planned. Bernd suggested we define what features should be in a 1.0 version, and finish that as soon as possible, making sure it works so that we can really start spreading the word.

I'm just making a list here, most of which is already done. Feel free to add/remove. Paraphrasing Linus Torvalds, 'suggestions are welcome, but we won't promise we'll implement them :-)'.

Website

Search, overview of tasks and task types, datasets, code, runs.
Pages with all details on individual dataset, tasks, code, with discussion fields.
Basic visualization of query results.
Uploading of new datasets and code, including by url.
Details for developers. Tutorial for new users.
Ability to filter datasets on properties.

Task types

Supervised classification and regression
Page on website where the requirements/options are listed in human-readable form and can be discussed (new requests, what is implemented?)

Datasets

Support for ARFF: computation of dataset properties and generation of tasks (train/test folds)
Basic check on dataset upload: feature name characters, other checks?
Pull in ARFF datasets from uci, mldata, others?

REST API (documented)

Search tasks, datasets, code
Upload datasets, code, run
Download dataset, code, task
Free SQL query

Plugins

SDK for Java, R: interface for interacting with the server from these languages
Plugin in WEKA
Plugin in R (at least mlr)
Optional?: Allow to search for/ list tasks from within plugins (not just by entering id)

Content

Start of 'new' database containing only runs on tasks. Old database will be available from another server.
Initial sweep of experiments with WEKA
Initial sweep of experiments with R

Dead links in online docs

https://github.com/openml/OpenML

Then click:

Service: openml.authenticate

Service: openml.data.upload

There are probably a few more!