Coder Social home page Coder Social logo

openml-java's Introduction

License

OpenML: Open Machine Learning

Welcome to the OpenML GitHub page! 🎉

Contents:

Who are we?

We are a group of people who are excited about open science, open data and machine learning. We want to make machine learning and data analysis simple, accessible, collaborative and open with an optimal division of labour between computers and humans.

What is OpenML?

Want to learn about OpenML or get involved? Please do and get in touch in case of questions or comments! 📨

OpenML is an online machine learning platform for sharing and organizing data, machine learning algorithms and experiments. It is designed to create a frictionless, networked ecosystem, that you can readily integrate into your existing processes/code/environments, allowing people all over the world to collaborate and build directly on each other’s latest ideas, data and results, irrespective of the tools and infrastructure they happen to use.

As an open science platform, OpenML provides important benefits for the science community and beyond.

Benefits for Science

Many sciences have made significant breakthroughs by adopting online tools that help organizing, structuring and analyzing scientific data online. Indeed, any shared idea, question, observation or tool may be noticed by someone who has just the right expertise to spark new ideas, answer open questions, reinterpret observations or reuse data and tools in unexpected new ways. Therefore, sharing research results and collaborating online as a (possibly cross-disciplinary) team enables scientists to quickly build on and extend the results of others, fostering new discoveries.

Moreover, ever larger studies become feasible as a lot of data are already available. Questions such as “Which hyperparameter is important to tune?”, “Which is the best known workflow for analyzing this data set?” or “Which data sets are similar in structure to my own?” can be answered in minutes by reusing prior experiments, instead of spending days setting up and running new experiments.

Benefits for Scientists

Scientists can also benefit personally from using OpenML. For example, they can save time, because OpenML assists in many routine and tedious duties: finding data sets, tasks, flows and prior results, setting up experiments and organizing all experiments for further analysis. Moreover, new experiments are immediately compared to the state of the art without always having to rerun other people’s experiments.

Another benefit is that linking one’s results to those of others has a large potential for new discoveries (see, for instance, Feurer et al. 2015; Post et al. 2016; Probst et al. 2017), leading to more publications and more collaboration with other scientists all over the world.

Finally, OpenML can help scientists to reinforce their reputation by making their work (published or not) visible to a wide group of people and by showing how often one’s data, code and experiments are downloaded or reused in the experiments of others.

Benefits for Society

OpenML also provides a useful learning and working environment for students, citizen scientists and practitioners. Students and citizen scientist can easily explore the state of the art and work together with top minds by contributing their own algorithms and experiments. Teachers can challenge their students by letting them compete on OpenML tasks or by reusing OpenML data in assignments. Finally, machine learning practitioners can explore and reuse the best solutions for specific analysis problems, interact with the scientific community or efficiently try out many possible approaches.


Get involved

OpenML has grown into quite a big project. We could use many more hands to help us out 🔧.

  • You want to contribute?: Awesome! Check out our wiki page on how to contribute or get in touch. There may be unexpected ways for how you could help. We are open for any ideas.
  • You want to support us financially?: YES! Getting funding through conventional channels is very competitive, and we are happy about every small contribution. Please send an email to [email protected]!

GitHub organization structure

OpenML's code distrubuted over different repositories to simplify development. Please see their individual readme's and issue trackers of you like to contribute. These are the most important ones:

  • openml/OpenML: The OpenML web application, including the REST API.
  • openml/openml-python: The Python API, to talk to OpenML from Python scripts (including scikit-learn).
  • openml/openml-r: The R API, to talk to OpenML from R scripts (inclusing mlr).
  • openml/java: The Java API, to talk to OpenML from Java scripts.
  • openml/openml-weka: The WEKA plugin, to talk to OpenML from the WEKA toolbox.

openml-java's People

Contributors

arlindkadra avatar jaksmid avatar janvanrijn avatar joaquinvanschoren avatar mwever avatar williamraynaut avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openml-java's Issues

Improving dependencies in flows

From @DraXus on November 26, 2014 12:17

Currently, flow dependencies are limited to the software version. However, some of them also require additional dependencies. For example, flow 191 requires also the installation of multiBoostAB from the package manager.

Copied from original issue: openml/OpenML#161

ignore_attribute in DataSetDescription null.

The value of ignore_attribute in org.openml.apiconnector.xml.DataSetDescription is always null since the XML parser tries to access an item field name "ignore_attribute" instead of a tag named "oml:ignore_attribute".

apiconnector lib in Weka project differs from the actual version used

From @DraXus on December 6, 2014 12:18

I was having some errors compiling the Weka package. Apparently, the apiconnector.jar file provided in lib folder has a different version that the one used in the project.

I've generated a new jar file with the current code and it is working fine so far. I can update the jar file in the repository, although I'm not sure is the best way to mantain the compatibility between versions. What do you think?

Copied from original issue: openml/OpenML#163

Broken method

org.openml.apiconnector.xml.RunList.getRuns() returns null even if the list is well formed and contains runs

Changing "runs" to "run" at lines 50 & 53 seems to solve the issue. Just an Xstream typo ?

Add static code analysis to CI

Sonarcloud.io hosts SonarQube which is a static code analysis tool.

Using a plugin for travis and maven we could integrate our projects with sonarcloud.io to benefit from the code analysis. Sonarcloud.io helps in improving overall code quality, finding bugs (e.g. leaking resources), security critical issues, and code duplicates. Furthermore, it can give summaries about test coverage.

It supports various platforms, among others Python, R, and Java, and can easily be set up together with travis. I already tried out to configure it for a fork. To get a feeling you can have a look at this report: https://sonarcloud.io/dashboard?id=org.openml%3Aapiconnector

In GitHub, one can also set some quality gates to prevent "bad" code getting into the dev/master branch.

Null value for number_missing_values

When using the data features, while passing through each feature, I found a null value for number_missing_values. The following code can replicate the problem:

OpenmlConnector connector = new OpenmlConnector("https://www.openml.org/", "9ed41f60b87fbe17054397936b96212d");
		Settings.CACHE_ALLOWED = false;
		DataFeature dataFeatures = connector.dataFeatures(2);
		for(Feature feature : dataFeatures.getFeatures()) {
			if(feature.getNumberOfMissingValues() != null && feature.getNumberOfMissingValues() instanceof Integer) {
				continue;
			} else {
				throw new IllegalArgumentException();
			}
		}

Names of functions inconsistent and against conventions

OpenmlConnector:
Almost all methods starts with openMl. This is generally not recommended as it should be clear from the object type itself what it is upload.
It would be like class person would have methods like getPersonName.

Method also usually start with nout and then verb.
e.g. dataUpload. This could lead to thinking that you are creating some sort of precooked dataupload object (especially when the return data type is called DataUpload).
It is usually recommended to start with verb first, e.g. uploadData.
It would be also consistent with some other methods in the class that starts with verb.

There are multiple ways to go, we could break the API or we could stuill support the old api and mark the old methods as obsolete.
Ref: http://www.iwombat.com/standards/JavaStyleGuide.html

OpenMl connector tightly coupled

OpenMlConnector:
relies heavily on the method: HttpConnector.doApiRequest which is static.
This is hard to test and mock.
Imho beter to call some interface, maybe even include some dependency injection.

Rename to openml-java

Would it make sense to rename this repo to 'openml-java' to make it consistent with the others?

Error while connecting using API

Hi everyone.
I'm trying to connect to download dataset using my java application and I'm following the JAVA API documentation. But I'm getting an error. Can some please figure out why it's throwing an error.

image

This is the error.

image 1
image 2

[MOA] NullPointerException using OpenmlTaskEvaluator

From @DraXus on February 15, 2015 18:20

I got the following error when running tasks in MOA (last version from OpenML website).

I tried different tasks and configurations:
openml.OpenmlDataStreamClassification -t 2177 -e openml.OpenmlTaskEvaluator
openml.OpenmlDataStreamClassification -l functions.NoChange -t 2172 -e openml.OpenmlTaskEvaluator

Failure reason: null
*** STACK TRACE ***java.lang.NullPointerException
    at java.util.Arrays$ArrayList.<init>(Arrays.java:2842)
    at java.util.Arrays.asList(Arrays.java:2828)
    at moa.evaluation.LearningEvaluation.<init>(LearningEvaluation.java:53)
    at moa.tasks.openml.OpenmlDataStreamClassification.doMainTask(OpenmlDataStreamClassification.java:175)
    at moa.tasks.MainTask.doTaskImpl(MainTask.java:50)
    at moa.tasks.AbstractTask.doTask(AbstractTask.java:57)
    at moa.tasks.TaskThread.run(TaskThread.java:76)

In addition, the console log output looks fine without errors:

[15-02-2015 18:11:27] [OK] [Authenticate] Authentication successfull. 
[15-02-2015 18:11:28] [INFO] [ARFF Cache] Stored dataset dataset_4_labor.arff to cache. 
[15-02-2015 18:11:28] [OK] [Download] Obtained Stream Header. 

However, it works if BasicClassificationPerformanceEvaluator is selected instead.

Copied from original issue: openml/OpenML#173

[Weka] IndexOutOfBoundsException when obtaining folds

From @DraXus on February 14, 2015 15:17

The following error is shown when trying to run task 17 in Weka 3.7.12 using Naive Bayes as classifier.

[14-02-2015 13:41:27] [INFO] [ARFF Cache] Stored dataset 17 to cache.
[14-02-2015 13:41:27] [INFO] [ARFF Cache] Stored splits 17 to cache.
[14-02-2015 13:41:27] [INFO] [Splits] Obtaining folds for Task 17 (bridges) with weka.classifiers.bayes.NaiveBayes - Repeat 0
java.lang.IndexOutOfBoundsException: Index: 107, Size: 107
java.util.ArrayList.rangeCheck(ArrayList.java:635)
java.util.ArrayList.get(ArrayList.java:411)
weka.core.Instances.instance(Instances.java:768)
org.openml.weka.experiment.TaskResultProducer.doRun(TaskResultProducer.java:248)
org.openml.weka.experiment.TaskBasedExperiment.nextIteration(TaskBasedExperiment.java:173)
org.openml.weka.gui.OpenmlRunPanel$ExperimentRunner.run(OpenmlRunPanel.java:197)

at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at weka.core.Instances.instance(Instances.java:768)
at org.openml.weka.experiment.TaskResultProducer.doRun(TaskResultProducer.java:248)
at org.openml.weka.experiment.TaskBasedExperiment.nextIteration(TaskBasedExperiment.java:173)
at org.openml.weka.gui.OpenmlRunPanel$ExperimentRunner.run(OpenmlRunPanel.java:197)

Copied from original issue: openml/OpenML#172

Using OpenML Java apiconnector in Matlab

I tried to use Java apiconnector in Matlab R2012a but got the following error:

Java exception occurred:
java.lang.NoSuchMethodError:
com.thoughtworks.xstream.io.xml.DomDriver.<init>(Ljava/lang/String;Lcom/thoughtworks/xstream/io/naming/NameCoder;)V
 at org.openml.apiconnector.xstream.XstreamXmlMapping.getInstance(XstreamXmlMapping.java:63)
 at org.openml.apiconnector.io.HttpConnector.doApiRequest(HttpConnector.java:30)
 at org.openml.apiconnector.io.ApiSessionHash.openmlAuthenticate(ApiSessionHash.java:162)
 at org.openml.apiconnector.io.ApiSessionHash.update(ApiSessionHash.java:84)
 at org.openml.apiconnector.io.ApiSessionHash.set(ApiSessionHash.java:71)
 at org.openml.apiconnector.io.OpenmlConnector.<init>(OpenmlConnector.java:89)

This problem occurs because an older version of xstream is loaded by Matlab in the static java class path and therefore it's not using the xstream library provided in the dynamic java class path. I've been googling and couldn't find any elegant solution. Apparently, the only workaround would be to replace the xstream.jar file in the "jarext" Matlab folder with the new version, but that could lead to internal Matlab problems.

So at the moment it's not possible to use the Java apiconnector in Matlab, but I would like to leave this comment here for future reference.

Java docs

Seems like (outdated) java docs are currently hardcoded hosted on the webserver:
https://www.openml.org/docs/

Wouldn't it be better to somehow host these on maven central and link to there? (if possible..) that's where a java-doc.jar is available)

Incomplete Stats in OpenML Features using Java API

Using the latest Java API (ver. 1.0.13 from Maven), we are facing an issue with the dataFeatures class methods to get some statistics about the features in the datasets. Whenever we call a method to retrieve statistics about the features (e.g. getNumberOfDistinctValues() ), we get a Null value. For example, when called using dataset_id = 967 or 21.

Are those methods to retrieve such statistics about features fully implemented in the current version of the API or are they still under development and shouldn't be used?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.