Coder Social home page Coder Social logo

fracpete / python-weka-wrapper Goto Github PK

View Code? Open in Web Editor NEW
84.0 8.0 24.0 89.84 MB

Python 2.7 wrapper for Weka using javabridge.

License: GNU General Public License v3.0

Makefile 0.11% Java 1.37% Python 98.18% HTML 0.34%
python27 weka java machine-learning

python-weka-wrapper's Introduction

END OF LIFE

Python 2.7 has reached its end-of-life in 2020, you should consider using the Python 3 version of this library, as the Python 2.7 version will no longer get updates!

python-weka-wrapper

Python wrapper for the Java machine learning workbench Weka using the javabridge library.

Requirements:

  • Python 2.7 (for Python 3 version see here)
    • javabridge (>= 1.0.14)
    • matplotlib (optional)
    • pygraphviz (optional)
    • PIL (optional)
  • Oracle JDK 1.8+

Uses:

  • Weka (3.9.3)

Installation

Detailed instructions and links to videos on installing the library are located here.

Please note, that you need a build environment to compile some libraries from source.

Forum

You can post questions, patches or enhancement requests in the following Google Group:

https://groups.google.com/forum/#!forum/python-weka-wrapper

Examples

See python-weka-wrapper-examples repository for example code on the various APIs. Also, check out the sphinx documentation in the doc directory. You can generate HTML documentation using the make html command in the doc directory.

Available online documentation:

python-weka-wrapper's People

Contributors

fracpete avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-weka-wrapper's Issues

Would be nice to have it python 3 compatible

@fracpete can't install in an easy way with python 3.x
$ pip install python-weka-wrapper

Downloading/unpacking python-weka-wrapper
  Downloading python-weka-wrapper-0.1.12.tar.gz (43kB): 43kB downloaded
  Running setup.py egg_info for package python-weka-wrapper
    Traceback (most recent call last):
      File "<string>", line 16, in <module>
      File "C:\Users\[...]\AppData\Local\Temp\pip_build_[...]\python-weka-
wrapper\setup.py", line 36
        print "Downloading '" + url + "' to '" + outfile + "'"
                            ^
    SyntaxError: invalid syntax
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 16, in <module>

  File "C:\Users\[...]\AppData\Local\Temp\pip_build_[...]\python-weka-wrap
per\setup.py", line 36

    print "Downloading '" + url + "' to '" + outfile + "'"

                        ^

SyntaxError: invalid syntax

Some common functions in Weka not wrapped by python-weka-wrapper

I see that the API provided by python-weka-wrapper is only a subset of what is offered by Weka. This is not a huge problem as the missing functions can be easily added manually:

Instances.insertAttributeAt = lambda self, att, index: javabridge.call(self.jobject, "insertAttributeAt", "(Lweka/core/Attribute;I)V", att.jobject, index)
Attribute.create_relational = staticmethod(lambda name, instances: Attribute(javabridge.make_instance("weka/core/Attribute", "(Ljava/lang/String;Lweka/core/Instances;)V", name, instances.jobject)))

Still, I think it would be a good idea if common methods like these were available from the start. I am not familiar with the Weka API and have been exploring it through python-weka-wrapper. Only later did I find out that certain functionality I needed was in fact supported by Weka but just hasn't been made available through the wrapper.

weka/core/packages.py - fails with new package manager

New version of Weka uses different package manager classes now (weka.core.packageManagement.Package), hence gets the following error enforcing the old one (org.pentaho.packageManagement.Package):

Exception in thread "Thread-1" java.lang.NoClassDefFoundError: Lorg/pentaho/packageManagement/Package;
Caused by: java.lang.ClassNotFoundException: org.pentaho.packageManagement.Package
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
Traceback (most recent call last):
  File "/Users/sonnguyen/Bitbucket/conll2016-jaist/implicit-sdp/weka-test/test2.py", line 6, in <module>
    items = packages.all_packages()
  File "//anaconda/lib/python2.7/site-packages/weka/core/packages.py", line 241, in all_packages
    result.append(Package(pkge))
  File "//anaconda/lib/python2.7/site-packages/weka/core/packages.py", line 34, in __init__
    self.enforce_type(jobject, "org.pentaho.packageManagement.Package")
  File "//anaconda/lib/python2.7/site-packages/weka/core/classes.py", line 572, in enforce_type
    if not cls.check_type(jobject, intf_or_class, jni_intf_or_class):
  File "//anaconda/lib/python2.7/site-packages/weka/core/classes.py", line 556, in check_type
    return javabridge.is_instance_of(jobject, jni_intf_or_class)
  File "//anaconda/lib/python2.7/site-packages/javabridge/jutil.py", line 806, in is_instance_of
    raise JavaException(jexception)
javabridge.jutil.JavaException: Lorg/pentaho/packageManagement/Package;
Failed to get class Lorg/pentaho/packageManagement/Package;

test_cv method does not take parameter "random"

Traceback (most recent call last):
File "weka_modeling.py", line 191, in
main(args)
File "weka_modeling.py", line 184, in main
weka_modeling.evaluation()
File "weka_modeling.py", line 173, in evaluation
self.evaluate_models(save_plots_dir=self.data_dir)
File "weka_modeling.py", line 128, in evaluate_models
test_data = self.data.test_cv(cv_plot, 0, Random(random))
TypeError: test_cv() takes exactly 3 arguments (4 given)

Learning curve for multiple tests sets

Upgrade method plot_learning_curve in module weka.plot.classifierers to handle a list of tests sets. This will allow plotting of curves for train and test set in the same plot.

Instance class: set_value method fails

This is the code:

loader = Loader(classname="weka.core.converters.ArffLoader")
data = loader.load_file("test.arff")
data.set_class_label(class_index)
index = 3
class_index = data.num_attribute() - 1
print data.get_instance(index).get_value(class_index)
instance = data.get_instance(index)
instance.set_value(class_index, 1.0)

This is the output

0.0
Exception in thread "Thread-0" java.lang.NoSuchMethodError: value
E
======================================================================
ERROR: test_cluster (__main__.TestPsuedoLabel)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 106, in test_func
    private_func()
  File "private.py", line 88, in private_func
    instance.set_value(class_index, 1) private_func
  File "/usr/local/lib/python2.7/dist-packages/python_weka_wrapper-0.1.13-py2.7.egg/weka/core/dataset.py", line 486, in set_value
    javabridge.call(self.jobject, "value", "(ID)V", index, value)
  File "/usr/local/lib/python2.7/dist-packages/javabridge/jutil.py", line 826, in call
    fn = make_call(o, method_name, sig)
  File "/usr/local/lib/python2.7/dist-packages/javabridge/jutil.py", line 789, in make_call
    raise JavaException(jexception)
JavaException: value

Original post:

https://groups.google.com/forum/#!topic/python-weka-wrapper/-SMVULCyjdk

package support

list available/installed packages, install/uninstall package

can not get predictions

evl.crossvalidate_model(cls, data, 2, Random(1))
evl.predictions()

Exception in thread "Thread-0" java.lang.NoSuchMethodError: predictions

JavaException Traceback (most recent call last)
in ()
----> 1 evl.predictions()

/usr/local/lib/python2.7/dist-packages/python_weka_wrapper-0.1.15-py2.7.egg/weka/classifiers.pyc in predictions(self)
1369 """
1370 preds = javabridge.get_collection_wrapper(
-> 1371 javabridge.call(self.jobject, "predictions", "()Ljava/util/ArrayList;"))
1372 result = []
1373 for pred in preds:

/usr/local/lib/python2.7/dist-packages/javabridge/jutil.pyc in call(o, method_name, sig, *args)
824 '''
825 env = get_env()
--> 826 fn = make_call(o, method_name, sig)
827 args_sig = split_sig(sig[1:sig.find(')')])
828 ret_sig = sig[sig.find(')')+1:]

/usr/local/lib/python2.7/dist-packages/javabridge/jutil.pyc in make_call(o, method_name, sig)
787 if method_id is None:
788 if jexception is not None:
--> 789 raise JavaException(jexception)
790 raise JavaError('Could not find method name = "%s" '
791 'with signature = "%s"' % (method_name, sig))

JavaException: predictions

problem with starting use weka wrapper

Hello Fracpete,
Hope you're fine!

I'm new in machine learning system, and know trying to implement one !

I succeeded bulding a classifier for images' classification, following a tutorial and adding many things to make it run. And now I want to evaluate the classifier by plotting the confusion matrix.
I read a lot about confusion matrix and now understand all the theories behind!

The classifier gives me some files (like X.svm.range, X.svm.model, X.svm.scale and a codebook). Actually my classifier uses Bag of visual word with sift.

My problem is I don't know which of this file, I should use to start my evaluation.
I know I have to split my dataset into training set and test set (like the iris dataset, used in almost all examples ). I also install python-weka-wrapper (I'm using python 2.7), but don't now from where to start!

Do you have any clue or any link that could help me from the very beginning?
just let me know if you need some more precisions.

Thanks!
Edess

number of selected attributes is wrong in this example

it should be 5 instead of 4, am I right?

data = loader.load_file(data_dir + "vote.arff")
data.class_is_last()

from weka.attribute_selection import ASSearch, ASEvaluation, AttributeSelection
search = ASSearch(classname="weka.attributeSelection.BestFirst", options=["-D", "1", "-N", "5"])
evaluator = ASEvaluation(classname="weka.attributeSelection.CfsSubsetEval", options=["-P", "1", "-E", "1"])
attsel = AttributeSelection()
attsel.search(search)
attsel.evaluator(evaluator)
attsel.select_attributes(data)

print("# attributes: " + str(attsel.number_attributes_selected))
print("selected: " + str(attsel.selected_attributes))
print("result string:\n" + attsel.results_string)

incremental classifiers

add support for incremental classifiers:

  • Loader: get_structure, next_instance (iterator?)
  • Classifier: update_classifier(Instance)
  • Evaluation: evaluate_model_once(Instance)

java.lang.NoSuchFieldError: TAGS_EVALUATION when using MultiSearch

Hello,

When initializing MultiSearch exactly according to the docs, I get

Exception in thread "Thread-0" java.lang.NoSuchFieldError: TAGS_EVALUATION
Traceback (most recent call last):
File "crossvalfolds-weka.py", line 131, in <module>
"-num-slots", "1", "-S", "1"])
File "/home/tomas/anaconda/lib/python2.7/site-packages/weka/classifiers.py", line 404, in __init__
self.tags_evaluation = Tags.get_tags("weka.classifiers.meta.MultiSearch", "TAGS_EVALUATION")
File "/home/tomas/anaconda/lib/python2.7/site-packages/weka/core/classes.py", line 1356, in    get_tags
return Tags(jobject=javabridge.get_static_field(classname, field, "[Lweka/core/Tag;"))
File "/home/tomas/anaconda/lib/python2.7/site-packages/javabridge/jutil.py", line 956, in get_static_field
raise JavaException(jexception)
javabridge.jutil.JavaException: TAGS_EVALUATION

My code

from weka.classifiers import MultiSearch
....
multi = MultiSearch(
  options=["-sample-size", "100.0", "-initial-folds", "2", "-subsequent-folds", "2",
      "-num-slots", "1", "-S", "1"])

MultiSearch freshly installed with

java -cp /usr/local/lib/python2.7/dist-packages/weka/lib/weka.jar weka.core.WekaPackageManager -install-package https://github.com/fracpete/multisearch-weka-package/releases/download/v2016.1.14/multisearch-2016.1.14.zip

Other possibly relevant installation steps:

pip install javabridge
pip install python-weka-wrapper

Weka version 3.7.13

Thanks!

Configuring Time Series Forecasting for Python

Any clue on how to install the required packages for time series forecasting.

I tried download the zip file and adding the package. But when I run I still get errors.

>>> import weka.classifiers.timeseries
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named timeseries

Using depricated version of Numpy

While trying to install in fedora using pip, I found following warning and error messages ->

/usr/lib64/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]

#warning "Using deprecated NumPy API, disable it by " \

^

_javabridge.c:342:17: fatal error: jni.h: No such file or directory

#include "jni.h"

             ^

compilation terminated.

error: command 'gcc' failed with exit status 1


Cleaning up...
Command /usr/bin/python -c "import setuptools;file='/tmp/pip_build_root/javabridge/setup.py';exec(compile(open(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /tmp/pip-n6PFgI-record/install-record.txt --single-version-externally-managed failed with error code 1 in /tmp/pip_build_root/javabridge
Storing complete log in /root/.pip/pip.log

Create an instance from a numpy array and classify it

Hello,
I would like to, given a numpy array (a vector, with a number of attribute values), classify it.
However, I don't know how to convert that vector into an instance and pass it to the classifier:
I get an error saying

"JavaException: Instance doesn't have access to a dataset!"

It seems I have to add that instance to a dataset, but I don't know why. Do I have to create a fake dataset with only one instance and then classify it? That instance won't have a class either.
This is how my code looks like:

loader = Loader(classname="weka.core.converters.ArffLoader")    
data = loader.load_file(TRAINING_ARFF)
data.class_is_last()

cls = Classifier(classname="weka.classifiers.trees.J48", options = ["-C", "0.3", "-M", "10"])
cls.build_classifier(data)

instance_vector = Instance.create_instance(numpy_vector)

class = cls.classify_instance(instance_vector)

Many thanks in advance!
Cheers,
Ignasi

MultiSearch prop value for J48 decision tree parameters

I am trying to optimize "C" and "R" parameters of J48. It seems I am not getting right the corresponding property values. For "C" I use "classifier.C", since the example SMOreg, has also "C" parameter (same command line option "-C"), which is referred to with "classifier.C":

  mparam.prop = "classifier.C"
  lparam.prop = "classifier.R"

With the setting above, I get the following error from the build_classifier() method

java.beans.IntrospectionException: Method not found: isC
at java.beans.PropertyDescriptor.<init>(PropertyDescriptor.java:107)
at java.beans.PropertyDescriptor.<init>(PropertyDescriptor.java:71)
at weka.core.PropertyPath.find(PropertyPath.java:386)
at weka.core.PropertyPath.find(PropertyPath.java:410)
at weka.core.SetupGenerator.setup(SetupGenerator.java:592)
at weka.classifiers.meta.MultiSearch.findBest(MultiSearch.java:1444)
at weka.classifiers.meta.MultiSearch.buildClassifier(MultiSearch.java:1481)

Same exception for isR.

The full code

multi = MultiSearch(
options=["-sample-size", "100.0", "-initial-folds", "2", "-subsequent-folds", "2",
      "-num-slots", "1", "-S", "1"])
multi.evaluation = "CC"
mparam = MathParameter()
mparam.prop = "classifier.C"
mparam.minimum = 0.1
mparam.maximum = 0.5
mparam.step = 0.1
mparam.base = 10.0
mparam.expression = "pow(BASE,I)"
lparam = ListParameter()
#Reduced error pruning
lparam.prop = "classifier.R"
lparam.values = [True, False]
multi.parameters = [mparam, lparam]
clOpt = Classifier(classname="weka.classifiers.trees.J48", options=["-C", "0.3"])
multi.classifier = clOpt
multi.build_classifier(train)

In the largely c&p code above from the example doc, it is not clear to me what is the role of the multi.evaluation and mparam.expression, but I doubt it has anything to do with the error.

Thanks for your continuing support.

add unit tests

add unit tests for core, plot and Weka-related classes to ensure correctness and avoid regressions

Create Instances from "Scratch"

Thanks for the excellent project, which looks like it will work nicely for me except for one thing: I need to be able to create some weka.core.Instances from, say, a python list of lists. I'm happy to go through and create all the attributes and whatnot by hand, but it doesn't look like there's an easy way to do this.

Is the only natural way of creating a training set through the loader methods (which would in general mean writing the dataset out to file then reading it back in), or is ther another way?

set_property: type parameter

add "type" parameter which defines what type to convert the value

Example:
J48's confidenceFactor is float, but Python uses double

The following converts the string "0.3" into a float before setting the property:
cls = Classifier("weka.classifiers.trees.J48")
cls.set_property("confidenceFactor", "0.3", "f")

Evaluating on provided set of instances

I am trying to evaluate a model on a provided set of instances. This boils down to invoking evaluateModel static method from Weka.

While this is not supported in python-weka-wrapper, creating such call should not be difficult. I tried the following:

classifier = Classifier(classname="weka.classifiers.trees.J48", options=["-C", "0.3"])
classifier.build_classifier(train)
...

javabridge.static_call(
        "Lweka/classifiers/Evaluation;", "evaluateModel",
        "(Lweka/classifiers/Classifier;Lweka/core/Instances;[Ljava/lang/Object;)[D",
        classifier.jobject, test.jobject)

where
test is of type weka.core.Instances
classifier is of type Classifier

I am getting the following error: Could not find method name = evaluateModel with signature = (Lweka/classifiers/Classifier;Lweka/core/Instances;[Ljava/lang/Object;)[D

I've also tried explicitly adding the empty third parameter with None or [], which did not help.

Signature looks fine to me:

  • Lweka/classifiers/Classifier; - Classifier object
  • Lweka/core/Instances; - instances object
  • [Ljava/lang/Object; - varargs
  • [D - array of doubles return type

Thanks for any pointers

Python 3 migration

migration to Python 3:

  • does javabridge support Python 3?
  • migrate python-weka-wrapper examples
  • migrate wekamooc examples
  • migrate documentation

Weka packages: alternative path

WinPython is a self contained Python distribution, which changes the home directory of the user. Instead of only accepting True or False for the packages parameter when starting up the JVM, an alternative path should get accepted as well.

Error in example.rst

Thanks for the help on the previous issue.
I believe I spotted a possible error in this section of the documentation:

evl.crossvalidate_model(cls, data, 10, Random(1))

should be

evl.crossvalidate_model(fc, data, 10, Random(1))

Instances and ARFF file dataset manipulations

Hi
looking at the WEKA source code I found a number of additional implemented features that allow for dataset manipulations. Can that work in python too, or is it missing intentionally? Is there a plan to include support for this in the wrapper?

Some examples can be found in the docs below:

Loader "loads" a non existent file without warning

Below it is a minimal example (also in loaderBug.py) of what I think it is a bug.
The example loads an existent file (into trainData) and a non existent file (into testData).
However both datasets contains trainData information (the example prints both num_instances, and they are unexpectedly the same).

#!/usr/bin/python
# -*- coding: utf-8 -*-

import weka.core.jvm as jvm
from weka.core.converters import Loader

jvm.logger.setLevel(jvm.logging.INFO)
jvm.start()

loader = Loader(classname="weka.core.converters.ArffLoader")
trainData = loader.load_file('segment-challenge.arff')
testData = loader.load_file('this-file-not-exist')  # silent fail

print(trainData.num_instances)  # True answer (1500)
print(testData.num_instances)   # False answer (1500)

jvm.stop()

experiment support

allow configuration and execution of experiments using the Experimenter framework

Simple workflow support

Basic workflow support:

  • LoadFile, LoadDB, SaveFile, SaveDB
  • SetClass
  • ApplyFilter
  • SelectAttributes
  • EvaluateClassifier/Clusterer
  • BuildClassifier/Clusterer
  • ...

set_class_index(self,index)

how about adding below mentioned function in
weka.core.dataset.py ??? why is such function not available is python when it's available in java
i'm just curious why it's not available , i'll immediately close this issue if you want.

def set_class_index(self,index):
"""
Sets the index value as class attribute (convenience method).
"""
self.class_index = index

double_matrix_to_ndarray falsely assumes square matrices

The implementation of double_matrix_to_ndarray assumes that the passed Java matrix is a square matrix. This may be true for confusion matrices but it's definitely an error for Classifier.distributions_for_instances. When there are more instances than class labels, distributions_for_instances works but returns a too big numpy array. When there are less instances than class labels, distributions_for_instances throws an exception:

File "C:\Program Files (x86)\Python27\lib\site-packages\weka\classifiers.py", line 129, in distributions_for_instances
return arrays.double_matrix_to_ndarray(self.__distributions(data.jobject))
File "C:\Program Files (x86)\Python27\lib\site-packages\weka\core\types.py", line 74, in double_matrix_to_ndarray
result[i][n] = element
IndexError: index 3 is out of bounds for axis 0 with size 3

speed up often used methods

Speed up following methods using method objects (javabridge.make_call):

  • Classifier.classify_instance
  • Classifier.distribution_for_instance
  • Clusterer.cluster_instance
  • Clusterer.distribution_for_instance
  • Filter.input
  • Filter.output
  • Instance.class_index
  • Instance.is_missing
  • Instance.get/set_value
  • Instance.get/set_string_value
  • Instance.weight
  • Instances.attribute
  • Instances.num_attributes
  • Instances.num_instances
  • Instances.get_instance
  • Instances.class_index

Example issues in windows

When running this code,

from weka.classifiers import Classifier
cls = Classifier(classname="weka.classifiers.trees.J48")
print(cls.to_help())

I have this error message:
cannot find classname attribute for Classifier __init__

It seems it cannot load the class in Python

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.