Coder Social home page Coder Social logo

pystruct / pystruct Goto Github PK

View Code? Open in Web Editor NEW
664.0 664.0 177.0 12.15 MB

Simple structured learning framework for python

Home Page: http://pystruct.github.io

License: BSD 2-Clause "Simplified" License

Python 98.30% Shell 1.05% Makefile 0.07% Cython 0.59%

pystruct's Introduction

Build Status pypi version licence DOI

PyStruct

PyStruct aims at being an easy-to-use structured learning and prediction library. Currently it implements only max-margin methods and a perceptron, but other algorithms might follow.

The goal of PyStruct is to provide a well-documented tool for researchers as well as non-experts to make use of structured prediction algorithms. The design tries to stay as close as possible to the interface and conventions of scikit-learn.

You can install pystruct using

pip install pystruct

Some of the functionality (namely OneSlackSSVM and NSlackSSVM) requires that cvxopt is installed. See the installation instructions for more details.

The full documentation and installation instructions can be found at the website: http://pystruct.github.io

You can contact the authors either via the mailing list or on github.

Currently the project is mostly maintained by Andreas Mueller, but contributions are very welcome.

Jean-Luc Meunier (Naver Labs Europe) contributed a new model and did some maintenance, in the course of the EU READ project. See READ_Contribution.md

pystruct's People

Contributors

amueller avatar bjanssen avatar derthorsten avatar eduardozamudio avatar fgregg avatar iver56 avatar jlmeunier avatar jnothman avatar kondra avatar larsmans avatar lemonlison avatar shengshuyang avatar tesla1060 avatar thomasp6t avatar vene avatar zaxtax avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pystruct's Issues

Gradient computation in SubgradientStructuredSVM

Hi Andreas,

Hope you don’t mind if I ask some questions here. :) Those two are probably bugs, or result of my misunderstanding.

  1. When you compute the gradient in SubgradientStructuredSVM::_solve_subgradient, instead of

    grad = (psi_matrix - w / self.C / 2.) 
    

    should be

    grad = (psi_matrix - w / self.C) 
    

    Otherwise you optimize the objective with a different value of C.

  2. In the same function, when you compute the non-adagrad update, you store it in grad_old instead of self.grad_old, thus ignoring momentum.

Window XP installation failed

Hi
by using pip install pystruct, I am receiving the following error

C:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\xlocale(342) : warning    C
4530: C++ exception handler used, but unwind semantics are not enabled.    Specify
/EHsc

ad3/FactorGraph.cpp(21) : fatal error C1083: Cannot open include file: 'sys/time
.h': No such file or directory

Is that because sys/time not in win32 ?

implement warm-start for inference

There should be a method to warm-start inference procedures from past iterations.
We need to cache the last result for each example and feed it to the inference procedure.
This is basically independent of the learner.
Maybe it could be completely put into the model.

First, we need to implement it in the inference procedures, though.
The LP should be able to benefit, and also opengm.
A slight API crux is that the result alone is not enough to warm-start, we also need dual solutions, messages, etc. depending on the method.

Helper functions for pretty returning of parameters

It would be nice if the user could get potentials returned as separate matrices of potential types.

I.e. for, graph_crf instead of

[1 2 -1 -2, 4 5 6]

we get

[numpy.array([[ 1 2 ],   
              [ -1 -2]],
 numpy.array([[ 4, 5], 
              [ 5, 6]])
]


Learning 2D interactions

I am trying to reproduce the sample code as in plot_grid_crf.py with my custom data. My array X is of the shape (2,720,960,12), having two training samples of two images of sizes 720(rows)x960(colums) and class-wise probabilities being along the final axis(3). Similarly my array Y is (2,720,960) having pixel wise ground truth.

When I define the clf and crf objects the same way as the sample code, I get the following error :

Traceback (most recent call last):

  File "<ipython-input-5-3c2334b8ac36>", line 1, in <module>
    runfile('/home/prassanna/Development/workspace/Semantic-texton-forests/scripts/tempcrf.py', wdir='/home/prassanna/Development/workspace/Semantic-texton-forests/scripts')

  File "/usr/local/lib/python2.7/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 601, in runfile
    execfile(filename, namespace)

  File "/usr/local/lib/python2.7/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 73, in execfile
    builtins.execfile(filename, *where)

  File "/home/prassanna/Development/workspace/Semantic-texton-forests/scripts/tempcrf.py", line 158, in <module>
    clf.fit(X, Y)

  File "/usr/local/lib/python2.7/dist-packages/pystruct/learners/one_slack_ssvm.py", line 440, in fit
    joint_feature_gt = self.model.batch_joint_feature(X, Y)

  File "/usr/local/lib/python2.7/dist-packages/pystruct/models/base.py", line 40, in batch_joint_feature
    joint_feature_ += self.joint_feature(x, y)

  File "/usr/local/lib/python2.7/dist-packages/pystruct/models/graph_crf.py", line 194, in joint_feature
    unary_marginals[gx, y] = 1

IndexError: index 3 is out of bounds for axis 1 with size 3

I tried the same with -1 *log probabilities instead of probabilities and the same issue again. However, it works great with your sample data. I do not understand what's causing the problem.

Any help would be appreciated...

-Semicolon Warrior

Stack result of predict if possible

I think it would be nice to ensure the result of predict is an array if this is possible for the model. This would make integration of the multi-class and multi-label algorithms into scikit-learn smoother.
Maybe we could just hack it in by seeing if it is possible and leaving it if not.

SubgradientSSVM trained with n_jobs > 1 performing worse than SubgradientSSVM trained with n_jobs==1 after same number of iterations

writing some tests for SaveLogger functionality with n_jobs > 1, because currently my pool attribute causes this to fail, and came across what I think is a bug.
Significant differences in performance for SubgradientSSVM between training with n_jobs==1 and n_jobs > 1, in current code using Parallel. I haven't tested whether Pool implementation is not showing this behavior. Seems worse with more cores:

on my local machine, 4 cores:

local$ python subgradient_ssvm_bug.py
0.973684210526
0.921052631579

on aws server, 16 cores:

aws-server$ python subgradient_ssvm_bug.py
0.973684210526
0.710526315789
import numpy as np
from sklearn.datasets import load_iris
from sklearn.cross_validation import train_test_split
from pystruct.models import GraphCRF
from pystruct.learners import SubgradientSSVM

if __name__ == '__main__':
    iris = load_iris()
    X, y = iris.data, iris.target

    X_ = [(np.atleast_2d(x), np.empty((0, 2), dtype=np.int)) for x in X]
    Y = y.reshape(-1, 1)

    X_train, X_test, y_train, y_test = train_test_split(X_, Y, random_state=1)

    pbl = GraphCRF(n_features=4, n_states=3, inference_method='unary')

    svm = SubgradientSSVM(pbl, max_iter=100)
    svm.fit(X_train, y_train)
    print svm.score(X_test, y_test)

    svm_par = SubgradientSSVM(pbl, max_iter=100, n_jobs=-1)
    svm_par.fit(X_train, y_train)
    print svm_par.score(X_test, y_test)

Add unit tests for standard datasets

We should test that they can be loaded and also processed.
They cython in loss-augmented prediction broke the snakes example as it used unsigned char for y :-/

Manage inference packages

The inference packages need be be installed way more pain free.
Possibilities:

  • include the DAI wrappers, check if dai is installed
  • include ad3
  • include scripts to fetch other solvers
  • implement message passing?

Cannot install on MacOSX

Hello! I'm very interested in using pystruct, and attempting to install on MacOSX 10.8.4. My python info is:

Python 2.7.5 (v2.7.5:ab05e7dd2788, May 13 2013, 13:18:45)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

I used easy_install to install pip, and the following happens when I attempt to install pystruct:

max$ sudo pip install pystruct
Password:
Downloading/unpacking pystruct
  Downloading pystruct-0.1.1.tar.gz (6.4MB): 6.4MB downloaded
  Running setup.py egg_info for package pystruct

    warning: no previously-included files matching '*.pyc' found under directory 'doc'
    warning: no previously-included files matching '*.pyo' found under directory 'doc'
    warning: no previously-included files matching '*.pyc' found under directory 'tests'
    warning: no previously-included files matching '*.pyo' found under directory 'tests'
    no previously-included directories found matching 'docs/_build'
    no previously-included directories found matching 'docs/auto_examples'
    no previously-included directories found matching 'docs/generated'
Downloading/unpacking ad3 (from pystruct)
  Downloading ad3-2.0.tar.gz (518kB): 518kB downloaded
  Running setup.py egg_info for package ad3

Downloading/unpacking pyqpbo (from pystruct)
  Downloading pyqpbo-0.1.tar.gz (76kB): 76kB downloaded
  Running setup.py egg_info for package pyqpbo

Installing collected packages: pystruct, ad3, pyqpbo
  Running setup.py install for pystruct

    warning: no previously-included files matching '*.pyc' found under directory 'doc'
    warning: no previously-included files matching '*.pyo' found under directory 'doc'
    warning: no previously-included files matching '*.pyc' found under directory 'tests'
    warning: no previously-included files matching '*.pyo' found under directory 'tests'
    no previously-included directories found matching 'docs/_build'
    no previously-included directories found matching 'docs/auto_examples'
    no previously-included directories found matching 'docs/generated'
    building 'pystruct.models.utils' extension
    clang -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/utils.c -o build/temp.macosx-10.8-intel-2.7/src/utils.o
    clang: error: no such file or directory: 'src/utils.c'
    clang: error: no input files
    error: command 'clang' failed with exit status 1
    Complete output from command /usr/bin/python -c "import setuptools;__file__='/private/tmp/pip_build_root/pystruct/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-8TrbT9-record/install-record.txt --single-version-externally-managed:
    running install

running build

running build_py

creating build

creating build/lib.macosx-10.8-intel-2.7

creating build/lib.macosx-10.8-intel-2.7/pystruct

copying pystruct/__init__.py -> build/lib.macosx-10.8-intel-2.7/pystruct

copying pystruct/plot_learning.py -> build/lib.macosx-10.8-intel-2.7/pystruct

creating build/lib.macosx-10.8-intel-2.7/pystruct/learners

copying pystruct/learners/__init__.py -> build/lib.macosx-10.8-intel-2.7/pystruct/learners

copying pystruct/learners/downhill_simplex_ssvm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/learners

copying pystruct/learners/latent_structured_svm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/learners

copying pystruct/learners/n_slack_ssvm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/learners

copying pystruct/learners/one_slack_ssvm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/learners

copying pystruct/learners/ssvm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/learners

copying pystruct/learners/structured_perceptron.py -> build/lib.macosx-10.8-intel-2.7/pystruct/learners

copying pystruct/learners/subgradient_latent_ssvm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/learners

copying pystruct/learners/subgradient_ssvm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/learners

copying pystruct/learners/svm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/learners

creating build/lib.macosx-10.8-intel-2.7/pystruct/inference

copying pystruct/inference/__init__.py -> build/lib.macosx-10.8-intel-2.7/pystruct/inference

copying pystruct/inference/inference_methods.py -> build/lib.macosx-10.8-intel-2.7/pystruct/inference

copying pystruct/inference/linear_programming.py -> build/lib.macosx-10.8-intel-2.7/pystruct/inference

creating build/lib.macosx-10.8-intel-2.7/pystruct/models

copying pystruct/models/__init__.py -> build/lib.macosx-10.8-intel-2.7/pystruct/models

copying pystruct/models/base.py -> build/lib.macosx-10.8-intel-2.7/pystruct/models

copying pystruct/models/chain_crf.py -> build/lib.macosx-10.8-intel-2.7/pystruct/models

copying pystruct/models/crf.py -> build/lib.macosx-10.8-intel-2.7/pystruct/models

copying pystruct/models/edge_feature_graph_crf.py -> build/lib.macosx-10.8-intel-2.7/pystruct/models

copying pystruct/models/graph_crf.py -> build/lib.macosx-10.8-intel-2.7/pystruct/models

copying pystruct/models/grid_crf.py -> build/lib.macosx-10.8-intel-2.7/pystruct/models

copying pystruct/models/latent_graph_crf.py -> build/lib.macosx-10.8-intel-2.7/pystruct/models

copying pystruct/models/latent_grid_crf.py -> build/lib.macosx-10.8-intel-2.7/pystruct/models

copying pystruct/models/latent_node_crf.py -> build/lib.macosx-10.8-intel-2.7/pystruct/models

copying pystruct/models/multilabel_svm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/models

copying pystruct/models/setup.py -> build/lib.macosx-10.8-intel-2.7/pystruct/models

copying pystruct/models/unstructured_svm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/models

creating build/lib.macosx-10.8-intel-2.7/pystruct/utils

copying pystruct/utils/__init__.py -> build/lib.macosx-10.8-intel-2.7/pystruct/utils

copying pystruct/utils/backports.py -> build/lib.macosx-10.8-intel-2.7/pystruct/utils

copying pystruct/utils/graph.py -> build/lib.macosx-10.8-intel-2.7/pystruct/utils

copying pystruct/utils/inference.py -> build/lib.macosx-10.8-intel-2.7/pystruct/utils

copying pystruct/utils/logging.py -> build/lib.macosx-10.8-intel-2.7/pystruct/utils

copying pystruct/utils/plotting.py -> build/lib.macosx-10.8-intel-2.7/pystruct/utils

creating build/lib.macosx-10.8-intel-2.7/pystruct/datasets

copying pystruct/datasets/__init__.py -> build/lib.macosx-10.8-intel-2.7/pystruct/datasets

copying pystruct/datasets/letters.py -> build/lib.macosx-10.8-intel-2.7/pystruct/datasets

copying pystruct/datasets/scene.py -> build/lib.macosx-10.8-intel-2.7/pystruct/datasets

copying pystruct/datasets/synthetic_grids.py -> build/lib.macosx-10.8-intel-2.7/pystruct/datasets

creating build/lib.macosx-10.8-intel-2.7/pystruct/tests

copying pystruct/tests/__init__.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests

copying pystruct/tests/test_libraries.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests

creating build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_learners

copying pystruct/tests/test_learners/__init__.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_learners

copying pystruct/tests/test_learners/test_binary_svm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_learners

copying pystruct/tests/test_learners/test_crammer_singer_svm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_learners

copying pystruct/tests/test_learners/test_edge_feature_graph_learning.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_learners

copying pystruct/tests/test_learners/test_graph_svm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_learners

copying pystruct/tests/test_learners/test_latent_node_crf_learning.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_learners

copying pystruct/tests/test_learners/test_latent_svm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_learners

copying pystruct/tests/test_learners/test_n_slack_ssvm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_learners

copying pystruct/tests/test_learners/test_one_slack_ssvm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_learners

copying pystruct/tests/test_learners/test_perceptron.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_learners

copying pystruct/tests/test_learners/test_primal_dual.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_learners

copying pystruct/tests/test_learners/test_structured_perceptron.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_learners

copying pystruct/tests/test_learners/test_subgradient_latent_svm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_learners

copying pystruct/tests/test_learners/test_subgradient_svm.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_learners

creating build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_models

copying pystruct/tests/test_models/__init__.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_models

copying pystruct/tests/test_models/test_chain_crf.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_models

copying pystruct/tests/test_models/test_directional_crf.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_models

copying pystruct/tests/test_models/test_edge_feature_graph_crf.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_models

copying pystruct/tests/test_models/test_graph_crf.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_models

copying pystruct/tests/test_models/test_grid_crf.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_models

copying pystruct/tests/test_models/test_latent_crf.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_models

copying pystruct/tests/test_models/test_latent_node_crf.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_models

copying pystruct/tests/test_models/test_multilabel_problem.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_models

creating build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_inference

copying pystruct/tests/test_inference/__init__.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_inference

copying pystruct/tests/test_inference/test_exact_inference.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_inference

creating build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_utils

copying pystruct/tests/test_utils/__init__.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_utils

copying pystruct/tests/test_utils/test_utils_inference.py -> build/lib.macosx-10.8-intel-2.7/pystruct/tests/test_utils

running egg_info

writing requirements to pystruct.egg-info/requires.txt

writing pystruct.egg-info/PKG-INFO

writing top-level names to pystruct.egg-info/top_level.txt

writing dependency_links to pystruct.egg-info/dependency_links.txt

warning: manifest_maker: standard file '-c' not found



reading manifest file 'pystruct.egg-info/SOURCES.txt'

reading manifest template 'MANIFEST.in'

warning: no previously-included files matching '*.pyc' found under directory 'doc'

warning: no previously-included files matching '*.pyo' found under directory 'doc'

warning: no previously-included files matching '*.pyc' found under directory 'tests'

warning: no previously-included files matching '*.pyo' found under directory 'tests'

no previously-included directories found matching 'docs/_build'

no previously-included directories found matching 'docs/auto_examples'

no previously-included directories found matching 'docs/generated'

writing manifest file 'pystruct.egg-info/SOURCES.txt'

copying pystruct/datasets/letters.pickle -> build/lib.macosx-10.8-intel-2.7/pystruct/datasets

copying pystruct/datasets/scene.pickle -> build/lib.macosx-10.8-intel-2.7/pystruct/datasets

running build_ext

building 'pystruct.models.utils' extension

creating build/temp.macosx-10.8-intel-2.7

creating build/temp.macosx-10.8-intel-2.7/src

clang -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/utils.c -o build/temp.macosx-10.8-intel-2.7/src/utils.o

clang: error: no such file or directory: 'src/utils.c'

clang: error: no input files

error: command 'clang' failed with exit status 1

----------------------------------------
Cleaning up...
Command /usr/bin/python -c "import setuptools;__file__='/private/tmp/pip_build_root/pystruct/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-8TrbT9-record/install-record.txt --single-version-externally-managed failed with error code 1 in /private/tmp/pip_build_root/pystruct
Storing complete log in /Users/max/Library/Logs/pip.log

Any help would be greatly appreciated. Thank you!

speed up tests

I hate slow tests. Currently they take ~15 minutes on travis. WUT?
They should probably also be more systematic. I'm not sure why I test some models for lp and ad3. Not sure there is any gain there (though lp always returns "marginals", which ad3 doesn't do any more).

get rid of n_features

probably by inserting a function into the model interface that infers n_features and is used by the learner to allocate w.

Problem with sample data

Hi, I am using windows XP for the record,
I have encountered the following problem while trying to load the sample data

from pystruct.datasets import load_letters
letters = load_letters()
---------------------------------------------------------------------------
EOFError                                  Traceback (most recent call last)
<ipython-input-2-5bb547bc0e79> in <module>()
----> 1 letters = load_letters()

C:\Python27\lib\site-packages\pystruct\datasets\letters.pyc in load_letters()
     16     module_path = dirname(__file__)
     17     data_file = open(join(module_path, 'letters.pickle'))
---> 18     data = cPickle.load(data_file)
     19     # we add an easy to use image representation:
     20     data['images'] = [np.hstack([l.reshape(16, 8) for l in word])

EOFError: 

Block-coordinate Frank-Wolfe algorithm

There's a paper at ICML 2013 "Block-Coordinate Frank-Wolfe Optimization for Structural SVMs".

The block-coordinate variant should be very easy to implement for you since it is basically equivalent to the stochastic subgradient based solver except that the step size is tuned in closed form. The paper uses a formulation based on lambda rather than C but I think one only needs to replace lambda everywhere by 1 and to clip gamma to [0, C] instead of [0, 1] (c.f. Algorithm 4).

Also this algorithm can be seen as a batch version of structured passive-aggressive. @vene told me he was thinking of implementing passive-aggressive but I think this block-coordinate algorithm should be better, since it can use the knowledge of previous iterations, unlike passive-aggressive.

Encode symmetric matrices using scipy.spatial

Symmetric weight-matrices are currently transformed to flat vectors using boolean masks. I would rather use the scipy methods. I'll try to do that today. It will change the memory layout of the (flat) weights, though.

Nodes would be better called examples

In http://pystruct.github.io/generated/pystruct.models.GraphCRF.html#pystruct.models.GraphCRF

Node features are given as a tuple of shape (n_nodes, n_features), An instance x is represented as a tuple (features, edges) where edges is an array of shape (n_edges, 2), representing the graph.

Might be better as something like

Examples, i.e. X, are given as a tuple of length n_examples. An example, x, is represented as a tuple (features, edges) where features is numpy array of shape (n_nodes, attributes), and edges is is an array of shape (n_edges, 2), representing the graph.

Labels, Y, are given as tuple of length n_examples. Each label, y, in Y is given by a numpy array of shape (n_nodes,).

Compare constraint caching with SVM^struct

When doing benchmarks, I found out that SVM^structs one-slack solver can benefit from constraint caching for multi-class SVMs. I don't understand that, as I would have thought finding the most violated constraint would be less expensive then evaluating all cached inference results.
Needs investigation.

Timestamps weird

The current timestamps_ attribute of the learners start with one absolute time stamp.
That is pretty confusing. The first entry should be stored in a separate attribute such that timestamps_ only contains the actually relative times.

Stopping tolerance scales with C

As the objective scales with C, the stopping tolerance of some estimators does, too.
Maybe the way the objective is scaled is not that great an idea after all?

website: fix links to classes in examples

There is this fancy script that creates links for all classes and functions in the examples to the documentation.
In scikit-learn that works for classes, in pystruct it doesn't :-/

Remaining issues in Frank-Wolfe implementation

There are still some minor issues that need to be fixed in the FrankWolfeSSVM:

  • don't recompute inference for duality gap in batch case
  • increase test coverage
  • benchmark
  • rename estimator to BCFW (should we? that is the name used in the paper)
  • implement averaging for batch version
  • Allow min-batches for parallel inference on multi-core computers.

Add Latent SVM

We should have a separate Latent SVM model as in the digits example.
Basically it would only need slight modifications from the CrammerSinger model, with additions from LatentGraphCRF. That would make the example much faster and can be used in many cases.

add node types

The GraphCRF should allow different nodes to take different types. This is more of a long-term goal as it is somewhat non-trivial, though.

UnboundLocalError when fitting example with only one label

When fitting a data set with only one label (all entries in y are equal), the following error is raised:

UnboundLocalError: local variable 'objective' referenced before assignment

backtrace:

Training 1-slack dual structural SVM
iteration 0
no additional constraints
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
[..]

/usr/local/lib/python2.7/dist-packages/pystruct/learners/one_slack_ssvm.pyc in fit(self, X, Y, constraints, warm_start, initialize)
    520         primal_objective = self._objective(X, Y)
    521         self.primal_objective_curve_.append(primal_objective)
--> 522         self.objective_curve_.append(objective)
    523         self.cached_constraint_.append(False)
    524 

It is reasonable that the model cannot be fit to this kind of labeled data but this situation should be caught and a more meaningful error should be raised.

Remove EdgeType CRF

As this is more readily implemented using EdgeFeatureGraphCRF.
Need to rewrite DirectionalGridCRF before, though :-/

Weighted loss

It would be nice to be able to pass node weights for Hamming loss function, i.e. make some labels more important than the others. For example, if we are labeling superpixels of an image instead of pixels, we want to minimize the number of wrong pixels, not superpixels.

[bug] image_segmentation example

I've downloaded pickle files and I've tried to run this example, but I've got an error:

Traceback (most recent call last):
  File "image_segmentation.py", line 30, in <module>
    data_train = cPickle.load(open("data_train.pickle"))
ImportError: No module named latent_crf_experiments.utils

I haven't found such a name in pystruct sources.
pystruct was installed by command python setup.py install --user

Test failure in latent SVMs

Got this from a fresh clone. Maybe assert_array_almost_equal should be used here?

======================================================================
FAIL: test_latent_svm.test_with_crosses_bad_init
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/scratch/apps/src/pystruct/tests/test_learners/test_latent_svm.py", line 80, in test_with_crosses_bad_init
    assert_array_equal(np.array(Y_pred), Y)
  File "/usr/lib/pymodules/python2.7/numpy/testing/utils.py", line 719, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "/usr/lib/pymodules/python2.7/numpy/testing/utils.py", line 645, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

(mismatch 2.5%)
 x: array([[[0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 1, 1, 1, 0],...
 y: array([[[0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 1, 1, 1, 0],...

Citation guidance

It would be good to add a note in the README about how we want this project to be cited in academic work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.