mnick / scikit-tensor Goto Github PK

Python library for multilinear algebra and tensor factorizations

License: GNU General Public License v3.0

Python 100.00%

scikit-tensor's Introduction

scikit-tensor

scikit-tensor is a Python module for multilinear algebra and tensor factorizations. Currently, scikit-tensor supports basic tensor operations such as folding/unfolding, tensor-matrix and tensor-vector products as well as the following tensor factorizations:

Canonical / Parafac Decomposition
Tucker Decomposition
RESCAL
DEDICOM
INDSCAL

Moreover, all operations support dense and tensors.

Dependencies

The required dependencies to build the software are Numpy >= 1.3, SciPy >= 0.7.

Usage

Example script to decompose sensory bread data (available from http://www.models.life.ku.dk/datasets) using CP-ALS

import logging
from scipy.io.matlab import loadmat
from sktensor import dtensor, cp_als

# Set logging to DEBUG to see CP-ALS information
logging.basicConfig(level=logging.DEBUG)

# Load Matlab data and convert it to dense tensor format
mat = loadmat('../data/sensory-bread/brod.mat')
T = dtensor(mat['X'])

# Decompose tensor using CP-ALS
P, fit, itr, exectimes = cp_als(T, 3, init='random')

Install

This package uses distutils, which is the default way of installing python modules. The use of virtual environments is recommended.

pip install scikit-tensor

To install in development mode

git clone [email protected]:mnick/scikit-tensor.git
pip install -e scikit-tensor/

Contributing & Development

scikit-tensor is still an extremely young project, and I'm happy for any contributions (patches, code, bugfixes, documentation, whatever) to get it to a stable and useful point. Feel free to get in touch with me via email (mnick at AT mit DOT edu) or directly via github.

Development is synchronized via git. To clone this repository, run

git clone git://github.com/mnick/scikit-tensor.git

Authors

Maximilian Nickel: Web, [Email](mailto://mnick AT mit DOT edu), Twitter

License

scikit-tensor is licensed under the GPLv3

Related Projects

Matlab Tensor Toolbox: A Matlab toolbox for tensor factorizations and tensor operations freely available for research and evaluation.
Matlab Tensorlab A Matlab toolbox for tensor factorizations, complex optimization, and tensor optimization freely available for non-commercial academic research.

scikit-tensor's People

Contributors

Stargazers

Watchers

Forkers

hihihippp panisson koh-t kastnerkyle kod3r ibayer dweissman brian-cleary rishabhmehrotra mapleyustat c3h3 phinx lms-bv lmsasu pmadhyastha rosewujunshuang bearnshaw uday-sherlock pengyuan 52nlp arvindks mrthat ranjan13 jackkamm marco-santoni giserh cc13ny tpnguyen wavelets peratham zhenv5 shenyucong shameer robert-alvarez kkkky123 chairmanmeow-sy smileyk cemoody ml-ai-nlp-ir enzomolfetta jasonge27 kezpitt yongnuo91 datnamer theosotr fdion nils-werner giggleliu brianholland simonab borundev thiell xiangyongcao niyaziso codeaudit ivorytwoer minghao2016 tozammel jakeykj vybhavk qiuyuew jinyouzhi gongqingyi-github xiaofengzhiyu djour staticfloat zhouyonglong nagyist philipp-gaspar anirband hayasick snci nikolayvoronchikhin aymar73 channabasavagola-zz stasysp jtliso pollyli jidrobo mzkhan2000 dimenwarper rygbee xidaow ruiatelsevier dmitriikolesnichenko ramav87 knut0815 mgwave afcarl ddfabbro shaobohan fabianaicheler michele-nazareth mutual-ai ducptruong leoyouli himelys jma78 logichen noobn00b

scikit-tensor's Issues

Make API sklearn compatible

sktensor's API should be compatible with sklearn's matrix factorization API for simpler/more consistent usage.

Error with Python 3.4 in cp_als: unsupported operand types for +:'range' and 'range'

Reproduction script is the example in the README

Python 3.4.1 |Anaconda 2.0.0 (64-bit)| (default, May 19 2014, 13:02:41)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-54)] on linux

git commit ef063d0

Works fine in 2.7 (I switched to a 2.7 environment using Anaconda)

Traceback (most recent call last):
  File "tmp.py", line 13, in <module>
    P, fit, itr, exectimes = cp_als(T, 3, init='random')
  File "/home/kkastner/src/scikit-tensor/sktensor/cp.py", line 143, in als
    Unew = X.uttkrp(U, n)
  File "/home/kkastner/src/scikit-tensor/sktensor/dtensor.py", line 162, in uttkrp
    order = range(n) + range(n + 1, self.ndim)
TypeError: unsupported operand type(s) for +: 'range' and 'range'

I will take a look and see if I can submit a PR

Installing error

Hello, i tried to use
pip2 install scikit-tensor
to install but failed. The error imformation

Collecting scikit-tensor
Using cached scikit-tensor-0.1.tar.gz
Complete output from command python setup.py egg_info:
setuptools module not found.
Install setuptools if you want to enable 'python setup.py develop'.
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-6yTuyk/scikit-tensor/setup.py", line 53, in
require('numpy', 'scipy', 'nose')
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 943, in require
needed = self.resolve(parse_requirements(requirements))
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 829, in resolve
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'nose' distribution was not found and is required by the application

How to install successfully? Thank you!

Sparse tensor with Tucker

I need to run the tucker with a sparse tensor.
I got this error : TypeError: 'numpy.int32' object is not iterable
When I run this code:
from sktensor import tucker_hooi
from sktensor import sptensor

S = sptensor(([0,1,2], [3,2,0], [2,2,2]), [1,1,1],
shape=(10, 20, 5), dtype=np.float)

tucker_hooi(X=S, rank=[5, 5, 4], init='nvecs')

Can you help me, please?
Thanks.

TypeError when result of sptensor.ttv(vectors) is a sptensor

When applying ttv (tensor times vector) between a sparse tensor and a set of vectors, if the result is a sparse tensor, I get the following error:

"TypeError: arange: scalar arguments expected instead of a tuple."

The error can be reproduced with this test case:

    def test_ttv():
        subs = (
            array([0, 1, 0, 5, 7, 8]),
            array([2, 0, 4, 5, 3, 9]),
            array([0, 1, 2, 2, 1, 0])
        )
        vals = array([1, 1, 1, 1, 1, 1])
        S = sptensor(subs, vals, shape=[10, 10, 3])

        sttv = S.ttv((zeros(10), zeros(10)), modes=[0, 1])
        assert_equal(type(sttv), sptensor)
        assert_true((allclose(zeros(3), sttv.vals)))
        assert_true((allclose(np.arange(3), sttv.subs)))

2D array in example code

I'm going through the cp_sensory_bread_data.py example:

import logging
from scipy.io.matlab import loadmat
from sktensor import dtensor, cp_als

# Set logging to DEBUG to see CP-ALS information
logging.basicConfig(level=logging.DEBUG)

# Load Matlab data and convert it to dense tensor format
mat = loadmat('../data/sensory-bread/brod.mat')
T = dtensor(mat['X'])

# Decompose tensor using CP-ALS
P, fit, itr, exectimes = cp_als(T, 3, init='random')

If I try:

mat['X'].shape
--> (10, 88)
T.shape
--> (10, 88)

Shouldn't the tensor have a third dimension?

cp.py _init has off-by-one error

Specifically, line 198 and 201 seem to consider too small ranges when initialising, leaving Uinit[0] with null values.

I didn't look into why this seems to work when using default settings. But it did crash my custom code using another matrix factorization method, leading me to believe that Uinit[0] really is meant to be initialized?

I.e. should be "for n in range(0, N)"

scikit-tensor is creating nans / infinities along the way

Note that I changed
/Library/Python/2.7/site-packages/scikit_tensor-0.1-py2.7.egg/sktensor/cp.py:147
to use nan_to_num: Y = nan_to_num(Y). While I was already doing this
here, this lib may be doing operations that create additional nans
or infinities. Instead of dying it now returns
/Library/Python/2.7/site-packages/scikit_tensor-0.1-py2.7.egg/sktensor/cp.py:157:
RuntimeWarning: invalid value encountered in divide U[n] = Unew / lmbda
/Library/Python/2.7/site-packages/scikit_tensor-0.1-py2.7.egg/sktensor/cp.py:156:
RuntimeWarning: invalid value encountered in less lmbda[lmbda < 1] = 1
BUT: It keeps going!

Implementation of Unfolding Clarification

I noticed that the results of the unfolding function is not consistent with the Tensor Paper from Kolda and Bader. Example: T = dTensor([[1,0],[1,-1]],[[-1,1],[1,0]]) Unfolding 1 output is [[1,1,0,-1],[-1,1,1,0]] but I think it should be [[1,0,-1,1],[1,-1,1,0]] thus, functions related to unfolding like ttm is affected.

`cp_als` fails with `sptensor`---intended behavior?

It's possible this is intended behavior. If so, my apologies, and feel free to close the issue. The input tensor I am using can be downloaded as a pickle file.

>>> import pickle
>>> with open('test-S.pkl', 'rb') as f:
...     S = pickle.load(f)
... 
>>> P, fit, itr, exectimes = cp_als(S, 3, init='random')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/yacin/work/tensors/venv/local/lib/python2.7/site-packages/sktensor/cp.py", line 143, in als
    Unew = X.uttkrp(U, n)
  File "/home/yacin/work/tensors/venv/local/lib/python2.7/site-packages/sktensor/sptensor.py", line 224, in uttkrp
    TZ = self.ttv(Z, mode, without=True)
  File "/home/yacin/work/tensors/venv/local/lib/python2.7/site-packages/sktensor/core.py", line 127, in ttv
    return self._ttv_compute(v, dims, vidx, remdims)
  File "/home/yacin/work/tensors/venv/local/lib/python2.7/site-packages/sktensor/sptensor.py", line 159, in _ttv_compute
    nvals = nvals * w[idx]
IndexError: too many indices for array
>>> P, fit, itr, exectimes = cp_als(dtensor(S.toarray()), 3, init='random')
>>>

Installation error: SyntaxError: Missing parentheses in call to 'print'. Did you mean print(mod.version)?

I'm getting a fatal installation error when installing scikit-tensor on my Mac:

~$ pip3 --version
pip 10.0.1 from /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pip (python 3.6)
~$ pip3 install scikit-tensor
Collecting scikit-tensor
Using cached https://files.pythonhosted.org/packages/e9/5e/2ce76cc8f9da0517085e17cd70210ed996aeb8f972e7080d0bc89d82bbd9/scikit-tensor-0.1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/private/var/folders/4y/cwpdv5dd37q0djhsgnr558m0000b4c/T/pip-install-4m41b2gw/scikit-tensor/setup.py", line 79
print mod.version
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(mod.version)?

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/4y/cwpdv5dd37q0djhsgnr558m0000b4c/T/pip-install-4m41b2gw/scikit-tensor/
~$

Sptensor wrong check.

In line 71 of sptensor.py, I believe the condition to check should be len(subs) == len(vals). The reason is that we want to check the number of provided subscripts to be equal to the number of values. Currently, it is checking the dimension of the tensor to be equal to the number of values which probably isn't right. Although, it works in the provided example because the dimension, the subscripts, and the values are all three. Correct me, if I am wrong.

Item access in sptensor always returns zero

When I try to access to a non-zero item in an sptensor like x[a, b, c], I always get zero.

Accessing after calling toarray() method works fine. I think this may be a bug.

setup can't find virtualenv packages

Hey - spent a few minutes trying to track this down but couldn't. When I use various methods to try to install this package, the setup utilities cannot find the dependencies that have definitely been installed.

I'm using a virtualenv. Here's the traceback:

(virtual-env-name)jstrong:~/src/scikit-tensor$ pip install -e .
Obtaining file:///[~]/src/scikit-tensor
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "[~]/src/scikit-tensor/setup.py", line 47, in <module>
        require('numpy', 'scipy', 'nose')
      File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 648, in require
      File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 546, in resolve
    pkg_resources.DistributionNotFound: numpy

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in [~]/src/scikit-tensor/

numpy definitely installed:

(virtual-env-name)jstrong:~/src/scikit-tensor$ pip freeze | grep numpy
numpy==1.11.1

fwiw also tried python setup.py develop and python setup.py install, same problem.

Install not seeming to work.

when I run "pip install scikit-tensor", I get:

Collecting scikit-tensor
Using cached scikit-tensor-0.1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-n2fgdtqa/scikit-tensor/setup.py", line 79
print mod.version
^
SyntaxError: Missing parentheses in call to 'print'

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-n2fgdtqa/scikit-tensor/

What could the issue with this be coming from?

TypeError in sptensor.uttkrp when ttv returns a sptensor

This error occurs when calling uttkrp and when ttv (inside uttkrp) returns a sptensor.
It can be reproduced with this test case:

def test_uttkrp():
    subs, vals, shape = mysetup()
    S = sptensor(subs, vals, shape)
    U = []
    for shp in (25, 11, 18, 7, 2):
        U.append(np.zeros((shp, 5)))
    SU = S.uttkrp(U, mode=0)
    assert_equal(SU.shape, (25, 5))

The error is "TypeError: float() argument must be a string or a number", and it happens in sptensor.py, line 191, in uttkrp:

    V[:, r] = self.ttv(Z, mode, without=True)

It seems that, when the result of ttv is a sptensor, it cannot be assigned to V using this operation.

"Maximum allowed dimension exceeded" error

Hi, I tried to decompose my personal 17x640x16200 3D tensor but unfortunately I get "Maximum allowed dimension exceeded" error. In particular, the problem is detected in khatrirao module in core.py where M in computed, obtaining an extremely huge M value of 507060240091291760598681282150400000000000000000.
Is there a fix or a solution a this kind of problem?
Thanks in advance.
Vincenzo Cappelluti.

Tucker-2 decomposition

i am confusing about how to apply Tucker-2 Decomposition with scikit-tensor, it seems that it only supports for standard Tucker Decomposition?

Does not corresponds to the formula in the paper

scikit-tensor/sktensor/tucker.py

Line 103 in fe517e9

Utilde = ttm(X, U, n, transp=True, without=True)

This calculation of U_tilde does not correspond to the formula in the paper!!

Batch_size problem in using ttm in a custom layer in keras

I have a custom layer in keras where the input is a 2D tensor, but the layer takes a batch of those and the shape it gets is (?, 61,80). The '?' part is for the batch_size . While using the ttm function of sktensor I am getting the error . please help.

The custom layer is

 def call(self, inputs):

num, n, m = inputs.shape 
print(inputs.shape)
(k1,k2) = self.output_dim
input_tensor = inputs 

print(input_tensor)
     
input_tensor = dtensor(input_tensor)
kernel1=np.array(self.W1)
kernel2=np.array(self.W2)



feed_forward_product=input_tensor.ttm([kernel1, kernel2],mode=0, transp=False, without=True)

feed_forward_product = np.array(feed_forward_product )
result = K.tanh(feed_forward_product)

return result

Variables W1 and W2 are declared beforehand.
The error I am getting is-

x_train shape: (269, 61, 80)
269 train samples
70 test samples
<tf.Variable 'neural_tensor_layer_2/W1:0' shape=(40, 61) dtype=float32_ref>
<tf.Variable 'neural_tensor_layer_2/W2:0' shape=(40, 80) dtype=float32_ref>
(?, 61, 80)
Tensor("sequential_2_input:0", shape=(?, 61, 80), dtype=float32)
Traceback (most recent call last):

File "", line 1, in
runfile('/home/hanumant/Documents/December_experiments/python_codes/TFNN_v2.py', wdir='/home/hanumant/Documents/December_experiments/python_codes')

File "/home/hanumant/.conda/envs/myenv/lib/python3.5/site-packages/spyder_kernels/customize/spydercustomize.py", line 668, in runfile
execfile(filename, namespace)

File "/home/hanumant/.conda/envs/myenv/lib/python3.5/site-packages/spyder_kernels/customize/spydercustomize.py", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/home/hanumant/Documents/December_experiments/python_codes/TFNN_v2.py", line 79, in
validation_data=(np.array(x_test),np.array(y_test)))

File "/home/hanumant/.conda/envs/myenv/lib/python3.5/site-packages/keras/engine/training.py", line 952, in fit
batch_size=batch_size)

File "/home/hanumant/.conda/envs/myenv/lib/python3.5/site-packages/keras/engine/training.py", line 677, in _standardize_user_data
self._set_inputs(x)

File "/home/hanumant/.conda/envs/myenv/lib/python3.5/site-packages/keras/engine/training.py", line 589, in _set_inputs
self.build(input_shape=(None,) + inputs.shape[1:])

File "/home/hanumant/.conda/envs/myenv/lib/python3.5/site-packages/keras/engine/sequential.py", line 221, in build
x = layer(x)

File "/home/hanumant/.conda/envs/myenv/lib/python3.5/site-packages/keras/engine/base_layer.py", line 457, in call
output = self.call(inputs, **kwargs)

File "/home/hanumant/Documents/December_experiments/python_codes/Neural_Tensor_layer.py", line 138, in call
feed_forward_product=input_tensor.ttm([kernel1, kernel2],mode=0, transp=False, without=True)

File "/home/hanumant/Downloads/scikit-tensor-master/scikit-tensor/sktensor/core.py", line 99, in ttm
dims, vidx = check_multiplication_dims(mode, self.ndim, len(V), vidx=True, without=without)

File "/home/hanumant/Downloads/scikit-tensor-master/scikit-tensor/sktensor/core.py", line 256, in check_multiplication_dims
raise ValueError('More multiplicants than dimensions')

ValueError: More multiplicants than dimensions

please help

Example data no longer available at the provided link

Fortunately, I found a copy here: https://github.com/tensorlib/tensorlib/blob/master/tensorlib/datasets/data/brod.mat

update pypi version

The current version on pypi is almost 3 years old and doesn't support python3 - any chance to have it updated?

pip3 installs just fine with the current master, so it should be just a simple upload - but would save any depending package from hardcoding the dependency links.

cheers and thanks for the package!

Incompatible license with scipy

Please consider migrating to the more open MIT or BSD license, which scipy uses. Then this fab library can be used by more people!

See discussion about scikit-tensor on twitter.

sktensor.tucker.hooi doesn't return fit, itr and exectimes

These three metrics are returned on the CP-ALS function. but not on the HOOI.

I believe it is a simple add, since in the hooi() function fit, itr and exectimes are already calculated.

Maybe sktensor/tests/test_tucker_hooi.py also needs to be changed.

hosvd fails for sptensor when using full rank along a mode

Hi there, nice package.

I'd like to compute the full hosvd of a large sparse tensor. I thought I could use the method tucker.hosvd, but it breaks for me.

Here is a small example:

T = sptensor(([0,1,2], [0,0,2], [0,2,0]), (1.0,2.0,3.0), (3,3,3))
sktensor.tucker.hosvd(T, (3,3,3))

which results in:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jackkamm/.local/lib/python2.7/site-packages/scikit_tensor-0.1-py2.7.egg/sktensor/tucker.py", line 132, in hosvd
    U[d] = array(nvecs(X, d, rank[d]), dtype=dtype)
  File "/Users/jackkamm/.local/lib/python2.7/site-packages/scikit_tensor-0.1-py2.7.egg/sktensor/core.py", line 283, in nvecs
    _, U = eigsh(Y, rank, which='LM')
  File "/Users/jackkamm/anaconda/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1487, in eigsh
    raise ValueError("k must be between 1 and ndim(A)-1")
ValueError: k must be between 1 and ndim(A)-1

The issue appears to be, if T is a sptensor, then scipy.sparse.linalg.eigsh will be used to get the eigendecomposition of the matrix unfolding. However, eigsh cannot return all eigenvalues. And while eigsh works well for the first few eigenvalues of a large sparse matrix, it appears to works poorly for the lower eigenvalues. (or at least, that is what I have read online about the Implicitly Restarted Lanczos Method, which eigsh uses)

I'd prefer not to convert the sptensor to a dtensor, since the tensor dimensions are quite large.

For now I will just use my own code to get the full hosvd. But I would prefer to have the default of nvecs and hosvd always use scipy.linalg.eigh, with a user option to use eigsh instead. If you like, I can submit a PR with this change.

Implementation of CP Alternating Poisson Regression algorithm

CP APR algorithm which is suitable for sparse data can be implemented as well.

http://arxiv.org/pdf/1112.2414.pdf