Coder Social home page Coder Social logo

arthurmensch / modl Goto Github PK

View Code? Open in Web Editor NEW
133.0 133.0 47.0 6.91 MB

Randomized online matrix factorization

Home Page: https://arthurmensch.github.io/modl

License: BSD 2-Clause "Simplified" License

Python 80.53% Makefile 0.59% Shell 1.74% C 17.15%

modl's People

Contributors

arthurmensch avatar kamalakerdadi avatar lesteve avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

modl's Issues

Installing on Mac OSX

Got as far as cloning and installing requirements and then

pip install .

Causes:

Processing /Users/seanlaw/Git/modl
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/var/folders/32/vghdf9rd4w1b2s9cpkf2v9nrcbdy1n/T/pip-jhMnUZ-build/setup.py", line 12, in <module>
LONG_DESCRIPTION = open('README.rst').read()
IOError: [Errno 2] No such file or directory: 'README.rst'

----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /var/folders/32/vghdf9rd4w1b2s9cpkf2v9nrcbdy1n/T/pip-jhMnUZ-build/

I tried renaming the README in setup.py but still encountered problems.

conftest.py broken. Can't run tests

(py3k) elvis@middle-earth:~/CODE/FORKED/modl$ make test
py.test --pyargs modl
Traceback (most recent call last):
  File "/home/elvis/anaconda2/envs/py3k/lib/python3.6/site-packages/_pytest/config.py", line 325, in _getconftestmodules
    return self._path2confmods[path]
KeyError: local('/home/elvis/CODE/FORKED/modl/modl')

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/elvis/anaconda2/envs/py3k/lib/python3.6/site-packages/_pytest/config.py", line 356, in _importconftest
    return self._conftestpath2mod[conftestpath]
KeyError: local('/home/elvis/CODE/FORKED/modl/modl/conftest.py')

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/elvis/anaconda2/envs/py3k/lib/python3.6/site-packages/_pytest/config.py", line 362, in _importconftest
    mod = conftestpath.pyimport()
  File "/home/elvis/anaconda2/envs/py3k/lib/python3.6/site-packages/py/_path/local.py", line 662, in pyimport
    __import__(modname)
  File "/home/elvis/CODE/FORKED/modl/modl/__init__.py", line 1, in <module>
    from .decomposition.dict_fact import DictFact
  File "/home/elvis/CODE/FORKED/modl/modl/decomposition/__init__.py", line 1, in <module>
    from .dict_fact import DictFact
  File "/home/elvis/CODE/FORKED/modl/modl/decomposition/dict_fact.py", line 13, in <module>
    from modl.utils.randomkit import RandomState
  File "/home/elvis/CODE/FORKED/modl/modl/utils/randomkit/__init__.py", line 1, in <module>
    from .random_fast import RandomState
ModuleNotFoundError: No module named 'modl.utils.randomkit.random_fast'
ERROR: could not load /home/elvis/CODE/FORKED/modl/modl/conftest.py

Removing the offending file modl/conftest.py and rerunning the tests works fine.

Running collaborative filtering examples.

vagrant@deep-learning:~/modl$ python examples/predict_recsys.py
Centering data
Sparsity: 0.0335217561064 X.nnz: 800167 N features: 3952 n_samples 6040
Iteration 0
('Test RMSE: ', 39.876633749822602)
Iteration 120
('Test RMSE: ', 48.440039171205285)
Iteration 720
('Test RMSE: ', 26.586114834502165)
Iteration 3630
('Test RMSE: ', 592.60091116572414)
Iteration 18110
('Test RMSE: ', 29932.618585930453)
('Final test RMSE:', 34662.550885905417)
Time : 1111.89 s

Unfortunately RMSE is increasing :(
Any suggestions ? Am I doing something wrong ?
Thanks.

error when run py.test --pyargs modl

/usr/lib/python2.7/site-packages/py/_path/common.py:367: in visit
for x in Visitor(fil, rec, ignore, bf, sort).gen(self):
/usr/lib/python2.7/site-packages/py/_path/common.py:406: in gen
if p.check(dir=1) and (rec is None or rec(p))])
/usr/lib/python2.7/site-packages/_pytest/main.py:676: in _recurse
ihook = self.gethookproxy(path.dirpath())
/usr/lib/python2.7/site-packages/_pytest/main.py:587: in gethookproxy
my_conftestmodules = pm._getconftestmodules(fspath)
/usr/lib/python2.7/site-packages/_pytest/config.py:339: in _getconftestmodules
mod = self._importconftest(conftestpath)
/usr/lib/python2.7/site-packages/_pytest/config.py:364: in _importconftest
raise ConftestImportFailure(conftestpath, sys.exc_info())
E ConftestImportFailure: ImportError('No module named concurrent.futures',)
E File "/usr/lib64/python2.7/site-packages/modl/init.py", line 1, in
E from .dict_fact import DictFact
E File "/usr/lib64/python2.7/site-packages/modl/dict_fact.py", line 1, in
E from concurrent.futures import ThreadPoolExecutor

enet_projection in dictionary update: hack or not hack ?

The code from lines 604 to 622 of dict_fact.py is really weird. Are we formally computing the projection of the atom onto the elastic-net unit ball of radius 1 or is it some kind of hack ? For example, the call to enet_projection has radius=self.comp_norm_[k].

In any case, a detailed comment explaining what is being done would help for future maintenance / extension.

DictFact does not converge on Faces Decomposition dataset when initialized with random component matrix

I tried to test the partial_fit method on the Faces decomposition dataset from scikit-learn.
Before using partial_fit, one must use the prepare method to initialize the component matrix. It works fine when I initialize with the data matrix,

estimator.prepare(n_samples=n_samples, X=data)

or with another method's output,

dic_init = np.array(FastICAEstimator.components_, dtype=np.float32)
estimator.prepare(n_samples=n_samples, X=dic_init)

But when I initialize with random noise, using

estimator.prepare(n_samples=n_samples, n_features=n_features, dtype=np.float32)

Then the output still looks like white noise and the convergence plot is flat.

I add the script I used (perhaps the lines for saving figures can be removed) and the figures it generated : test_partial_fit_modl.zip

Thanks !

py.test import failure -> wrong seed ValueError

Hi,
I was trying to run py.test after install on Ubuntu 14.04 vagrant box

vagrant@deep-learning:~/modl$ py.test --pyargs modl
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/_pytest/config.py", line 362, in _importconftest
    mod = conftestpath.pyimport()
  File "/usr/local/lib/python2.7/dist-packages/py/_path/local.py", line 662, in pyimport
    __import__(modname)
  File "/home/vagrant/modl/modl/__init__.py", line 1, in <module>
    from .dict_fact import DictFact
  File "/home/vagrant/modl/modl/dict_fact.py", line 8, in <module>
    from .utils.randomkit import RandomState
  File "/home/vagrant/modl/modl/utils/randomkit/__init__.py", line 1, in <module>
    from .random_fast import RandomState
ImportError: No module named random_fast                     
ERROR: could not load /home/vagrant/modl/modl/conftest.py

Could you please advise ?

Numpy datatype issue

Hi,

I was just trying to run the face_decompose example, but I get the output below. Seems like there is an issue recognizing datatypes in numpy ? I'm using numpy/sklearn/nilearn and never experienced this before.

thx

Nico

`Dataset consists of 400 faces
Extracting the top 18 MODL...


TypeError Traceback (most recent call last)
in ()
93 random_state=2,
94 callback=cb)
---> 95 estimator.fit(data)
96 train_time = (time.time() - t0)
97 print("done in %0.3fs" % train_time)

/home/nfarrugi/anaconda2/envs/fmri_new/lib/python2.7/site-packages/modl/dict_fact.pyc in fit(self, X, y)
244 Dataset to learn the dictionary from
245 """
--> 246 X = self.prefit(X, reset=True)
247 if self.max_n_iter > 0:
248 while self.n_iter
[0] + self.batch_size - 1 < self.max_n_iter:

/home/nfarrugi/anaconda2/envs/fmri_new/lib/python2.7/site-packages/modl/dict_fact.pyc in _prefit(self, X, reset, check_input)
262 "with backend == 'python'")
263
--> 264 self._init(X)
265 self._init_arrays(X)
266 if check_input:

/home/nfarrugi/anaconda2/envs/fmri_new/lib/python2.7/site-packages/modl/dict_fact.pyc in init(self, X)
184 self.counter
= np.zeros(n_cols + 1, dtype='int')
185
--> 186 self.n_iter_ = np.zeros(1, dtype=np.long)
187
188 self.code_ = np.zeros((self.n_samples_, self.n_components))

TypeError: data type "long" not understood`

Heuristic when a dictionary atom is to small

When an atom has a small norm, you want to remove it, replace it by an other atom and replace the corresponding code by 0 (this is what is done in _update_dictionary).

In the online version, you manage two statistics A and B and everything happens as if A was the code and B the data. However, when a dictionary atom is too small, setting the corresponding code to zero and replacing a line of A by zero is not the same thing. Do you agree ?

Hugo

Maybe cython not indepensible ?

Beyond the call to sklearn's enet_coordinate_descent_gram in dict_fact_fast.pyx, I wonder whether the remainder cython code is really necessary. Without benchmarks, I'd've thought the main bottleneck is the descent, which is already cythonized in sklearn, and not the for loop on range(batch_size), etc. But i may be wrong...

super() takes at least 1 argument (0 given)

I am running on Debian 8 (Jessie)

While I tried to test

anaconda2/lib/python2.7/site-packages/modl/tests/test_dict_completion.py:54: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = DictCompleter(alpha=None, backend=None, batch_size=None, beta=None,
       cal...chs=None, offset=None,
       projection=None, random_state=None, verbose=None)
alpha = 0.001, beta = 0.0, n_components = 3, learning_rate = 1.0, batch_size = 1, offset = 0, projection = 'partial'
fit_intercept = False, dict_init = None, l1_ratio = 0, max_n_iter = 100, n_epochs = 1, random_state = 0, verbose = 0, backend = 'python'
debug = False, detrend = True, crop = None, callback = None

    def __init__(self, alpha=1.0, beta=.0,
                 n_components=30, learning_rate=1.,
                 batch_size=1, offset=0,
                 projection='partial',
                 fit_intercept=False, dict_init=None, l1_ratio=0,
                 max_n_iter=0,
                 n_epochs=1,
                 random_state=None, verbose=0, backend='c', debug=False,
                 detrend=False,
                 crop=None,
                 callback=None):
>       super().__init__(alpha=alpha,
                         n_components=n_components,
                         # Hyper-parameters
                         learning_rate=learning_rate,
                         batch_size=batch_size,
                         offset=offset,
                         # Reduction parameter
                         reduction=1,
                         projection=projection,
                         fit_intercept=fit_intercept,
                         # Dict parameter
                         dict_init=dict_init,
                         l1_ratio=l1_ratio,
                         # For variance reduction
                         n_samples=None,
                         # Generic parameters
                         max_n_iter=max_n_iter,
                         n_epochs=n_epochs,
                         random_state=random_state,
                         verbose=verbose,
                         backend=backend,
                         debug=debug,
                         callback=callback)
E       TypeError: super() takes at least 1 argument (0 given)

svd and nmf for a matrix

Hello Arthur!

Is it possible to get svd's results for a random matrix via modl package ?
Could you show me an example?

Thank you in advance,
Olexiy

setup install fails for cython < 0.25

python setup.py install fails with error message

error: Command "gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/volatile/home/edohmato/anaconda2/lib/python2.7/site-packages/numpy/core/include -I/volatile/home/edohmato/anaconda2/lib/python2.7/site-packages/numpy/core/include -I/volatile/home/edohmato/anaconda2/include/python2.7 -c modl/utils/math/enet.c -o build/temp.linux-x86_64-2.7/modl/utils/math/enet.o" failed with exit status 1

For the record, upgrading cython (e.g pip install cython --upgrade) to latest version (0.25.2) fixes the problem.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.