arthurmensch / modl Goto Github PK

View Code? Open in Web Editor NEW

136.0 136.0 48.0 6.91 MB

Randomized online matrix factorization

Home Page: https://arthurmensch.github.io/modl

License: BSD 2-Clause "Simplified" License

Python 80.53% Makefile 0.59% Shell 1.74% C 17.15%

modl's People

Contributors

Stargazers

Watchers

Forkers

fulquan wanjinchang yxliang directorscut82 jayinai lelegan ml-lab yezhiyun bigr-lab buguen caomw gaobingaobingaobin lesteve justinbuzzni dohmatob shashankg7 benjamesbabala carlos-gg leodesigner allensmile rsantana-isg zhouyonglong yw81 kamalakerdadi bigrlab zshwuhan m0ro locussam jakirkham antrec ninamiolane bradleydi iloleg lcharlin shineusn tbng geophysics-opensource xiaomaiyun iggcaswy calebgeniesse knowledgefold nicolasgensollen johnbanq vishalbelsare mehdi-abbasi asherbond charlescnorton

modl's Issues

error when run py.test --pyargs modl

/usr/lib/python2.7/site-packages/py/_path/common.py:367: in visit
for x in Visitor(fil, rec, ignore, bf, sort).gen(self):
/usr/lib/python2.7/site-packages/py/_path/common.py:406: in gen
if p.check(dir=1) and (rec is None or rec(p))])
/usr/lib/python2.7/site-packages/_pytest/main.py:676: in _recurse
ihook = self.gethookproxy(path.dirpath())
/usr/lib/python2.7/site-packages/_pytest/main.py:587: in gethookproxy
my_conftestmodules = pm._getconftestmodules(fspath)
/usr/lib/python2.7/site-packages/_pytest/config.py:339: in _getconftestmodules
mod = self._importconftest(conftestpath)
/usr/lib/python2.7/site-packages/_pytest/config.py:364: in _importconftest
raise ConftestImportFailure(conftestpath, sys.exc_info())
E ConftestImportFailure: ImportError('No module named concurrent.futures',)
E File "/usr/lib64/python2.7/site-packages/modl/init.py", line 1, in
E from .dict_fact import DictFact
E File "/usr/lib64/python2.7/site-packages/modl/dict_fact.py", line 1, in
E from concurrent.futures import ThreadPoolExecutor

conftest.py broken. Can't run tests

(py3k) elvis@middle-earth:~/CODE/FORKED/modl$ make test
py.test --pyargs modl
Traceback (most recent call last):
  File "/home/elvis/anaconda2/envs/py3k/lib/python3.6/site-packages/_pytest/config.py", line 325, in _getconftestmodules
    return self._path2confmods[path]
KeyError: local('/home/elvis/CODE/FORKED/modl/modl')

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/elvis/anaconda2/envs/py3k/lib/python3.6/site-packages/_pytest/config.py", line 356, in _importconftest
    return self._conftestpath2mod[conftestpath]
KeyError: local('/home/elvis/CODE/FORKED/modl/modl/conftest.py')

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/elvis/anaconda2/envs/py3k/lib/python3.6/site-packages/_pytest/config.py", line 362, in _importconftest
    mod = conftestpath.pyimport()
  File "/home/elvis/anaconda2/envs/py3k/lib/python3.6/site-packages/py/_path/local.py", line 662, in pyimport
    __import__(modname)
  File "/home/elvis/CODE/FORKED/modl/modl/__init__.py", line 1, in <module>
    from .decomposition.dict_fact import DictFact
  File "/home/elvis/CODE/FORKED/modl/modl/decomposition/__init__.py", line 1, in <module>
    from .dict_fact import DictFact
  File "/home/elvis/CODE/FORKED/modl/modl/decomposition/dict_fact.py", line 13, in <module>
    from modl.utils.randomkit import RandomState
  File "/home/elvis/CODE/FORKED/modl/modl/utils/randomkit/__init__.py", line 1, in <module>
    from .random_fast import RandomState
ModuleNotFoundError: No module named 'modl.utils.randomkit.random_fast'
ERROR: could not load /home/elvis/CODE/FORKED/modl/modl/conftest.py

Removing the offending file modl/conftest.py and rerunning the tests works fine.

super() takes at least 1 argument (0 given)

I am running on Debian 8 (Jessie)

While I tried to test

anaconda2/lib/python2.7/site-packages/modl/tests/test_dict_completion.py:54: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = DictCompleter(alpha=None, backend=None, batch_size=None, beta=None,
       cal...chs=None, offset=None,
       projection=None, random_state=None, verbose=None)
alpha = 0.001, beta = 0.0, n_components = 3, learning_rate = 1.0, batch_size = 1, offset = 0, projection = 'partial'
fit_intercept = False, dict_init = None, l1_ratio = 0, max_n_iter = 100, n_epochs = 1, random_state = 0, verbose = 0, backend = 'python'
debug = False, detrend = True, crop = None, callback = None

    def __init__(self, alpha=1.0, beta=.0,
                 n_components=30, learning_rate=1.,
                 batch_size=1, offset=0,
                 projection='partial',
                 fit_intercept=False, dict_init=None, l1_ratio=0,
                 max_n_iter=0,
                 n_epochs=1,
                 random_state=None, verbose=0, backend='c', debug=False,
                 detrend=False,
                 crop=None,
                 callback=None):
>       super().__init__(alpha=alpha,
                         n_components=n_components,
                         # Hyper-parameters
                         learning_rate=learning_rate,
                         batch_size=batch_size,
                         offset=offset,
                         # Reduction parameter
                         reduction=1,
                         projection=projection,
                         fit_intercept=fit_intercept,
                         # Dict parameter
                         dict_init=dict_init,
                         l1_ratio=l1_ratio,
                         # For variance reduction
                         n_samples=None,
                         # Generic parameters
                         max_n_iter=max_n_iter,
                         n_epochs=n_epochs,
                         random_state=random_state,
                         verbose=verbose,
                         backend=backend,
                         debug=debug,
                         callback=callback)
E       TypeError: super() takes at least 1 argument (0 given)

Pytests fails - random seed error - mac os X 10.10.5

Installation seems to work all the way, up to :

py.test --pyargs modl

I've uploaded the logfile,

modl_tests_log.txt

Thanks

Nico

svd and nmf for a matrix

Hello Arthur!

Is it possible to get svd's results for a random matrix via modl package ?
Could you show me an example?

Thank you in advance,
Olexiy

ValueError: 'modl/datasets/tests' is not a directory

Hi, I am trying to install modl and get the above error. I checked under modl/datasets and there is no directory called tests. Could you please help me get past this and install modl?

Thanks,
Swetha.

DictFact does not converge on Faces Decomposition dataset when initialized with random component matrix

I tried to test the partial_fit method on the Faces decomposition dataset from scikit-learn.
Before using partial_fit, one must use the prepare method to initialize the component matrix. It works fine when I initialize with the data matrix,

estimator.prepare(n_samples=n_samples, X=data)

or with another method's output,

dic_init = np.array(FastICAEstimator.components_, dtype=np.float32)
estimator.prepare(n_samples=n_samples, X=dic_init)

But when I initialize with random noise, using

estimator.prepare(n_samples=n_samples, n_features=n_features, dtype=np.float32)

Then the output still looks like white noise and the convergence plot is flat.

I add the script I used (perhaps the lines for saving figures can be removed) and the figures it generated : test_partial_fit_modl.zip

Thanks !

Heuristic when a dictionary atom is to small

When an atom has a small norm, you want to remove it, replace it by an other atom and replace the corresponding code by 0 (this is what is done in _update_dictionary).

In the online version, you manage two statistics A and B and everything happens as if A was the code and B the data. However, when a dictionary atom is too small, setting the corresponding code to zero and replacing a line of A by zero is not the same thing. Do you agree ?

Hugo

dowload datasets

Commands

make download-movielens1m

make download-movielens10m

can't download datasets.

They give error 404 for urls

http://www.mblondel.org/data/movielens1m.tar.bz2
http://www.mblondel.org/data/movielens10m.tar.bz2

Numpy datatype issue

Hi,

I was just trying to run the face_decompose example, but I get the output below. Seems like there is an issue recognizing datatypes in numpy ? I'm using numpy/sklearn/nilearn and never experienced this before.

thx

Nico

`Dataset consists of 400 faces
Extracting the top 18 MODL...

TypeError Traceback (most recent call last)
in ()
93 random_state=2,
94 callback=cb)
---> 95 estimator.fit(data)
96 train_time = (time.time() - t0)
97 print("done in %0.3fs" % train_time)

/home/nfarrugi/anaconda2/envs/fmri_new/lib/python2.7/site-packages/modl/dict_fact.pyc in fit(self, X, y)
244 Dataset to learn the dictionary from
245 """
--> 246 X = self.prefit(X, reset=True)
247 if self.max_n_iter > 0:
248 while self.n_iter[0] + self.batch_size - 1 < self.max_n_iter:

/home/nfarrugi/anaconda2/envs/fmri_new/lib/python2.7/site-packages/modl/dict_fact.pyc in _prefit(self, X, reset, check_input)
262 "with backend == 'python'")
263
--> 264 self._init(X)
265 self._init_arrays(X)
266 if check_input:

/home/nfarrugi/anaconda2/envs/fmri_new/lib/python2.7/site-packages/modl/dict_fact.pyc in init(self, X)
184 self.counter = np.zeros(n_cols + 1, dtype='int')
185
--> 186 self.n_iter_ = np.zeros(1, dtype=np.long)
187
188 self.code_ = np.zeros((self.n_samples_, self.n_components))

TypeError: data type "long" not understood`

Maybe cython not indepensible ?

Beyond the call to sklearn's enet_coordinate_descent_gram in dict_fact_fast.pyx, I wonder whether the remainder cython code is really necessary. Without benchmarks, I'd've thought the main bottleneck is the descent, which is already cythonized in sklearn, and not the for loop on range(batch_size), etc. But i may be wrong...

py.test import failure -> wrong seed ValueError

Hi,
I was trying to run py.test after install on Ubuntu 14.04 vagrant box

vagrant@deep-learning:~/modl$ py.test --pyargs modl
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/_pytest/config.py", line 362, in _importconftest
    mod = conftestpath.pyimport()
  File "/usr/local/lib/python2.7/dist-packages/py/_path/local.py", line 662, in pyimport
    __import__(modname)
  File "/home/vagrant/modl/modl/__init__.py", line 1, in <module>
    from .dict_fact import DictFact
  File "/home/vagrant/modl/modl/dict_fact.py", line 8, in <module>
    from .utils.randomkit import RandomState
  File "/home/vagrant/modl/modl/utils/randomkit/__init__.py", line 1, in <module>
    from .random_fast import RandomState
ImportError: No module named random_fast                     
ERROR: could not load /home/vagrant/modl/modl/conftest.py

Could you please advise ?

Running collaborative filtering examples.

vagrant@deep-learning:~/modl$ python examples/predict_recsys.py
Centering data
Sparsity: 0.0335217561064 X.nnz: 800167 N features: 3952 n_samples 6040
Iteration 0
('Test RMSE: ', 39.876633749822602)
Iteration 120
('Test RMSE: ', 48.440039171205285)
Iteration 720
('Test RMSE: ', 26.586114834502165)
Iteration 3630
('Test RMSE: ', 592.60091116572414)
Iteration 18110
('Test RMSE: ', 29932.618585930453)
('Final test RMSE:', 34662.550885905417)
Time : 1111.89 s

Unfortunately RMSE is increasing :(
Any suggestions ? Am I doing something wrong ?
Thanks.

setup install fails for cython < 0.25

python setup.py install fails with error message

error: Command "gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/volatile/home/edohmato/anaconda2/lib/python2.7/site-packages/numpy/core/include -I/volatile/home/edohmato/anaconda2/lib/python2.7/site-packages/numpy/core/include -I/volatile/home/edohmato/anaconda2/include/python2.7 -c modl/utils/math/enet.c -o build/temp.linux-x86_64-2.7/modl/utils/math/enet.o" failed with exit status 1

For the record, upgrading cython (e.g pip install cython --upgrade) to latest version (0.25.2) fixes the problem.

Installing on Mac OSX

Got as far as cloning and installing requirements and then

pip install .

Causes:

Processing /Users/seanlaw/Git/modl
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/var/folders/32/vghdf9rd4w1b2s9cpkf2v9nrcbdy1n/T/pip-jhMnUZ-build/setup.py", line 12, in <module>
LONG_DESCRIPTION = open('README.rst').read()
IOError: [Errno 2] No such file or directory: 'README.rst'

----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /var/folders/32/vghdf9rd4w1b2s9cpkf2v9nrcbdy1n/T/pip-jhMnUZ-build/

I tried renaming the README in setup.py but still encountered problems.

enet_projection in dictionary update: hack or not hack ?

The code from lines 604 to 622 of dict_fact.py is really weird. Are we formally computing the projection of the atom onto the elastic-net unit ball of radius 1 or is it some kind of hack ? For example, the call to enet_projection has radius=self.comp_norm_[k].

In any case, a detailed comment explaining what is being done would help for future maintenance / extension.