arthurmensch / modl Goto Github PK
View Code? Open in Web Editor NEWRandomized online matrix factorization
Home Page: https://arthurmensch.github.io/modl
License: BSD 2-Clause "Simplified" License
Randomized online matrix factorization
Home Page: https://arthurmensch.github.io/modl
License: BSD 2-Clause "Simplified" License
Got as far as cloning and installing requirements and then
pip install .
Causes:
Processing /Users/seanlaw/Git/modl
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/var/folders/32/vghdf9rd4w1b2s9cpkf2v9nrcbdy1n/T/pip-jhMnUZ-build/setup.py", line 12, in <module>
LONG_DESCRIPTION = open('README.rst').read()
IOError: [Errno 2] No such file or directory: 'README.rst'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /var/folders/32/vghdf9rd4w1b2s9cpkf2v9nrcbdy1n/T/pip-jhMnUZ-build/
I tried renaming the README in setup.py but still encountered problems.
(py3k) elvis@middle-earth:~/CODE/FORKED/modl$ make test
py.test --pyargs modl
Traceback (most recent call last):
File "/home/elvis/anaconda2/envs/py3k/lib/python3.6/site-packages/_pytest/config.py", line 325, in _getconftestmodules
return self._path2confmods[path]
KeyError: local('/home/elvis/CODE/FORKED/modl/modl')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/elvis/anaconda2/envs/py3k/lib/python3.6/site-packages/_pytest/config.py", line 356, in _importconftest
return self._conftestpath2mod[conftestpath]
KeyError: local('/home/elvis/CODE/FORKED/modl/modl/conftest.py')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/elvis/anaconda2/envs/py3k/lib/python3.6/site-packages/_pytest/config.py", line 362, in _importconftest
mod = conftestpath.pyimport()
File "/home/elvis/anaconda2/envs/py3k/lib/python3.6/site-packages/py/_path/local.py", line 662, in pyimport
__import__(modname)
File "/home/elvis/CODE/FORKED/modl/modl/__init__.py", line 1, in <module>
from .decomposition.dict_fact import DictFact
File "/home/elvis/CODE/FORKED/modl/modl/decomposition/__init__.py", line 1, in <module>
from .dict_fact import DictFact
File "/home/elvis/CODE/FORKED/modl/modl/decomposition/dict_fact.py", line 13, in <module>
from modl.utils.randomkit import RandomState
File "/home/elvis/CODE/FORKED/modl/modl/utils/randomkit/__init__.py", line 1, in <module>
from .random_fast import RandomState
ModuleNotFoundError: No module named 'modl.utils.randomkit.random_fast'
ERROR: could not load /home/elvis/CODE/FORKED/modl/modl/conftest.py
Removing the offending file modl/conftest.py
and rerunning the tests works fine.
Installation seems to work all the way, up to :
py.test --pyargs modl
I've uploaded the logfile,
Thanks
Nico
vagrant@deep-learning:~/modl$ python examples/predict_recsys.py
Centering data
Sparsity: 0.0335217561064 X.nnz: 800167 N features: 3952 n_samples 6040
Iteration 0
('Test RMSE: ', 39.876633749822602)
Iteration 120
('Test RMSE: ', 48.440039171205285)
Iteration 720
('Test RMSE: ', 26.586114834502165)
Iteration 3630
('Test RMSE: ', 592.60091116572414)
Iteration 18110
('Test RMSE: ', 29932.618585930453)
('Final test RMSE:', 34662.550885905417)
Time : 1111.89 s
Unfortunately RMSE is increasing :(
Any suggestions ? Am I doing something wrong ?
Thanks.
/usr/lib/python2.7/site-packages/py/_path/common.py:367: in visit
for x in Visitor(fil, rec, ignore, bf, sort).gen(self):
/usr/lib/python2.7/site-packages/py/_path/common.py:406: in gen
if p.check(dir=1) and (rec is None or rec(p))])
/usr/lib/python2.7/site-packages/_pytest/main.py:676: in _recurse
ihook = self.gethookproxy(path.dirpath())
/usr/lib/python2.7/site-packages/_pytest/main.py:587: in gethookproxy
my_conftestmodules = pm._getconftestmodules(fspath)
/usr/lib/python2.7/site-packages/_pytest/config.py:339: in _getconftestmodules
mod = self._importconftest(conftestpath)
/usr/lib/python2.7/site-packages/_pytest/config.py:364: in _importconftest
raise ConftestImportFailure(conftestpath, sys.exc_info())
E ConftestImportFailure: ImportError('No module named concurrent.futures',)
E File "/usr/lib64/python2.7/site-packages/modl/init.py", line 1, in
E from .dict_fact import DictFact
E File "/usr/lib64/python2.7/site-packages/modl/dict_fact.py", line 1, in
E from concurrent.futures import ThreadPoolExecutor
The code from lines 604 to 622 of dict_fact.py is really weird. Are we formally computing the projection of the atom onto the elastic-net unit ball of radius 1 or is it some kind of hack ? For example, the call to enet_projection
has radius=self.comp_norm_[k]
.
In any case, a detailed comment explaining what is being done would help for future maintenance / extension.
Hi, I am trying to install modl and get the above error. I checked under modl/datasets and there is no directory called tests. Could you please help me get past this and install modl?
Thanks,
Swetha.
I tried to test the partial_fit
method on the Faces decomposition dataset from scikit-learn.
Before using partial_fit
, one must use the prepare
method to initialize the component matrix. It works fine when I initialize with the data matrix,
estimator.prepare(n_samples=n_samples, X=data)
or with another method's output,
dic_init = np.array(FastICAEstimator.components_, dtype=np.float32)
estimator.prepare(n_samples=n_samples, X=dic_init)
But when I initialize with random noise, using
estimator.prepare(n_samples=n_samples, n_features=n_features, dtype=np.float32)
Then the output still looks like white noise and the convergence plot is flat.
I add the script I used (perhaps the lines for saving figures can be removed) and the figures it generated : test_partial_fit_modl.zip
Thanks !
Hi,
I was trying to run py.test after install on Ubuntu 14.04 vagrant box
vagrant@deep-learning:~/modl$ py.test --pyargs modl
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/_pytest/config.py", line 362, in _importconftest
mod = conftestpath.pyimport()
File "/usr/local/lib/python2.7/dist-packages/py/_path/local.py", line 662, in pyimport
__import__(modname)
File "/home/vagrant/modl/modl/__init__.py", line 1, in <module>
from .dict_fact import DictFact
File "/home/vagrant/modl/modl/dict_fact.py", line 8, in <module>
from .utils.randomkit import RandomState
File "/home/vagrant/modl/modl/utils/randomkit/__init__.py", line 1, in <module>
from .random_fast import RandomState
ImportError: No module named random_fast
ERROR: could not load /home/vagrant/modl/modl/conftest.py
Could you please advise ?
Commands
make download-movielens1m
make download-movielens10m
can't download datasets.
They give error 404
for urls
http://www.mblondel.org/data/movielens1m.tar.bz2
http://www.mblondel.org/data/movielens10m.tar.bz2
Hi,
I was just trying to run the face_decompose example, but I get the output below. Seems like there is an issue recognizing datatypes in numpy ? I'm using numpy/sklearn/nilearn and never experienced this before.
thx
Nico
`Dataset consists of 400 faces
Extracting the top 18 MODL...
TypeError Traceback (most recent call last)
in ()
93 random_state=2,
94 callback=cb)
---> 95 estimator.fit(data)
96 train_time = (time.time() - t0)
97 print("done in %0.3fs" % train_time)
/home/nfarrugi/anaconda2/envs/fmri_new/lib/python2.7/site-packages/modl/dict_fact.pyc in fit(self, X, y)
244 Dataset to learn the dictionary from
245 """
--> 246 X = self.prefit(X, reset=True)
247 if self.max_n_iter > 0:
248 while self.n_iter[0] + self.batch_size - 1 < self.max_n_iter:
/home/nfarrugi/anaconda2/envs/fmri_new/lib/python2.7/site-packages/modl/dict_fact.pyc in _prefit(self, X, reset, check_input)
262 "with backend == 'python'")
263
--> 264 self._init(X)
265 self._init_arrays(X)
266 if check_input:
/home/nfarrugi/anaconda2/envs/fmri_new/lib/python2.7/site-packages/modl/dict_fact.pyc in init(self, X)
184 self.counter = np.zeros(n_cols + 1, dtype='int')
185
--> 186 self.n_iter_ = np.zeros(1, dtype=np.long)
187
188 self.code_ = np.zeros((self.n_samples_, self.n_components))
TypeError: data type "long" not understood`
When an atom has a small norm, you want to remove it, replace it by an other atom and replace the corresponding code by 0 (this is what is done in _update_dictionary).
In the online version, you manage two statistics A and B and everything happens as if A was the code and B the data. However, when a dictionary atom is too small, setting the corresponding code to zero and replacing a line of A by zero is not the same thing. Do you agree ?
Hugo
Beyond the call to sklearn's enet_coordinate_descent_gram
in dict_fact_fast.pyx, I wonder whether the remainder cython code is really necessary. Without benchmarks, I'd've thought the main bottleneck is the descent, which is already cythonized in sklearn, and not the for loop on range(batch_size)
, etc. But i may be wrong...
I am running on Debian 8 (Jessie)
While I tried to test
anaconda2/lib/python2.7/site-packages/modl/tests/test_dict_completion.py:54:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = DictCompleter(alpha=None, backend=None, batch_size=None, beta=None,
cal...chs=None, offset=None,
projection=None, random_state=None, verbose=None)
alpha = 0.001, beta = 0.0, n_components = 3, learning_rate = 1.0, batch_size = 1, offset = 0, projection = 'partial'
fit_intercept = False, dict_init = None, l1_ratio = 0, max_n_iter = 100, n_epochs = 1, random_state = 0, verbose = 0, backend = 'python'
debug = False, detrend = True, crop = None, callback = None
def __init__(self, alpha=1.0, beta=.0,
n_components=30, learning_rate=1.,
batch_size=1, offset=0,
projection='partial',
fit_intercept=False, dict_init=None, l1_ratio=0,
max_n_iter=0,
n_epochs=1,
random_state=None, verbose=0, backend='c', debug=False,
detrend=False,
crop=None,
callback=None):
> super().__init__(alpha=alpha,
n_components=n_components,
# Hyper-parameters
learning_rate=learning_rate,
batch_size=batch_size,
offset=offset,
# Reduction parameter
reduction=1,
projection=projection,
fit_intercept=fit_intercept,
# Dict parameter
dict_init=dict_init,
l1_ratio=l1_ratio,
# For variance reduction
n_samples=None,
# Generic parameters
max_n_iter=max_n_iter,
n_epochs=n_epochs,
random_state=random_state,
verbose=verbose,
backend=backend,
debug=debug,
callback=callback)
E TypeError: super() takes at least 1 argument (0 given)
Hello Arthur!
Is it possible to get svd's results for a random matrix via modl package ?
Could you show me an example?
Thank you in advance,
Olexiy
python setup.py install
fails with error message
error: Command "gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/volatile/home/edohmato/anaconda2/lib/python2.7/site-packages/numpy/core/include -I/volatile/home/edohmato/anaconda2/lib/python2.7/site-packages/numpy/core/include -I/volatile/home/edohmato/anaconda2/include/python2.7 -c modl/utils/math/enet.c -o build/temp.linux-x86_64-2.7/modl/utils/math/enet.o" failed with exit status 1
For the record, upgrading cython (e.g pip install cython --upgrade
) to latest version (0.25.2) fixes the problem.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.