Coder Social home page Coder Social logo

ibayer / fastfm Goto Github PK

View Code? Open in Web Editor NEW
1.1K 28.0 204.0 4.77 MB

fastFM: A Library for Factorization Machines

Home Page: http://ibayer.github.io/fastFM

License: Other

Makefile 0.29% Python 98.41% Shell 1.30%
machine-learning recommender-system factorization-machines matrix-factorization

fastfm's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastfm's Issues

Pip install: Problems on OSX (Symbol not found: _cs_di_norm)

Installed from PyPI ("pip install fastfm") and I get the following error:

Python 3.5.1 (default, Dec  9 2015, 00:25:02) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ffm
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: dlopen(/Users/merrellb/.virtualenv/python3/lib/python3.5/site-packages/ffm.cpython-35m-darwin.so, 2): Symbol not found: _cs_di_norm
  Referenced from: /Users/merrellb/.virtualenv/python3/lib/python3.5/site-packages/ffm.cpython-35m-darwin.so
  Expected in: flat namespace
 in /Users/merrellb/.virtualenv/python3/lib/python3.5/site-packages/ffm.cpython-35m-darwin.so
>>> 

Issues compiling with OS X El Capitan (10.11.3)

I've tried both GCC 5.3.0 and Apple LLVM version 7.0.2 and get the same results:

$ make
( cd fastFM-core ; /Applications/Xcode.app/Contents/Developer/usr/bin/make lib )
( cd src ; /Applications/Xcode.app/Contents/Developer/usr/bin/make lib )
gcc -std=c99 -fPIC -g -Wall -O3 -I/include -I./src -I../externals/CXSparse/Include -I   -c -o kmath.o kmath.c
kmath.c:23:9: warning: 'M_SQRT2' macro redefined [-Wmacro-redefined]
#define M_SQRT2 1.41421356237309504880  /*-- sqrt(2) */
        ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/math.h:709:9: note: 
      previous definition is here
#define M_SQRT2     1.41421356237309504880168872420969808   /* sqrt(2)        */
        ^
1 warning generated.
Undefined symbols for architecture x86_64:
  "_main", referenced from:
     implicit entry/start for main executable
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [kmath.o] Error 1
make[1]: *** [lib] Error 2
make: *** [all] Error 2

sgd.FMRegression always predicts NaN values

I trained a SGD FM regression model, and when I made predictions with the model, all the predicted values are NaN. Upon a quick look on the model's learned coefficients (i.e., w0_, w_, and V_), most of them are NaNs.

from __future__ import division
from fastFM import sgd

model = sgd.FMRegression()
model.fit(Xtrain, ytrain)
model.predict(Xvalid)  # all predicted values are NaNs.

# Both Xtrain and Xvalid are CSC sparse matrices.

MCMC and ALS work fine and actually produce reasonably good predictions for my particular task. Any thought on possible problems with SGD?

OS: Ubuntu 16.04
Python version: 2.7.12
fastFM version: 0.2.5

Error Installing on CentOS and RHEL platforms

I followed all the install directions. Running make for the C parts seemed to finish without error but running python setup.py install creates the following error. I am on 64 bit Linux (mint). Any ideas?

python setup.py install
running install
running build
running build_py
running build_ext
skipping 'fastFM/ffm.c' Cython extension (up-to-date)
building 'ffm' extension
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -IfastFM/ -IfastFM-core/include/ -IfastFM-core/externals/CXSparse/Include/ -I/usr/include/ -I/home/anaconda/lib/python2.7/site-packages/numpy/core/include -I/home/anaconda/include/python2.7 -c fastFM/ffm.c -o build/temp.linux-x86_64-2.7/fastFM/ffm.o
In file included from /home/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1781:0,
from /home/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:18,
from /home/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from fastFM/ffm.c:252:
/home/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^
In file included from /home/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:27:0,
from /home/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from fastFM/ffm.c:252:
/home/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/__multiarray_api.h:1634:1: warning: ‘_import_array’ defined but not used [-Wunused-function]
_import_array(void)
^
gcc -pthread -shared -L/home/anaconda/lib -Wl,-rpath=/home/anaconda/lib,--no-as-needed build/temp.linux-x86_64-2.7/fastFM/ffm.o -LfastFM/ -LfastFM-core/bin/ -LfastFM-core/externals/CXSparse/Lib/ -L/usr/lib/ -L/usr/lib/atlas-base/ -L/home/anaconda/lib -lm -lfastfm -lcxsparse -lcblas -lpython2.7 -o build/lib.linux-x86_64-2.7/ffm.so
/usr/bin/ld: cannot find -lfastfm
collect2: error: ld returned 1 exit status
error: command 'gcc' failed with exit status 1

nosetest failed

ubgpu@ubgpu:/github$ sudo pip install -e fastFM/
Obtaining file:///home/ubgpu/github/fastFM
Installing collected packages: fastFM
Running setup.py develop for fastFM
Successfully installed fastFM
ubgpu@ubgpu:
/github$

ubgpu@ubgpu:/github/fastFM/fastFM/tests$ sudo pip install nose
Requirement already satisfied (use --upgrade to upgrade): nose in /usr/lib/python3/dist-packages
ubgpu@ubgpu:
/github/fastFM/fastFM/tests$
ubgpu@ubgpu:/github/fastFM/fastFM/tests$
ubgpu@ubgpu:
/github/fastFM/fastFM/tests$ sudo pip2 install nose
You are using pip version 7.0.3, however version 7.1.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Requirement already satisfied (use --upgrade to upgrade): nose in /usr/lib/python2.7/dist-packages
ubgpu@ubgpu:/github/fastFM/fastFM/tests$
ubgpu@ubgpu:
/github/fastFM/fastFM/tests$
ubgpu@ubgpu:/github/fastFM/fastFM/tests$
ubgpu@ubgpu:
/github/fastFM/fastFM/tests$ nosetests

EEEEEEEEE

ERROR: Failure: ImportError (No module named fastFM)

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 411, in loadTestsFromName
addr.filename, addr.module)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/ubgpu/github/fastFM/fastFM/tests/test_als.py", line 7, in
from fastFM import als
ImportError: No module named fastFM

ERROR: Failure: ImportError (No module named fastFM)

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 411, in loadTestsFromName
addr.filename, addr.module)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/ubgpu/github/fastFM/fastFM/tests/test_base.py", line 7, in
from fastFM import als
ImportError: No module named fastFM

ERROR: Failure: ImportError (No module named fastFM.datasets)

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 411, in loadTestsFromName
addr.filename, addr.module)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/ubgpu/github/fastFM/fastFM/tests/test_datasets.py", line 4, in
from fastFM.datasets import make_user_item_regression
ImportError: No module named fastFM.datasets

ERROR: Failure: ImportError (No module named ffm)

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 411, in loadTestsFromName
addr.filename, addr.module)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/ubgpu/github/fastFM/fastFM/tests/test_ffm.py", line 8, in
import ffm
ImportError: No module named ffm

ERROR: Failure: ImportError (No module named fastFM)

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 411, in loadTestsFromName
addr.filename, addr.module)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/ubgpu/github/fastFM/fastFM/tests/test_mcmc.py", line 7, in
from fastFM import mcmc
ImportError: No module named fastFM

ERROR: Failure: ImportError (No module named fastFM)

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 411, in loadTestsFromName
addr.filename, addr.module)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/ubgpu/github/fastFM/fastFM/tests/test_ranking.py", line 6, in
from fastFM import bpr
ImportError: No module named fastFM

ERROR: Failure: ImportError (No module named fastFM)

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 411, in loadTestsFromName
addr.filename, addr.module)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/ubgpu/github/fastFM/fastFM/tests/test_sgd.py", line 7, in
from fastFM import sgd
ImportError: No module named fastFM

ERROR: Failure: ImportError (No module named fastFM.bpr)

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 411, in loadTestsFromName
addr.filename, addr.module)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/ubgpu/github/fastFM/fastFM/tests/test_transform.py", line 10, in
from fastFM.bpr import FMRecommender
ImportError: No module named fastFM.bpr

ERROR: Failure: ImportError (No module named fastFM.utils)

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 411, in loadTestsFromName
addr.filename, addr.module)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/ubgpu/github/fastFM/fastFM/tests/test_utils.py", line 5, in
from fastFM.utils import kendall_tau
ImportError: No module named fastFM.utils


Ran 9 tests in 0.305s

FAILED (errors=9)
ubgpu@ubgpu:~/github/fastFM/fastFM/tests$

Problem running MCMC example code

Hi,

I'm trying to run your example code for the MCMC method. I downloaded the 100k-Movielens dataset and generate the lightsvm data format of u1.base and u1.test files using srendle/libFM script. But I'm getting the following error when trying to load the data with sklearn.datasets.load_svmlight_file:

ValueError: Feature indices in SVMlight/LibSVM data file should be sorted and unique.

I tried re-sorting rows by the index id of both features (user id, item id), but nothing happened. I'm sure that it's some kind of a silly error I'm commiting Could you help me with this issue, please?. Maybe can I find the files you use in the example to check the format?

Thanks.

Regards.

Cannot compile fastFM on Windows 10

Good morning,
Unfortunately, I was not able to compile and install fastFM on my Windows 10 x64 machine.
I am using Python 2.7.11 |Anaconda 2.3.0 (64-bit)| (default, Feb 16 2016, 09:58:36) [MSC v.1500 64 bit (AMD64)].
I tried with different compilers: g++, w64-mingw32, and Microsoft Visual C++ Compiler for Python 2.7.
Your help would be greatly appreciated.
Regards

error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

I follow all the build instructions and have all requirements upto date, however when i 'make'
i get the following error

/usr/bin/ld: cannot find -lgsl
/usr/bin/ld: cannot find -lgslcblas
collect2: error: ld returned 1 exit status
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
make: *** [all] Error 1

Understanding MCMC and n_more_iter

MCMC seems to be sensitive to the number of iterations between fit_predict calls. Setting n_more_iter=1 has an optimal RMSE of 0.884 at step 27. With n_more_iter=10, RMSE is at 0.860 and still improving at 10 iterations (equivalent to 100 with a step of 1). The documentation mentions "We can warm_start every fastFM model which allows us to calculate custom statistics during the model fitting process efficiently", which seems to suggest this sensitivity shouldn't be present.

Code excerpts and results below:

fm = mcmc.FMRegression(n_iter=0, rank=10)
fm.fit_predict(X_train, y_train, X_test)
for i in range(100):
    y_pred = fm.fit_predict(X_train, y_train, X_test, n_more_iter=1)
    y_pred[y_pred > 5] = 5
    y_pred[y_pred < 1] = 1
    print(i, np.sqrt(mean_squared_error(y_pred, y_test)))
0 1.04720819915
1 0.97778708587
2 0.948017085861
3 0.93420488937
4 0.927061672571
5 0.922935100294
6 0.920257539721
7 0.918207455438
8 0.916209819939
9 0.913894249208
10 0.911193471613
11 0.908216022258
12 0.905165877765
13 0.902210412052
14 0.899446393313
15 0.896925703595
16 0.894664102177
17 0.892659400427
18 0.890901483694
19 0.889378630713
20 0.888077107425
21 0.886984550257
22 0.886090970488
23 0.88538765354
24 0.884863438543
25 0.884506506975
26 0.884306239634
27 0.884247226983
28 0.884317448473
29 0.884505712397
30 0.884800603791
31 0.885189631326
32 0.885661840403
33 0.886204648292
34 0.886803484806
35 0.8874511923
36 0.888143165709
37 0.888869261405
38 0.889633023158
39 0.890425723873
40 0.891229955936
41 0.892040331705
42 0.892863900969
43 0.893707463792
44 0.894571302645
45 0.895459171301
46 0.896382680339
47 0.897345726293
48 0.8983542138
49 0.899390885256
50 0.900449139443
51 0.90153494656
52 0.902640912205
53 0.903777024948
54 0.904930834814
55 0.906096794935
56 0.907275765877
57 0.9084718919
58 0.909667324754
59 0.910873466425
60 0.912084967428
61 0.913287204773
62 0.914474364239
63 0.915653122817
64 0.916826161945
65 0.917994888944
66 0.919162248375
67 0.920329277189
68 0.92149676688
69 0.922659246729
70 0.923810308867
71 0.924949669371
72 0.926074311282
73 0.927179794147
74 0.928264040074
75 0.929326802974
76 0.930376434434
77 0.9314191654
78 0.932455845704
79 0.933482129994
80 0.93449735718
81 0.935486636427
82 0.936467682892
83 0.937443712518
84 0.938410321158
85 0.939356840954
86 0.94027321806
87 0.941148746045
88 0.942005000857
89 0.9428498981
90 0.943684777878
91 0.944508612458
92 0.945317518167
93 0.946115254746
94 0.946898829726
95 0.947669244301
96 0.948434140619
97 0.949193065287
98 0.949947628213
99 0.950688137695
fm = mcmc.FMRegression(n_iter=0, rank=10)
fm.fit_predict(X_train, y_train, X_test)
for i in range(10):
    y_pred = fm.fit_predict(X_train, y_train, X_test, n_more_iter=10)
    y_pred[y_pred > 5] = 5
    y_pred[y_pred < 1] = 1
    print(i, np.sqrt(mean_squared_error(y_pred, y_test)))
0 0.911849673248
1 0.902846141012
2 0.89065879739
3 0.880330818455
4 0.874373355886
5 0.870324418211
6 0.866544031989
7 0.863735153323
8 0.861829622252
9 0.860483981533

Verbose Optimizer Output

Is there any possibility to see the details of the optimization process during the optimization or as a summary?
I am currently having trouble with a dying kernel when I increase the n_iter and its impossible to find the cause without any debug output.

Symbol not found: _PyBytes_Type error running nosetests

I thought I had managed to get this all working on a Mac until I ran the nose tests for the python wrapper. I get the following error

======================================================================
ERROR: Failure: ImportError (dlopen(/Users/mellypang/eduvee-data/fastfm/fastFM/ffm.so, 2): Symbol not found: _PyBytes_Type
  Referenced from: /Users/mellypang/eduvee-data/fastfm/fastFM/ffm.so
  Expected in: dynamic lookup
)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/mellypang/anaconda/envs/fastfm/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/Users/mellypang/anaconda/envs/fastfm/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/Users/mellypang/anaconda/envs/fastfm/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/Users/mellypang/eduvee-data/fastfm/fastFM/fastFM/tests/test_als.py", line 7, in <module>
    from fastFM import als
  File "/Users/mellypang/eduvee-data/fastfm/fastFM/fastFM/als.py", line 6, in <module>
    from base import FactorizationMachine, BaseFMClassifier,\
  File "/Users/mellypang/eduvee-data/fastfm/fastFM/fastFM/base.py", line 5, in <module>
    import ffm
ImportError: dlopen(/Users/mellypang/eduvee-data/fastfm/fastFM/ffm.so, 2): Symbol not found: _PyBytes_Type
  Referenced from: /Users/mellypang/eduvee-data/fastfm/fastFM/ffm.so
  Expected in: dynamic lookup

All the other steps seem to run fine. Fastfm-core compiles and the tests pass, and the pip freeze contains the fastfm package

backports.ssl-match-hostname==3.4.0.2
certifi==2015.4.28
Cython==0.22
-e git+https://github.com/ibayer/fastFM.git@15b3a451978315d298348778ddc9de11b9e05f8f#egg=fastFM-master
functools32==3.2.3.post2
gnureadline==6.3.3
ipython==3.2.1
Jinja2==2.8
jsonschema==2.5.1
MarkupSafe==0.23
mistune==0.7
numpy==1.9.1
pandas==0.15.2
ptyprocess==0.5
Pygments==2.0.2
python-dateutil==2.4.2
pytz==2015.4
pyzmq==14.7.0
scikit-learn==0.16.0
scipy==0.15.1
six==1.9.0
terminado==0.5
tornado==4.2.1

Anyone else encountered a similar issue

CentOS ImportError: ./ffm.so: undefined symbol: cblas_daxpy

After compile&install fastFM according to install instruciton,which no error occured,

I run
python
from fastFM import als
then get the error "ImportError: ./ffm.so: undefined symbol: cblas_daxpy".

what happend?

my os environment:
centos 6.6
sudo yum install blas blas-devel lapack lapack-devel atlas atlas-devel had completed.

Can't reproduce results on warm_start_mcmc.py

Hi,

I'm trying to reproduce the results presented in http://arxiv.org/pdf/1505.00641v2.pdf by using the example source code warm_start_mcmc.py, with the same hyperparameters, but graph results for both test RMSE and hyperparameters do not look like the results in the paper. I've made a grid search to find hyperparameters that reproduce the results but I did not succeed.

I'm using the Movielens 100K dataset files u1.train and u1.test (same as in the example code), and re-generate them in the lightsvm format (files format was verified with cjlin1/libsvm checkdata tool)

What do you think could be the problem? Attached you can find the plot figure I'm getting and the dataset files I'm using

Archive.zip

figure_1

Thanks for your help,

Cheers
Santiago

support more scikit-learn versions

Using the sklearn.utils functions requires having the specific sklearn versions
as this function are not consider part of the official api and are changed often.

Change this so that all the most resent sklearn versions can be used.

Train rmse of each iter for mcmc

In libfm, there's log on train rmse and test rmse. How to compute train rmse when using mcmc?

self.w0_, self.w_, self.V_ = coef

I saw this line in mcmc, which means I may compute the prediction on train, is there any plan on offering the config of computing train rmse?

The following way does not work,
fm = mcmc.FMRegression(n_iter=0, init_stdev=0.1, rank=2)
for i in range(1, n_iter):
y_pred = fm.fit_predict(X_train, y_train, X_test, n_more_iter=step_size)
rmse_train.append(np.sqrt(mean_squared_error(fm.predict(X_train), y_train)))
rmse_test.append(np.sqrt(mean_squared_error(fm.predict(X_test), y_test)))
logging.info(rmse_train)

which gives an error
AttributeError: 'FMRegression' object has no attribute 'w0_'

Assert input issparse_matrix

The function check_array currently used to check the input doesn't assert
that the input is a sparse matrix but the implementation currently supports only sparse input!

Feature extraction from fastFM

Hello
Thanks for a great tool. Could you, please, ask a couple of questions on its functionality?
Is it somehow possible:

  • to extract weights/latent feature vectors from fastFM fitted model
  • to make feature space transformation to latent feature space, using fitted fastFM model.

The interesting part of the task - not only to predict actual ratings/probabilities, but also get some information on products and clients clusters.

Thanks a lot in advance,
Best regards
Vladimir Litvinyuk

MCMC vs ALS

I have tested three approaches to regression on the Movielens 1M dataset:
properMCMC - Passing the test data to fit_predict
badMCMC - Passing empty test data to fit_predict and then the actual test data to predict
properALS - Calling predict after fit much like "badMCMC"

properMCMC does better (lower RMSE) than badMCMC which in turn seems to do better than properALS (despite some tuning efforts). My questions:

  1. Should properALS be able to match properMCMC or does MCMC's additional internal state (more than one V_ and w_) provide intrinsic advantages?
  2. badMCMC seems to do better than properALS. Is this expected or do I just to do more tuning of ALS?
  3. My understanding is that the V_ and w_ chosen by badMCMC is arbitrary. Is there some way to chose the best or create some sort of aggregate?

My goal is to determine the single optimal V_ and w_ coefficients for a given set of data so that I can make future predictions (ie based on data that is NOT present at training time). I am struggling to understand if it is better to take a "bad" MCMC approach which still seems to be both convenient and accurate (although not as much as properMCMC) or if it would be better use to use ALS the "proper" way (and presumably invest a lot more time in tuning)

OpenBLAS support

It would be nice to support OpenBLAS, since this one is faster, and also demanded i.e. by theano.

At this moment pip install fastFM fails for OpenBLAS on ubuntu:

gcc -pthread -shared -L/moosefs/miniconda/envs/ipython_py2/lib -Wl,-rpath=/moosefs/miniconda/envs/ipython_py2/lib,--no-as-needed build/temp.linux-x86_64-2.7/fastFM/ffm.o -LfastFM/ -LfastFM-core/bin/ -LfastFM-core/externals/CXSparse/Lib/ -L/usr/lib/ -L/usr/lib/atlas-base/ -L/moosefs/miniconda/envs/ipython_py2/lib -lm -lfastfm -lcxsparse -lcblas -lpython2.7 -o build/lib.linux-x86_64-2.7/ffm.so
  /usr/bin/ld: cannot find -lcblas
  collect2: error: ld returned 1 exit status

I am not sure about consequences of changing BLAS (last time it required reinstalling whole scipy-stack), so I'd prefer not touch it :)

I get an error when I try to import it

"dlopen(/Users/tanle/Desktop/recommender/lib/python2.7/site-packages/ffm.so, 2): Symbol not found: _cs_di_norm
Referenced from: /Users/tanle/Desktop/recommender/lib/python2.7/site-packages/ffm.so
Expected in: flat namespace
in /Users/tanle/Desktop/recommender/lib/python2.7/site-packages/ffm.so"

Segmentation Fault in BPR from python Interface

With the information I gained from #58 I can now create a error report. I decided to use gdb --args python2.7 test.py to debug, because the bug only appears when using the python interface.
test.py:

import cPickle as pickle
from scipy import io,sparse
preferencesLocalArray = pickle.load(open("preferences.pickle","rb"))
features = io.mmread(open("sparse_features.mmw","rb"))
features = features.tocsc()

#shuffle pairwise preferences for train and test split
import numpy as np
np.random.seed(123L)
random_indices = np.random.permutation(preferencesLocalArray.shape[0])
preferencesLocalArrayShuffled = np.array(preferencesLocalArray[random_indices])

train_percentage = 95
trainIdx = range(int(preferencesLocalArray.shape[0]/100.0*train_percentage))
testIdx = range(int(preferencesLocalArray.shape[0]/100.0*train_percentage),preferencesLocalArray.shape[0])
posExamples = preferencesLocalArrayShuffled[testIdx,0]
negExamples = preferencesLocalArrayShuffled[testIdx,1]

from fastFM import bpr
import numpy as np
fm = bpr.FMRecommender(n_iter=500000,init_stdev=0.01,l2_reg_w=.2,l2_reg_V=1.,step_size=.01,rank=100, random_state=11)

fm.fit(features,preferencesLocalArrayShuffled[140615:140616])

When fitting BPR with a big sparse feature matrix (just user and item hot-encoded) the solver access non-existing memory here:
ffm_sgc.c

int p_n = Ap[sample_row_n];
int p_p = Ap[sample_row_p];
while (p_n < Ap[sample_row_n + 1] || p_p < Ap[sample_row_p + 1]) {
  double grad = 0;
  int i_to_update = Ai[p_p] <= Ai[p_n] ? Ai[p_p] : Ai[p_n];
  double theta_w = coef->w->data[i_to_update];
  // incrementing the smaller index or both if equal
  if (Ai[p_p] == i_to_update) {
    grad = Ax[p_p]; <------------------------------------------------------------ HERE
    p_p++;
  }
  if (Ai[p_n] == i_to_update) {
    grad -= Ax[p_n];
    p_n++;
  }

the gdb backtrace:

#0  ffm_fit_sgd_bpr (coef=coef@entry=0x7fffffffd520, A=A@entry=0x496d780, pairs=pairs@entry=0x7fffffffd4e0, param=...) at ffm_sgd.c:96
#1  0x00007fffdf7fbd34 in ffm_sgd_bpr_fit (w_0=w_0@entry=0x7fffffffd5e8, w=<optimized out>, V=<optimized out>, X=X@entry=0x496d780,
    pairs=0x4a2a2a0, n_pairs=1000000, param=param@entry=0x4a1b090) at ffm.c:131
#2  0x00007fffdf7e9c10 in __pyx_pf_3ffm_10ffm_fit_sgd_bpr (__pyx_v_fm=__pyx_v_fm@entry=0x7fffd4c10650,
    __pyx_v_X=__pyx_v_X@entry=0x7fffcef43a10, __pyx_v_pairs=0x7fffd4c02350, __pyx_self=<optimized out>) at fastFM/ffm.c:4748
#3  0x00007fffdf7eb5eb in __pyx_pw_3ffm_11ffm_fit_sgd_bpr (__pyx_self=<optimized out>, __pyx_args=<optimized out>,
    __pyx_kwds=<optimized out>) at fastFM/ffm.c:4468
#4  0x00007ffff7af6138 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#5  0x00007ffff7af6612 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.7.so.1.0
#6  0x00007ffff7af77cd in PyEval_EvalCodeEx () from /usr/lib64/libpython2.7.so.1.0
#7  0x00007ffff7af78d2 in PyEval_EvalCode () from /usr/lib64/libpython2.7.so.1.0
#8  0x00007ffff7b1055f in ?? () from /usr/lib64/libpython2.7.so.1.0
#9  0x00007ffff7b1167e in PyRun_FileExFlags () from /usr/lib64/libpython2.7.so.1.0
#10 0x00007ffff7b127e9 in PyRun_SimpleFileExFlags () from /usr/lib64/libpython2.7.so.1.0
#11 0x00007ffff7b234bf in Py_Main () from /usr/lib64/libpython2.7.so.1.0
#12 0x00007ffff6d52b15 in __libc_start_main () from /lib64/libc.so.6
#13 0x00000000004006f1 in _start ()
(gdb) frame 0  
#0  ffm_fit_sgd_bpr (coef=coef@entry=0x7fffffffd520, A=A@entry=0x496d780, pairs=pairs@entry=0x7fffffffd4e0, param=...) at ffm_sgd.c:96
96              grad = Ax[p_p];

some variables that are defined in frame 0

(gdb) info locals
grad = 0
i_to_update = 0
theta_w = -0.002142975917008301
comparison_row = 140615
sample_row_p = 4977937
sample_row_n = 24502
pairs_err = -0.48447366131550462
p_n = 49005
p_p = 9956862
i = 140615
p = <optimized out>
Ap = 0x7fffc0716010
Ai = 0x7fffc1a14010
Ax = 0x7fffc4010010
step_size = 0.01
n_comparisons = 1000000
k = 100

(gdb) print A->n
$82 = 4978177
(gdb) print A->m
$83 = 1972040
(gdb) print A->p

#some tests:
(gdb) print Ax[p_n]
$58 = 1
(gdb) print Ax[p_p]
Cannot access memory at address 0x7fffc8c07000
(gdb) print Ax[p_p-1]
$59 = 0
(gdb) print Ax[p_p+1]
Cannot access memory at address 0x7fffc8c07008
(gdb) print Ax[*p_p]
$60 = 1  // I don't get why this is now possible; is it an arbitrary number from anywhere in the memory?

(gdb) print A->nzmax
$77 = 9956354
but 
p_p = 9956862 which is not in range anymore!!!

The python csc matrix has  in fact 9956354 one's.

(gdb) print A->n (columns (examples))
$78 = 4978177
(gdb) print A->m (rows (features))
$79 = 1972040

Things I've figured out:

  1. when writing the whole data to files and passing them into the cli, the training runs successful (while crashing during prediction, but this is probably another story/issue). This gives us the hint that the error might not be in the C code itself but rather in the cython code for preparing the data.
  2. I can crash the fitting process by passing just one specific learning pair, but the whole feature matrix. This is the exact learning pair you see in the stack above.
  3. CXSparse matrices member p (X->p) can be filled with two different contents which might be the difference between using cli or python:
p → Int32List
Column pointers (size n+1) or col indices (size nzmax).
https://www.dartdocs.org/documentation/csparse/0.4.1/cxsparse/Matrix-class.html

While in externals/CXSparse/Source/cs_entry.c, the member p is filled with the column id

#include "cs.h"
/* add an entry to a triplet matrix; return 1 if ok, 0 otherwise */
CS_INT cs_entry (cs *T, CS_INT i, CS_INT j, CS_ENTRY x)
{
    if (!CS_TRIPLET (T) || i < 0 || j < 0) return (0) ;     /* check inputs */
    if (T->nz >= T->nzmax && !cs_sprealloc (T,2*(T->nzmax))) return (0) ;
    if (T->x) T->x [T->nz] = x ;
    T->i [T->nz] = i ;
    T->p [T->nz++] = j ; <---------------------- HERE
    T->m = CS_MAX (T->m, i+1) ;
    T->n = CS_MAX (T->n, j+1) ;
    return (1) ;
}

While in ffm.pyx the member p is filled by the matrix member indptr

# Create a CsMatrix object and return as a capsule
def CsMatrix(X not None):
    cdef cffm.cs_di *p
    p = <cffm.cs_di *> malloc(sizeof(cffm.cs_di))
    if p == NULL:
        raise MemoryError("No memory to make a Point")

    cdef int i
    cdef np.ndarray[int, ndim=1, mode = 'c'] indptr = X.indptr <-------------- HERE
    cdef np.ndarray[int, ndim=1, mode = 'c'] indices = X.indices
    cdef np.ndarray[double, ndim=1, mode = 'c'] data = X.data

    # Put the scipy data into the CSparse struct. This is just copying some
    # pointers.
    p.nzmax = X.data.shape[0]
    p.m = X.shape[0]
    p.n = X.shape[1]
    p.p = &indptr[0] <------------------ AND HERE
    p.i = &indices[0]
    p.x = &data[0]
    p.nz = -1  # to indicate CSC format
    return PyCapsule_New(<void *>p, "CsMatrix",
                         <PyCapsule_Destructor>del_CsMatrix)

which is the indptr from the csc sparse matrix format and differs in meaning from above (as far as I understood).

I hope, @ibayer , you can help me fixing this issue.

Train/Test of different column dimension

Hi,

I was trying to train the factorization machine using a dataset with X_train and X_test where X_test.shape[1] < X_train.shape[1]. However, I could not proceed with training because of the following assertion:

assert X_test.shape[1] == len(self.w_)

Since self.w_ has a length initialized from X_train, X_test in this case will fail. It is perfectly reasonable to me that the number of columns in X_test could be less than or equal to the number of columns in X_train. The workaround for this is to zero-pad X_test on the right using the scipy.sparse.hstack function which should not be necessary.

Is there any motivation for why this assertion should continue to exist? If the shape of X_test is necessary for the fastFM-core, perhaps we could perform the test and zero-pad the matrix if necessary?

Cannot clone object FMClassification

I receive this error:

RuntimeError: Cannot clone object FMClassification(init_stdev=0.1, l2_reg=None, l2_reg_V=0.1, l2_reg_w=0.1,
         n_iter=100, random_state=123, rank=8), as the constructor does not seem to set parameter l2_reg_V

when using FMClassification with cross_val_score.

Question:Using Custom Dataset

1.I am about to to use a customized data set in fastFm. Data is a collection of many messages (text content) and subjects, one of the features is key words array (array of few words) and the other feature is the subjects,as both of the features are textual kinds,I was wondering how to use fastFM with such data set?
2. in case I use corresponding unique IDs for each single word and subject which means I'll have an array of integer values for Keywords feature, do you think if "OneHotEncoder" can be any helpful for changing such string values and make them useful for fastFm?

Expected behavior of bpr.FMRecommender on a trivial case

I really hope I am not missing something obvious here.
Using fastFM (0.2.6) on MacOS.

import numpy as np
import scipy as sp
from fastFM import bpr

# learn matching two binary features

#1-hot encoding
x=sp.sparse.csc_matrix(np.array([
    [ 0.,  1.,  0.,  1.],
    [ 1.,  0.,  1.,  0.],
    [ 0.,  1.,  1.,  0.],
    [ 1.,  0.,  0.,  1.]
    ]))

# first two observations are better than the last two
compares=np.array([
    [  0.,   2.],
    [  0.,   3.],
    [  1.,   2.],
    [  1.,   3.]
    ])

# fit
fm=bpr.FMRecommender(n_iter=1000,
   init_stdev=0.01, l2_reg_w=.5, l2_reg_V=.5, rank=2,
   step_size=.001, random_state=11)
fm.fit(x,compares)

# predict
p=fm.predict(x)

# expecting 0 and 1 before 2 and 3
print(np.argsort(-p))
print(p)

What I get instead is:

[3 0 1 2]
[-0.0020049  -0.00487495 -0.00704542 -0.00022524]

Assertion failed when fitting with l2_reg_w=0 l2_reg_V=0

When fitting als.FMRegression with l2_reg_w=0 or l2_reg_V=0 I get the following assertion failure:

Assertion failed: (isfinite(new_V_fl) && "V not finite"), function sparse_fit, file ffm_als_mcmc.c, line 218.
Abort trap: 6

I note that older documentation actually has these as defaults but current values are 0.1. Regardless is this expected to fail? I am using pretty basic Movielens training data (ml1m)

5 1:1 11193:1
3 1:1 10661:1
3 1:1 10914:1
4 1:1 13408:1
5 1:1 12355:1
3 1:1 11197:1
5 1:1 11287:1
5 1:1 12804:1
4 1:1 10594:1
4 1:1 10919:1
5 1:1 10595:1
4 1:1 10938:1

Why do I get better results with libfm?

Why do I get better results with libfm?

Be careful if you use a regression model with a categorical target, such as the 1-5 star rating of the movielens dataset.

libfm automatically clips the prediction values to the higest / lowest value in the training data.
This make sense if you predict ratings with a regression model and evaluate with RMSE.

For example, it's certainly better to predict a 5 star rating if the regression score is > 5 then the regression value.
With fastFM you have to do the clipping yourself, because clipping is not always a good idea.

But it's easy to do if you need it.

    # clip values                                                    
    y_pred[y_pred > y_true.max()] = y_true.max()                        
    y_pred[y_pred < y_true.min()] = y_true.min()

Why do I not get exactly the same results with fastFM as with libFM?

FMs are non-linear models that use random initialization. This means that the solver might end up in a different local optima if the initialization changes. We can use a random seed in fastFM to make individual runs comparable, but that doesn't help if you compare results between different implementations. You should therefore always expect small differences between fastFM and libFM predictions.

Better documentation of mcmc hyperparameter

Access to the mcmc hyperparameter is fairly obscure and needs better documentation.

  • add documentation to python mcmc module
  • add function that generates string parameter description
  • add documentation to C fastFM-core/src/ffm.c L. 77 ff

Hyperparameter layout in fm.hyper_param_:

    print (['alpha'] +               
    ['lambda_w'] + ['lambda_V' + str(i) for i in range(rank)] + 
    ['mu_w'] + ['mu_V' + str(i) for i in range(rank)])`

output example (rank=4):

['alpha', 'lambda_w', 'lambda_V0', 'lambda_V1', 'lambda_V2', 'lambda_V3', 'mu_w', 'mu_V0', 'mu_V1', 'mu_V2', 'mu_V3']

fm.fit_predict vs fm.predict

Hello,

I wanted to use fm.predict to have a fast prediction with MCMC FM, without using fit_predict.

Maybe i am wrong, but
Yhat=fm.fit_predict(X_train, ytrain,X_test)
and
Yhat0=fm.predict(X_test)
(when executed after fit_predict)
should return the same thing right ?
However it is not the case. (even np.sort(Yhat) and np.sort(Yhat0) are different)

I tried to predict manually :
X=X_test
V=np.transpose(fm.V_)
w=fm.w_
w0=fm.w0_
XV=X * V
R=np.square(XV)-X.multiply(X) * np.square(V)
Yhat1=(X*w+w0)+np.sum(R,axis=1)/2

And I have seen that Yhat1=Yhat0, so I think that the reason why fm.predict doesn't return the same thing as fm.fit_predict is that some parameters among w, w0, and V are not really those which are fitted with X_train and ytrain while using fm.fit_predict(X_train, ytrain,X_test).

Maybe I dont have understood how the lib works or maybe I don't have understood the true meaning of the function fit_predict, but it always gives me a better RMSE than fm.predict.

I have tried this :
Yhat2=fm.fit_predict(X_train[:1], ytrain[:1], X_test, n_more_iter=1)

But even it is fast to compute and similar to Yhat, it deteriorates the RMSE.

(And plzz find a solution to install the packages on osx)

I really hope my post is useful,

Thanks for that lib.

Question: Restrict Columns Involved with Interaction

Is there a way or a work-around to run a FM on a standard regression type data set but only allow the factorized interaction between certain column sets? For example only allow interactions between column A and columns B through Z?

fastFM has bad performance on classification with SGD

I test fastFM with some datasets, the performance is really bad, compared with libfm, lr, gbdt.
The prediction=0.76, recall=0.55, while other methods all give prediction=1, recall = 1
Do I use it wrong?

Uploading agaricus.txt…

train_file = '../data/agaricus.txt.train'
    test_file = '../data/agaricus.txt.test'
    X_train, y_train, X_test, y_test = read_data(train_file, test_file)

    y_train = transform_label(y_train)
    y_test = transform_label(y_test)
    n_iter = 50

    clf = sgd.FMClassification(n_iter=1000, init_stdev=0.1, rank=5,
                 l2_reg_w=0, l2_reg_V=0, l2_reg=None, step_size=0.01)
    clf.fit(X_train, y_train)
    y_predict  = clf.predict(X_test)
    print classification_report(y_test, y_predict)
    print clf.predict_proba(X_test)

Easy installation on mac

Hello,

I was wondering if there was an easy way to install FastFM on mac,
because I have followed all the instructions of the updated readme
but only the last command fails :

In my terminal, when I exectue in the FastFM folder 'sudo pip install -e .' , I get :

fastFM/ffm.c:7789:32: warning: unused function '__pyx_f_5numpy_get_array_base' [-Wunused-function]
static CYTHON_INLINE PyObject *__pyx_f_5numpy_get_array_base(PyArrayObject *__pyx_v_arr) {
^
21 warnings generated.
gcc -bundle -undefined dynamic_lookup -L/Users/edmondjacoupeau/anaconda/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.5-x86_64-2.7/fastFM/ffm.o -LfastFM/ -LfastFM-core/bin/ -LfastFM-core/externals/CXSparse/Lib/ -L/usr/lib/ -L/usr/lib/atlas-base/ -L/Users/edmondjacoupeau/anaconda/lib -lm -lfastfm -lcxsparse -lgsl -lgslcblas -lglib-2.0 -o /Users/edmondjacoupeau/fastFM/ffm.so
ld: warning: directory not found for option '-LfastFM-core/bin/'
ld: warning: directory not found for option '-L/usr/lib/atlas-base/'
ld: library not found for -lfastfm
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: command 'gcc' failed with exit status 1


Command "/Users/edmondjacoupeau/anaconda/bin/python -c "import setuptools, tokenize; file='/Users/edmondjacoupeau/fastFM/setup.py'; exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" develop --no-deps" failed with error code 1 in /Users/edmondjacoupeau/fastFM

Do I have forgotten something ?

Install fastFM in conda with openblas

I'm not sure if it is proper to open up an issue for documenting my own experiences installing fastFM on a Linux machine without root privilege. Administrator, please move it to some more suitable place if you can think of any.

I installed conda some time ago. It includes sklearn, numpy, and other useful things for machine learning and building recommendation system. It has other things required by fastFM except for BLAS, which can be easily installed with

conda install openblas

However, it seems fastFM does not recognize openblas by default. Or probably I'm missing some conda setup. Anyway, when I tried compiling fastFM, it failed because the cblas.h file is missing.

The solution to that is to append the absolute path to cblas.h to fast_fm.h, which is under fastFM-core/src/. My cblas.h is under .../anaconda2/pkgs/openblas-0.2.14-4/include/.

After that, I encountered another issue, where fastFM cannot recognize openblas as blas. To solve this, I simply changed this line in setup.py (libraries=['m', 'fastfm', 'cxsparse', 'blas']) to (libraries=['m', 'fastfm', 'cxsparse', 'openblas']).

I also added "....anaconda2/pkgs/openblas-0.2.14-4/include/" and "..../anaconda2/pkgs/openblas-0.2.14-4/lib/" to include_dirs in setup.py. Not sure if this is necessary.

Note: anaconda2 is my conda environment.

Hope this will help others who have similar problems.

Python3?

Hi, does fastFM support Python3?

How to access the variables

I've trained an FM model on some data I have and I want to have a look at the variables w, and V that the FM fits. How do I access these?

Create a Google Group

A forum to discuss topics that don't necessarily rise to the level of Github "Issue" (at least initially) would be quite valuable.

predict_proba method and n_more_iter functionality for MCMC Classification

It would be really nice to see both features coming soon :-).
For MCMC Classification there is only one fit_predict method which is not practical for an productional use (because its necessary to train the model each time one want to predict an example).
Also the warm_start functionality of the other solvers is really nice but not available for this solver.

Assertion failed error

While calling fm=als.FMClassification(n_iter=n_iter, l2_reg_w=0.1, l2_reg_V=0.1, rank=8) I am getting
Assertion failed: (isfinite(*w_0) && "w_0 not finite"), function sparse_fit, file ffm_als_mcmc.c, line 165.
Abort trap: 6

What is the problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.