coreylynch / pyfm Goto Github PK

View Code? Open in Web Editor NEW

919.0 919.0 311.0 1.37 MB

Factorization machines in python

Python 100.00%

pyfm's People

Contributors

Stargazers

Watchers

Forkers

bigsnarfdude python-recsys jattenberg tomsheep aurora1625 jinbochen pprett kug he-yunlong kobedeshow andrew916 tangzk just4jin jrings pavan1112 pchankh goooooopen kevinhsu akansal1 kaynewest mindis ty01csbaidu ssrinivasan stephanesbizzera benjamesbabala paulhendricks lai-bluejay ziayalon rms15 xypan1232 geekan cherishzhang davidurpani stonehuang1024 foolchan2556 dslituiev beifeizhou peratham travistdale nissimnabar thonic alex-senov dusthui guanlongtianzi magicjane pandasasa tonytongzhao ml-ai-nlp-ir jinyu0310 hiro-koba paullo0106 dansbs trietnm2 dmitsf lixiangbao veterun undarmaa fuzzydunlop83 edwardzeng mathlf2015 subedi90 timwee shakezo detectica starkmchen colinsongf zjuzt kercker poseidon1214 shaoyonghua wzhe06 yuelianghaoyuana eyalav pinkw hitwer912 pyx123 naivescript matafight caifazhou zyfnhct pengshuang guojiangwei2 yikkin leezqcst tyrinwu yuezzymsqtd minghao2016 wanesta raghavatreya skyjiao 4575759ww tony32769 mariosilvaecastro wlzhong maskani-moh liupzone sreza ajoeajoe leliuchn yushuai

pyfm's Issues

why do the examples of pyFM perform not good, obvise overfit~

hello ,dose anyone find that the examply pyFM contains and showed in README is overfit?
the mse in TrainData and TestData is 0.4 and 0.8......that is overfit,so dose the example of classifier model~

Why are the descriptors weighted by area before projection?

This is the line of projection
return self.eigenvectors[:,:k].T @ (self.A @ func)
If I understand correctly, the basis itself is orthogonal, and are the solution to lambda*L@x = A@x, where L is cotangent weights and A are area weights.
So, isn't the matrix multiplication with A redundant here? Since our goal is just to project the descriptors onto a basis set for dimensionality reduction reasons.

unbalansed data classificatoin

how to do unbalansed data classification,
may you add weights as input parameter for example this has
https://github.com/dstein64/PyFactorizationMachines

sklearn\cross_validation deprecation warning

sklearn 0.19.1

lib\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)

How to save the model

Since the computation of the model is time-consuming, is there any way to save the model for later prediction?

Can't install pyfm

Hi
I am Installing using pip install git+https://github.com/coreylynch/pyFM
Gave me the error below, any ideas?
Error compiling Cython file:

...
& validation_sample_weight)
self._sgd_lambda_step(validation_x_data_ptr, validation_x_ind_ptr,
validation_xnnz, validation_y)
if self.verbose > 0:
error_type = "MSE" if self.task == REGRESSION else "log loss"
print "Training %s: %.5f" % (error_type, (self.sumloss / self.count))
^

LINK : fatal error LNK1181: cannot open input file 'm.lib'

pip install in windows first required the"Visual C++ Build Tools" but now fails due to LINK : fatal error LNK1181: cannot open input file 'm.lib'

Why is my training rusult (log loss) is always 0

Creating validation dataset of 0.01 of training for adaptive regularization
-- Epoch 1
Training MSE: nan
-- Epoch 2
Training MSE: nan
-- Epoch 3
Training MSE: nan
-- Epoch 4
Training MSE: nan
-- Epoch 5
Training MSE: nan
-- Epoch 6
Training MSE: nan
-- Epoch 7
Training MSE: nan
-- Epoch 8
Training MSE: nan
-- Epoch 9
Training MSE: nan
-- Epoch 10
Training MSE: nan

bug in classification prediction

It seems there is a bug in the pyfm_fast.pyx within the prediction part for classification tasks:

In the _predict method, the outcome is basically calculated in line 252 using the predict_instance method. predict_instance is evaluating the FM model and then scales the result with the sigmoid function for classification in _scale_prediction (line 179), which is fine and which is also needed also for the training part.
The problem I see, is that this sigmoid transformation is done again in line 252, which basically means, that _predict always returns values > 0.5 because we apply the sigmoid twice.
There is no effect for the regression taks, due to the difference handling for classification/regression within _scale_prediction.

What do you think?

install error

I tried pip but i got this:

LINK : fatal error LNK1181: cannot open input file 'm.lib'
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\link.exe' failed with exit status 1181

the tools only support binary classification? what about multi-classification?

Installation: need to use `python setup.py build_clib` before build_ext for python3

I tried to build and load the module under python3 (with respective print corrections). Upon import pylibfm I got:

ImportError: /home/dima/data/external/pyFM/pyfm_fast.so: undefined symbol: _Py_ZeroStruct

Running following command fixed the problem:

 python setup.py build_clib

Incorrect format

I am trying to use libFM n the Frappe dataset. However, I get the following error on running the code:

Original exception was:
Traceback (most recent call last):
File "fm.py", line 19, in
(train_data, y_train, train_users, train_items)=loadData("traindata.mat")
File "fm.py", line 11, in loadData
for line in f:
File "/usr/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 133: invalid continuation byte

Is there some problem in the input format of the training and/or test dataset?
My training and test dataset are in the .mat format

Is there a parallel processing option?

Hi there

is there a way of running the fit method in parallel?

Thanks!

Is there a way to implement early stopping if we start overfitting on validation?

This would greatly help with speed.

can't save and load model correctly

I tried to save trained FM model and load it by using pickle.
I could save and load it , but loaded model predicted anomalous values like all zero.
How do I save and load trained model correctly ?

may you update code to remove this: cross_validation.py:41: DeprecationWarning

in near future your code may be not running since : cross_validation.py:41: DeprecationWarning

it is output of your test case

from pyfm import pylibfm
from sklearn.feature_extraction import DictVectorizer
import numpy as np
train = [
{"user": "1", "item": "5", "age": 19},
{"user": "2", "item": "43", "age": 33},
{"user": "3", "item": "20", "age": 55},
{"user": "4", "item": "10", "age": 20},
]
v = DictVectorizer()
X = v.fit_transform(train)
print(X.toarray())
#[[ 19. 0. 0. 0. 1. 1. 0. 0. 0.]
#[ 33. 0. 0. 1. 0. 0. 1. 0. 0.]
#[ 55. 0. 1. 0. 0. 0. 0. 1. 0.]
#[ 20. 1. 0. 0. 0. 0. 0. 0. 1.]]
y = np.repeat(1.0,X.shape[0])
fm = pylibfm.FM()
fm.fit(X,y)
fm.predict(v.transform({"user": "1", "item": "10", "age": 24}))

C:\Users\sndr\Anaconda3\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
[[19. 0. 0. 0. 1. 1. 0. 0. 0.]
[33. 0. 0. 1. 0. 0. 1. 0. 0.]
[55. 0. 1. 0. 0. 0. 0. 1. 0.]
[20. 1. 0. 0. 0. 0. 0. 0. 1.]]
Creating validation dataset of 0.01 of training for adaptive regularization
-- Epoch 1
Training log loss: 0.13187

so
this warning is for line
from pyfm import pylibfm

it is
C:\Users\sndr\Anaconda3\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)

thanks

When I try to train a model I get

AttributeError: 'numpy.ndarray' object has no attribute 'indptr'

From
dataset = CSRDataset(X.data, X.indptr, X.indices, y_i, sample_weight)

The training X frame should be a numpy dataset yeah?

Different values for same item?

After training the model, I get different values of the same item, when I enter more items to the predicting list? Is this supposed to happend or is it a bug?
For eksample:
print fm.predict(v.fit_transform({"user": "1", "item": "10", "age": 24}))

print fm.predict(v.fit_transform([{"user": "1", "item": "10", "age": 24},{"user": "1", "item": "12", "age": 24}]))

both have user 1 and item 10, however the ratings of those would be different...

pyFM installation fails

Only may you please share how to install it for python 3 on widows computer
I use this advice
#11

and get this message
Microsoft Windows [Version 10.0.14393]
(c) 2016 Microsoft Corporation. All rights reserved.
e:\factirizatoin machine including FFM\code\pyFM_master_May5>python setup.py build_clib
running build_clib
e:\factirizatoin machine including FFM\code\pyFM_master_May5>

however when I run it from python
this error happen

File "E:\factirizatoin machine including FFM\code\pyFM_master_May5\myexampl.py", line 1, in
from pyfm import pylibfm
File "E:\factirizatoin machine including FFM\code\pyFM_master_May5\pyfm\pylibfm.py", line 4, in
from pyfm_fast import FM_fast, CSRDataset

builtins.ImportError: No module named 'pyfm_fast'

my friend try to install this on pytohn2 but he gets this error

An exception has occurred, use %tb to see the full traceback.
SystemExit: usage: main.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: main.py --help [cmd1 cmd2 ...]
   or: main.py --help-commands
   or: main.py cmd --help
error: option -f not recognized

Adding user feature reduce model performance?

Hi everyone,

I am trying to use ml-1m data to build a rs model for users. What is weird for me is that, the model has a better performance without using the user features. Did i do something wrong when adding the features or is this normal?

Fitting the dataset
dataset = Dataset() dataset.fit(users = (row['UserID'] for index,row in users_df.iterrows()), items = (row['MovieID'] for index,row in movie_df.iterrows()), user_features = set(user_features_flat))

Creating the interaction and feature matrix
(interactions, weights) = dataset.build_interactions((row['UserID'],row['MovieID'],row['rating']) for index,row in ratings_df.iterrows())
user_feature_matrix = dataset.build_user_features((row['UserID'], [row['Gender'],row['Occupation'],row['age_group']]) for index,row in users.iterrows())

Model with user features
model = LightFM(no_components=70, loss='warp',) model.fit(interactions, user_features=user_feature_matrix, item_features=None, sample_weight=None, epochs=70, num_threads=4)
p_k = evaluation.precision_at_k(model, test, k=10, user_features=user_feature_matrix, item_features=None, preserve_rows=False, num_threads=4, check_intersections=True).mean() p_k #0.14658715

Model without
model_cf = LightFM(no_components=70, loss='warp') model_cf.fit(interactions, user_features=None, item_features=None, sample_weight=None, epochs=70, num_threads=4)
p_k_cf = evaluation.precision_at_k(model_cf, test, k=10, user_features=None, item_features=None, preserve_rows=False, num_threads=4, check_intersections=True).mean() p_k_cf #0.1638668

Not converge when training?

You can see the rmse change as below, using the example from README.
Why does it not converge, and how could I fix it?

--- git/pyFM ‹master* ?› » python example.py 
Creating validation dataset of 0.01 of training for adaptive regularization
-- Epoch 1
Training RMSE: 0.49676
-- Epoch 2
Training RMSE: 0.44940
-- Epoch 3
Training RMSE: 0.44133
-- Epoch 4
Training RMSE: 0.43757
-- Epoch 5
Training RMSE: 0.43599
-- Epoch 6
Training RMSE: 0.43494
-- Epoch 7
Training RMSE: 0.43381
-- Epoch 8
Training RMSE: 0.43375
-- Epoch 9
Training RMSE: 0.43324
-- Epoch 10
Training RMSE: 0.43272
-- Epoch 11
Training RMSE: 0.43310
-- Epoch 12
Training RMSE: 0.43255
-- Epoch 13
Training RMSE: 0.43229
-- Epoch 14
Training RMSE: 0.43235
-- Epoch 15
Training RMSE: 0.43214
-- Epoch 16
Training RMSE: 0.43237
-- Epoch 17
Training RMSE: 0.43242
-- Epoch 18
Training RMSE: 0.43247
-- Epoch 19
Training RMSE: 0.43308
-- Epoch 20
Training RMSE: 0.44136
-- Epoch 21
Training RMSE: 0.44681
-- Epoch 22
Training RMSE: 0.44714
-- Epoch 23
Training RMSE: nan

Fatal error during the installation

I am trying to install the pyFM in my machine using python27. During the installation I am receiving the following error: LINK: fatal error LNK1181: cannot open input file 'm.lib'

What this error is stand out?

How to save the trained model?

I can't find the way for save the model?Could someone help to solve this? thx
now I can run like this:
fm = pylibfm.FM()
fm.fit(X,y)
fm.predict(v.transform({"user": "1", "item": "10", "age": 24}))
but can't to find the way to save the model

AttributeError: 'numpy.ndarray' object has no attribute 'indptr'

Creating validation dataset of 0.01 of training for adaptive regularization
Traceback (most recent call last):
File "suanfa.py", line 25, in
fm.fit(trainX,trainY)
File "/home/ks/anaconda3/lib/python3.5/site-packages/pyfm/pylibfm.py", line 181, in fit
X_train_dataset = _make_dataset(X_train, train_labels)
File "/home/ks/anaconda3/lib/python3.5/site-packages/pyfm/pylibfm.py", line 239, in _make_dataset
dataset = CSRDataset(X.data, X.indptr, X.indices, y_i, sample_weight)
AttributeError: 'numpy.ndarray' object has no attribute 'indptr'

referring to w0, w, and v of trained model

Can I refer to w0, w, and v of trained model?
If I can, please tell me the way referring to w0, w, and v of trained model.

thanks

FM's on simple Sklearn's boston data giving NaN's

This is giving errors, am I missing something?

from scipy import sparse
from sklearn.datasets import load_boston
import pylibfm

instantiate FM instance with 7 latent factors

fm = pylibfm.FM(num_factors=7, num_iter=6, verbose=True)

load dataset

boston = load_boston()

fit FM, making sure to wrap the ndarray as a sparse csr

fm.fit(sparse.csr_matrix(boston.data), boston.target)

Creating validation dataset of 0.01 of training for adaptive regularization
-- Epoch 1
Training log loss: nan
-- Epoch 2
Training log loss: nan
-- Epoch 3
Training log loss: nan
-- Epoch 4
Training log loss: nan
-- Epoch 5
Training log loss: nan
-- Epoch 6
Training log loss: nan

fm.v is also all nan.

it is written that this code is badly performing, ma you explain?

pylibfm is out of the game. It is slow, it crashes on large datasets, sometimes simply diverge and hardly can compete in quality.
https://arogozhnikov.github.io/2016/02/15/TestingLibFM.html

will it work for third order categorical features interaction ?

Great code, thanks !

Plese help to understand
1
will it work for third order categorical features interaction ?
2
will it run on Windows computer ?

3
will it work for sparse data ?

Pycharm error: Process finished with exit code -1073741819 (0xC0000005)

I was using pyFM to classify a data set. When only 50% of the data were used, the program went on normally. But when I tried to use all the data (the amount was about 300,000), an error occurred in pycharm at fm.fit(): Process finished with exit code -1073741819 (0xC0000005). I wonder if it was running out of memory?

'from pyfm import pylibfm' causes 'Python.exe stops working'

Is it because of Anaconda?

indptr not found

Train_x and Test_x are all scipy sparse data, the fm.fit() is running normally, while in predict it come up with the error " indptr not found" when call the function CSRDataset(), why this error not occurred in fit()?

About regularization

Hi,

From the code I see:

    # Regularization Parameters (start with no regularization)
    self.reg_0 = 0.0
    self.reg_w = 0.0
    self.reg_v = np.repeat(0.0, num_factors)

However, I don't see where the regularization parameters are updated. Moreover, after fitting the model, I used model.reg_v to output the regularization parameters, it gave me an array of zeros. I am wondering does the model impose regularization on the model parameters? Thanks

may you remove this warnings: pyfm_fast.c(3174): warning C4018: '<': signed/unsigned mismatch

there are many warnings doing installation
pyfm_fast.c(3174): warning C4018: '<': signed/unsigned mismatch

E:\Factorisation machens\how to install\pyfm_installation\pyFM-master\pyFM-master>python setup.py build_ext --inplace
running build_ext
cythoning pyfm_fast.pyx to pyfm_fast.c
building 'pyfm_fast' extension
creating build
creating build\temp.win-amd64-3.6
creating build\temp.win-amd64-3.6\Release
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\sndr\Anaconda3\lib\site-packages\numpy\core\include -IC:\Users\sndr\Anaconda3\include -IC:\Users\sndr\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\winrt" /Tcpyfm_fast.c /Fobuild\temp.win-amd64-3.6\Release\pyfm_fast.obj
pyfm_fast.c
c:\users\sndr\anaconda3\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(12) : Warning Msg: Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
pyfm_fast.c(3174): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(3214): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(3245): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(3843): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(3910): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(3941): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(4885): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(4953): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(4964): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(5505): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(5572): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(5610): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(5996): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(14070): warning C4244: 'initializing': conversion from 'double' to 'float', possible loss of data
pyfm_fast.c(14076): warning C4244: 'initializing': conversion from 'double' to 'float', possible loss of data
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Users\sndr\Anaconda3\libs /LIBPATH:C:\Users\sndr\Anaconda3\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\LIB\amd64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.10240.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\8.1\lib\winv6.3\um\x64" /EXPORT:PyInit_pyfm_fast build\temp.win-amd64-3.6\Release\pyfm_fast.obj "/OUT:E:\Factorisation machens\how to install\pyfm_installation\pyFM-master\pyFM-master\pyfm_fast.cp36-win_amd64.pyd" /IMPLIB:build\temp.win-amd64-3.6\Release\pyfm_fast.cp36-win_amd64.lib
pyfm_fast.obj : warning LNK4197: export 'PyInit_pyfm_fast' specified multiple times; using first specification
Creating library build\temp.win-amd64-3.6\Release\pyfm_fast.cp36-win_amd64.lib and object build\temp.win-amd64-3.6\Release\pyfm_fast.cp36-win_amd64.exp
Generating code
Finished generating code

E:\Factorisation machens\how to install\pyfm_installation\pyFM-master\pyFM-master>

coreylynch / pyfm Goto Github PK

pyfm's People

Contributors

Stargazers

Watchers

Forkers

pyfm's Issues

Hi I am Installing using pip install git+https://github.com/coreylynch/pyFM Gave me the error below, any ideas? Error compiling Cython file:

... & validation_sample_weight) self._sgd_lambda_step(validation_x_data_ptr, validation_x_ind_ptr, validation_xnnz, validation_y) if self.verbose > 0: error_type = "MSE" if self.task == REGRESSION else "log loss" print "Training %s: %.5f" % (error_type, (self.sumloss / self.count)) ^

instantiate FM instance with 7 latent factors

load dataset

fit FM, making sure to wrap the ndarray as a sparse csr

fm.v is also all nan.

Recommend Projects

Recommend Topics

Recommend Org

Hi
I am Installing using pip install git+https://github.com/coreylynch/pyFM
Gave me the error below, any ideas?
Error compiling Cython file:

...
& validation_sample_weight)
self._sgd_lambda_step(validation_x_data_ptr, validation_x_ind_ptr,
validation_xnnz, validation_y)
if self.verbose > 0:
error_type = "MSE" if self.task == REGRESSION else "log loss"
print "Training %s: %.5f" % (error_type, (self.sumloss / self.count))
^