coreylynch / pyfm Goto Github PK
View Code? Open in Web Editor NEWFactorization machines in python
Factorization machines in python
hello ,dose anyone find that the examply pyFM contains and showed in README is overfit?
the mse in TrainData and TestData is 0.4 and 0.8......that is overfit,so dose the example of classifier model~
This is the line of projection
return self.eigenvectors[:,:k].T @ (self.A @ func)
If I understand correctly, the basis itself is orthogonal, and are the solution to lambda*L@x = A@x, where L is cotangent weights and A are area weights.
So, isn't the matrix multiplication with A redundant here? Since our goal is just to project the descriptors onto a basis set for dimensionality reduction reasons.
how to do unbalansed data classification,
may you add weights as input parameter for example this has
https://github.com/dstein64/PyFactorizationMachines
sklearn 0.19.1
lib\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
Since the computation of the model is time-consuming, is there any way to save the model for later prediction?
pip install in windows first required the"Visual C++ Build Tools" but now fails due to LINK : fatal error LNK1181: cannot open input file 'm.lib'
Creating validation dataset of 0.01 of training for adaptive regularization
-- Epoch 1
Training MSE: nan
-- Epoch 2
Training MSE: nan
-- Epoch 3
Training MSE: nan
-- Epoch 4
Training MSE: nan
-- Epoch 5
Training MSE: nan
-- Epoch 6
Training MSE: nan
-- Epoch 7
Training MSE: nan
-- Epoch 8
Training MSE: nan
-- Epoch 9
Training MSE: nan
-- Epoch 10
Training MSE: nan
It seems there is a bug in the pyfm_fast.pyx within the prediction part for classification tasks:
In the _predict
method, the outcome is basically calculated in line 252 using the predict_instance
method. predict_instance
is evaluating the FM model and then scales the result with the sigmoid function for classification in _scale_prediction
(line 179), which is fine and which is also needed also for the training part.
The problem I see, is that this sigmoid transformation is done again in line 252, which basically means, that _predict
always returns values > 0.5
because we apply the sigmoid twice.
There is no effect for the regression taks, due to the difference handling for classification/regression within _scale_prediction
.
What do you think?
I tried pip but i got this:
LINK : fatal error LNK1181: cannot open input file 'm.lib'
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\link.exe' failed with exit status 1181
I tried to build and load the module under python3 (with respective print
corrections). Upon import pylibfm
I got:
ImportError: /home/dima/data/external/pyFM/pyfm_fast.so: undefined symbol: _Py_ZeroStruct
Running following command fixed the problem:
python setup.py build_clib
I am trying to use libFM n the Frappe dataset. However, I get the following error on running the code:
Original exception was:
Traceback (most recent call last):
File "fm.py", line 19, in
(train_data, y_train, train_users, train_items)=loadData("traindata.mat")
File "fm.py", line 11, in loadData
for line in f:
File "/usr/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 133: invalid continuation byte
Is there some problem in the input format of the training and/or test dataset?
My training and test dataset are in the .mat format
Hi there
is there a way of running the fit
method in parallel?
Thanks!
This would greatly help with speed.
I tried to save trained FM model and load it by using pickle.
I could save and load it , but loaded model predicted anomalous values like all zero.
How do I save and load trained model correctly ?
in near future your code may be not running since : cross_validation.py:41: DeprecationWarning
it is output of your test case
from pyfm import pylibfm
from sklearn.feature_extraction import DictVectorizer
import numpy as np
train = [
{"user": "1", "item": "5", "age": 19},
{"user": "2", "item": "43", "age": 33},
{"user": "3", "item": "20", "age": 55},
{"user": "4", "item": "10", "age": 20},
]
v = DictVectorizer()
X = v.fit_transform(train)
print(X.toarray())
#[[ 19. 0. 0. 0. 1. 1. 0. 0. 0.]
#[ 33. 0. 0. 1. 0. 0. 1. 0. 0.]
#[ 55. 0. 1. 0. 0. 0. 0. 1. 0.]
#[ 20. 1. 0. 0. 0. 0. 0. 0. 1.]]
y = np.repeat(1.0,X.shape[0])
fm = pylibfm.FM()
fm.fit(X,y)
fm.predict(v.transform({"user": "1", "item": "10", "age": 24}))
C:\Users\sndr\Anaconda3\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
[[19. 0. 0. 0. 1. 1. 0. 0. 0.]
[33. 0. 0. 1. 0. 0. 1. 0. 0.]
[55. 0. 1. 0. 0. 0. 0. 1. 0.]
[20. 1. 0. 0. 0. 0. 0. 0. 1.]]
Creating validation dataset of 0.01 of training for adaptive regularization
-- Epoch 1
Training log loss: 0.13187
so
this warning is for line
from pyfm import pylibfm
it is
C:\Users\sndr\Anaconda3\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
thanks
AttributeError: 'numpy.ndarray' object has no attribute 'indptr'
From
dataset = CSRDataset(X.data, X.indptr, X.indices, y_i, sample_weight)
The training X frame should be a numpy dataset yeah?
After training the model, I get different values of the same item, when I enter more items to the predicting list? Is this supposed to happend or is it a bug?
For eksample:
print fm.predict(v.fit_transform({"user": "1", "item": "10", "age": 24}))
print fm.predict(v.fit_transform([{"user": "1", "item": "10", "age": 24},{"user": "1", "item": "12", "age": 24}]))
both have user 1 and item 10, however the ratings of those would be different...
Only may you please share how to install it for python 3 on widows computer
I use this advice
#11
and get this message
Microsoft Windows [Version 10.0.14393]
(c) 2016 Microsoft Corporation. All rights reserved.
e:\factirizatoin machine including FFM\code\pyFM_master_May5>python setup.py build_clib
running build_clib
e:\factirizatoin machine including FFM\code\pyFM_master_May5>
however when I run it from python
this error happen
File "E:\factirizatoin machine including FFM\code\pyFM_master_May5\myexampl.py", line 1, in
from pyfm import pylibfm
File "E:\factirizatoin machine including FFM\code\pyFM_master_May5\pyfm\pylibfm.py", line 4, in
from pyfm_fast import FM_fast, CSRDataset
builtins.ImportError: No module named 'pyfm_fast'
my friend try to install this on pytohn2 but he gets this error
An exception has occurred, use %tb to see the full traceback.
SystemExit: usage: main.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: main.py --help [cmd1 cmd2 ...]
or: main.py --help-commands
or: main.py cmd --help
error: option -f not recognized
Hi everyone,
I am trying to use ml-1m data to build a rs model for users. What is weird for me is that, the model has a better performance without using the user features. Did i do something wrong when adding the features or is this normal?
Fitting the dataset
dataset = Dataset() dataset.fit(users = (row['UserID'] for index,row in users_df.iterrows()), items = (row['MovieID'] for index,row in movie_df.iterrows()), user_features = set(user_features_flat))
Creating the interaction and feature matrix
(interactions, weights) = dataset.build_interactions((row['UserID'],row['MovieID'],row['rating']) for index,row in ratings_df.iterrows())
user_feature_matrix = dataset.build_user_features((row['UserID'], [row['Gender'],row['Occupation'],row['age_group']]) for index,row in users.iterrows())
Model with user features
model = LightFM(no_components=70, loss='warp',) model.fit(interactions, user_features=user_feature_matrix, item_features=None, sample_weight=None, epochs=70, num_threads=4)
p_k = evaluation.precision_at_k(model, test, k=10, user_features=user_feature_matrix, item_features=None, preserve_rows=False, num_threads=4, check_intersections=True).mean() p_k #0.14658715
Model without
model_cf = LightFM(no_components=70, loss='warp') model_cf.fit(interactions, user_features=None, item_features=None, sample_weight=None, epochs=70, num_threads=4)
p_k_cf = evaluation.precision_at_k(model_cf, test, k=10, user_features=None, item_features=None, preserve_rows=False, num_threads=4, check_intersections=True).mean() p_k_cf #0.1638668
You can see the rmse change as below, using the example from README.
Why does it not converge, and how could I fix it?
--- git/pyFM ‹master* ?› » python example.py
Creating validation dataset of 0.01 of training for adaptive regularization
-- Epoch 1
Training RMSE: 0.49676
-- Epoch 2
Training RMSE: 0.44940
-- Epoch 3
Training RMSE: 0.44133
-- Epoch 4
Training RMSE: 0.43757
-- Epoch 5
Training RMSE: 0.43599
-- Epoch 6
Training RMSE: 0.43494
-- Epoch 7
Training RMSE: 0.43381
-- Epoch 8
Training RMSE: 0.43375
-- Epoch 9
Training RMSE: 0.43324
-- Epoch 10
Training RMSE: 0.43272
-- Epoch 11
Training RMSE: 0.43310
-- Epoch 12
Training RMSE: 0.43255
-- Epoch 13
Training RMSE: 0.43229
-- Epoch 14
Training RMSE: 0.43235
-- Epoch 15
Training RMSE: 0.43214
-- Epoch 16
Training RMSE: 0.43237
-- Epoch 17
Training RMSE: 0.43242
-- Epoch 18
Training RMSE: 0.43247
-- Epoch 19
Training RMSE: 0.43308
-- Epoch 20
Training RMSE: 0.44136
-- Epoch 21
Training RMSE: 0.44681
-- Epoch 22
Training RMSE: 0.44714
-- Epoch 23
Training RMSE: nan
I am trying to install the pyFM in my machine using python27. During the installation I am receiving the following error: LINK: fatal error LNK1181: cannot open input file 'm.lib'
What this error is stand out?
I can't find the way for save the model?Could someone help to solve this? thx
now I can run like this:
fm = pylibfm.FM()
fm.fit(X,y)
fm.predict(v.transform({"user": "1", "item": "10", "age": 24}))
but can't to find the way to save the model
Creating validation dataset of 0.01 of training for adaptive regularization
Traceback (most recent call last):
File "suanfa.py", line 25, in
fm.fit(trainX,trainY)
File "/home/ks/anaconda3/lib/python3.5/site-packages/pyfm/pylibfm.py", line 181, in fit
X_train_dataset = _make_dataset(X_train, train_labels)
File "/home/ks/anaconda3/lib/python3.5/site-packages/pyfm/pylibfm.py", line 239, in _make_dataset
dataset = CSRDataset(X.data, X.indptr, X.indices, y_i, sample_weight)
AttributeError: 'numpy.ndarray' object has no attribute 'indptr'
Can I refer to w0, w, and v of trained model?
If I can, please tell me the way referring to w0, w, and v of trained model.
thanks
This is giving errors, am I missing something?
from scipy import sparse
from sklearn.datasets import load_boston
import pylibfm
fm = pylibfm.FM(num_factors=7, num_iter=6, verbose=True)
load dataset
boston = load_boston()
fit FM, making sure to wrap the ndarray as a sparse csr
fm.fit(sparse.csr_matrix(boston.data), boston.target)
Creating validation dataset of 0.01 of training for adaptive regularization
-- Epoch 1
Training log loss: nan
-- Epoch 2
Training log loss: nan
-- Epoch 3
Training log loss: nan
-- Epoch 4
Training log loss: nan
-- Epoch 5
Training log loss: nan
-- Epoch 6
Training log loss: nan
pylibfm is out of the game. It is slow, it crashes on large datasets, sometimes simply diverge and hardly can compete in quality.
https://arogozhnikov.github.io/2016/02/15/TestingLibFM.html
Great code, thanks !
Plese help to understand
1
will it work for third order categorical features interaction ?
2
will it run on Windows computer ?
3
will it work for sparse data ?
I was using pyFM to classify a data set. When only 50% of the data were used, the program went on normally. But when I tried to use all the data (the amount was about 300,000), an error occurred in pycharm at fm.fit(): Process finished with exit code -1073741819 (0xC0000005). I wonder if it was running out of memory?
Train_x and Test_x are all scipy sparse data, the fm.fit() is running normally, while in predict it come up with the error " indptr not found" when call the function CSRDataset(), why this error not occurred in fit()?
Hi,
From the code I see:
# Regularization Parameters (start with no regularization)
self.reg_0 = 0.0
self.reg_w = 0.0
self.reg_v = np.repeat(0.0, num_factors)
However, I don't see where the regularization parameters are updated. Moreover, after fitting the model, I used model.reg_v to output the regularization parameters, it gave me an array of zeros. I am wondering does the model impose regularization on the model parameters? Thanks
there are many warnings doing installation
pyfm_fast.c(3174): warning C4018: '<': signed/unsigned mismatch
Microsoft Windows [Version 10.0.16299.371]
(c) 2017 Microsoft Corporation. All rights reserved.
E:\Factorisation machens\how to install\pyfm_installation\pyFM-master\pyFM-master>python setup.py build_ext --inplace
running build_ext
cythoning pyfm_fast.pyx to pyfm_fast.c
building 'pyfm_fast' extension
creating build
creating build\temp.win-amd64-3.6
creating build\temp.win-amd64-3.6\Release
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\sndr\Anaconda3\lib\site-packages\numpy\core\include -IC:\Users\sndr\Anaconda3\include -IC:\Users\sndr\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\winrt" /Tcpyfm_fast.c /Fobuild\temp.win-amd64-3.6\Release\pyfm_fast.obj
pyfm_fast.c
c:\users\sndr\anaconda3\lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(12) : Warning Msg: Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
pyfm_fast.c(3174): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(3214): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(3245): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(3843): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(3910): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(3941): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(4885): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(4953): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(4964): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(5505): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(5572): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(5610): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(5996): warning C4018: '<': signed/unsigned mismatch
pyfm_fast.c(14070): warning C4244: 'initializing': conversion from 'double' to 'float', possible loss of data
pyfm_fast.c(14076): warning C4244: 'initializing': conversion from 'double' to 'float', possible loss of data
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Users\sndr\Anaconda3\libs /LIBPATH:C:\Users\sndr\Anaconda3\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\LIB\amd64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.10240.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\8.1\lib\winv6.3\um\x64" /EXPORT:PyInit_pyfm_fast build\temp.win-amd64-3.6\Release\pyfm_fast.obj "/OUT:E:\Factorisation machens\how to install\pyfm_installation\pyFM-master\pyFM-master\pyfm_fast.cp36-win_amd64.pyd" /IMPLIB:build\temp.win-amd64-3.6\Release\pyfm_fast.cp36-win_amd64.lib
pyfm_fast.obj : warning LNK4197: export 'PyInit_pyfm_fast' specified multiple times; using first specification
Creating library build\temp.win-amd64-3.6\Release\pyfm_fast.cp36-win_amd64.lib and object build\temp.win-amd64-3.6\Release\pyfm_fast.cp36-win_amd64.exp
Generating code
Finished generating code
E:\Factorisation machens\how to install\pyfm_installation\pyFM-master\pyFM-master>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.