Comments (15)
@david-cortes that's interesting information. Do biases help to reduce loss?
My intuition is that item biases should be highly correlated with item popularity. I would expect them help significantly for users with few interactions.
As for centering - I'm not sure if it will help.
from rsparse.
I'm realizing there might be a bug with the item biases in cmfrec
. But here's a comparison with centering and user biases for now:
library(Matrix)
library(rsparse)
library(cmfrec)
data("movielens100k")
eval.full.loss <- function(X, W, A, B, user_bias=NULL, item_bias=NULL, glob_mean=NULL) {
Xdense <- as(X, "matrix")
X@x <- W
Wdense <- as(X, "matrix")
pred <- A %*% t(B)
if (!is.null(glob_mean))
pred <- pred + glob_mean
if (NROW(user_bias))
pred <- pred + user_bias
if (NROW(item_bias))
pred <- pred + item_bias
err <- Wdense * ((Xdense - pred)^2)
return(mean(err))
}
Xcoo <- as(movielens100k, "TsparseMatrix")
Xvalues <- Xcoo@x
Xcoo@x <- rep(1, length(Xcoo@x))
set.seed(123)
m.nobias.nocenter <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
center=FALSE, user_bias=FALSE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
t(m.nobias.nocenter$matrices$A),
t(m.nobias.nocenter$matrices$B))
set.seed(123)
m.nobias.center <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
center=TRUE, user_bias=FALSE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
t(m.nobias.center$matrices$A),
t(m.nobias.center$matrices$B),
glob_mean = m.nobias.center$matrices$glob_mean)
set.seed(123)
m.userbias.center <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
center=TRUE, user_bias=TRUE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
t(m.userbias.center$matrices$A),
t(m.userbias.center$matrices$B),
user_bias = m.userbias.center$matrices$user_bias,
glob_mean = m.userbias.center$matrices$glob_mean)
Results:
[1] 0.04399711
[1] 0.04234777
[1] 0.04188822
So the change doesn't look too big.
from rsparse.
@dselivanov My bad, there's actually no bug in cmfrec
, it's this loss function that was wrong. Doing the experiment again:
EDIT: just realized that R doesn't do matrix + row-vector sums, here's the function redone and compared with all the biases
eval.full.loss <- function(X, W, A, B, user_bias=NULL, item_bias=NULL, glob_mean=NULL) {
Xdense <- as(X, "matrix")
X@x <- W - 1
Wdense <- as(X, "matrix") + 1
pred <- A %*% t(B)
if (!is.null(glob_mean))
pred <- pred + glob_mean
if (NROW(user_bias))
pred <- pred + user_bias
if (NROW(item_bias))
pred <- pred + matrix(rep(item_bias, nrow(pred)), nrow=nrow(pred), byrow = TRUE)
err <- Wdense * ((Xdense - pred)^2)
return(mean(err))
}
[1] 0.0726521 ## no bias, no center
[1] 0.07276726 ## center
[1] 0.07216536 ## center + user bias
[1] 0.07158445 ## center + user bias + item bias
So in the end the item biases do bring some small improvement in terms of loss, and my earlier tests were wrong (was summing them incorrectly).
from rsparse.
@david-cortes thanks for example. I will try to play with cmfrec
and user-item biases. Could you provide a link to the code where user-item biases for implicit feedback implemented?
from rsparse.
It's implemented through different functions. First it has some aggregated steps like this:
https://github.com/david-cortes/cmfrec/blob/259057fcb59f2c0115f9737c6a18cbe1347925e9/src/collective.c#L7518
Then it calls function factors_closed_form
in a loop here:
https://github.com/david-cortes/cmfrec/blob/259057fcb59f2c0115f9737c6a18cbe1347925e9/src/common.c#L625
(key there are the variables named "bias")
If using the CG method, that function will then end up calling this other one:
https://github.com/david-cortes/cmfrec/blob/259057fcb59f2c0115f9737c6a18cbe1347925e9/src/common.c#L1067
But overall, the idea is that you're solving a system like this:
solve(t(W*X)*X + diag(lambda), t(W*X)(Y-glob_bias-column_bias)
In which the RHS can be decomposed into some parts that apply to all rows and some parts that turn to zero for missing entries:
t(W*X)*(Y) - t((W-1)*X*(glob_bias+column_bias)) - t(X)*(glob_bias+column_bias)
from rsparse.
@david-cortes I have some challenges with cmfrec... Would be great if you can provide an example on how to predict top n items for new users. Here I've put a template:
library(Matrix)
library(rsparse)
library(cmfrec)
data(movielens100k)
set.seed(1)
# take 100 users for validation
i = sample(nrow(movielens100k), 100)
val = movielens100k[i, ]
train = movielens100k[-i, ]
# now mark 30% of the interactions as observed and
# 70% as unobserved - will evaluate map@k at these 70%
val_split = rsparse:::train_test_split(val, test_proportion = 0.7)
str(val_split)
List of 2
$ train:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:3039] 10 13 24 32 45 46 52 54 55 59 ...
.. ..@ p : int [1:1683] 0 13 17 20 26 30 31 42 53 61 ...
.. ..@ Dim : int [1:2] 100 1682
.. ..@ Dimnames:List of 2
.. .. ..$ : chr [1:100] "836" "679" "129" "930" ...
.. .. ..$ : chr [1:1682] "1" "2" "3" "4" ...
.. ..@ x : num [1:3039] 3 3 4 3 4 2 5 5 4 5 ...
.. ..@ factors : list()
$ test :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:6983] 1 3 5 6 9 11 14 30 31 33 ...
.. ..@ p : int [1:1683] 0 41 48 54 66 71 72 102 117 137 ...
.. ..@ Dim : int [1:2] 100 1682
.. ..@ Dimnames:List of 2
.. .. ..$ : chr [1:100] "836" "679" "129" "930" ...
.. .. ..$ : chr [1:1682] "1" "2" "3" "4" ...
.. ..@ x : num [1:6983] 3 3 4 3 5 4 5 4 3 3 ...
.. ..@ factors : list()
train = as(train, "TsparseMatrix")
w = train@x
train@x = rep(1, length(train@x))
model = CMF(train,
weight = w,
NA_as_zero = TRUE,
k = 10,
verbose = TRUE,
center = FALSE,
user_bias = FALSE,
item_bias = FALSE)
Now the question is how can I use topN_new()
to make predictions based on val_split$train
and validate against val_split$test
from rsparse.
Sorry had a bug with topN
that threw incorrect results when not using biases, fixed now.
The easiest way to use that function is to pass the data as a sparseVector
from Matrix
:
val.csr = as(val, "RsparseMatrix")
w.val = val.csr@x
val.csr@x = rep(1, length(val.csr@x))
### TopN for row 1 in val
topN_new(model, as(val.csr[1L, , drop=FALSE], "sparseVector"),
weight = w.val[seq(val.csr@p[1L] + 1L, val.csr@p[2L])])
Although you could also take the matrices and replace the values of $components
from a WRMF
object. These are available under model$matrices
(A
is the user factors, B
is the item factors, and the biases are user_bias
, item_bias
, glob_mean
).
from rsparse.
Although you could also take the matrices and replace the values of $components from a WRMF object. These are available under model$matrices (A is the user factors, B is the item factors, and the biases are user_bias, item_bias, glob_mean)
But still there is an ALS step when item-embeddings are fixed. And this should call cmfrec
solver under the hood.
map@k
here how I calculate map@k with cmfrec
predict_cmfrec = function(model, X) {
n = nrow(X)
X = as(X, "RsparseMatrix")
res = lapply(seq_len(n), function (i) {
if (i %% 10 == 0)message(sprintf("%d/%d", i, n))
x = as(X[i, , drop=FALSE], "sparseVector")
w = x@x
x@x = rep(1, length(x@x))
preds = topN_new(
model,
x,
weight = w,
exclude = x@i
)
preds
})
do.call(rbind, res)
}
preds = predict_cmfrec(model, val_split$train)
mean(rsparse::ap_k(preds, val_split$test))
For lastfm360 it looks there is a moderate lift in map@10:
- 0.2880954 without user and item biases
- 0.298398 with user and item biases
from rsparse.
@david-cortes also when lambda > 0
I observe very strong correlation between item bias and item popularity. Interestingly that when lambda close to 0 this is not the case. I think there might be chance that are some issues in the code related to this fact.
from rsparse.
I'll have to guess that this increment in MAP@10 is less than what you'd see from using k+2 factors, ergo not worth it.
The low correlation might be due to numerical instability when using too small lambda. By default it uses a GC solver and then switches to Cholesky in the last iteration, so perhaps it'd look a bit better using finalize_chol=FALSE
.
Or perhaps could be due to how you're measuring popularity. There is also a model MostPopular
which will calculate only the biases, using their closed-form solution:
model = MostPopular(X, implicit=TRUE, lambda=0)
from rsparse.
I'll have to guess that this increment in MAP@10 is less than what you'd see from using k+2 factors, ergo not worth it.
Well, it is rather worth to compare to the model with k+1 factors. Also biases should make a huge difference for users with few/no interactions.
from rsparse.
What I've figured out so far:
-
only code for rhs is affected
-
rhs = X * C_u * (p_u - x_biases) = X * eye * (0 - x_biases) + X * diag(1 + f(r_ui)) * (1 - x_biases)
-
rhs_init = X * eye * (0 - x_biases) = -X * x_biases
can be precomputed -
then for each user we calculate
rhs = rhs_init + X.cols(idx_nnz) * x_biases(idx_nnz)
- removing p=0 terms from initrhs = rhs + X.cols(idx_nnz) * diag(confidence(idx_nnz)) * (1 - x_biases(idx_nnz))
- adding p=1 terms
from rsparse.
But that can be simplified further:
rhs = rhs_init + X_nnz * C_u - X_nnz * ((C_u-1) * (x_biases))
rhs = rhs_init + X_nnz * (C_u - (C_u-1) * x_biases)
That way you also avoid an extra matrix multiplication.
from rsparse.
Yeah, I've done that in #54 . However results are quite different from cmfrec and map@k significantly worse compared to the model without biases.
User/item biases on other side are highly correlated with popularity...
from rsparse.
By the way, it's also straightforward to add it to the CG method: you just need to modify the calculation for the first residual.
from rsparse.
Related Issues (20)
- item_exclude HOT 2
- devtools::install_github("dselivanov/rsparse") Win7 Will not compile. HOT 15
- Classification Using Factorization Machines HOT 2
- How to use item_exclude HOT 1
- future float R version dependency HOT 2
- Error loading rsparse after install HOT 5
- Embarrassingly Shallow Autoencoders for Sparse Data HOT 1
- EigenRec: Generalizing PureSVD for Effective and Efficient Top-N Recommendations HOT 2
- HybridSVD: When Collaborative Information is Not Enough
- Optimization objective under explicit feedback HOT 7
- Non-negativity constraints HOT 10
- Cholesky solver HOT 8
- Instability in rsparse::WRMF convergence and loss function HOT 2
- Development version failing compilation with devtools::install_github("rexyai/rsparse") HOT 1
- user and item biases in WRMF and explicit feedback HOT 30
- Dead wikipedia link
- Configure script doesn't pick OpenMP
- Huge performance degradataion for WRMF HOT 1
- Q; Python wrappers? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rsparse.