<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

WRMF user and item biases for implicit feedback data about rsparse HOT 15 CLOSED

dselivanov commented on May 27, 2024

WRMF user and item biases for implicit feedback data

from rsparse.

Comments (15)

dselivanov commented on May 27, 2024

@david-cortes that's interesting information. Do biases help to reduce loss?

My intuition is that item biases should be highly correlated with item popularity. I would expect them help significantly for users with few interactions.

As for centering - I'm not sure if it will help.

from rsparse.

david-cortes commented on May 27, 2024

I'm realizing there might be a bug with the item biases in cmfrec. But here's a comparison with centering and user biases for now:

library(Matrix)
library(rsparse)
library(cmfrec)
data("movielens100k")

eval.full.loss <-  function(X, W, A, B, user_bias=NULL, item_bias=NULL, glob_mean=NULL) {
    Xdense <- as(X, "matrix")
    X@x <- W
    Wdense <- as(X, "matrix")
    pred <- A %*% t(B)
    if (!is.null(glob_mean))
        pred <- pred + glob_mean
    if (NROW(user_bias))
        pred <- pred + user_bias
    if (NROW(item_bias))
        pred <- pred + item_bias
    err <- Wdense * ((Xdense - pred)^2)
    return(mean(err))
}

Xcoo <- as(movielens100k, "TsparseMatrix")
Xvalues <- Xcoo@x
Xcoo@x <- rep(1, length(Xcoo@x))

set.seed(123)
m.nobias.nocenter <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
                         center=FALSE, user_bias=FALSE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
               t(m.nobias.nocenter$matrices$A),
               t(m.nobias.nocenter$matrices$B))



set.seed(123)
m.nobias.center <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
                       center=TRUE, user_bias=FALSE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
               t(m.nobias.center$matrices$A),
               t(m.nobias.center$matrices$B),
               glob_mean = m.nobias.center$matrices$glob_mean)


set.seed(123)
m.userbias.center <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
                         center=TRUE, user_bias=TRUE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
               t(m.userbias.center$matrices$A),
               t(m.userbias.center$matrices$B),
               user_bias = m.userbias.center$matrices$user_bias,
               glob_mean = m.userbias.center$matrices$glob_mean)

Results:

[1] 0.04399711
[1] 0.04234777
[1] 0.04188822

So the change doesn't look too big.

from rsparse.

david-cortes commented on May 27, 2024

@dselivanov My bad, there's actually no bug in cmfrec, it's this loss function that was wrong. Doing the experiment again:

EDIT: just realized that R doesn't do matrix + row-vector sums, here's the function redone and compared with all the biases

eval.full.loss <-  function(X, W, A, B, user_bias=NULL, item_bias=NULL, glob_mean=NULL) {
    Xdense <- as(X, "matrix")
    X@x <- W - 1
    Wdense <- as(X, "matrix") + 1
    pred <- A %*% t(B)
    if (!is.null(glob_mean))
        pred <- pred + glob_mean
    if (NROW(user_bias))
        pred <- pred + user_bias
    if (NROW(item_bias))
        pred <- pred + matrix(rep(item_bias, nrow(pred)), nrow=nrow(pred), byrow = TRUE)
    err <- Wdense * ((Xdense - pred)^2)
    return(mean(err))
}

[1] 0.0726521 ## no bias, no center
[1] 0.07276726 ## center
[1] 0.07216536 ## center + user bias
[1] 0.07158445 ## center + user bias + item bias

So in the end the item biases do bring some small improvement in terms of loss, and my earlier tests were wrong (was summing them incorrectly).

from rsparse.

dselivanov commented on May 27, 2024

@david-cortes thanks for example. I will try to play with cmfrec and user-item biases. Could you provide a link to the code where user-item biases for implicit feedback implemented?

from rsparse.

david-cortes commented on May 27, 2024

It's implemented through different functions. First it has some aggregated steps like this:
https://github.com/david-cortes/cmfrec/blob/259057fcb59f2c0115f9737c6a18cbe1347925e9/src/collective.c#L7518

Then it calls function factors_closed_form in a loop here:
https://github.com/david-cortes/cmfrec/blob/259057fcb59f2c0115f9737c6a18cbe1347925e9/src/common.c#L625
(key there are the variables named "bias")

If using the CG method, that function will then end up calling this other one:
https://github.com/david-cortes/cmfrec/blob/259057fcb59f2c0115f9737c6a18cbe1347925e9/src/common.c#L1067

But overall, the idea is that you're solving a system like this:

solve(t(W*X)*X + diag(lambda),   t(W*X)(Y-glob_bias-column_bias)

In which the RHS can be decomposed into some parts that apply to all rows and some parts that turn to zero for missing entries:

t(W*X)*(Y)  - t((W-1)*X*(glob_bias+column_bias)) - t(X)*(glob_bias+column_bias)

from rsparse.

dselivanov commented on May 27, 2024

@david-cortes I have some challenges with cmfrec... Would be great if you can provide an example on how to predict top n items for new users. Here I've put a template:

library(Matrix)
library(rsparse)
library(cmfrec)
data(movielens100k)

set.seed(1)
# take 100 users for validation
i = sample(nrow(movielens100k), 100)

val = movielens100k[i, ]
train = movielens100k[-i, ]

# now mark 30% of the interactions as observed and
# 70% as unobserved - will evaluate map@k at these 70%
val_split = rsparse:::train_test_split(val, test_proportion = 0.7)
str(val_split)

List of 2
$ train:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:3039] 10 13 24 32 45 46 52 54 55 59 ...
.. ..@ p : int [1:1683] 0 13 17 20 26 30 31 42 53 61 ...
.. ..@ Dim : int [1:2] 100 1682
.. ..@ Dimnames:List of 2
.. .. ..$ : chr [1:100] "836" "679" "129" "930" ...
.. .. ..$ : chr [1:1682] "1" "2" "3" "4" ...
.. ..@ x : num [1:3039] 3 3 4 3 4 2 5 5 4 5 ...
.. ..@ factors : list()
$ test :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:6983] 1 3 5 6 9 11 14 30 31 33 ...
.. ..@ p : int [1:1683] 0 41 48 54 66 71 72 102 117 137 ...
.. ..@ Dim : int [1:2] 100 1682
.. ..@ Dimnames:List of 2
.. .. ..$ : chr [1:100] "836" "679" "129" "930" ...
.. .. ..$ : chr [1:1682] "1" "2" "3" "4" ...
.. ..@ x : num [1:6983] 3 3 4 3 5 4 5 4 3 3 ...
.. ..@ factors : list()

train = as(train, "TsparseMatrix")
w = train@x
train@x = rep(1, length(train@x))

model = CMF(train, 
            weight = w, 
            NA_as_zero = TRUE, 
            k = 10, 
            verbose = TRUE, 
            center = FALSE, 
            user_bias = FALSE, 
            item_bias = FALSE)

Now the question is how can I use topN_new() to make predictions based on val_split$trainand validate against val_split$test

from rsparse.

david-cortes commented on May 27, 2024

Sorry had a bug with topN that threw incorrect results when not using biases, fixed now.

The easiest way to use that function is to pass the data as a sparseVector from Matrix:

val.csr = as(val, "RsparseMatrix")
w.val = val.csr@x
val.csr@x = rep(1, length(val.csr@x))

### TopN for row 1 in val
topN_new(model, as(val.csr[1L, , drop=FALSE], "sparseVector"),
         weight = w.val[seq(val.csr@p[1L] + 1L, val.csr@p[2L])])

Although you could also take the matrices and replace the values of $components from a WRMF object. These are available under model$matrices (A is the user factors, B is the item factors, and the biases are user_bias, item_bias, glob_mean).

from rsparse.

dselivanov commented on May 27, 2024

Although you could also take the matrices and replace the values of $components from a WRMF object. These are available under model$matrices (A is the user factors, B is the item factors, and the biases are user_bias, item_bias, glob_mean)

But still there is an ALS step when item-embeddings are fixed. And this should call cmfrec solver under the hood.

map@k

here how I calculate map@k with cmfrec

predict_cmfrec = function(model, X) {
  n = nrow(X)
  X = as(X, "RsparseMatrix")
  res = lapply(seq_len(n), function (i) {
    if (i %% 10 == 0)message(sprintf("%d/%d", i, n))
    x = as(X[i, , drop=FALSE], "sparseVector")
    w = x@x
    x@x = rep(1, length(x@x))
    preds = topN_new(
      model,
      x,
      weight = w,
      exclude = x@i
    )
    preds
  })
  do.call(rbind, res)
}

preds = predict_cmfrec(model, val_split$train)
mean(rsparse::ap_k(preds, val_split$test))

For lastfm360 it looks there is a moderate lift in map@10:

0.2880954 without user and item biases
0.298398 with user and item biases

from rsparse.

dselivanov commented on May 27, 2024

@david-cortes also when lambda > 0 I observe very strong correlation between item bias and item popularity. Interestingly that when lambda close to 0 this is not the case. I think there might be chance that are some issues in the code related to this fact.

from rsparse.

david-cortes commented on May 27, 2024

I'll have to guess that this increment in MAP@10 is less than what you'd see from using k+2 factors, ergo not worth it.

The low correlation might be due to numerical instability when using too small lambda. By default it uses a GC solver and then switches to Cholesky in the last iteration, so perhaps it'd look a bit better using finalize_chol=FALSE.

Or perhaps could be due to how you're measuring popularity. There is also a model MostPopular which will calculate only the biases, using their closed-form solution:

model = MostPopular(X, implicit=TRUE, lambda=0)

from rsparse.

dselivanov commented on May 27, 2024

I'll have to guess that this increment in MAP@10 is less than what you'd see from using k+2 factors, ergo not worth it.

Well, it is rather worth to compare to the model with k+1 factors. Also biases should make a huge difference for users with few/no interactions.

from rsparse.

dselivanov commented on May 27, 2024

What I've figured out so far:

only code for rhs is affected
rhs = X * C_u * (p_u - x_biases) = X * eye * (0 - x_biases) + X * diag(1 + f(r_ui)) * (1 - x_biases)
rhs_init = X * eye * (0 - x_biases) = -X * x_biases can be precomputed
then for each user we calculate
- rhs = rhs_init + X.cols(idx_nnz) * x_biases(idx_nnz) - removing p=0 terms from init
- rhs = rhs + X.cols(idx_nnz) * diag(confidence(idx_nnz)) * (1 - x_biases(idx_nnz)) - adding p=1 terms

from rsparse.

david-cortes commented on May 27, 2024

But that can be simplified further:

rhs = rhs_init + X_nnz * C_u - X_nnz * ((C_u-1) * (x_biases))
rhs = rhs_init + X_nnz * (C_u - (C_u-1) * x_biases)

That way you also avoid an extra matrix multiplication.

from rsparse.

dselivanov commented on May 27, 2024

Yeah, I've done that in #54 . However results are quite different from cmfrec and map@k significantly worse compared to the model without biases.
User/item biases on other side are highly correlated with popularity...

from rsparse.

david-cortes commented on May 27, 2024

By the way, it's also straightforward to add it to the CG method: you just need to modify the calculation for the first residual.

from rsparse.

WRMF user and item biases for implicit feedback data about rsparse HOT 15 CLOSED

Comments (15)

map@k

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent