Coder Social home page Coder Social logo

Comments (15)

dselivanov avatar dselivanov commented on May 27, 2024

@david-cortes that's interesting information. Do biases help to reduce loss?

My intuition is that item biases should be highly correlated with item popularity. I would expect them help significantly for users with few interactions.

As for centering - I'm not sure if it will help.

from rsparse.

david-cortes avatar david-cortes commented on May 27, 2024

I'm realizing there might be a bug with the item biases in cmfrec. But here's a comparison with centering and user biases for now:

library(Matrix)
library(rsparse)
library(cmfrec)
data("movielens100k")

eval.full.loss <-  function(X, W, A, B, user_bias=NULL, item_bias=NULL, glob_mean=NULL) {
    Xdense <- as(X, "matrix")
    X@x <- W
    Wdense <- as(X, "matrix")
    pred <- A %*% t(B)
    if (!is.null(glob_mean))
        pred <- pred + glob_mean
    if (NROW(user_bias))
        pred <- pred + user_bias
    if (NROW(item_bias))
        pred <- pred + item_bias
    err <- Wdense * ((Xdense - pred)^2)
    return(mean(err))
}

Xcoo <- as(movielens100k, "TsparseMatrix")
Xvalues <- Xcoo@x
Xcoo@x <- rep(1, length(Xcoo@x))

set.seed(123)
m.nobias.nocenter <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
                         center=FALSE, user_bias=FALSE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
               t(m.nobias.nocenter$matrices$A),
               t(m.nobias.nocenter$matrices$B))



set.seed(123)
m.nobias.center <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
                       center=TRUE, user_bias=FALSE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
               t(m.nobias.center$matrices$A),
               t(m.nobias.center$matrices$B),
               glob_mean = m.nobias.center$matrices$glob_mean)


set.seed(123)
m.userbias.center <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
                         center=TRUE, user_bias=TRUE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
               t(m.userbias.center$matrices$A),
               t(m.userbias.center$matrices$B),
               user_bias = m.userbias.center$matrices$user_bias,
               glob_mean = m.userbias.center$matrices$glob_mean)

Results:

[1] 0.04399711
[1] 0.04234777
[1] 0.04188822

So the change doesn't look too big.

from rsparse.

david-cortes avatar david-cortes commented on May 27, 2024

@dselivanov My bad, there's actually no bug in cmfrec, it's this loss function that was wrong. Doing the experiment again:

EDIT: just realized that R doesn't do matrix + row-vector sums, here's the function redone and compared with all the biases

eval.full.loss <-  function(X, W, A, B, user_bias=NULL, item_bias=NULL, glob_mean=NULL) {
    Xdense <- as(X, "matrix")
    X@x <- W - 1
    Wdense <- as(X, "matrix") + 1
    pred <- A %*% t(B)
    if (!is.null(glob_mean))
        pred <- pred + glob_mean
    if (NROW(user_bias))
        pred <- pred + user_bias
    if (NROW(item_bias))
        pred <- pred + matrix(rep(item_bias, nrow(pred)), nrow=nrow(pred), byrow = TRUE)
    err <- Wdense * ((Xdense - pred)^2)
    return(mean(err))
}
[1] 0.0726521 ## no bias, no center
[1] 0.07276726 ## center
[1] 0.07216536 ## center + user bias
[1] 0.07158445 ## center + user bias + item bias

So in the end the item biases do bring some small improvement in terms of loss, and my earlier tests were wrong (was summing them incorrectly).

from rsparse.

dselivanov avatar dselivanov commented on May 27, 2024

@david-cortes thanks for example. I will try to play with cmfrec and user-item biases. Could you provide a link to the code where user-item biases for implicit feedback implemented?

from rsparse.

david-cortes avatar david-cortes commented on May 27, 2024

It's implemented through different functions. First it has some aggregated steps like this:
https://github.com/david-cortes/cmfrec/blob/259057fcb59f2c0115f9737c6a18cbe1347925e9/src/collective.c#L7518

Then it calls function factors_closed_form in a loop here:
https://github.com/david-cortes/cmfrec/blob/259057fcb59f2c0115f9737c6a18cbe1347925e9/src/common.c#L625
(key there are the variables named "bias")

If using the CG method, that function will then end up calling this other one:
https://github.com/david-cortes/cmfrec/blob/259057fcb59f2c0115f9737c6a18cbe1347925e9/src/common.c#L1067

But overall, the idea is that you're solving a system like this:

solve(t(W*X)*X + diag(lambda),   t(W*X)(Y-glob_bias-column_bias)

In which the RHS can be decomposed into some parts that apply to all rows and some parts that turn to zero for missing entries:

t(W*X)*(Y)  - t((W-1)*X*(glob_bias+column_bias)) - t(X)*(glob_bias+column_bias)

from rsparse.

dselivanov avatar dselivanov commented on May 27, 2024

@david-cortes I have some challenges with cmfrec... Would be great if you can provide an example on how to predict top n items for new users. Here I've put a template:

library(Matrix)
library(rsparse)
library(cmfrec)
data(movielens100k)

set.seed(1)
# take 100 users for validation
i = sample(nrow(movielens100k), 100)

val = movielens100k[i, ]
train = movielens100k[-i, ]

# now mark 30% of the interactions as observed and
# 70% as unobserved - will evaluate map@k at these 70%
val_split = rsparse:::train_test_split(val, test_proportion = 0.7)
str(val_split)

List of 2
$ train:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:3039] 10 13 24 32 45 46 52 54 55 59 ...
.. ..@ p : int [1:1683] 0 13 17 20 26 30 31 42 53 61 ...
.. ..@ Dim : int [1:2] 100 1682
.. ..@ Dimnames:List of 2
.. .. ..$ : chr [1:100] "836" "679" "129" "930" ...
.. .. ..$ : chr [1:1682] "1" "2" "3" "4" ...
.. ..@ x : num [1:3039] 3 3 4 3 4 2 5 5 4 5 ...
.. ..@ factors : list()
$ test :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:6983] 1 3 5 6 9 11 14 30 31 33 ...
.. ..@ p : int [1:1683] 0 41 48 54 66 71 72 102 117 137 ...
.. ..@ Dim : int [1:2] 100 1682
.. ..@ Dimnames:List of 2
.. .. ..$ : chr [1:100] "836" "679" "129" "930" ...
.. .. ..$ : chr [1:1682] "1" "2" "3" "4" ...
.. ..@ x : num [1:6983] 3 3 4 3 5 4 5 4 3 3 ...
.. ..@ factors : list()

train = as(train, "TsparseMatrix")
w = train@x
train@x = rep(1, length(train@x))

model = CMF(train, 
            weight = w, 
            NA_as_zero = TRUE, 
            k = 10, 
            verbose = TRUE, 
            center = FALSE, 
            user_bias = FALSE, 
            item_bias = FALSE)

Now the question is how can I use topN_new() to make predictions based on val_split$trainand validate against val_split$test

from rsparse.

david-cortes avatar david-cortes commented on May 27, 2024

Sorry had a bug with topN that threw incorrect results when not using biases, fixed now.

The easiest way to use that function is to pass the data as a sparseVector from Matrix:

val.csr = as(val, "RsparseMatrix")
w.val = val.csr@x
val.csr@x = rep(1, length(val.csr@x))

### TopN for row 1 in val
topN_new(model, as(val.csr[1L, , drop=FALSE], "sparseVector"),
         weight = w.val[seq(val.csr@p[1L] + 1L, val.csr@p[2L])])

Although you could also take the matrices and replace the values of $components from a WRMF object. These are available under model$matrices (A is the user factors, B is the item factors, and the biases are user_bias, item_bias, glob_mean).

from rsparse.

dselivanov avatar dselivanov commented on May 27, 2024

Although you could also take the matrices and replace the values of $components from a WRMF object. These are available under model$matrices (A is the user factors, B is the item factors, and the biases are user_bias, item_bias, glob_mean)

But still there is an ALS step when item-embeddings are fixed. And this should call cmfrec solver under the hood.

map@k

here how I calculate map@k with cmfrec

predict_cmfrec = function(model, X) {
  n = nrow(X)
  X = as(X, "RsparseMatrix")
  res = lapply(seq_len(n), function (i) {
    if (i %% 10 == 0)message(sprintf("%d/%d", i, n))
    x = as(X[i, , drop=FALSE], "sparseVector")
    w = x@x
    x@x = rep(1, length(x@x))
    preds = topN_new(
      model,
      x,
      weight = w,
      exclude = x@i
    )
    preds
  })
  do.call(rbind, res)
}

preds = predict_cmfrec(model, val_split$train)
mean(rsparse::ap_k(preds, val_split$test))

For lastfm360 it looks there is a moderate lift in map@10:

  • 0.2880954 without user and item biases
  • 0.298398 with user and item biases

from rsparse.

dselivanov avatar dselivanov commented on May 27, 2024

@david-cortes also when lambda > 0 I observe very strong correlation between item bias and item popularity. Interestingly that when lambda close to 0 this is not the case. I think there might be chance that are some issues in the code related to this fact.

from rsparse.

david-cortes avatar david-cortes commented on May 27, 2024

I'll have to guess that this increment in MAP@10 is less than what you'd see from using k+2 factors, ergo not worth it.

The low correlation might be due to numerical instability when using too small lambda. By default it uses a GC solver and then switches to Cholesky in the last iteration, so perhaps it'd look a bit better using finalize_chol=FALSE.

Or perhaps could be due to how you're measuring popularity. There is also a model MostPopular which will calculate only the biases, using their closed-form solution:

model = MostPopular(X, implicit=TRUE, lambda=0)

from rsparse.

dselivanov avatar dselivanov commented on May 27, 2024

I'll have to guess that this increment in MAP@10 is less than what you'd see from using k+2 factors, ergo not worth it.

Well, it is rather worth to compare to the model with k+1 factors. Also biases should make a huge difference for users with few/no interactions.

from rsparse.

dselivanov avatar dselivanov commented on May 27, 2024

What I've figured out so far:

  • only code for rhs is affected

  • rhs = X * C_u * (p_u - x_biases) = X * eye * (0 - x_biases) + X * diag(1 + f(r_ui)) * (1 - x_biases)

  • rhs_init = X * eye * (0 - x_biases) = -X * x_biases can be precomputed

  • then for each user we calculate

    • rhs = rhs_init + X.cols(idx_nnz) * x_biases(idx_nnz) - removing p=0 terms from init
    • rhs = rhs + X.cols(idx_nnz) * diag(confidence(idx_nnz)) * (1 - x_biases(idx_nnz)) - adding p=1 terms

from rsparse.

david-cortes avatar david-cortes commented on May 27, 2024

But that can be simplified further:

rhs = rhs_init + X_nnz * C_u - X_nnz * ((C_u-1) * (x_biases))
rhs = rhs_init + X_nnz * (C_u - (C_u-1) * x_biases)

That way you also avoid an extra matrix multiplication.

from rsparse.

dselivanov avatar dselivanov commented on May 27, 2024

Yeah, I've done that in #54 . However results are quite different from cmfrec and map@k significantly worse compared to the model without biases.
User/item biases on other side are highly correlated with popularity...

from rsparse.

david-cortes avatar david-cortes commented on May 27, 2024

By the way, it's also straightforward to add it to the CG method: you just need to modify the calculation for the first residual.

from rsparse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.