I have tested three approaches to regression on the Movielens 1M dataset: properMC

I am hoping to get a sense of, how "bad" is the badMCMC

Use what ever gives you the best result. link updated Go with

MCMC vs ALS about fastfm HOT 6 OPEN

ibayer commented on May 28, 2024

MCMC vs ALS

from fastfm.

Comments (6)

ibayer commented on May 28, 2024

Hyper parameter tuning is more art then science, make sure that you also tune, rank, n_iter and even
init_std (not all are equally important).

The fact that MCMC has one l2_reg_V for each layer (rank) gives MCMC indeed an intrinsic advantage.
The code could be extended to have the same number of l2_reg_V for ALS, but that would lead to a hard to tune model. Theoretically MCMC has other advantages (read about Bayesian linear regression) , but how much of a difference that makes depends...
Hard to say, the badMCMC performance should depend a lot on n_iter and this is certainly not the intended way to use mcmc, but if it works...
Have a look at our paper Sample selection for MCMC-based recommender systems, you can get it here without the pay-wall.

from fastfm.

merrellb commented on May 28, 2024

I am hoping to get a sense of, how "bad" is the badMCMC approach? Depending a lot on n_iter still seems a lot easier to optimize than the multiple ALS parameters. However if this is an intrinsically worse approach (loss of information, known weaknesses of the algorithm, etc) I would expect that the seemingly better performance is more of a reflection of (so far) insufficient ALS tuning.
Thank you for the link. I look forward to reading the paper. However the links from your university website seem to all lead to the paywall (perhaps you have an account/cookie that lets you bypass this)?
My goal may be unusual as I am using RMSE to measure performance and tune the model but I am actually most interested in V_ and the column (rank sized) of each feature. In the Movielens context I can use these for user similarity and movie clustering or in a Word2Vec context they can serve as word "embeddings". I would love to use MCMC for this (who doesn't want less tuning) but I am struggling to understand if I need to switch over to ALS or SGD if all I care about is a simple single vector for each feature.

from fastfm.

ibayer commented on May 28, 2024

Use what ever gives you the best result.
link updated
Go with ALS, MCMC doesn't really make sense for user / item embeddings (you would get a different embedding for each iteration). You can use MCMC for the hyper-parameter search if that helps in your case.
Calling predict on an mcmc model gives you a "ALS" prediction using the hyper-parameter from the last mcmc chain.

from fastfm.

merrellb commented on May 28, 2024

This seems a bit in conflict with the answer to Question 4 where ALS is suggested as superior in this situation.
I've done a hard reload on your webpage and it still seems to be linking to the paywall.
4a) So you are saying predict on a well tuned ALS model should represent the embeddings better than predict on MCMC which must use the less meaningful parameters from the last mcmc chain?
4b) Is there any sort of convergence going on where the parameters at the current end of the mcmc chain are approaching the parameters we might get from ALS (or at least moving towards stability)?

While I certainly will use whatever gives me best results I am new to the "art" of tuning the parameters. It isn't clear to me if the superior MCMC results I see are because I need to invest more effort into tuning ALS, or if MCMC can actually "beat" ALS even when used "improperly" (your response to Question 4 seems to suggest it shouldn't)?

Thanks for all your help.

from fastfm.

ibayer commented on May 28, 2024

This seems a bit in conflict with the answer to Question 4 where ALS is suggested as superior in this situation.

No, I didn't say superior " (you would get a different embedding for each iteration)"

I've done a hard reload on your webpage and it still seems to be linking to the paywall.

works for me, here the link again: http://www.informatik.uni-konstanz.de/rendle/pub0/

4b) Is there any sort of convergence going on where the parameters at the current end of the mcmc chain are approaching the parameters we might get from ALS (or at least moving towards stability)?

No, it's not even clear how to define the end of a mcmc chain. :)

from fastfm.

merrellb commented on May 28, 2024

Thanks, the paper link seems to work now (perhaps there was some caching earlier)

My apologies for the lack of familiarity with the terminology. Instead of the end of a mcmc chain I meant "the hyper-parameter from the last mcmc chain" which predict uses with mcmc. This approach seems to be strongly cautioned in the docs as "This evaluation is fast but usually of low quality." I am therefore surprised to see results that seem to consistently beat ALS with defaults (and some basic regularization tuning). I am trying to understand if this means that I need to tune ALS more (as it seems like it should be able to beat "low quality" predictions) or if the warning is giving me the wrong impression of the limitations of MCMC.

from fastfm.

MCMC vs ALS about fastfm HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent