Comments (6)
Hyper parameter tuning is more art then science, make sure that you also tune, rank, n_iter and even
init_std (not all are equally important).
- The fact that MCMC has one
l2_reg_V
for each layer (rank) gives MCMC indeed an intrinsic advantage.
The code could be extended to have the same number ofl2_reg_V
for ALS, but that would lead to a hard to tune model. Theoretically MCMC has other advantages (read about Bayesian linear regression) , but how much of a difference that makes depends... - Hard to say, the badMCMC performance should depend a lot on
n_iter
and this is certainly not the intended way to use mcmc, but if it works... - Have a look at our paper Sample selection for MCMC-based recommender systems, you can get it here without the pay-wall.
from fastfm.
-
I am hoping to get a sense of, how "bad" is the badMCMC approach? Depending a lot on
n_iter
still seems a lot easier to optimize than the multiple ALS parameters. However if this is an intrinsically worse approach (loss of information, known weaknesses of the algorithm, etc) I would expect that the seemingly better performance is more of a reflection of (so far) insufficient ALS tuning. -
Thank you for the link. I look forward to reading the paper. However the links from your university website seem to all lead to the paywall (perhaps you have an account/cookie that lets you bypass this)?
-
My goal may be unusual as I am using RMSE to measure performance and tune the model but I am actually most interested in V_ and the column (rank sized) of each feature. In the Movielens context I can use these for user similarity and movie clustering or in a Word2Vec context they can serve as word "embeddings". I would love to use MCMC for this (who doesn't want less tuning) but I am struggling to understand if I need to switch over to ALS or SGD if all I care about is a simple single vector for each feature.
from fastfm.
- Use what ever gives you the best result.
- link updated
- Go with ALS, MCMC doesn't really make sense for user / item embeddings (you would get a different embedding for each iteration). You can use MCMC for the hyper-parameter search if that helps in your case.
Callingpredict
on an mcmc model gives you a "ALS" prediction using the hyper-parameter from the last mcmc chain.
from fastfm.
- This seems a bit in conflict with the answer to Question 4 where ALS is suggested as superior in this situation.
- I've done a hard reload on your webpage and it still seems to be linking to the paywall.
4a) So you are sayingpredict
on a well tuned ALS model should represent the embeddings better thanpredict
on MCMC which must use the less meaningful parameters from the last mcmc chain?
4b) Is there any sort of convergence going on where the parameters at the current end of the mcmc chain are approaching the parameters we might get from ALS (or at least moving towards stability)?
While I certainly will use whatever gives me best results I am new to the "art" of tuning the parameters. It isn't clear to me if the superior MCMC results I see are because I need to invest more effort into tuning ALS, or if MCMC can actually "beat" ALS even when used "improperly" (your response to Question 4 seems to suggest it shouldn't)?
Thanks for all your help.
from fastfm.
- This seems a bit in conflict with the answer to Question 4 where ALS is suggested as superior in this situation.
No, I didn't say superior " (you would get a different embedding for each iteration)"
- I've done a hard reload on your webpage and it still seems to be linking to the paywall.
works for me, here the link again: http://www.informatik.uni-konstanz.de/rendle/pub0/
4b) Is there any sort of convergence going on where the parameters at the current end of the mcmc chain are approaching the parameters we might get from ALS (or at least moving towards stability)?
No, it's not even clear how to define the end of a mcmc chain. :)
from fastfm.
Thanks, the paper link seems to work now (perhaps there was some caching earlier)
My apologies for the lack of familiarity with the terminology. Instead of the end of a mcmc chain I meant "the hyper-parameter from the last mcmc chain" which predict
uses with mcmc. This approach seems to be strongly cautioned in the docs as "This evaluation is fast but usually of low quality." I am therefore surprised to see results that seem to consistently beat ALS with defaults (and some basic regularization tuning). I am trying to understand if this means that I need to tune ALS more (as it seems like it should be able to beat "low quality" predictions) or if the warning is giving me the wrong impression of the limitations of MCMC.
from fastfm.
Related Issues (20)
- pip install . is not working on Winodws HOT 1
- Illegal instruction (core dumped) in ALS HOT 5
- Can fastfm use mini batch? HOT 1
- Check pairs range failed when fitting BPR
- OverflowError: n_iter too high in bpr.FMRecommender HOT 1
- Need partial_fit HOT 1
- Can fastfm use multicore to speed up training? HOT 1
- Fit complaining about both dense/sparse HOT 2
- Recompile for python 3.7 HOT 7
- Input of fit() and return value of predict_proba() method
- Failure to install on Python3.8 HOT 8
- Fix simple typo: reommend -> recommend
- Import Error
- Compiling using OpenBLAS from anaconda
- Source file type in PyPi
- No coordinate descent solver available HOT 1
- Compilation error on macOS 11.2 ARM HOT 3
- Any plan to support py3.7+? HOT 1
- will it work for third order categorical features interaction ?
- will it work on windows OS?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fastfm.