Coder Social home page Coder Social logo

Comments (8)

ibayer avatar ibayer commented on June 7, 2024

Try different values for the step_size parameter. Unfortunately SGD suffers from vanishing / exploding gradients if the step_size is not chosen properly. I would recommend to use the ALS solver which is more stable and usually faster.

from fastfm.

chialikuo avatar chialikuo commented on June 7, 2024

Thanks @ibayer.
A small step_size works. I had to shrink it to around 1e-7 for SGD to work properly. BTW, why do you think ALS is usually faster? It seems to me SGD is always faster.

from fastfm.

ibayer avatar ibayer commented on June 7, 2024

BTW, why do you think ALS is usually faster? It seems to me SGD is always faster.

In my experience ALS needs less clock time to converge to the same quality as SGD even through one SGD iterations is much faster then one ALS iteration.

from fastfm.

chialikuo avatar chialikuo commented on June 7, 2024

In my experience ALS needs less clock time to converge to the same quality as SGD even through one SGD iterations is much faster then one ALS iteration.

What counts as one ALS iteration? Is it when every model parameter (i.e., each coordinate as in general coordinate descent) gets updated once?

I guess the actual time may depend on the data size and the particular problem (e.g., is convergence intrinsically difficult). On my fairly small, sparse data (~90000 instances, 100 effective non-zero dimension), SGD takes about 1/22/3 of the clock time needed by ALS to converge to comparable quality, as measured on some hold-out data. Of course, SGD needs a much bigger n_iter though.

from fastfm.

ibayer avatar ibayer commented on June 7, 2024

What counts as one ALS iteration? Is it when every model parameter (i.e., each coordinate as in general coordinate descent) gets updated once?

That the definition that I'm using.

I guess the actual time may depend on the data size and the particular problem (e.g., is convergence intrinsically difficult).

Absolutely, it's also easy to construct a data set where SGD converges faster. Just oversample the original data heavily. This shouldn't influence SGD at all but slow down ALS quite a bit.

On my fairly small, sparse data (~90000 instances, 100 effective non-zero dimension), SGD takes about 1/22/3 of the clock time needed by ALS to converge to comparable quality, as measured on some hold-out data. Of course, SGD needs a much bigger n_iter though.

I'm still surprised. The ALS solver is fairly optimized and carefully profiled, the SGD solver not so much.
Are you using a public data set? Can you share the experiments?

from fastfm.

chialikuo avatar chialikuo commented on June 7, 2024

I'm using data that is not public, and for some reasons, I do not wish (and might not be allowed) to share at the moment.

I just had some quick runs with the two approaches without very thorough experiments, but I'm definitely willing to share them. When I'm more available later, I'll run a few more experiments, organize the result a bit, and share them here.

from fastfm.

ibayer avatar ibayer commented on June 7, 2024

That would be great.

from fastfm.

ibayer avatar ibayer commented on June 7, 2024

I'm closing this one as the original issue seem to be fixed after changing the sgd stepsize.

from fastfm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.