Coder Social home page Coder Social logo

Grow & Drop Ambiguity about rigl HOT 6 CLOSED

google-research avatar google-research commented on August 27, 2024
Grow & Drop Ambiguity

from rigl.

Comments (6)

verbose-void avatar verbose-void commented on August 27, 2024

Also, I have another question regarding dense & sparse gradient calculations:

for your drop & grow code it seems like the scores are derived from a normal dense backprop calculation that (after calculation) is multiplied by the mask. Is this also used for the sparse parameter update?

the way I have it implemented is I'm using a backward hook that applies the mask for each layer's gradients before it is passed to its successors (the unmasked grad matrix is copied and used to score the drops/grows). in other words, the mask is applied during the backward pass (affecting the outputs) as opposed to the "post calculation" method you implemented.

are you aware of this being possible and you deliberately chose to do the "post calc" method? do you think that may be a source of loss of accuracy i'm seeing?

from rigl.

evcu avatar evcu commented on August 27, 2024

Hi,

I realize now we have a typo/mistake in the notation. It should be the remaining connections that we exclude; not the dropped ones. In other words we don't want to select existing connections L286. Thanks for noticing this we will fix this in the paper and sorry for the confusion.

Yes, indeed it is (1) Yes (2) No. For the second one we tried re-init but it either didn't make a difference or harmed the performance (don't remember exactly). You can try on your own by using reinit_when same flag (L307).

In this implementation gradients are sparse by default since graph uses w*m to begin with. To obtain dense gradients we take the gradient with respect to masked_weights=w*m (see L556). What do you mean by layer's gradient? The activation gradients/error signals that are passed to the previous layer are not masked. The weights can be masked during or after backprop, no difference it would make.

Let me know if you have other questions and feel free to reach out through e-mail, too. I can look at your code if needed. I would be happy to add your implementation to the README.md once you have it ready.

Best

from rigl.

verbose-void avatar verbose-void commented on August 27, 2024

Thanks for the quick reply! I've never received a reply from an author about their work so I really appreciate your involvement :)

I think your notation was without error, you explicitly said the indices received from ArgTopK should not be within the set of THETA / II_drop, which you further mentioned was the set of active connections left.

It only really got me thinking after reading algorithm 1:

Screen Shot 2020-10-21 at 12 48 41 PM

Since you do the parameter updates AFTER the grow selection, I thought there was no way you could gather your grow selection candidates after the drop phase because you explicitly stated to do these updates after both grow & select. There is technically no problem here, as you can update the masks before ever touching the parameters (like you did in your code), though I had to read the code to clear that up for myself.

And poking through your code more I was able to draw the same conclusions as well, however, the gradient problem is still unclear for me. I will shoot you an email shortly, thanks for being awesome!

from rigl.

evcu avatar evcu commented on August 27, 2024

That's true, wrong alarm 👍

Yes we either do the mask update or the gradient update. Let's continue offline

from rigl.

varun19299 avatar varun19299 commented on August 27, 2024

To confirm, the paper's $I \notin \theta \ \I_\text{drop}$ is correct? i.e., can grow connections from the dropped set.

image

(no TeX support in GitHub :( )

from rigl.

verbose-void avatar verbose-void commented on August 27, 2024

To confirm, the paper's $I \notin \theta \ \I_\text{drop}$ is correct? i.e., can grow connections from the dropped set.

image

(no TeX support in GitHub :( )

It IS correct, although slightly ambiguous. the "i cannot be within the set of theta/Idrop" notation is slightly misleading, but all it means is that the grow connections cannot grow connections that already exist, AFTER dropping has been done.

Growing CAN be done upon connections that were dropped, but in the tensorflow source code, if this happens the weights are left alone. But in the paper, it makes no reference of this and to me before I read the source code, I was confused on how those dropped & grown connections would be handled.

from rigl.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.